About
Researching a result is not nearly as valuable without appropriately communicating it…
Activity
-
I enjoy finding statues of Nikola Tesla, and luckily, there are many around the world. One was at the Croatian Machine Learning Workshop (CMLW)…
I enjoy finding statues of Nikola Tesla, and luckily, there are many around the world. One was at the Croatian Machine Learning Workshop (CMLW)…
Liked by Petar Veličković
-
I've just been promoted to Senior Staff Research Scientist at Google DeepMind! 🚀 This energises me to push even stronger on exciting foundational…
I've just been promoted to Senior Staff Research Scientist at Google DeepMind! 🚀 This energises me to push even stronger on exciting foundational…
Posted by Petar Veličković
-
I love a good leaderboard... or several! ⏫ TGR 🐅 is our graph rewiring method for temporal graphs leveraging expander graph propagation…
I love a good leaderboard... or several! ⏫ TGR 🐅 is our graph rewiring method for temporal graphs leveraging expander graph propagation…
Shared by Petar Veličković
Experience
Education
-
University of Cambridge
-
Activities and Societies: Computer Science Tripos Supervisor (Computer Laboratory, Colleges: Trinity, Jesus, Wolfson, Girton, Churchill, Sidney Sussex, Homerton, Magdalene, Hughes Hall, Newnham, Queens')
Thesis title: “The resurgence of structure in deep neural networks” (Accepted without corrections; thesis committee: Prof John Daugman, Prof Jure Leskovec)
Worked within the Artificial Intelligence Group of the Department of Computer Science and Technology.
Supervised by Prof Pietro Liò. -
-
Activities and Societies: Computer Laboratory Open Days Demonstrator
Final year dissertation: "Molecular multiplex network inference", supervised by Dr Pietro Liò.
-
-
Final year project: "Development of a software emulator for the GameBoy console"
Publications
-
Pointer Graph Networks
Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020)
Graph neural networks (GNNs) are typically applied to static graphs that are assumed to be known upfront. This static input structure is often informed purely by insight of the machine learning practitioner, and might not be optimal for the actual task the GNN is solving. In absence of reliable domain expertise, one might resort to inferring the latent graph structure, which is often difficult due to the vast search space of possible graphs. Here we introduce Pointer Graph Networks (PGNs) which…
Graph neural networks (GNNs) are typically applied to static graphs that are assumed to be known upfront. This static input structure is often informed purely by insight of the machine learning practitioner, and might not be optimal for the actual task the GNN is solving. In absence of reliable domain expertise, one might resort to inferring the latent graph structure, which is often difficult due to the vast search space of possible graphs. Here we introduce Pointer Graph Networks (PGNs) which augment sets or graphs with additional inferred edges for improved model expressivity. PGNs allow each node to dynamically point to another node, followed by message passing over these pointers. The sparsity of this adaptable graph structure makes learning tractable while still being sufficiently expressive to simulate complex algorithms. Critically, the pointing mechanism is directly supervised to model long-term sequences of operations on classical data structures, incorporating useful structural inductive biases from theoretical computer science. Qualitatively, we demonstrate that PGNs can learn parallelisable variants of pointer-based data structures, namely disjoint set unions and link/cut trees. PGNs generalise out-of-distribution to 5x larger test inputs on dynamic graph connectivity tasks, outperforming unrestricted GNNs and Deep Sets.
Other authorsSee publication -
Neural Execution of Graph Algorithms
The 8th International Conference on Learning Representations (ICLR 2020)
Graph Neural Networks (GNNs) are a powerful representational tool for solving problems on graph-structured inputs. In almost all cases so far, however, they have been applied to directly recovering a final solution from raw inputs, without explicit guidance on how to structure their problem-solving. Here, instead, we focus on learning in the space of algorithms: we train several state-of-the-art GNN architectures to imitate individual steps of classical graph algorithms, parallel (breadth-first…
Graph Neural Networks (GNNs) are a powerful representational tool for solving problems on graph-structured inputs. In almost all cases so far, however, they have been applied to directly recovering a final solution from raw inputs, without explicit guidance on how to structure their problem-solving. Here, instead, we focus on learning in the space of algorithms: we train several state-of-the-art GNN architectures to imitate individual steps of classical graph algorithms, parallel (breadth-first search, Bellman-Ford) as well as sequential (Prim's algorithm). As graph algorithms usually rely on making discrete decisions within neighbourhoods, we hypothesise that maximisation-based message passing neural networks are best-suited for such objectives, and validate this claim empirically. We also demonstrate how learning in the space of algorithms can yield new opportunities for positive transfer between tasks---showing how learning a shortest-path algorithm can be substantially improved when simultaneously learning a reachability algorithm.
Other authorsSee publication -
Deep Graph Infomax
The 7th International Conference on Learning Representations (ICLR 2019)
We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner. DGI relies on maximizing mutual information between patch representations and corresponding high-level summaries of graphs---both derived using established graph convolutional network architectures. The learnt patch representations summarize subgraphs centered around nodes of interest, and can thus be reused for downstream node-wise learning tasks. In…
We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner. DGI relies on maximizing mutual information between patch representations and corresponding high-level summaries of graphs---both derived using established graph convolutional network architectures. The learnt patch representations summarize subgraphs centered around nodes of interest, and can thus be reused for downstream node-wise learning tasks. In contrast to most prior approaches to unsupervised learning with GCNs, DGI does not rely on random walk objectives, and is readily applicable to both transductive and inductive learning setups. We demonstrate competitive performance on a variety of node classification benchmarks, which at times even exceeds the performance of supervised learning.
Other authorsSee publication -
The resurgence of structure in deep neural networks
University of Cambridge
Machine learning with deep neural networks ("deep learning") allows for learning complex features directly from raw input data, completely eliminating hand-crafted, "hard-coded" feature extraction from the learning pipeline. This has lead to state-of-the-art performance being achieved across several---previously disconnected---problem domains, including computer vision, natural language processing, reinforcement learning and generative modelling. These success stories nearly universally go…
Machine learning with deep neural networks ("deep learning") allows for learning complex features directly from raw input data, completely eliminating hand-crafted, "hard-coded" feature extraction from the learning pipeline. This has lead to state-of-the-art performance being achieved across several---previously disconnected---problem domains, including computer vision, natural language processing, reinforcement learning and generative modelling. These success stories nearly universally go hand-in-hand with availability of immense quantities of labelled training examples ("big data") exhibiting simple grid-like structure (e.g. text or images), exploitable through convolutional or recurrent layers. This is due to the extremely large number of degrees-of-freedom in neural networks, leaving their generalisation ability vulnerable to effects such as overfitting. However, there remain many domains where extensive data gathering is not always appropriate, affordable, or even feasible. Furthermore, data is generally organised in more complicated kinds of structure---which most existing approaches would simply discard. Examples of such tasks are abundant in the biomedical space; with e.g. small numbers of subjects available for any given clinical study, or relationships between proteins specified via interaction networks. I hypothesise that, if deep learning is to reach its full potential in such environments, we need to reconsider "hard-coded" approaches---integrating assumptions about inherent structure in the input data directly into our architectures and learning algorithms, through structural inductive biases. In this dissertation, I directly validate this hypothesis by developing three structure-infused neural network architectures (operating on sparse multimodal and graph-structured data), and a structure-informed learning algorithm for graph neural networks, demonstrating significant outperformance of conventional baseline models and algorithms.
-
Cross-modal recurrent models for weight objective prediction from multimodal time-series data
The 12th EAI International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth 2018)
We analyse multimodal time-series data corresponding to weight, sleep and steps measurements. We focus on predicting whether a user will successfully achieve his/her weight objective. For this, we design several deep long short-term memory (LSTM) architectures, including a novel cross-modal LSTM (X-LSTM), and demonstrate their superiority over baseline approaches. The X-LSTM improves parameter efficiency by processing each modality separately and allowing for information flow between them by…
We analyse multimodal time-series data corresponding to weight, sleep and steps measurements. We focus on predicting whether a user will successfully achieve his/her weight objective. For this, we design several deep long short-term memory (LSTM) architectures, including a novel cross-modal LSTM (X-LSTM), and demonstrate their superiority over baseline approaches. The X-LSTM improves parameter efficiency by processing each modality separately and allowing for information flow between them by way of recurrent cross-connections. We present a general hyperparameter optimisation technique for X-LSTMs, which allows us to significantly improve on the LSTM and a prior state-of-the-art cross-modal approach, using a comparable number of parameters. Finally, we visualise the model’s predictions, revealing implications about latent variables in this task.
Other authorsSee publication -
Graph Attention Networks
The 6th International Conference on Learning Representations (ICLR 2018)
We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods' features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of computationally intensive matrix…
We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods' features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of computationally intensive matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved or matched state-of-the-art results across four established transductive and inductive graph benchmarks: the Cora, Citeseer and Pubmed citation network datasets, as well as a protein-protein interaction dataset (wherein test graphs remain unseen during training).
Other authorsSee publication -
X-CNN: Cross-modal convolutional neural networks for sparse datasets
The 7th IEEE Symposium Series on Computational Intelligence (IEEE SSCI 2016)
In this paper we propose cross-modal convolutional neural networks (X-CNNs), a novel biologically inspired type of CNN architectures, treating gradient descent-specialised CNNs as individual units of processing in a larger-scale network topology, while allowing for unconstrained information flow and/or weight sharing between analogous hidden layers of the network---thus generalising the already well-established concept of neural network ensembles (where information typically may flow only…
In this paper we propose cross-modal convolutional neural networks (X-CNNs), a novel biologically inspired type of CNN architectures, treating gradient descent-specialised CNNs as individual units of processing in a larger-scale network topology, while allowing for unconstrained information flow and/or weight sharing between analogous hidden layers of the network---thus generalising the already well-established concept of neural network ensembles (where information typically may flow only between the output layers of the individual networks). The constituent networks are individually designed to learn the output function on their own subset of the input data, after which cross-connections between them are introduced after each pooling operation to periodically allow for information exchange between them. This injection of knowledge into a model (by prior partition of the input data through domain knowledge or unsupervised methods) is expected to yield greatest returns in sparse data environments, which are typically less suitable for training CNNs. For evaluation purposes, we have compared a standard four-layer CNN as well as a sophisticated FitNet4 architecture against their cross-modal variants on the CIFAR-10 and CIFAR-100 datasets with differing percentages of the training data being removed, and find that at lower levels of data availability, the X-CNNs significantly outperform their baselines (typically providing a 2--6% benefit, depending on the dataset size and whether data augmentation is used), while still maintaining an edge on all of the full dataset tests.
Other authorsSee publication
Patents
-
Generating Prediction Outputs Using Dynamic Graphs
Filed 17 / 338,974
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating prediction outputs characterizing a set of entities. In one aspect, a method comprises: obtaining data defining a graph, comprising: (i) a set of nodes, wherein each node represents a respective entity from the set of entities, (ii) a current set of edges, wherein each edge connects a pair of nodes, and (iii) a respective current embedding of each node; at each of a plurality of…
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating prediction outputs characterizing a set of entities. In one aspect, a method comprises: obtaining data defining a graph, comprising: (i) a set of nodes, wherein each node represents a respective entity from the set of entities, (ii) a current set of edges, wherein each edge connects a pair of nodes, and (iii) a respective current embedding of each node; at each of a plurality of time steps: updating the respective current embedding of each node, comprising processing data defining the graph using a graph neural network; and updating the current set of edges based at least in part on the updated embeddings of the nodes; and at one or more of the plurality of time steps: generating a prediction output characterizing the set of entities based on the current embeddings of the nodes.
Other inventorsSee patent -
User Behaviour Monitoring
Filed EU EP3483792A1
A method and apparatus is described comprising: receiving first user data of a first user from a plurality of data sources; inferring at least some missing data points in said first user data using a model to predict the missing data points, wherein the model is trained using selected data extracted from stored data relating to a plurality of users; generating modified user data from said first user data and at least some of said inferred missing data points; and converting said modified user…
A method and apparatus is described comprising: receiving first user data of a first user from a plurality of data sources; inferring at least some missing data points in said first user data using a model to predict the missing data points, wherein the model is trained using selected data extracted from stored data relating to a plurality of users; generating modified user data from said first user data and at least some of said inferred missing data points; and converting said modified user data into a plurality of representations having defined length sequences for each data source.
Other inventorsSee patent
Honors & Awards
-
Outstanding Reviewer
Ninth International Conference on Learning Representations
Awarded to the top reviewers for ICLR 2021.
-
Best Short Paper Runner-Up
AAAI-21 International Workshop on Health Intelligence (W3PHIAI 2021)
Awarded for "Predicting Patient Outcomes with Graph Representation Learning", co-authored with Emma Rocheteau, Catherine Tong, Nicholas Lane and Pietro Liò
-
Top Reviewer
Thirty-seventh International Conference on Machine Learning
Awarded to the top 33% reviewers for ICML 2020.
-
Best Reviewer Award
Thirty-third Conference on Neural Information Processing Systems
Awarded to the top 400 reviewers for NeurIPS 2019.
-
ICLR 2019 Travel Award
The 7th International Conference on Learning Representations
-
TeleSign Datathon Winner
TeleSign
-
Bronze Medal
Hack Cambridge 4D
Awarded the 3rd best project award (out of ~70 participating teams) at the 24h Hack Cambridge Recurse hackathon (a Major League Hacking event), as a member of team N-hance (pls).
-
ICLR 2018 Travel Award
The 6th International Conference on Learning Representations
-
NIPS TSW 2017 Travel Award
NIPS 2017 Time Series Workshop
-
ACM-ICPC Northwestern Europe Regional Contest 2017 - Gold Medal (Coach)
Association for Computing Machinery
-
Computer Laboratory Wiseman Award
Department of Computer Science and Technology, University of Cambridge
Awarded for making a commendable contribution to the work of the Department in the 2016/17 academic year.
-
Finalist
Google Hash Code 2017
Ranked 5th in the Final Round (out of 50 participating teams)
Ranked 53rd in the Qualification Round (out of ~3000 participating teams) -
Bronze Medal
Hack Cambridge Recurse
Awarded the 3rd best project award (out of ~100 participating teams) at the 24h Hack Cambridge Recurse hackathon (a Major League Hacking event), as a member of team facejack.
-
ACM-ICPC Northwestern Europe Regional Contest 2016 - Silver Medal
Association for Computing Machinery
-
Finalist
Hack Cambridge
Nominated as one of the top 7 projects (out of ~100 participating teams) at the 24h Hack Cambridge hackathon (a Major League Hacking event), as a member of team Viral.
-
ACM-ICPC Northwestern Europe Regional Contest 2014 - Bronze Medal
Association for Computing Machinery
-
Senior Scholar
Trinity College, University of Cambridge
Elected due to high performance in the examinations for Part IB of the Computer Science Tripos.
Re-elected for another year (in July 2015), due to high performance in the examinations for Part II of the Computer Science Tripos. -
Junior Scholar
Trinity College, University of Cambridge
Elected due to high performance in the examinations for Part IA of the Computer Science Tripos.
-
Dositeja Award
Fund for Young Talents of Serbia
Awarded to high-school students for great performance at acknowledged national and international competitions during the 2011/12 school year.
-
Best Final Year Project in the area of Informatics
Matematička gimnazija
-
Cambridge Overseas Trust Scholarship
Cambridge Commonwealth, European & International Trust
-
Trinity Overseas Bursary
Trinity College, University of Cambridge
-
Serbian High School Competition Awards
Mathematical and Physical Societies of Serbia
– Serbian Olympiad in Informatics - Third Prize (May 2012)
– National Competition in Informatics - Second Prize (April 2012)
– National Competition in Physics - Honourable Mention (April 2011)
– National Competition in Mathematics - Honourable Mention (March 2010)
– National Competition in Physics - Honourable Mention (April 2009)
Languages
-
English
Full professional proficiency
-
Serbian
Native or bilingual proficiency
-
French
Elementary proficiency
-
Romanian
Limited working proficiency
More activity by Petar
-
The first day of DND-24 was awesome 🔥 Today we go all in on STEM topics. My talks will be: * 7 min plenary in the second AI session (starting…
The first day of DND-24 was awesome 🔥 Today we go all in on STEM topics. My talks will be: * 7 min plenary in the second AI session (starting…
Shared by Petar Veličković
-
I'm attending DND-24 at the Rectorate of the University of Montenegro in Podgorica 🇲🇪 for the next two days; looking forward to meeting the most…
I'm attending DND-24 at the Rectorate of the University of Montenegro in Podgorica 🇲🇪 for the next two days; looking forward to meeting the most…
Shared by Petar Veličković
-
Round and Round we Go! 🔄 Rotary Positional Encodings (RoPE) are a common staple of frontier LLMs. _Why_ do they work so well, and _how_ do LLMs…
Round and Round we Go! 🔄 Rotary Positional Encodings (RoPE) are a common staple of frontier LLMs. _Why_ do they work so well, and _how_ do LLMs…
Shared by Petar Veličković
-
I did say Google DeepMind would one day win the Nobel Prize, I just didn't expect it to come so fast! 🧬 Heartfelt congratulations to the entire…
I did say Google DeepMind would one day win the Nobel Prize, I just didn't expect it to come so fast! 🧬 Heartfelt congratulations to the entire…
Shared by Petar Veličković
-
Next stop: Podgorica 🇲🇪 Vidimo se uskoro na DND-24 🚀
Next stop: Podgorica 🇲🇪 Vidimo se uskoro na DND-24 🚀
Shared by Petar Veličković
-
Expander Graph Propagation has levelled up! ⏫ Introducing Cayley Graph Propagation 🕸 https://lnkd.in/edVnQpQY While EGP truncates Cayley graphs…
Expander Graph Propagation has levelled up! ⏫ Introducing Cayley Graph Propagation 🕸 https://lnkd.in/edVnQpQY While EGP truncates Cayley graphs…
Shared by Petar Veličković
-
I recently made my very first visit to Skopje 🇲🇰 for the AI Tech Summit! I had ample and inspiring interactions with the local AI community, and…
I recently made my very first visit to Skopje 🇲🇰 for the AI Tech Summit! I had ample and inspiring interactions with the local AI community, and…
Shared by Petar Veličković
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More