Petar Veličković

Petar Veličković

London Area, United Kingdom
20K followers 500+ connections

About

Researching a result is not nearly as valuable without appropriately communicating it…

Activity

Join now to see all activity

Experience

  • Google DeepMind Graphic

    Google DeepMind

    London, England, United Kingdom

  • -

    Cambridge, England, United Kingdom

  • -

    Cambridge, England, United Kingdom

  • -

    London, England, United Kingdom

  • -

    London, England, United Kingdom

  • -

    London, England, United Kingdom

  • -

    Cambridge, United Kingdom

  • -

    Montreal, Canada Area

  • -

    Montreal, Canada Area

  • -

    Cambridge, United Kingdom

  • -

    London, United Kingdom

  • -

    Cambridge, United Kingdom

  • -

    Belgrade, Serbia

Education

  • University of Cambridge Graphic

    University of Cambridge

    -

    Activities and Societies: Computer Science Tripos Supervisor (Computer Laboratory, Colleges: Trinity, Jesus, Wolfson, Girton, Churchill, Sidney Sussex, Homerton, Magdalene, Hughes Hall, Newnham, Queens')

    Thesis title: “The resurgence of structure in deep neural networks” (Accepted without corrections; thesis committee: Prof John Daugman, Prof Jure Leskovec)

    Worked within the Artificial Intelligence Group of the Department of Computer Science and Technology.

    Supervised by Prof Pietro Liò.

  • -

    Activities and Societies: Computer Laboratory Open Days Demonstrator

    Final year dissertation: "Molecular multiplex network inference", supervised by Dr Pietro Liò.

  • -

    Final year project: "Development of a software emulator for the GameBoy console"

Publications

  • Pointer Graph Networks

    Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020)

    Graph neural networks (GNNs) are typically applied to static graphs that are assumed to be known upfront. This static input structure is often informed purely by insight of the machine learning practitioner, and might not be optimal for the actual task the GNN is solving. In absence of reliable domain expertise, one might resort to inferring the latent graph structure, which is often difficult due to the vast search space of possible graphs. Here we introduce Pointer Graph Networks (PGNs) which…

    Graph neural networks (GNNs) are typically applied to static graphs that are assumed to be known upfront. This static input structure is often informed purely by insight of the machine learning practitioner, and might not be optimal for the actual task the GNN is solving. In absence of reliable domain expertise, one might resort to inferring the latent graph structure, which is often difficult due to the vast search space of possible graphs. Here we introduce Pointer Graph Networks (PGNs) which augment sets or graphs with additional inferred edges for improved model expressivity. PGNs allow each node to dynamically point to another node, followed by message passing over these pointers. The sparsity of this adaptable graph structure makes learning tractable while still being sufficiently expressive to simulate complex algorithms. Critically, the pointing mechanism is directly supervised to model long-term sequences of operations on classical data structures, incorporating useful structural inductive biases from theoretical computer science. Qualitatively, we demonstrate that PGNs can learn parallelisable variants of pointer-based data structures, namely disjoint set unions and link/cut trees. PGNs generalise out-of-distribution to 5x larger test inputs on dynamic graph connectivity tasks, outperforming unrestricted GNNs and Deep Sets.

    Other authors
    See publication
  • Neural Execution of Graph Algorithms

    The 8th International Conference on Learning Representations (ICLR 2020)

    Graph Neural Networks (GNNs) are a powerful representational tool for solving problems on graph-structured inputs. In almost all cases so far, however, they have been applied to directly recovering a final solution from raw inputs, without explicit guidance on how to structure their problem-solving. Here, instead, we focus on learning in the space of algorithms: we train several state-of-the-art GNN architectures to imitate individual steps of classical graph algorithms, parallel (breadth-first…

    Graph Neural Networks (GNNs) are a powerful representational tool for solving problems on graph-structured inputs. In almost all cases so far, however, they have been applied to directly recovering a final solution from raw inputs, without explicit guidance on how to structure their problem-solving. Here, instead, we focus on learning in the space of algorithms: we train several state-of-the-art GNN architectures to imitate individual steps of classical graph algorithms, parallel (breadth-first search, Bellman-Ford) as well as sequential (Prim's algorithm). As graph algorithms usually rely on making discrete decisions within neighbourhoods, we hypothesise that maximisation-based message passing neural networks are best-suited for such objectives, and validate this claim empirically. We also demonstrate how learning in the space of algorithms can yield new opportunities for positive transfer between tasks---showing how learning a shortest-path algorithm can be substantially improved when simultaneously learning a reachability algorithm.

    Other authors
    See publication
  • Deep Graph Infomax

    The 7th International Conference on Learning Representations (ICLR 2019)

    We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner. DGI relies on maximizing mutual information between patch representations and corresponding high-level summaries of graphs---both derived using established graph convolutional network architectures. The learnt patch representations summarize subgraphs centered around nodes of interest, and can thus be reused for downstream node-wise learning tasks. In…

    We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner. DGI relies on maximizing mutual information between patch representations and corresponding high-level summaries of graphs---both derived using established graph convolutional network architectures. The learnt patch representations summarize subgraphs centered around nodes of interest, and can thus be reused for downstream node-wise learning tasks. In contrast to most prior approaches to unsupervised learning with GCNs, DGI does not rely on random walk objectives, and is readily applicable to both transductive and inductive learning setups. We demonstrate competitive performance on a variety of node classification benchmarks, which at times even exceeds the performance of supervised learning.

    Other authors
    See publication
  • The resurgence of structure in deep neural networks

    University of Cambridge

    Machine learning with deep neural networks ("deep learning") allows for learning complex features directly from raw input data, completely eliminating hand-crafted, "hard-coded" feature extraction from the learning pipeline. This has lead to state-of-the-art performance being achieved across several---previously disconnected---problem domains, including computer vision, natural language processing, reinforcement learning and generative modelling. These success stories nearly universally go…

    Machine learning with deep neural networks ("deep learning") allows for learning complex features directly from raw input data, completely eliminating hand-crafted, "hard-coded" feature extraction from the learning pipeline. This has lead to state-of-the-art performance being achieved across several---previously disconnected---problem domains, including computer vision, natural language processing, reinforcement learning and generative modelling. These success stories nearly universally go hand-in-hand with availability of immense quantities of labelled training examples ("big data") exhibiting simple grid-like structure (e.g. text or images), exploitable through convolutional or recurrent layers. This is due to the extremely large number of degrees-of-freedom in neural networks, leaving their generalisation ability vulnerable to effects such as overfitting. However, there remain many domains where extensive data gathering is not always appropriate, affordable, or even feasible. Furthermore, data is generally organised in more complicated kinds of structure---which most existing approaches would simply discard. Examples of such tasks are abundant in the biomedical space; with e.g. small numbers of subjects available for any given clinical study, or relationships between proteins specified via interaction networks. I hypothesise that, if deep learning is to reach its full potential in such environments, we need to reconsider "hard-coded" approaches---integrating assumptions about inherent structure in the input data directly into our architectures and learning algorithms, through structural inductive biases. In this dissertation, I directly validate this hypothesis by developing three structure-infused neural network architectures (operating on sparse multimodal and graph-structured data), and a structure-informed learning algorithm for graph neural networks, demonstrating significant outperformance of conventional baseline models and algorithms.

    See publication
  • Cross-modal recurrent models for weight objective prediction from multimodal time-series data

    The 12th EAI International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth 2018)

    We analyse multimodal time-series data corresponding to weight, sleep and steps measurements. We focus on predicting whether a user will successfully achieve his/her weight objective. For this, we design several deep long short-term memory (LSTM) architectures, including a novel cross-modal LSTM (X-LSTM), and demonstrate their superiority over baseline approaches. The X-LSTM improves parameter efficiency by processing each modality separately and allowing for information flow between them by…

    We analyse multimodal time-series data corresponding to weight, sleep and steps measurements. We focus on predicting whether a user will successfully achieve his/her weight objective. For this, we design several deep long short-term memory (LSTM) architectures, including a novel cross-modal LSTM (X-LSTM), and demonstrate their superiority over baseline approaches. The X-LSTM improves parameter efficiency by processing each modality separately and allowing for information flow between them by way of recurrent cross-connections. We present a general hyperparameter optimisation technique for X-LSTMs, which allows us to significantly improve on the LSTM and a prior state-of-the-art cross-modal approach, using a comparable number of parameters. Finally, we visualise the model’s predictions, revealing implications about latent variables in this task.

    Other authors
    See publication
  • Graph Attention Networks

    The 6th International Conference on Learning Representations (ICLR 2018)

    We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods' features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of computationally intensive matrix…

    We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods' features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of computationally intensive matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved or matched state-of-the-art results across four established transductive and inductive graph benchmarks: the Cora, Citeseer and Pubmed citation network datasets, as well as a protein-protein interaction dataset (wherein test graphs remain unseen during training).

    Other authors
    See publication
  • X-CNN: Cross-modal convolutional neural networks for sparse datasets

    The 7th IEEE Symposium Series on Computational Intelligence (IEEE SSCI 2016)

    In this paper we propose cross-modal convolutional neural networks (X-CNNs), a novel biologically inspired type of CNN architectures, treating gradient descent-specialised CNNs as individual units of processing in a larger-scale network topology, while allowing for unconstrained information flow and/or weight sharing between analogous hidden layers of the network---thus generalising the already well-established concept of neural network ensembles (where information typically may flow only…

    In this paper we propose cross-modal convolutional neural networks (X-CNNs), a novel biologically inspired type of CNN architectures, treating gradient descent-specialised CNNs as individual units of processing in a larger-scale network topology, while allowing for unconstrained information flow and/or weight sharing between analogous hidden layers of the network---thus generalising the already well-established concept of neural network ensembles (where information typically may flow only between the output layers of the individual networks). The constituent networks are individually designed to learn the output function on their own subset of the input data, after which cross-connections between them are introduced after each pooling operation to periodically allow for information exchange between them. This injection of knowledge into a model (by prior partition of the input data through domain knowledge or unsupervised methods) is expected to yield greatest returns in sparse data environments, which are typically less suitable for training CNNs. For evaluation purposes, we have compared a standard four-layer CNN as well as a sophisticated FitNet4 architecture against their cross-modal variants on the CIFAR-10 and CIFAR-100 datasets with differing percentages of the training data being removed, and find that at lower levels of data availability, the X-CNNs significantly outperform their baselines (typically providing a 2--6% benefit, depending on the dataset size and whether data augmentation is used), while still maintaining an edge on all of the full dataset tests.

    Other authors
    See publication

Patents

  • Generating Prediction Outputs Using Dynamic Graphs

    Filed 17 / 338,974

    Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating prediction outputs characterizing a set of entities. In one aspect, a method comprises: obtaining data defining a graph, comprising: (i) a set of nodes, wherein each node represents a respective entity from the set of entities, (ii) a current set of edges, wherein each edge connects a pair of nodes, and (iii) a respective current embedding of each node; at each of a plurality of…

    Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating prediction outputs characterizing a set of entities. In one aspect, a method comprises: obtaining data defining a graph, comprising: (i) a set of nodes, wherein each node represents a respective entity from the set of entities, (ii) a current set of edges, wherein each edge connects a pair of nodes, and (iii) a respective current embedding of each node; at each of a plurality of time steps: updating the respective current embedding of each node, comprising processing data defining the graph using a graph neural network; and updating the current set of edges based at least in part on the updated embeddings of the nodes; and at one or more of the plurality of time steps: generating a prediction output characterizing the set of entities based on the current embeddings of the nodes.

    Other inventors
    See patent
  • User Behaviour Monitoring

    Filed EU EP3483792A1

    A method and apparatus is described comprising: receiving first user data of a first user from a plurality of data sources; inferring at least some missing data points in said first user data using a model to predict the missing data points, wherein the model is trained using selected data extracted from stored data relating to a plurality of users; generating modified user data from said first user data and at least some of said inferred missing data points; and converting said modified user…

    A method and apparatus is described comprising: receiving first user data of a first user from a plurality of data sources; inferring at least some missing data points in said first user data using a model to predict the missing data points, wherein the model is trained using selected data extracted from stored data relating to a plurality of users; generating modified user data from said first user data and at least some of said inferred missing data points; and converting said modified user data into a plurality of representations having defined length sequences for each data source.

    Other inventors
    See patent
  • Persistent Message Passing for Graph Neural Networks

    Filed US 17/829,204

    Other inventors

Honors & Awards

  • Outstanding Reviewer

    Ninth International Conference on Learning Representations

    Awarded to the top reviewers for ICLR 2021.

  • Best Short Paper Runner-Up

    AAAI-21 International Workshop on Health Intelligence (W3PHIAI 2021)

    Awarded for "Predicting Patient Outcomes with Graph Representation Learning", co-authored with Emma Rocheteau, Catherine Tong, Nicholas Lane and Pietro Liò

  • Top Reviewer

    Thirty-seventh International Conference on Machine Learning

    Awarded to the top 33% reviewers for ICML 2020.

  • Best Reviewer Award

    Thirty-third Conference on Neural Information Processing Systems

    Awarded to the top 400 reviewers for NeurIPS 2019.

  • ICLR 2019 Travel Award

    The 7th International Conference on Learning Representations

  • TeleSign Datathon Winner

    TeleSign

  • Bronze Medal

    Hack Cambridge 4D

    Awarded the 3rd best project award (out of ~70 participating teams) at the 24h Hack Cambridge Recurse hackathon (a Major League Hacking event), as a member of team N-hance (pls).

  • ICLR 2018 Travel Award

    The 6th International Conference on Learning Representations

  • NIPS TSW 2017 Travel Award

    NIPS 2017 Time Series Workshop

  • ACM-ICPC Northwestern Europe Regional Contest 2017 - Gold Medal (Coach)

    Association for Computing Machinery

  • Computer Laboratory Wiseman Award

    Department of Computer Science and Technology, University of Cambridge

    Awarded for making a commendable contribution to the work of the Department in the 2016/17 academic year.

  • Finalist

    Google Hash Code 2017

    Ranked 5th in the Final Round (out of 50 participating teams)
    Ranked 53rd in the Qualification Round (out of ~3000 participating teams)

  • Bronze Medal

    Hack Cambridge Recurse

    Awarded the 3rd best project award (out of ~100 participating teams) at the 24h Hack Cambridge Recurse hackathon (a Major League Hacking event), as a member of team facejack.

  • ACM-ICPC Northwestern Europe Regional Contest 2016 - Silver Medal

    Association for Computing Machinery

  • Finalist

    Hack Cambridge

    Nominated as one of the top 7 projects (out of ~100 participating teams) at the 24h Hack Cambridge hackathon (a Major League Hacking event), as a member of team Viral.

  • ACM-ICPC Northwestern Europe Regional Contest 2014 - Bronze Medal

    Association for Computing Machinery

  • Senior Scholar

    Trinity College, University of Cambridge

    Elected due to high performance in the examinations for Part IB of the Computer Science Tripos.
    Re-elected for another year (in July 2015), due to high performance in the examinations for Part II of the Computer Science Tripos.

  • Junior Scholar

    Trinity College, University of Cambridge

    Elected due to high performance in the examinations for Part IA of the Computer Science Tripos.

  • Dositeja Award

    Fund for Young Talents of Serbia

    Awarded to high-school students for great performance at acknowledged national and international competitions during the 2011/12 school year.

  • Best Final Year Project in the area of Informatics

    Matematička gimnazija

  • Cambridge Overseas Trust Scholarship

    Cambridge Commonwealth, European & International Trust

  • Trinity Overseas Bursary

    Trinity College, University of Cambridge

  • Serbian High School Competition Awards

    Mathematical and Physical Societies of Serbia

    – Serbian Olympiad in Informatics - Third Prize (May 2012)
    – National Competition in Informatics - Second Prize (April 2012)
    – National Competition in Physics - Honourable Mention (April 2011)
    – National Competition in Mathematics - Honourable Mention (March 2010)
    – National Competition in Physics - Honourable Mention (April 2009)

Languages

  • English

    Full professional proficiency

  • Serbian

    Native or bilingual proficiency

  • French

    Elementary proficiency

  • Romanian

    Limited working proficiency

More activity by Petar

View Petar’s full profile

  • See who you know in common
  • Get introduced
  • Contact Petar directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Petar Veličković

Add new skills with these courses