How to deal with graph data?

So I was working on a problem which consists of a dataset with graph and another dataset with the features of the nodes of the graph. I found two things which can be done:

  1. DeepWalk
  2. Pytorch’s BigGraph

But for both, I couldn’t find any resources which had implemented these things on code. So can someone please help me regarding this?


Start with stuff that’s already code code available.


Thanks Jeremy Is a great codebase which I found useful.


I recently attended a talk by Jure Leskovec from Stanford on Graph Neural Networks and it was the most interesting talk I’ve seen in quite a while. I couldn’t find a recording of him giving the presentation, but most of the slides seem to be available on Stanford’s SNAP.

One of the more impressive things he described was the scale of application - at Pinterest, they are applying this method to a graph with 3 billion nodes and 20 billion edges. A description of that work is here on Medium.

Lastly, in addition to the GraphSAGE code above, which is in Tensorflow, there appears to be a PyTorch version at this repo.


Pytorch Geometric was also getting some love here recently:

We introduce PyTorch Geometric, a library for deep learning on irregularly structured input data such as graphs, point clouds and manifolds, built upon PyTorch. In addition to general graph data structures and processing methods, it contains a variety of recently published methods from the domains of relational learning and 3D data processing. PyTorch Geometric achieves high data throughput by leveraging sparse GPU acceleration, by providing dedicated CUDA kernels and by introducing efficient mini-batch handling for input examples of different size. In this work, we present the library in detail and perform a comprehensive comparative study of the implemented methods in homogeneous evaluation scenarios.

Graph Neural Networks(GNNs) recently emerged as a powerful approach for representation learning on graphs, point clouds and manifolds (Bronstein et al., 2017; Kipf & Welling, 2017). Similar to the concepts of convolutional and pooling layers on regular domains, GNNs are able to (hierarchically) extract localized embeddings by passing, transforming, and aggregating information (Bronstein et al., 2017; Gilmer et al., 2017; Battaglia et al., 2018; Ying et al., 2018). However, implementing GNNs is challenging, as high GPU throughput needs to be achieved on highly sparse and irregular data of varying size. Here, we introduce PyTorch Geometric (PyG), a geometric deep learning extension library for PyTorch (Paszke et al., 2017) which achieves high performance by leveraging dedicated CUDA kernels. Following a simple message passing API, it bundles most of the recently proposed convolutional and pooling layers into a single and unified framework. All implemented methods support both CPU and GPU computations and follow an immutable data flow paradigm that enables dynamic changes in graph structures through time. PyG is released under the MIT license and is available on GitHub.1 It is thoroughly documented and provides accompanying tutorials and examples as a first starting point.2

Runtime Experiments. We conduct several experiments on a number of dataset-model pairs to report the runtime of a whole training procedure obtained on a single NVIDIA GTX 1080 Ti (cf. Table 4). As it shows, PyG is very fast despite working on sparse data. Compared to the Deep Graph Library (DGL) 0.1.3 (Wang et al., 2018a), PyG trains models up to 15 times faster.


Interestingly Soumith said on Twitter yesterday that something like Julia is what’s needed to do stuff with graph data properly.


When I was finding resources for graph data, I found graphs implemented in Neo4j(java) and Pytorch. But the problem with pytorch’s graph implementation is that it is fairly new and people haven’t used it much. Maybe we’ll learn how to do that in python or maybe in swift in coming days.

Pytorch’s BigGraph was recently open sourced:


This week this pop on my YouTube feed, since I have also been intrigued by the idea. Great talk by Jure.


I’m just starting to dig into graph data and GCNNs. Currently I’m using DGL, as I’m working with molecular data and DGL has existing functions specific for molecules and seems more developed. I’ll give PyG a try though.

If I end up with anything useful as far as dataloaders what whatnot that allow graph libraries to plug into, I’ll post them here. Currently it looks like none of the major graph libraries have built in functions for training, so getting graph data + a graph model to plug into a Learner could be really useful.


PyTorch Geometric has implementations of almost all graph layers that I’ve come across, when I worked with graph data on a project. Simple layers such as Spectral Graph Convolutions are easy enough to implement yourself. However, if you want to do training in mini batches, you have use more advanced approaches that utilises sampling of the graph - such as GraphSage.

This paper has a nice overview of the various layers that has been developed for deep learning on graphs.


There was a kaggle challenge predicting molecular properties in 2019 to predict NMR (nuclear magnetic resonance) data for small molecules, where graph based networks outperformed most other approaches. There are a lot of interesting threads and Kernels available there introducing different kind of graph (convolutional) networks (Schnet, MPNN/enn-s2s) for chemistry applications (molecules can be described well as undirected graphs).


hi all! I am using neo4j as a way to structure my data for input to a GNN like GAT. Does anyone have a suggestion on how to go from neo4j to the data structure that PyTorch’s dgl would need for training? I’m new at this and there are alot of different libraries (pyneo neo4jdriver, networkx etc) so I’m just not sure where to begin. Any help appreciated!

Here is my attempt to implement Graph Convolutional Networks in Julia with Knet:

You can also find the Colab link there.

(I will try to rewrite the PyCall parts (in utils) from the original code and for now please ignore the Chebyshev polynomial k=2 and k=3 parts which needs some simplification.)