On the bottleneck of graph neural networks and its practical implications


Researchr is a web site for finding, collecting, sharing, and reviewing scientific publications, for researchers by researchers.

Sign up for an account to create a profile with publication list, tag and review your related work, and share bibliographies with your co-authors.

. On the Bottleneck of Graph Neural Networks and its Practical Implications. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. [doi]

  • Abstract
  • Authors
  • BibTeX
  • References
  • Bibliographies
  • Reviews
  • Related


Abstract is missing.

  title={On the Bottleneck of Graph Neural Networks and its Practical Implications},
  author={Uri Alon and Eran Yahav},

Graph neural networks (GNNs) were shown to effectively learn from highly structured data containing elements (nodes) with relationships (edges) between them. GNN variants differ in how each node in the graph absorbs the information flowing from its neighbor nodes. In this paper, we highlight an inherent problem in GNNs: the mechanism of propagating information between neighbors creates a bottleneck when every node aggregates messages from its neighbors. This bottleneck causes the over-squashing… 

Figures and Tables from this paper

179 Citations



How Powerful are Graph Neural Networks?

  • Keyulu XuWeihua HuJ. LeskovecS. Jegelka
  • Computer Science


  • 2019

This work characterize the discriminative power of popular GNN variants, such as Graph Convolutional Networks and GraphSAGE, and show that they cannot learn to distinguish certain simple graph structures, and develops a simple architecture that is provably the most expressive among the class of GNNs.

Abstract: Since the proposal of the graph neural network (GNN) by Gori et al. (2005) and Scarselli et al. (2008), one of the major problems in training GNNs was their struggle to propagate information between distant nodes in the graph. We propose a new explanation for this problem: GNNs are susceptible to a bottleneck when aggregating messages across a long path. This bottleneck causes the over-squashing of exponentially growing information into fixed-size vectors. As a result, GNNs fail to propagate messages originating from distant nodes and perform poorly when the prediction task depends on long-range interaction. In this paper, we highlight the inherent problem of over-squashing in GNNs: we demonstrate that the bottleneck hinders popular GNNs from fitting long-range signals in the training data; we further show that GNNs that absorb incoming edges equally, such as GCN and GIN, are more susceptible to over-squashing than GAT and GGNN; finally, we show that prior work, which extensively tuned GNN models of long-range problems, suffers from over-squashing, and that breaking the bottleneck improves their state-of-the-art results without any tuning or additional weights. Our code is available at https://github.com/tech-srl/bottleneck/ .

On the Bottleneck of Graph Neural Networks and its Practical Implications

This is the official implementation of the paper: On the Bottleneck of Graph Neural Networks and its Practical Implications (ICLR'2021), which introduces the over-squashing problem of GNNs.

By Uri Alon and Eran Yahav. See also the [video], [poster] and [slides].

this repository is divided into three sub-projects:

  1. The subdirectory tf-gnn-samples is a clone of https://github.com/microsoft/tf-gnn-samples by Brockschmidt (ICML'2020). This project can be used to reproduce the QM9 and VarMisuse experiments of Section 4.2 and 4.2 in the paper. This sub-project depends on TensorFlow 1.13. The instructions for our clone are the same as their original code, except that reproducing our experiments (the QM9 dataset and VarMisuse) can be done by running the script tf-gnn-samples/run_qm9_benchs_fa.py or tf-gnn-samples/run_varmisuse_benchs_fa.py instead of their original scripts. For additional dependencies and instructions, see their original README: https://github.com/microsoft/tf-gnn-samples/blob/master/README.md. The main modification that we performed is using a Fully-Adjacent layer as the last GNN layer and we describe in our paper.
  2. The subdirectory gnn-comparison is a clone of https://github.com/diningphil/gnn-comparison by Errica et al. (ICLR'2020). This project can be used to reproduce the biological experiments (Section 4.3, the ENZYMES and NCI1 datasets). This sub-project depends on PyTorch 1.4 and Pytorch-Geometric. For additional dependencies and instructions, see their original README: https://github.com/diningphil/gnn-comparison/blob/master/README.md. The instructions for our clone are the same, except that we added an additional flag to every config_*.yml file, called last_layer_fa, which is set to True by default, and reproduces our experiments. The main modification that we performed is using a Fully-Adjacent layer as the last GNN layer.
  3. The main directory (in which this file resides) can be used to reproduce the experiments of Section 4.1 in the paper, for the "Tree-NeighborsMatch" problem. The rest of this README file includes the instructions for this main directory. This repository can be used to reproduce the experiments of

This project was designed to be useful in experimenting with new GNN architectures and new solutions for the over-squashing problem.

Feel free to open an issue with any questions.

The Tree-NeighborsMatch problem

On the bottleneck of graph neural networks and its practical implications



This project is based on PyTorch 1.4.0 and the PyTorch Geometric library.

  • First, install PyTorch from the official website: https://pytorch.org/.
  • Then install PyTorch Geometric: https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
  • Eventually, run the following to verify that all dependencies are satisfied:

pip install -r requirements.txt

The requirements.txt file lists the additional requirements. However, PyTorch Geometric might requires manual installation, and we thus recommend to use the requirements.txt file only afterward.

Verify that importing the dependencies goes without errors:

python -c 'import torch; import torch_geometric'


Training on large trees (depth=8) might require ~60GB of RAM and about 10GB of GPU memory. GPU memory can be compromised by using a smaller batch size and using the --accum_grad flag.

For example, instead of running:

python main.py --batch_size 1024 --type GGNN

The following uses gradient accumulation, and takes less GPU memory:

python main.py --batch_size 512 --accum_grad 2 --type GGNN

Reproducing Experiments

To run a single experiment from the paper, run:

And see the available flags. For example, to train a GGNN with depth=4, run:

python main.py --task DICTIONARY --eval_every 1000 --depth 4 --num_layers 5 --batch_size 1024 --type GGNN

To train a GNN across all depths, run one of the following:

python run-gcn-2-8.py
python run-gat-2-8.py
python run-ggnn-2-8.py
python run-gin-2-8.py


The results of running the above scripts are (Section 4.1 in the paper):

On the bottleneck of graph neural networks and its practical implications

GGNN 1.0 1.0 1.0 0.60 0.38 0.21 0.16
GAT 1.0 1.0 1.0 0.41 0.21 0.15 0.11
GIN 1.0 1.0 0.77 0.29 0.20
GCN 1.0 1.0 0.70 0.19 0.14 0.09 0.08

Experiment with other GNN types

To experiment with other GNN types:

  • Add the new GNN type to the GNN_TYPE enum here, for example: MY_NEW_TYPE = auto()
  • Add another elif self is GNN_TYPE.MY_NEW_TYPE: to instantiate the new GNN type object here
  • Use the new type as a flag for the main.py file:

python main.py --type MY_NEW_TYPE ...


If you want to cite this work, please use this bibtex entry:

    title={On the Bottleneck of Graph Neural Networks and its Practical Implications},
    author={Uri Alon and Eran Yahav},
    booktitle={International Conference on Learning Representations},

What is a bottleneck in neural network?

The bottleneck in a neural network is just a layer with fewer neurons than the layer below or above it. Having such a layer encourages the network to compress feature representations (of salient features for the target variable) to best fit in the available space.

What is over smoothing problem?

the performance of GNNs does not improve as the number of layers increases. This effect is called oversmoothing.