A Gentle Introduction To Graph Neural Network
A Gentle Introduction To Graph Neural Network
https://medium.com/comet-app/review-of-deep-learning-algorithms-for-object-detection-c1f3d437b852
Non-Euclidean data structure
Successful deep learning architectures
1. Convolutional neural networks
Sentiment analysis
Non-Euclidean data structure
Successful deep learning architectures
2. Recurrent neural network
Social Graph
3D Mesh Molecular Graph
(Facebook, Wikipedia)
𝑮𝒓𝒂𝒑𝒉 = 𝑮(𝑿, 𝑨)
X : Node, Vertex
- Individual person in a social network
- Atoms in a molecule
𝑮𝒓𝒂𝒑𝒉 = 𝑮(𝑿, 𝑨)
A : Adjacency matrix
- Edges of a graph
- Connectivity, Relationship
Donald Trump
• 72 years old
• Male
• American
• President, business man
• …
Vladimir Putin
• 65 years old
• Male
• Russian
• President
• …
Graph structure
Edge features
Aromatic bond
• Node classification
• Link prediction
(𝟎)
Input node features, 𝑯𝒊
: Raw node information
(𝑳)
Final node states, 𝑯𝒊
: How the NN recognizes the nodes
Graph features, Z
Principles of graph neural network
Updates in a graph neural network
What NN learns
(𝒍+𝟏) (𝒍) (𝒍) (𝒍)
𝑿𝒊 = 𝝈( 𝒋∈[𝒊−𝒌,𝒊+𝒌] 𝑾𝒋 𝑿𝒋 + 𝒃𝒋 )
GCN : Graph Convolutional Network
2 3
1
4
(𝒍+𝟏) 𝒍 𝒍 𝒍 𝒍 𝒍 𝒍 𝒍 𝒍
𝑯𝟐 = 𝝈 𝑯𝟏 𝑾 + 𝑯𝟐 𝑾 + 𝑯𝟑 𝑾 + 𝑯𝟒 𝑾
GCN : Graph Convolutional Network
What NN learns
(𝒍+𝟏) 𝒍 𝒍 𝒍 𝒍
𝑯𝒊 =𝝈 𝑯𝒋 𝑾 = 𝝈 𝑨𝑯𝒋 𝑾
𝒋∈𝑵 𝒊
GCN : Graph Convolutional Network
What NN learns
(𝒍+𝟏) 𝒍 𝒍
GCN : 𝑯𝒊 =𝝈 𝑯𝒋 𝑾 𝒍 = 𝝈 𝑨𝑯𝒋 𝑾 𝒍
𝒋∈𝑵 𝒊
GAT : Graph Attention Network
• Attention revisits
What NN learns
: Convolution weight and attention coefficient
(𝒍+𝟏) (𝒍) 𝒍 𝒍
GAT : 𝑯𝒊 =𝝈 𝜶𝒊𝒋 𝑯𝒋 𝑾
𝒋∈𝑵 𝒊
Velickovic, Petar, et al.
"Graph attention networks." arXiv preprint arXiv:1710.10903 (2017).
GAT : Graph Attention Network
• Attention mechanism in natural language processing
https://devblogs.nvidia.com/introduction-neural-machine-translation-gpus-part-3/figure6_sample_translations-2/
GAT : Graph Attention Network
• Attention mechanism in natural language processing
http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/
GAT : Graph Attention Network
• Attention mechanism in natural language processing
𝛽 ⋅ 𝒉𝑻𝒕 𝒉𝒔
𝑠𝑐𝑜𝑟𝑒(𝒉𝒕 , 𝒉𝒔 ) = 𝒉𝑻𝒕 𝑾𝒂 𝒉𝒔
What NN learns
: Convolution weight and attention coefficient
(𝒍+𝟏) (𝒍) 𝒍 𝒍
𝑯𝒊 =𝝈 𝜶𝒊𝒋 𝑯𝒋 𝑾 𝜶𝒊𝒋 = 𝒇(𝑯𝒊 𝑾, 𝑯𝒋 𝑾)
𝒋∈𝑵 𝒊
What NN learns
: Convolution weight and attention coefficient
(𝒍+𝟏) (𝒍) 𝒍 𝒍
𝑯𝒊 =𝝈 𝜶𝒊𝒋 𝑯𝒋 𝑾 𝜶𝒊𝒋 = 𝒇(𝑯𝒊 𝑾, 𝑯𝒋 𝑾)
𝒋∈𝑵 𝒊
𝜶𝒊𝒋 = 𝐭𝐚𝐧𝐡 𝑯𝒊 𝑾 𝑻 𝑪 𝑯𝒋 𝑾
GAT : Graph Attention Network
• Multi-head attention
• Average
𝑲
(𝒍+𝟏) 𝟏 (𝒍) 𝒍 𝒍
𝑯𝒊 =𝝈 𝜶𝒊𝒋 𝑯𝒋 𝑾
𝑲
𝒌=𝟏 𝒋∈𝑵 𝒊
• Concatenation
𝑲
(𝒍+𝟏) (𝒍) 𝒍 𝒍
𝑯𝒊 = 𝝈 𝜶𝒊𝒋 𝑯𝒋 𝑾
𝒌=𝟏 𝒋∈𝑵 𝒊
(𝑙+1) 𝑙 𝑙+1
ℎ𝑖 = 𝑈(ℎ𝑖 , 𝑚𝑖 )
(𝑙+1) 𝑙+1
𝑚𝑖 = 𝑀 (ℎ𝑖 , ℎ𝑗 , 𝑒𝑖𝑗 )
𝑗∈𝑁(𝑖)
(𝟎)
Input node features, 𝑯𝒊
: Raw node information
(𝑳)
Final node states, 𝑯𝒊
: How the NN recognizes the nodes
Graph features, Z
Readout : permutation invariance on changing nodes order
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction - Scientific Figure on ResearchGate. Available from:
https://www.researchgate.net/Graph-permutation-invariance-and-structured-prediction-A-graph-labeling-function-F-is_fig1_323217335 [accessed 8 Sep, 2018]
Readout : permutation invariance on changing nodes order
• Graph feature
(𝐿)
𝑧𝐺 = 𝑓 𝐻𝑖
• Node-wise summation
𝐿
𝑧𝐺 = 𝜏 𝑀𝐿𝑃 𝐻𝑖
𝑖∈𝐺
• Graph gathering
𝐿 0 𝐿 • 𝜏 : ReLU activation
𝑧𝐺 = 𝜏 𝜎 𝑀𝐿𝑃1 𝐻𝑖 , 𝐻𝑖 ⨀𝑀𝐿𝑃2 𝐻𝑖
𝑖∈𝐺 • 𝜎 : sigmoid activation
• Clustering
• Link prediction
• Matrix completion and recommendation
Kipf, Thomas N., and Max Welling.
"Variational graph auto-encoders." arXiv preprint arXiv:1611.07308 (2016).
https://github.com/tkipf/gae
Graph Auto-Encoders (GAE)
Encoder
Decoder
Reconstructed
Adjacency matrix, 𝐴 = 𝝈(𝒁𝒁𝑻 )
https://towardsdatascience.com/an-intuitive-guide-to-deep-network-architectures-65fdc477db41
Practical Issues
: Inception
Inception module
https://towardsdatascience.com/an-intuitive-guide-to-deep-network-architectures-65fdc477db41
Practical Issues
: Inception
(𝟏) 𝟎 (𝟐) 𝟏
𝑯𝒊 = 𝝈 𝑨𝑯𝒋 𝑾(𝟎) 𝑯𝒊 = 𝝈 𝑨𝑯𝒋 𝑾(𝟏)
Practical Issues
: Inception
Going Deeeeeeeeper!
Practical Issues
: Skip-connection
ResNet – Winner of 2015 ImageNet Challenge
(𝑙+1) (𝑙)
𝑦 = 𝐻𝑖 + 𝐻𝑖
https://towardsdatascience.com/an-intuitive-guide-to-deep-network-architectures-65fdc477db41
Practical Issues
: Dropout
Dropout rate (𝑝) = 0.25
For a dense network, the dropout of hidden state reduces the number of parameters
from 𝑁𝑤 to 𝑁𝑤 1 − 𝑝 2
Practical Issues
: Dropout
Hidden states in a graph neural network
Age 28 40 36 ⋅⋅⋅
(𝑙+1)
Graph Conv. : 𝐻𝑖 = 𝐴𝑯(𝒍) 𝑊 Sex
M F F ⋅⋅⋅
Medical
Job Student Politician ⋅⋅⋅
Doctor
Nationality Korean American French ⋅⋅⋅ Nationality Korean American French ⋅⋅⋅
Medical Medical
Job Student Politician ⋅⋅⋅ Job Student Politician ⋅⋅⋅
Doctor Doctor
• Molecular Applications
1. Neural molecular fingerprint
2. Quantitative Structure-Property Relationship (QSPR)
3. Molecular generative model
Karate club graph, colors denote communities obtained GCN embedding (with random weights)
via modularity-based clustering (Brandes et al., 2008). for nodes in the karate club network.
• All figures and descriptions are taken from Thomas N. Kipf’s blog.
• Watch video on his blog.
Kipf, Thomas N., and Max Welling.
"Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016)
http://tkipf.github.io/graph-convolutional-networks/
Network analysis
1. Node classification
• Good node features → Good node classification results
Renormalization trick
𝐼𝑁 + 𝐷−1/2 𝐴𝐷 −1/2 = 𝐷 −1/2 𝐴𝐷−1/2
• Clustering
• Link prediction
• Matrix completion and recommendation
Kipf, Thomas N., and Max Welling.
"Variational graph auto-encoders." arXiv preprint arXiv:1611.07308 (2016).
https://github.com/tkipf/gae
Network analysis
2. Link prediction
Encoder
Decoder
Reconstructed
Adjacency matrix, 𝐴 = 𝝈(𝒁𝒁𝑻 )
𝑨 = 𝑔(𝒁)
which takes pairs of node embeddings 𝒛𝒊 , 𝒛𝒋 and predicts respective entries 𝑨𝒊𝒋
Encoder
𝑈, 𝑉 = 𝑓(𝑋, 𝑀1 , … , 𝑀𝑅 ), where 𝑀𝑟 ∈ 0,1 𝑁𝑢 ×𝑁𝑣 is the adjacency matrix associated with rating type 𝑟 ∈ ℛ
Decoder
Encoder
1
𝜇𝑗→𝑖,𝑟 = 𝑊𝑥 : Message function from item 𝑗 to user i 𝑐𝑖𝑗 = 𝒩𝑖 𝒩𝑗
𝑐𝑖𝑗 𝑟 𝑗
Decoder
𝑢𝑖𝑇 𝑄𝑟 𝑣𝑗
𝑒
𝑝 𝑀𝑖𝑗 = 𝑟 = 𝑇 : probability that rating 𝑀𝑖𝑗 is 𝑟
𝑢𝑖 𝑄𝑠 𝑣𝑗
𝑠∈𝑅 𝑒
Loss function
𝑅
• 12 types of toxicity
• Molecule species (represented with SMILES)
and toxicity labels are given
• But too small to train a DL model
https://tripod.nih.gov/tox21/challenge/data.jsp
Molecular applications
1. Neural molecular fingerprint
https://mc.ai/machine-learning-for-drug-discovery-in-a-nutshell%E2%80%8A-%E2%80%8Apart-ii/
Molecular applications
1. Neural molecular fingerprint
Such molecular fingerprints can be easily obtained by open source packages, e.g.) RDKit.
http://kehang.github.io/basic_project/2017/04/18/machine-learning-in-molecular-property-prediction/
Molecular applications
1. Neural molecular fingerprint
In recent days, neural fingerprints generated by graph convolutional network is widely used for
more accurate molecular property predictions.
http://kehang.github.io/basic_project/2017/04/18/machine-learning-in-molecular-property-prediction/
Molecular applications
1. Neural molecular fingerprint
Ascending
order 𝟐
𝒁𝒊 − 𝒁𝒋
Graph features
Similar molecules are located closely
in the graph latent space
Molecular applications
3. Molecular generative model
Motivation : de novo molecular design
Segler, Marwin HS, et al. "Generating focused molecule libraries for drug discovery
with recurrent neural networks." ACS central science 4.1 (2017): 120-131.
Literatures
• Li, Y., Vinyals, O., Dyer, C., Pascanu, R., & Battaglia, P. (2018). Learning deep generative models of graphs.
arXiv preprint arXiv:1803.03324.
• Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for Molecular
Graph Generation." arXiv preprint arXiv:1802.04364 (2018).
• Constrained Graph Variational Autoencoders for Molecule Design." arXiv preprint arXiv:1805.09076
(2018).
• "GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders." arXiv preprint
arXiv:1802.03480 (2018).
• De Cao, Nicola, and Thomas Kipf. "MolGAN: An implicit generative model for small molecular graphs."
arXiv preprint arXiv:1805.11973 (2018).
• …
Molecular applications
3. Molecular generative model
1. Sample whether to add a new node of a particular type or terminate : if a node type is chosen
2. Add a node of this type to the graph
3. Check if any further edges are needed to connect the new node to the existing graph
4. If yes, select a node in the graph and add an edge connecting the new to the selected node.
• If a node is added, determine that add edges between current node and other nodes.
Graph propagation process : readout all node states and generating a graph feature
𝒉′𝒗 = 𝑓𝑛 𝒂𝒗 , 𝒉𝒗 ∀𝑣 ∈ 𝒱 𝒂𝒗 = 𝑓𝑒 𝒉𝒖 , 𝒉𝒗 , 𝒙𝒖,𝒗 ∀𝑣 ∈ 𝒱
𝑢,𝑣∈ℰ
Li, Yujia, et al.
"Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).
Molecular applications
3. Molecular generative model
(𝑻)
𝑓𝑎𝑑𝑑𝑒𝑑𝑔𝑒 𝐺, 𝑣 = 𝜎 𝑓𝑎𝑒 𝒉𝑮 , 𝒉𝒗 : the probability of adding an edge to the newly created node 𝑣.
𝑻 (𝑻)
𝑓𝑛𝑜𝑑𝑒𝑠 𝐺, 𝑣 = softmax 𝑓𝑠 𝒉𝒖 , 𝒉𝒗 : Score for each node to
If “yes” connect the edges
Li, Yujia, et al.
"Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).
Molecular applications
3. Molecular generative model
Objective function
𝑝(𝐺, 𝜋)
𝑝(𝐺) : marginal likelihood 𝑝 𝐺 = 𝑝(𝐺, 𝜋) = 𝔼𝑞 𝜋|𝐺
permutation 𝑞 𝜋|𝐺
𝜋∈𝒫(𝐺)
• Interacting systems
• Nodes : particles, Edges : (physical) interaction between particle pairs
• Latent code : the underlying interaction graph
𝑒 → 𝑣 ∶ 𝒉𝒍+𝟏
𝒋 = 𝑓𝑣𝑙 𝒉𝒍 𝒊,𝒋 , 𝒙𝒋 𝒙𝒊 : initial node features, 𝒙 𝒊,𝒋 : initial edge features
𝑖∈𝒩𝑗
𝒉𝟏(𝒊,𝒋) = 𝑓𝑒1 𝒉𝟏𝒊 , 𝒉𝟏𝒋 𝒉𝟐𝒋 = 𝑓𝑣1 𝒉𝟏(𝒊,𝒋) 𝒉𝟐(𝒊,𝒋) = 𝑓𝑒2 𝒉𝟐𝒊 , 𝒉𝟐𝒋
𝑖≠𝑗
2
𝑞𝜙 𝑧𝑖𝑗 𝑥 = softmax 𝑓𝑒𝑛𝑐,𝜙 𝑥 𝑖𝑗,1:𝐾 = softmax ℎ(𝑖,𝑗)
𝑝𝜃 𝒙|𝒛 = 𝑇
𝑡=1 𝑝𝜃 𝒙𝒕+𝟏 |𝒙𝒕 , … , 𝒙𝟏 |𝒛 = 𝑇
𝑡=1 𝑝𝜃 𝒙𝒕+𝟏 |𝒙𝒕 |𝒛 , since the dynamics is Markovian.
𝑝 𝒛 = 𝒊≠𝒋 𝑝𝜃 𝒛𝒊𝒋 , the prior is a factorized uniform distribution over edge types
• Bronstein, Michael M., et al. "Geometric deep learning: going beyond euclidean data." IEEE Signal Processing
Magazine 34.4 (2017): 18-42.
• [NIPS 2017] Tutorial - Geometric deep learning on graphs and manifolds,
https://nips.cc/Conferences/2017/Schedule?showEvent=8735
• Goyal, Palash, and Emilio Ferrara. "Graph embedding techniques, applications, and performance: A survey."
Knowledge-Based Systems 151 (2018): 78-94., https://github.com/palash1992/GEM
• Battaglia, Peter W., et al. "Relational inductive biases, deep learning, and graph networks." arXiv preprint
arXiv:1806.01261 (2018).
• Awesome Graph Embedding And Representation Learning Papers,
https://github.com/benedekrozemberczki/awesome-graph-embedding
https://github.com/SeongokRyu/Graph-neural-networks
References
• Defferrard, Michaël, Xavier Bresson, and Pierre Vandergheynst. "Convolutional neural networks on graphs with fast
localized spectral filtering." Advances in Neural Information Processing Systems. 2016.
• Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." arXiv preprint
arXiv:1609.02907 (2016).
• van den Berg, Rianne, Thomas N. Kipf, and Max Welling. "Graph Convolutional Matrix Completion." stat 1050 (2017): 7.
• Schlichtkrull, Michael, et al. "Modeling relational data with graph convolutional networks." European Semantic Web
Conference. Springer, Cham, 2018.
• Levie, Ron, et al. "Cayleynets: Graph convolutional neural networks with complex rational spectral filters." arXiv preprint
arXiv:1705.07664 (2017).
https://github.com/SeongokRyu/Graph-neural-networks
References
• Velickovic, Petar, et al. "Graph attention networks." arXiv preprint arXiv:1710.10903 (2017).
• GRAM: Graph-based Attention Model for Healthcare Representation Learning
• Shang, C., Liu, Q., Chen, K. S., Sun, J., Lu, J., Yi, J., & Bi, J. (2018). Edge Attention-based Multi-Relational Graph
Convolutional Networks. arXiv preprint arXiv:1802.04944.
• Lee, John Boaz, et al. "Attention Models in Graphs: A Survey." arXiv preprint arXiv:1807.07984 (2018).
• Li, Yujia, et al. "Gated graph sequence neural networks." arXiv preprint arXiv:1511.05493 (2015).
• Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural message passing for quantum
chemistry. arXiv preprint arXiv:1704.01212.
https://github.com/SeongokRyu/Graph-neural-networks
References
• Kipf, Thomas N., and Max Welling. "Variational graph auto-encoders." arXiv preprint arXiv:1611.07308 (2016).
• Simonovsky, Martin, and Nikos Komodakis. "GraphVAE: Towards Generation of Small Graphs Using Variational
Autoencoders." arXiv preprint arXiv:1802.03480 (2018).
• Liu, Qi, et al. "Constrained Graph Variational Autoencoders for Molecule Design." arXiv preprint arXiv:1805.09076
(2018).
• Pan, Shirui, et al. "Adversarially Regularized Graph Autoencoder." arXiv preprint arXiv:1802.04407 (2018).
• Li, Y., Vinyals, O., Dyer, C., Pascanu, R., & Battaglia, P. (2018). Learning deep generative models of graphs. arXiv
preprint arXiv:1803.03324.
• Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for Molecular Graph
Generation." arXiv preprint arXiv:1802.04364 (2018).
• Liu, Qi, et al. "Constrained Graph Variational Autoencoders for Molecule Design." arXiv preprint arXiv:1805.09076
(2018).
https://github.com/SeongokRyu/Graph-neural-networks
References
Applications of GNN
• Duvenaud, David K., et al. "Convolutional networks on graphs for learning molecular fingerprints." Advances in
neural information processing systems. 2015.
• Kearnes, Steven, et al. "Molecular graph convolutions: moving beyond fingerprints." Journal of computer-aided
molecular design 30.8 (2016): 595-608.
• Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R., & Tkatchenko, A. (2017). Quantum-chemical insights from
deep tensor neural networks. Nature communications, 8, 13890.
• Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Geniesse, C., Pappu, A. S., ... & Pande, V. (2018). MoleculeNet: a
benchmark for molecular machine learning. Chemical Science, 9(2), 513-530.
• Feinberg, Evan N., et al. "Spatial Graph Convolutions for Drug Discovery." arXiv preprint arXiv:1803.04465 (2018).
https://github.com/SeongokRyu/Graph-neural-networks
References
Applications of GNN
• Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for Molecular Graph
Generation." arXiv preprint arXiv:1802.04364 (2018).
• Liu, Qi, et al. "Constrained Graph Variational Autoencoders for Molecule Design." arXiv preprint arXiv:1805.09076
(2018).
• De Cao, Nicola, and Thomas Kipf. "MolGAN: An implicit generative model for small molecular graphs." arXiv preprint
arXiv:1805.11973 (2018).
• Selvan, Raghavendra, et al. "Extraction of Airways using Graph Neural Networks." arXiv preprint arXiv:1804.04436
(2018).
• Kipf, Thomas, et al. "Neural relational inference for interacting systems." arXiv preprint arXiv:1802.04687 (2018).
https://github.com/SeongokRyu/Graph-neural-networks