06 GNN3
06 GNN3
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 2
Goal: identify a specific use case and
demonstrate how GNNs and PyG can be used
to solve this problem
Output: blog post, Google colab
Example use cases
▪ Fraud detection
▪ Predicting drug interactions
▪ Friend recommendation
Check out the featured posts from our course
last year as examples of this type of project
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 3
Goal: develop a tutorial that explains how to
use existing PyG functionality
Output: blog post, Google colab
Example topics for tutorials
▪ PyG’s explainability module
▪ Methods for graph sampling (e.g., negative
sampling, sampling on heterogeneous graphs)
▪ Tutorial on GraphGym, a platform for designing
and evaluating GNNs
Check out example tutorials from PyG
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 4
Goal: implement interesting methods from a
recent research paper in graph ML
Output: PR to PyG contrib, short blog post
Project details
▪ Implementation should include comprehensive
testing and documentation on new functionality
▪ Try to build on existing PyG and PyTorch code
wherever possible
▪ Note: this project is more manageable if you are
already comfortable with PyTorch and deep
learning. We also highly recommend group of 3.
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 5
Project is worth 20% of your course grade
▪ Project proposal (2 pages), due February 7
▪ Final reports, due March 21
We recommend groups of 3, but groups of 2
are also allowed
Full project description will be released
tonight! We will provide much more detail on
each project type, examples, pointers to
datasets, tips for writing blog posts and
Google Colabs, etc.
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 6
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
J. You, R. Ying, J. Leskovec. Design Space of Graph Neural Networks, NeurIPS 2020
(2) Aggregation
GNN Layer 1
(1) Message
(3) Layer
connectivity
GNN Layer 2
(2) Aggregation
(1) Message
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 9
He et al. Deep Residual Learning for Image Recognition, CVPR 2015
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 11
Feature augmentation: constant vs. one-hot
Constant node feature One-hot node feature
1 2
1 1
1 3
1 6
1 4
1 5
Expressive power Medium. All the nodes are High. Each node has a unique ID,
identical, but GNN can still learn so node-specific information can
from the graph structure be stored
Inductive learning High. Simple to generalize to new Low. Cannot generalize to new
(Generalize to nodes: we assign constant nodes: new nodes introduce new
unseen nodes) feature to them, then apply our IDs, GNN doesn’t know how to
GNN embed unseen IDs
Computational Low. Only 1 dimensional feature High. High dimensional feature,
cost cannot apply to large graphs
Use cases Any graph, inductive settings Small graph, transductive settings
(generalize to new nodes) (no new nodes)
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 12
Why do we need feature augmentation?
(2) Certain structures are hard to learn by GNN
Example: Cycle count feature
▪ Can GNN learn the length of a cycle that 𝑣1 resides in?
▪ Unfortunately, no
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 13
𝒗𝟏 cannot differentiate which graph it resides in
▪ Because all the nodes in the graph have degree of 2
▪ The computational graphs will be the same binary tree
… 𝑣1 …
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 14
J. You, J. Gomes-Selman, R. Ying, J. Leskovec. Identity-aware Graph Neural Networks, AAAI 2021
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 15
Why do we need feature augmentation?
(2) Certain structures are hard to learn by GNN
Other commonly used augmented features:
▪ Degree distribution
▪ Clustering coefficient
▪ PageRank
▪ Centrality
▪ …
Any feature we have introduced can be used!
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 16
Motivation: Augment sparse graphs
(1) Add virtual edges
▪ Common approach: Connect 2-hop neighbors via
virtual edges
▪ Intuition: Instead of using adj. matrix 𝐴 for GNN
computation, use 𝐴 + 𝐴2
Authors Papers
A
▪ Use cases: Bipartite graphs B
▪ Author-to-papers (they authored) C
collaboration graph E
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 17
Motivation: Augment sparse graphs
(2) Add virtual nodes
▪ The virtual node will connect to all the
nodes in the graph The virtual
node
▪ Suppose in a sparse graph, two nodes have
shortest path distance of 10
▪ After adding the virtual node, all the nodes
will have a distance of 2
▪ Node A – Virtual node – Node B
▪ Benefits: Greatly improves message
passing in sparse graphs
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 18
Hamilton et al. Inductive Representation Learning on Large Graphs, NeurIPS 2017
Previously:
▪ All the nodes are used for message passing
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 20
Next time when we compute the embeddings,
we can sample different neighbors
▪ Only nodes 𝐶 and 𝐷 will pass message to 𝐴
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 21
Ying et al. Graph Convolutional Neural Networks for Web-Scale Recommender Systems, KDD 2018
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 22
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
J. You, R. Ying, J. Leskovec. Design Space of Graph Neural Networks, NeurIPS 2020
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 24
So far what we have covered
Evaluation
metrics
Input Graph Node
Graph Neural embeddings
Network
Prediction
Predictions Labels
head
Loss
function
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 25
Evaluation
metrics
Input Graph Node
Graph Neural embeddings
Network
Prediction
Predictions Labels
head
Loss
function
(1) Different prediction heads:
- Node-level tasks
- Edge-level tasks
- Graph-level tasks
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 26
Idea: Different task levels require different
prediction heads
Node-level
prediction
Graph-level
prediction
Edge-level
prediction
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 27
Node-level prediction: We can directly make
prediction using node embeddings!
After GNN computation, we have 𝑑-dim node
𝐿
embeddings: {𝐡𝑣 ∈ ℝ𝑑 , ∀𝑣 ∈ 𝐺}
Suppose we want to make 𝑘-way prediction
▪ Classification: classify among 𝑘 categories
▪ Regression: regress on 𝑘 targets
𝐿
𝐡𝑢
? 𝐡𝑣𝐿
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 29
Options for :
(1) Concatenation + Linear
▪ We have seen this in graph attention
Concatenate Linear
𝒚ෞ
𝑢𝑣
(𝑙−1) (𝑙−1)
𝐡𝑢 𝐡𝑣
𝐿 𝐿
▪𝒚ෝ𝒖𝒗 = Linear(Concat(𝐡𝑢 )) , 𝐡𝑣
▪ Here Linear(⋅) will map 2𝑑-dimensional
embeddings (since we concatenated embeddings)
to 𝑘-dim embeddings (𝑘-way prediction)
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 30
𝐿 𝐿
Options for Headedg𝑒 (𝐡𝑢 , 𝐡𝑣 ):
(2) Dot product
𝐿 𝐿
▪𝒚ෝ 𝒖𝒗 = (𝐡𝑢 )𝑇 𝐡𝑣
▪ This approach only applies to 𝟏-way prediction (e.g.,
link prediction: predict the existence of an edge)
▪ Applying to 𝒌-way prediction:
▪ Similar to multi-head attention: 𝐖 (1) , … , 𝐖 (𝑘) trainable
(𝟏) 𝐿 𝐿
𝒚𝒖𝒗 = (𝐡𝑢 )𝑇 𝐖(1) 𝐡𝑣
ෝ
…
(𝒌) 𝐿 𝐿
𝒚𝒖𝒗 = (𝐡𝑢 )𝑇 𝐖 (𝑘) 𝐡𝑣
ෝ
(𝟏) (𝒌)
𝒚𝑢𝑣 = Concat(ෝ
ෝ 𝒚𝒖𝒗 ) ∈ ℝ𝑘
𝒚𝒖𝒗 , … , ෝ
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 31
Graph-level prediction: Make prediction using
all the node embeddings in our graph
Suppose we want to make 𝑘-way prediction
Graph-level prediction
(2) Aggregation
Headgraph (⋅) is similar to
(1) Message
AGG(⋅) in a GNN layer!
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 32
K. Xu*, W. Hu*, J. Leskovec, S. Jegelka. How Powerful Are Graph Neural Networks, ICLR 2019
Options for
(1) Global mean pooling
𝐿
ෝ𝐺 = Mean({𝐡𝑣 ∈ ℝ𝑑 , ∀𝑣 ∈ 𝐺})
𝒚
(2) Global max pooling
𝐿
𝒚ෝ𝐺 = Max({𝐡𝑣 ∈ ℝ𝑑 , ∀𝑣 ∈ 𝐺})
(3) Global sum pooling
𝐿
ෝ𝐺 = Sum({𝐡𝑣 ∈ ℝ𝑑 , ∀𝑣 ∈ 𝐺})
𝒚
These options work great for small graphs
Can we do better for large graphs?
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 33
Issue: Global pooling over a (large) graph will lose
information
Toy example: we use 1-dim node embeddings
▪ Node embeddings for 𝐺1: {−1, −2, 0, 1, 2}
▪ Node embeddings for 𝐺2: {−10, −20, 0, 10, 20}
▪ Clearly 𝐺1 and 𝐺2 have very different node embeddings
→ Their structures should be different
If we do global sum pooling:
▪ Prediction for 𝐺1: 𝑦ො𝐺 = Sum −1, −2, 0, 1, 2 = 0
▪ Prediction for 𝐺2: 𝑦ො𝐺 = Sum −10, −20, 0, 10, 20 = 0
▪ We cannot differentiate 𝐺1 and 𝐺2!
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 34
A solution: Let’s aggregate all the node
embeddings hierarchically
▪ Toy example: We will aggregate via ReLU Sum ⋅
▪ We first separately aggregate the first 2 nodes and last 3 nodes
▪ Then we aggregate again to make the final prediction
▪ 𝐺1 node embeddings: {−1, −2, 0, 1, 2}
▪ Round 1: 𝑦ො𝑎 = ReLU Sum −1, −2 = 0, 𝑦ො𝑏 =
ReLU Sum 0,1, 2 =3
▪ Round 2: 𝑦ො𝐺 = ReLU Sum 𝑦𝑎 , 𝑦𝑏 =𝟑
▪ 𝐺2 node embeddings: {−10, −20, 0, 10, 20}
▪ Round 1: 𝑦ො𝑎 = ReLU Sum −10, −20 = 0, 𝑦ො𝑏 =
ReLU Sum 0,10, 20 = 30 Now we can
differentiate
▪ Round 2: 𝑦ො𝐺 = ReLU Sum 𝑦𝑎 , 𝑦𝑏 = 𝟑𝟎 𝑮𝟏 and 𝑮𝟐 !
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 35
Ying et al. Hierarchical Graph Representation Learning with Differentiable Pooling , NeurIPS 2018
DiffPool idea:
▪ Hierarchically pool node embeddings
Loss
function
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 39
Supervised learning on graphs
▪ Labels come from external sources
▪ E.g., predict drug likeness of a molecular graph
Unsupervised learning on graphs
▪ Signals come from graphs themselves
▪ E.g., link prediction: predict if two nodes are connected
Sometimes the differences are blurry
▪ We still have “supervision” in unsupervised learning
▪ E.g., train a GNN to predict node clustering coefficient
▪ An alternative name for “unsupervised” is “self-
supervised”
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 40
Supervised labels come from the specific use
cases. For example:
▪ Node labels 𝒚𝒗: in a citation network, which subject
area does a node belong to
▪ Edge labels 𝒚𝒖𝒗: in a transaction network, whether an
edge is fraudulent
▪ Graph labels 𝒚𝐺 : among molecular graphs, the drug
likeness of graphs
Advice: Reduce your task to node / edge / graph
labels, since they are easy to work with
▪ E.g., we knew some nodes form a cluster. We can treat
the cluster that a node belongs to as a node label
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 41
The problem: sometimes we only have a graph,
without any external labels
The solution: “self-supervised learning”, we can
find supervision signals within the graph.
▪ For example, we can let GNN predict the following:
▪ Node-level 𝒚𝑣 . Node statistics: such as clustering
coefficient, PageRank, …
▪ Edge-level 𝒚𝑢𝑣 . Link prediction: hide the edge
between two nodes, predict if there should be a link
▪ Graph-level 𝒚𝐺 . Graph statistics: for example, predict
if two graphs are isomorphic
▪ These tasks do not require any external labels!
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 42
Evaluation
metrics
Input Graph Node
Graph Neural embeddings
Network
Prediction
Predictions Labels
head
Loss
function
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 44
𝑖
Classification: labels 𝒚 with discrete value
▪ E.g., Node classification: which category does a
node belong to
𝑖
Regression: labels 𝒚 with continuous value
▪ E.g., predict the drug likeness of a molecular graph
GNNs can be applied to both settings
Differences: loss function & evaluation
metrics
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 45
As discussed in lecture 6, cross entropy (CE) is
a very common loss function in classification
𝐾-way prediction for 𝑖-th data point:
𝒊-th data point
𝒋-th class
Label Prediction
where:
E.g. 0 0 1 0 0
𝒚(𝑖) 𝜖 ℝ = one-hot label encoding
𝐾
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 46
For regression tasks we often use Mean Squared
Error (MSE) a.k.a. L2 loss
𝐾-way regression for data point (i):
𝒊-th data point
𝒋-th target
where:
E.g. 1.4 2.3 1.0 0.5 0.6
𝒚(𝒊) 𝜖 ℝ𝑘 = Real valued vector of targets
ෝ(𝒊) 𝜖 ℝ𝑘 = Real valued vector of predictions
𝒚
E.g. 0.9 2.8 2.0 0.3 0.8
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 47
(4) How do we measure the success of a GNN?
- Accuracy
- ROC AUC
Evaluation
metrics
Input Graph Node
Graph Neural embeddings
Network
Prediction
Predictions Labels
head
Loss
function
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 48
We use standard evaluation metrics for GNN
▪ (Content below can be found in any ML course)
▪ In practice we will use sklearn for implementation
▪ Suppose we make predictions for 𝑁 data points
Evaluate regression tasks on graphs:
▪ Root mean square error (RMSE)
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 49
Evaluate classification tasks on graphs:
(1) Multi-class classification
▪ We simply report the accuracy
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 50
Accuracy:
Precision (P):
Confusion matrix
Recall (R):
F1-Score:
TPR
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 53
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
Evaluation
metrics
Input Graph Node
Graph Neural embeddings
Network
Prediction
Predictions Labels
head
Loss
function
(5) How do we split our dataset
into train / validation / test set?
Dataset split
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 55
Fixed split: We will split our dataset once
▪ Training set: used for optimizing GNN parameters
▪ Validation set: develop model/hyperparameters
▪ Test set: held out until we report final performance
A concern: sometimes we cannot guarantee
that the test set will really be held out
Random split: we will randomly split our
dataset into training / validation / test
▪ We report average performance over different
random seeds
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 56
Suppose we want to split an image dataset
▪ Image classification: Each data point is an image
▪ Here data points are independent
▪ Image 5 will not affect our prediction on image 1
Training 2
1 3
Validation
6
Test 5 4
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 57
Splitting a graph dataset is different!
▪ Node classification: Each data point is a node
▪ Here data points are NOT independent
▪ Node 5 will affect our prediction on node 1, because it will
participate in message passing → affect node 1’s embedding
Training 2
1 3
Validation
6
Test 5 4
Training 2
1 3
Validation
6
Test 5 4
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 59
Solution 2 (Inductive setting): We break the edges
between splits to get multiple graphs
▪ Now we have 3 graphs that are independent. Node 5 will
not affect our prediction on node 1 any more
▪ At training time, we compute embeddings using the
graph over node 1&2, and train using node 1&2’s labels
▪ At validation time, we compute embeddings using the
graph over node 3&4, and evaluate on node 3&4’s labels
Training 2
1 3
Validation
6
Test 5 4
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 60
Transductive setting: training / validation / test
sets are on the same graph
▪ The dataset consists of one graph
▪ The entire graph can be observed in all dataset splits,
we only split the labels
▪ Only applicable to node / edge prediction tasks
Inductive setting: training / validation / test sets
are on different graphs
▪ The dataset consists of multiple graphs
▪ Each split can only observe the graph(s) within the split.
A successful model should generalize to unseen graphs
▪ Applicable to node / edge / graph tasks
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 61
Transductive node classification
▪ All the splits can observe the entire graph structure, but
can only observe the labels of their respective nodes
Training
Validation
Test
Inductive node classification
▪ Suppose we have a dataset of 3 graphs
▪ Each split contains an independent graph
Training
Validation
Test
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 62
Only the inductive setting is well defined for
graph classification
▪ Because we have to test on unseen graphs
▪ Suppose we have a dataset of 5 graphs. Each split
will contain independent graph(s).
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 63
Goal of link prediction: predict missing edges
Setting up link prediction is tricky:
▪ Link prediction is an unsupervised / self-supervised
task. We need to create the labels and dataset
splits on our own
▪ Concretely, we need to hide some edges from the
GNN and the let the GNN predict if the edges exist
2 2 2
1 3 1 3 1 3
?
5 4 5 4 5 4
Original graph Input graph to GNN Predictions made by GNN
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 64
2
1 3
2 7 12
1 3 6 8 11 13
𝐺1 𝐺2 𝐺3
5 4 10 9 15 14
Training set Validation set Test set
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 66
Step 2: Split edges into train / validation / test
Option 1: Inductive link prediction split
▪ Suppose we have a dataset of 3 graphs. Each
inductive split will contain an independent graph
▪ In train or val or test set, each graph will have 2
types of edges: message edges + supervision edges
▪ Supervision edges are not the input to GNN
Message 2 7 12
1 3 6 8 11
edge 13
𝐺1 𝐺2 𝐺3
Supervision
5 4 10 9 15 14
edge
Training set Validation set Test set
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 67
Option 2: Transductive link prediction split:
▪ This is the default setting when people talk about
link prediction
▪ Suppose we have a dataset of 1 graph
2
1 3
5 4
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 68
Option 2: Transductive link prediction split:
▪ By definition of “transductive”, the entire graph can
be observed in all dataset splits
▪ But since edges are both part of graph structure and the
supervision, we need to hold out validation / test edges
▪ To train the training set, we further need to hold out
supervision edges for the training set
2
1 3
5 4
5 4
The original graph
2 2 2
1 3 1 3 1 3
5 4 5 4 5 4
(1) At training time: (2) At validation time: (3) At test time:
Use training message Use training message Use training message
edges to predict training edges & training edges & training
supervision edges supervision edges to supervision edges &
predict validation edges validation edges to
predict test edges
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 70
Summary: Transductive link prediction split:
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 72
Dataset split
Evaluation
metrics
Input Graph Node
Graph Neural embeddings
Network
Prediction
Predictions Labels
head
Loss
function
Implementation resources:
DeepSNAP provides core modules for this pipeline
GraphGym further implements the full pipeline to facilitate GNN design
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 73
We introduce a general GNN framework:
▪ GNN Layer:
▪ Transformation + Aggregation
▪ Classic GNN layers: GCN, GraphSAGE, GAT
▪ Layer connectivity:
▪ The over-smoothing problem
▪ Solution: skip connections
▪ Graph Augmentation:
▪ Feature augmentation
▪ Structure augmentation
▪ Learning Objectives
▪ The full training pipeline of a GNN
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 74