09 Hetero
09 Hetero
material useful for giving your own lectures. Feel free to use these slides verbatim, or to modify
them to fit your own needs. If you make use of a significant portion of these slides in your own
lecture, please include this message, or a link to our web site: http://cs224w.Stanford.edu
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2
¡ Slide pre-viewing
We upload the slides the day before the lecture.
Please check it out!
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 3
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
¡ So far we only handle graphs with one edge
type
¡ How to handle graphs with multiple nodes or
edge types (a.k.a heterogeneous graphs)?
¡ Goal: Learning with heterogeneous graphs
§ Relational GCNs
§ Design space for heterogeneous GNNs
§ Heterogeneous Graph Transformer (Time
permitting)
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5
2 types of nodes:
¡ Node type A: Paper nodes
¡ Node type B: Author nodes
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6
2 types of edges:
¡ Edge type A: Like
¡ Edge type B: Cite
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7
A graph could have multiple types of nodes and
edges! 2 types of nodes + 2 types of edges.
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8
8 possible relation types!
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 11
¡ Example: E-Commerce Graph
§ Node types: User, Item, Query, Location, ...
§ Edge types: Purchase, Visit, Guide, Search, …
§ Different node type's features spaces can be different!
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 12
¡ Example: Academic Graph
§ Node types: Author, Paper, Venue, Field, ...
§ Edge types: Publish, Cite, …
§ Benchmark dataset: Microsoft Academic Graph
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13
¡ Observation: We can also treat types of
nodes and edges as features
§ Example: Add a one-hot indicator for nodes and
edges
§ Append feature [1, 0] to each “author node”; Append
feature [0, 1] to each “paper node”
§ Similarly, we can assign edge features to edges with
different types
§ Then, a heterogeneous graph reduces to a
standard graph
¡ When do we need a heterogeneous graph?
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14
¡ When do we need a heterogeneous graph?
§ Case 1: Different node/edge types have different
shapes of features
§ An “author node” has 4-dim feature, a “paper node” has
5-dim feature
§ Case 2: We know different relation types
represent different types of interactions
§ (English, translate, French) and (English, translate,
Chinese) require different models
¡ There are many ways to convert a
heterogeneous graph to a standard graph
(that is, a homogeneous graph)
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 15
¡ Ultimately, heterogeneous graph is a more
expressive graph representation
§ Captures different types of interactions between
entities
¡ But it also comes with costs
§ More expensive (computation, storage)
§ More complex implementation
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 16
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
Kipf and Welling. Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017
(#) #
𝐡%#()
𝐡! = 𝜎 𝐖 %
𝑁 𝑣
%∈' !
(#) #
𝐡%#() (2) Aggregation
𝐡! =𝜎 % 𝐖
𝑁 𝑣 (1) Message
%∈' !
Aggregation
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19
¡ We will extend GCN to handle heterogeneous
graphs with multiple edge/relation types
¡ We start with a directed graph with one relation
§ How do we run GCN and update the representation of
the target node A on this graph?
B
Target Node
A
C
F
D E
Input Graph
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 20
¡ We will extend GCN to handle heterogeneous
graphs with multiple edge/relation types
¡ We start with a directed graph with one relation
§ How do we run GCN and update the representation of
the target node A on this graph?
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 21
¡ What if the graph has multiple relation types?
𝑟) B
Target node 𝑟+
A
𝑟) 𝑟* C
𝑟+ 𝑟*
F
D E 𝑟)
Input graph
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 22
¡ What if the graph has multiple relation types?
¡ Use different neural network weights for
different relation types.
Weights 𝐖!! for 𝑟"
𝑟) B
Target node 𝑟+
A
Weights 𝐖!" for 𝑟#
𝑟) 𝑟* C
𝑟+ 𝑟*
F
D E 𝑟) Weights 𝐖!# for 𝑟$
Input graph
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 23
¡ What if the graph has multiple relation types?
¡ Use different neural network weights for
different relation types! AggregaQon
C
𝑟) B B
Target node 𝑟+
A F
𝑟) 𝑟* C A C
𝑟+ 𝑟*
F E
D E 𝑟) D
Input graph
Neural networks
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24
Kipf and Welling. Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017
(#) #
𝐡%#()
𝐡! = 𝜎 % 𝐖
𝑁 𝑣
%∈' !
¡ We add a self-loop
(#) #
𝐡%#() #()
𝐡! =𝜎 % 𝐖 + 𝐖 # 𝐡!
𝑁 𝑣
%∈' !
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25
¡ Introduce a set of neural networks for each
relation type!
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26
¡ Relational GCN (RGCN):
#.) 1 # (#) (#) (#)
𝐡! =𝜎 % % 𝐖/ 𝐡% + 𝐖2 𝐡!
𝑐
" !,/
/∈0 %∈'!
¡ How to write this as Message + Aggregation?
¡ Message: Normalized by node degree
§ Each neighbor of a given relation: of the relation 𝑐%,! = 𝑁%!
(%) 1 % (%)
𝐦!,# = 𝐖# 𝐡!
𝑐',#
§ Self-loop:
(%) % (%)
𝐦' = 𝐖( 𝐡'
¡ Aggregation:
§ Sum over messages from neighbors and self-loop, then apply activation
%)* % %
§ 𝐡' = 𝜎 Sum 𝐦!,# , 𝑢 ∈ 𝑁(𝑣) ∪ 𝐦'
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 27
" # $
¡ Each relation has 𝐿 matrices: 𝐖! , 𝐖! ⋯ 𝐖!
%
¡ The size of each 𝐖! is 𝑑 (%'") ×𝑑 (%) 𝑑 is the hidden (")
dimension in layer 𝑙
𝐖+ =
Limitation: only nearby
neurons/dimensions
can interact through 𝑊
6
𝐡% 6
? 𝐡!
$ $
¡ What are the options for 𝑓(𝐡1 , 𝐡2 )?
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 32
$ $
¡ Options for 𝑓(𝐡1 , 𝐡2 ):
¡ Dot product
§𝒚3𝒖𝒗 = (𝐡*% )+ 𝐡,%
§ This approach only applies to 𝟏-way prediction
(e.g., link prediction: predict the existence of an
edge)
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 33
¡ Transductive link prediction split:
2
1 3
5 4
The original graph
2 2 2
1 3 1 3 1 3
5 4 5 4 5 4
𝑟) B
𝑟+
A
𝑟) 𝑟* C
𝑟+ 𝑟*
𝒓𝟑 F
D E 𝑟)
Input Graph
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 36
¡ Training:
𝑟) B 1. Use RGCN to score the training
𝑟+ supervision edge 𝑬, 𝒓𝟑 , 𝑨
A
2. Create a negative edge by perturbing
𝑟) 𝑟* C
𝑟+ 𝑟* the supervision edge 𝑬, 𝒓𝟑 , 𝑩
𝒓𝟑 F • Corrupt the tail of 𝑬, 𝒓𝟑 , 𝑨
D E 𝑟) • e.g., 𝑬, 𝒓𝟑 , 𝑩 , 𝑬, 𝒓𝟑 , 𝑫
Input Graph
𝜎 … Sigmoid function
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 38
¡ Evaluation:
§ Validation time as an example, same at the test time
𝑟) B Evaluate how the model can predict the
𝑟+ validation edges with the relation types.
A
Let’s predict validation edge 𝑬, 𝒓𝟑 , 𝑫
𝑟) 𝑟* C
𝑟+ 𝑟+ 𝑟* Intuition: the score of 𝑬, 𝒓𝟑 , 𝑫 should be
F higher than all 𝑬, 𝒓𝟑 , 𝒗 where 𝑬, 𝒓𝟑 , 𝒗 is NOT
D
𝒓𝟑 ?
E 𝑟) in the training message edges and training
Input Graph supervision edges, e.g., 𝑬, 𝒓𝟑 , 𝑩
validation edges: 𝑬, 𝒓𝟑 , 𝑫
training message edges & training supervision
edges: all existing edges (solid lines)
¡ Benchmark dataset
§ ogbn-mag from Microsoft Academic Graph (MAG)
¡ Four (4) types of entities
§ Papers: 736k nodes
§ Authors: 1.1m nodes
§ Institutions: 9k nodes
§ Fields of study: 60k nodes
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 44
Wang et al. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 2020.
¡ Benchmark dataset
§ ogbn-mag from Microsoft Academic Graph (MAG)
¡ Four (4) directed relations
§ An author is "affiliated with" an institution
§ An author "writes" a paper
§ A paper "cites" a paper
§ A paper "has a topic of" a field of study
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 45
Wang et al. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 2020.
¡ Prediction task
§ Each paper has a 128-dimensional word2vec feature vector
§ Given the content, references, authors, and author affiliations
from ogbn-mag, predict the venue of each paper
§ 349-class classification problem due to 349 venues considered
¡ Time-based dataset splitting
§ Training set: papers published before 2018
§ Test set: papers published after 2018
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 46
Wang et al. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 2020.
¡ Benchmark results:
SOTA
R-GCN
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 47
¡ Relational GCN, a graph neural network for
heterogeneous graphs
(2) Aggregation
GNN Layer 1
(1) Message
(3) Layer
connectivity
GNN Layer 2
A
Node 𝒗
TARGET NODE B B C
A (2) Aggregation
A
C B
A C
F E
D (1) Message
F
E
D
INPUT GRAPH A
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 51
¡ (1) Heterogeneous message computation
(7) 7 789
§ Message function: = 𝐦* MSG" 𝐡*
§ Observation: A node could receive multiple types of
messages. Num of message type = Num of relation
type
§ Idea: Create a different message function for each
relation type
(,)
§𝐦+ = MSG*, 𝐡+,-. , 𝑟 = (𝑢, 𝑒, 𝑣) is the relation
type between node 𝑢 that sends the message, edge
type 𝑒 , and node 𝑣 that receive the message
(,) ,-. ,
§ Example: A Linear layer 𝐦+ = 𝐖* 𝐡+
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 52
¡ (2) Aggregation
§ Intuition: Each node will aggregate the messages from
node 𝑣’s neighbors
(,) , 7
𝐡/ = AGG 𝐦* ,𝑢 ∈ 𝑁 𝑣
§ Example: Sum(⋅), Mean(⋅) or Max(⋅) aggregator
§ 𝐡!# = Sum({𝐦%# , 𝑢 ∈ 𝑁(𝑣)})
A
A
A
C (2) Aggregation
B
A C
F E
D F
E (1) Message
D
INPUT GRAPH A
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 53
¡ (2) Heterogeneous Aggregation
§ Observation: Each node could receive multiple types of
messages from its neighbors, and multiple neighbors
may belong to each message type.
§ Idea: We can define a 2-stage message passing
(#) #
§ 𝐡! = AGG?## AGG/# 𝐦%# , 𝑢 ∈ 𝑁/ 𝑣
§ Given all the messages sent to a node
§ Within each message type, aggregate the messages
that belongs to the relation type with AGG/#
#
§ Aggregate across the edge types with AGG?##
# #
§ Example: 𝐡! = Concat Sum 𝐦% , 𝑢 ∈ 𝑁/ 𝑣
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 54
¡ (3) Layer connectivity
§ Add skip connections, pre/post-process layers
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 55
¡ Heterogeneous pre/post-process layers:
§ MLP layers with respect to each node type
§ Since the output of GNN are node embeddings
(7) (7)
§ 𝐡, = MLP +(,) (𝐡, )
§ 𝑇(𝑣) is the type of node 𝑣
¡ Other successful GNN designs are
also encouraged for heterogeneous
GNNs: skip connections, batch/layer
normalization, …
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 56
¡ Graph Feature manipulation
§ The input graph lacks features à feature
augmentation
¡ Graph Structure manipulation
§ The graph is too sparse à Add virtual nodes / edges
§ The graph is too dense à Sample neighbors when
doing message passing
§ The graph is too large à Sample subgraphs to
compute embeddings
§ Will cover later in lecture: Scaling up GNNs
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 57
¡ Graph Feature manipulation
§ 2 Common options: compute graph statistics (e.g.,
node degree) within each relation type, or across the
full graph (ignoring the relation types)
¡ Graph Structure manipulation
§ Neighbor and subgraph sampling are also common
for heterogeneous graphs.
§ 2 Common options: sampling within each relation
type (ensure neighbors from each type are covered),
or sample across the full graph
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 58
Node-level prediction:
$ (7) ($)
:𝒗 = Head3456 (𝐡2 ) = 𝐖 𝐡2
¡ 𝒚
Edge-level prediction:
$ $
:𝒖𝒗 = Head6589 (𝐡1 , 𝐡2 )=
¡ 𝒚
$ $
Linear(Concat(𝐡1 , 𝐡2 ))
Graph-level prediction:
$
:: = Head8;<=> ({𝐡2 ∈ ℝ+ , ∀𝑣 ∈ 𝐺})
¡ 𝒚
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 59
Node-level prediction:
$ 7 ($)
:𝒗 = Head3456, A(2) (𝐡2 ) = 𝐖A(2) 𝐡2
¡ 𝒚
Edge-level prediction:
$ $
:𝒖𝒗 = Head6589, ! (𝐡1 , 𝐡2 )=
¡ 𝒚
$ $
Linear! (Concat(𝐡1 , 𝐡2 ))
Graph-level prediction:
$
:
¡ 𝒚: = AGG(Head8;<=>, B ({𝐡2 ∈
ℝ+ , ∀𝑇 𝑣 = 𝑖}))
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 60
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
¡ Graph Attention Networks (GAT)
(7) (7) (789)
𝐡, = 𝜎(∑*∈; , 𝛼,* 𝐖 𝐡* )
Attention weights
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 64
Hu et al. Heterogeneous Graph Transformer. WWW 2020.
¡ Mutual Attention:
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 65
Hu et al. Heterogeneous Graph Transformer. WWW 2020.
" Q-Linear!"#$%
'()*+[-]
Write Cite
/)+[0+] %&& 1--[01, -]
!!"#$
!! K-Linear!"#$%
Paper
'()*+[-] …
…
!" %&&
!'("#$ 1--[02, -]
K-Linear&'()*% /)+[0,]
Author
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 67
Hu et al. Heterogeneous Graph Transformer. WWW 2020.
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 68
Hu et al. Heterogeneous Graph Transformer. WWW 2020.
(2) Aggregation
GNN Layer 1
(1) Message
(3) Layer
connectivity
GNN Layer 2