0% found this document useful (0 votes)

11 views72 pages

09 Hetero

Uploaded by

laijiahao0430

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views72 pages

09 Hetero

Uploaded by

laijiahao0430

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

Note to other teachers and users of these slides: We would be delighted if you found our

material useful for giving your own lectures. Feel free to use these slides verbatim, or to modify
them to fit your own needs. If you make use of a significant portion of these slides in your own
lecture, please include this message, or a link to our web site: http://cs224w.Stanford.edu

CS224W: Machine Learning with Graphs

Charilaos Kanatsoulis and Jure Leskovec, Stanford
University
http://cs224w.stanford.edu
¡ Project Proposal due today
§ Gradescope submissions close at 11:59 PM
¡ Colab 2 due this Thursday
¡ Homework 2: UPDATED + NEW DUE DATE
§ HW2 Problem 4 has been removed
§ Updated Due Date: Monday Nov 4th, 2024

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2
¡ Slide pre-viewing
We upload the slides the day before the lecture.
Please check it out!

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 3
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
¡ So far we only handle graphs with one edge
type
¡ How to handle graphs with multiple nodes or
edge types (a.k.a heterogeneous graphs)?
¡ Goal: Learning with heterogeneous graphs
§ Relational GCNs
§ Design space for heterogeneous GNNs
§ Heterogeneous Graph Transformer (Time
permitting)

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5
2 types of nodes:
¡ Node type A: Paper nodes
¡ Node type B: Author nodes
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6
2 types of edges:
¡ Edge type A: Like
¡ Edge type B: Cite
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7
A graph could have multiple types of nodes and
edges! 2 types of nodes + 2 types of edges.

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8
8 possible relation types!

(Paper, Cite, Paper) (Author, Cite, Author)

(Paper, Like, Paper) (Author, Like, Author)

(Paper, Cite, Author) (Author, Cite, Paper)

(Paper, Like, Author) (Author, Like, Paper)

Relation types: (node_start, edge, node_end)

¡ We use relation type to describe an edge (as
opposed to edge type)
¡ Relation type better captures the interaction
between nodes and edges
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9
¡ A heterogeneous graph is defined as
𝑮 = 𝑽, 𝑬, 𝜏, 𝜙
§ Nodes with node types 𝑣 ∈ 𝑉
§ Node type for node 𝑣: 𝜏 𝑣
An edge can be
§ Edges with edge types (𝑢, 𝑣) ∈ 𝐸 described as a
pair of nodes
§ Edge type for edge (𝑢, 𝑣): 𝜙 𝑢, 𝑣
§ Relation type for edge 𝑒 is a tuple: 𝑟 𝑢, 𝑣 =
(𝜏 𝑢 , 𝜙 𝑢, 𝑣 , 𝜏(𝑣))
¡ There are other definitions for heterogeneous graphs
as well – describe graphs with node & edge types
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10
Biomedical Knowledge Graphs Event Graphs
Example node: Migraine Example node: SFO
Example relation: (fulvestrant, Example relation: (UA689, Origin,
Treats, Breast Neoplasms) LAX)
Example node type: Protein Example node type: Flight
Example edge type: Causes Example edge type: Destination

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 11
¡ Example: E-Commerce Graph
§ Node types: User, Item, Query, Location, ...
§ Edge types: Purchase, Visit, Guide, Search, …
§ Different node type's features spaces can be different!

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 12
¡ Example: Academic Graph
§ Node types: Author, Paper, Venue, Field, ...
§ Edge types: Publish, Cite, …
§ Benchmark dataset: Microsoft Academic Graph

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13
¡ Observation: We can also treat types of
nodes and edges as features
§ Example: Add a one-hot indicator for nodes and
edges
§ Append feature [1, 0] to each “author node”; Append
feature [0, 1] to each “paper node”
§ Similarly, we can assign edge features to edges with
different types
§ Then, a heterogeneous graph reduces to a
standard graph
¡ When do we need a heterogeneous graph?
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14
¡ When do we need a heterogeneous graph?
§ Case 1: Different node/edge types have different
shapes of features
§ An “author node” has 4-dim feature, a “paper node” has
5-dim feature
§ Case 2: We know different relation types
represent different types of interactions
§ (English, translate, French) and (English, translate,
Chinese) require different models
¡ There are many ways to convert a
heterogeneous graph to a standard graph
(that is, a homogeneous graph)
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 15
¡ Ultimately, heterogeneous graph is a more
expressive graph representation
§ Captures different types of interactions between
entities
¡ But it also comes with costs
§ More expensive (computation, storage)
§ More complex implementation

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 16
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
Kipf and Welling. Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017

¡ (1) Graph Convolutional Networks (GCN)

(#) #
𝐡%#()
𝐡! = 𝜎 𝐖 %
𝑁 𝑣
%∈' !

¡ How to write this as Message + Aggregation?

Message

(#) #
𝐡%#() (2) Aggregation
𝐡! =𝜎 % 𝐖
𝑁 𝑣 (1) Message
%∈' !
Aggregation
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19
¡ We will extend GCN to handle heterogeneous
graphs with multiple edge/relation types
¡ We start with a directed graph with one relation
§ How do we run GCN and update the representation of
the target node A on this graph?

B
Target Node
A
C

F
D E
Input Graph

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 20
¡ We will extend GCN to handle heterogeneous
graphs with multiple edge/relation types
¡ We start with a directed graph with one relation
§ How do we run GCN and update the representation of
the target node A on this graph?

B Only pass messages C

Target Node along direction of edges B
A
C F
A C
F
D E E
D
Input Graph

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 21
¡ What if the graph has multiple relation types?

𝑟) B
Target node 𝑟+
A
𝑟) 𝑟* C
𝑟+ 𝑟*
F
D E 𝑟)

Input graph

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 22
¡ What if the graph has multiple relation types?
¡ Use different neural network weights for
different relation types.
Weights 𝐖!! for 𝑟"
𝑟) B
Target node 𝑟+
A
Weights 𝐖!" for 𝑟#
𝑟) 𝑟* C
𝑟+ 𝑟*
F
D E 𝑟) Weights 𝐖!# for 𝑟$

Input graph

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 23
¡ What if the graph has multiple relation types?
¡ Use different neural network weights for
different relation types! AggregaQon
C
𝑟) B B
Target node 𝑟+
A F
𝑟) 𝑟* C A C
𝑟+ 𝑟*
F E
D E 𝑟) D

Input graph

Neural networks
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24
Kipf and Welling. Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017

¡ (1) Graph Convolutional Networks (GCN)

(#) #
𝐡%#()
𝐡! = 𝜎 % 𝐖
𝑁 𝑣
%∈' !

¡ We add a self-loop

(#) #
𝐡%#() #()
𝐡! =𝜎 % 𝐖 + 𝐖 # 𝐡!
𝑁 𝑣
%∈' !

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25
¡ Introduce a set of neural networks for each
relation type!

Weight for rel_1

…
…
Weight for rel_N

Weight for self-loop

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26
¡ Relational GCN (RGCN):
#.) 1 # (#) (#) (#)
𝐡! =𝜎 % % 𝐖/ 𝐡% + 𝐖2 𝐡!
𝑐
" !,/
/∈0 %∈'!
¡ How to write this as Message + Aggregation?
¡ Message: Normalized by node degree
§ Each neighbor of a given relation: of the relation 𝑐%,! = 𝑁%!
(%) 1 % (%)
𝐦!,# = 𝐖# 𝐡!
𝑐',#
§ Self-loop:
(%) % (%)
𝐦' = 𝐖( 𝐡'
¡ Aggregation:
§ Sum over messages from neighbors and self-loop, then apply activation
%)* % %
§ 𝐡' = 𝜎 Sum 𝐦!,# , 𝑢 ∈ 𝑁(𝑣) ∪ 𝐦'

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 27
" # $
¡ Each relation has 𝐿 matrices: 𝐖! , 𝐖! ⋯ 𝐖!
%
¡ The size of each 𝐖! is 𝑑 (%'") ×𝑑 (%) 𝑑 is the hidden (")

dimension in layer 𝑙

¡ Rapid growth of the number of parameters w.r.t

number of relations!
§ Overfitting becomes an issue
(𝒍)
¡ Two methods to regularize the weights 𝐖𝒓
§ (1) Use block diagonal matrices
§ (2) Basis/Dictionary learning
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 28
¡ Key insight: make the weights sparse!
¡ Use block diagonal matrices for 𝐖!

𝐖+ =
Limitation: only nearby
neurons/dimensions
can interact through 𝑊

¡ If use 𝐵 blocks, then # param reduces from

+ !"# +!
𝑑(%'") ×𝑑(%) to 𝐵× ×
, ,
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 29
¡ Key insight: Share weights across different
relations!
¡ Represent the matrix of each relation as a linear
combination of basis transformations
𝐖! = ∑,-." 𝑎!- ⋅ 𝐕- , where 𝐕- is shared across
all relations
§ 𝐕! are the basis matrices
§ 𝑎"! is the importance weight of matrix 𝐕!
,
¡ Now each relation only needs to learn 𝑎!- -." ,
which is 𝐵 scalars
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 30
¡ Goal: Predict the label of a given node
¡ RGCN uses the representation of the final layer:
§ If we predict the class of node 𝑨 from 𝒌 classes
(%)
§ Take the final layer (prediction head): ∈ ℝ' , 𝐡#
(%)
each item in 𝐡# represents the probability of that
class
𝑟) B
Target Node 𝑟+
A
𝑟) 𝑟* C
𝑟+ 𝑟*
F
D E 𝑟)
Input Graph
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 31
¡ Link prediction: Make prediction using pairs of
node embeddings
$ $
:
¡ 𝒚𝒖𝒗 = 𝑓(𝐡1 , 𝐡2 )

6
𝐡% 6
? 𝐡!

$ $
¡ What are the options for 𝑓(𝐡1 , 𝐡2 )?

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 32
$ $
¡ Options for 𝑓(𝐡1 , 𝐡2 ):
¡ Dot product
§𝒚3𝒖𝒗 = (𝐡*% )+ 𝐡,%
§ This approach only applies to 𝟏-way prediction
(e.g., link prediction: predict the existence of an
edge)

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 33
¡ Transductive link prediction split:
2
1 3

5 4
The original graph

2 2 2
1 3 1 3 1 3

5 4 5 4 5 4

(1) At training time: (2) At validation time: (3) At test time:

Use training message Use training message Use training message
edges to predict training edges & training edges & training
supervision edges supervision edges to supervision edges &
predict validation edges validation edges to
predict test edges
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 34
¡ Link prediction split: Every edge also has a
2 1
2 relation type, this is
1 3 3
Split independent of the 4
categories.
5 4 5 4
In a heterogeneous
The original graph Split Graph with 4 graph, the homogeneous
categories of edges graphs formed by every
Training message edges for 𝒓𝟏 single relation also have
Training supervision edges for 𝒓𝟏 the 4 splits.
Validation edges for 𝒓𝟏
Test edges for 𝒓𝟏
Training message edges
…..

Training supervision edges

Validation edges
Training message edges for 𝒓𝒏 Test edges
Training supervision edges for 𝒓𝒏
Validation edges for 𝒓𝒏
Test edges for 𝒓𝒏
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 35
¡ Assume 𝑬, 𝒓𝟑 , 𝑨 is training supervision edge,
all the other edges are training message edges
¡ Use RGCN to score 𝑬, 𝒓𝟑 , 𝑨 !
% (%)
§ Take the final layer of 𝐸 and 𝐴: 𝐡$ and 𝐡& ∈ ℝ)
§ Relation-specific score function 𝑓* : ℝ) ×ℝ) → ℝ
§ One example 𝑓#) 𝐡. , 𝐡0 = 𝐡1. 𝐖#) 𝐡0 , 𝐖#) ∈ ℝ2×2

𝑟) B
𝑟+
A
𝑟) 𝑟* C
𝑟+ 𝑟*
𝒓𝟑 F
D E 𝑟)
Input Graph
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 36
¡ Training:
𝑟) B 1. Use RGCN to score the training
𝑟+ supervision edge 𝑬, 𝒓𝟑 , 𝑨
A
2. Create a negative edge by perturbing
𝑟) 𝑟* C
𝑟+ 𝑟* the supervision edge 𝑬, 𝒓𝟑 , 𝑩
𝒓𝟑 F • Corrupt the tail of 𝑬, 𝒓𝟑 , 𝑨
D E 𝑟) • e.g., 𝑬, 𝒓𝟑 , 𝑩 , 𝑬, 𝒓𝟑 , 𝑫
Input Graph

Note the negative edges should NOT

training supervision edges: 𝑬, 𝒓𝟑 , 𝑨 belong to training message edges or
training message edges: all the rest training supervision edges!
existing edges (solid lines) e.g., 𝑬, 𝒓𝟑 , 𝑪 is NOT a negative edge

(1) Use training message edges to

predict training supervision edges
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 37
¡ Training:
1. Use RGCN to score the training
𝑟) B supervision edge 𝑬, 𝒓𝟑 , 𝑨
𝑟+
A 2. Create a negative edge by perturbing
𝑟* C the supervision edge 𝑬, 𝒓𝟑 , 𝑩
𝑟) 𝑟+ 𝑟*
𝒓𝟑 3. Use GNN model to score negative edge
F
D E 𝑟) 4. Optimize a standard cross entropy loss
Input Graph (as discussed in Lecture 6)
1. Maximize the score of training supervision edge
2. Minimize the score of negative edge

ℓ = − log 𝜎 𝑓"! ℎ- , ℎ# − log(1 − 𝜎 𝑓"! (ℎ- , ℎ. ))

𝜎 … Sigmoid function
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 38
¡ Evaluation:
§ Validation time as an example, same at the test time
𝑟) B Evaluate how the model can predict the
𝑟+ validation edges with the relation types.
A
Let’s predict validation edge 𝑬, 𝒓𝟑 , 𝑫
𝑟) 𝑟* C
𝑟+ 𝑟+ 𝑟* Intuition: the score of 𝑬, 𝒓𝟑 , 𝑫 should be
F higher than all 𝑬, 𝒓𝟑 , 𝒗 where 𝑬, 𝒓𝟑 , 𝒗 is NOT
D
𝒓𝟑 ?
E 𝑟) in the training message edges and training
Input Graph supervision edges, e.g., 𝑬, 𝒓𝟑 , 𝑩
validation edges: 𝑬, 𝒓𝟑 , 𝑫
training message edges & training supervision
edges: all existing edges (solid lines)

(2) At validation time:

Use training message edges & training
supervision edges to predict validation edges
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 39
¡ Evaluation:
§ Validation time as an example, same at the test time
𝑟) B Evaluate how the model can predict the
𝑟+ validation edges with the relation types.
A
Let’s predict validation edge 𝑬, 𝒓𝟑 , 𝑫
𝑟) 𝑟* C
𝑟+ 𝑟+ 𝑟* Intuition: the score of 𝑬, 𝒓𝟑 , 𝑫 should be
F higher than all 𝑬, 𝒓𝟑 , 𝒗 where 𝑬, 𝒓𝟑 , 𝒗 is NOT
D
𝒓𝟑 ?
E 𝑟) in the training message edges and training
Input Graph supervision edges, e.g., 𝑬, 𝒓𝟑 , 𝑩
1. Calculate the score of 𝑬, 𝒓𝟑 , 𝑫
2. Calculate the score of all the negative edges: 𝑬, 𝒓𝟑 , 𝒗 𝒗 ∈ 𝑩, 𝑭 , since 𝑬, 𝒓𝟑 , 𝑨 ,
𝑬, 𝒓𝟑 , 𝑪 belong to training message edges & training supervision edges
3. Obtain the ranking 𝑹𝑲 of 𝑬, 𝒓𝟑 , 𝑫 .
4. Calculate metrics
1. Hits@𝒌: 𝟏 𝑹𝑲 ≤ 𝒌 . Higher is better
𝟏
2. Reciprocal Rank: . Higher is better
𝑹𝑲
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 40
¡ Evaluation:
§ Validation time as an example, same at the test time
𝑟) B
𝑟+
A 𝒇𝒓𝟑 𝑬, 𝒓𝟑 , 𝑫 𝑹𝑲 of 𝑬, 𝒓𝟑 , 𝑫 : 𝟏
𝑟) 𝑟* C 𝒇𝒓𝟑 𝑬, 𝒓𝟑 , 𝑩 1. Hits@𝟐 = 𝟏
𝑟+ 𝑟+ 𝑟* 2. Reciprocal Rank:
𝟏
𝒇𝒓𝟑 𝑬, 𝒓𝟑 , 𝑭 𝟏
F
D
𝒓𝟑 ?
E 𝑟)
Input Graph
1. Calculate the score of 𝑬, 𝒓𝟑 , 𝑫
2. Calculate the score of all the negative edges: 𝑬, 𝒓𝟑 , 𝒗 𝒗 ∈ 𝑩, 𝑭 , since 𝑬, 𝒓𝟑 , 𝑨 ,
𝑬, 𝒓𝟑 , 𝑪 belong to training message edges & training supervision edges
3. Obtain the ranking 𝑹𝑲 of 𝑬, 𝒓𝟑 , 𝑫 .
4. Calculate metrics
1. Hits@𝒌: 𝟏 𝑹𝑲 ≤ 𝒌 . Higher is better
𝟏
2. Reciprocal Rank: . Higher is better
𝑹𝑲
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 41
¡ Evaluation:
§ Validation time as an example, same at the test time
𝑟) B
𝑟+
A 𝒇𝒓𝟑 𝑬, 𝒓𝟑 , 𝑩 𝑹𝑲 of 𝑬, 𝒓𝟑 , 𝑫 : 𝟐
𝑟) 𝑟* C 𝒇𝒓𝟑 𝑬, 𝒓𝟑 , 𝑫 1. Hits@𝟐 = 𝟏
𝑟+ 𝑟+ 𝑟* 2. Reciprocal Rank:
𝟏
𝒇𝒓𝟑 𝑬, 𝒓𝟑 , 𝑭 𝟐
F
D
𝒓𝟑 ?
E 𝑟)
Input Graph
1. Calculate the score of 𝑬, 𝒓𝟑 , 𝑫
2. Calculate the score of all the negative edges: 𝑬, 𝒓𝟑 , 𝒗 𝒗 ∈ 𝑩, 𝑭 , since 𝑬, 𝒓𝟑 , 𝑨 ,
𝑬, 𝒓𝟑 , 𝑪 belong to training message edges & training supervision edges
3. Obtain the ranking 𝑹𝑲 of 𝑬, 𝒓𝟑 , 𝑫 .
4. Calculate metrics
1. Hits@𝒌: 𝟏 𝑹𝑲 ≤ 𝒌 . Higher is better
𝟏
2. Reciprocal Rank: . Higher is better
𝑹𝑲
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 42
¡ Evaluation:
§ Validation time as an example, same at the test time
𝑟) B
𝑟+
A 𝒇𝒓𝟑 𝑬, 𝒓𝟑 , 𝑩 𝑹𝑲 of 𝑬, 𝒓𝟑 , 𝑫 : 𝟑
𝑟) 𝑟* C 𝒇𝒓𝟑 𝑬, 𝒓𝟑 , 𝑭 1. Hits@𝟐 = 𝟎
𝑟+ 𝑟+ 𝑟* 2. Reciprocal Rank:
𝟏
𝒇𝒓𝟑 𝑬, 𝒓𝟑 , 𝑫 𝟑
F
D
𝒓𝟑 ?
E 𝑟)
Input Graph
1. Calculate the score of 𝑬, 𝒓𝟑 , 𝑫
2. Calculate the score of all the negative edges: 𝑬, 𝒓𝟑 , 𝒗 𝒗 ∈ 𝑩, 𝑭 , since 𝑬, 𝒓𝟑 , 𝑨 ,
𝑬, 𝒓𝟑 , 𝑪 belong to training message edges & training supervision edges
3. Obtain the ranking 𝑹𝑲 of 𝑬, 𝒓𝟑 , 𝑫 .
4. Calculate metrics
1. Hits@𝒌: 𝟏 𝑹𝑲 ≤ 𝒌 . Higher is better
𝟏
2. Reciprocal Rank: . Higher is better
𝑹𝑲
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 43
Wang et al. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 2020.

¡ Benchmark dataset
§ ogbn-mag from Microsoft Academic Graph (MAG)
¡ Four (4) types of entities
§ Papers: 736k nodes
§ Authors: 1.1m nodes
§ Institutions: 9k nodes
§ Fields of study: 60k nodes

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 44
Wang et al. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 2020.

¡ Benchmark dataset
§ ogbn-mag from Microsoft Academic Graph (MAG)
¡ Four (4) directed relations
§ An author is "affiliated with" an institution
§ An author "writes" a paper
§ A paper "cites" a paper
§ A paper "has a topic of" a field of study

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 45
Wang et al. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 2020.

¡ Prediction task
§ Each paper has a 128-dimensional word2vec feature vector
§ Given the content, references, authors, and author affiliations
from ogbn-mag, predict the venue of each paper
§ 349-class classification problem due to 349 venues considered
¡ Time-based dataset splitting
§ Training set: papers published before 2018
§ Test set: papers published after 2018

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 46
Wang et al. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 2020.

¡ Benchmark results:

SOTA

R-GCN

§ SOTA method: SeHGNN

§ ComplEx (Next lecture) + Simplified GCN

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 47
¡ Relational GCN, a graph neural network for
heterogeneous graphs

¡ Can perform entity classification as well as

link prediction tasks.

¡ Ideas can easily be extended into RGNN

(RGraphSAGE, RGIN, RGAT, etc.)

¡ Benchmark: ogbn-mag from Microsoft

Academic Graph, to predict paper venues
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 48
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
J. You, R. Ying, J. Leskovec. Design Space of Graph Neural Networks, NeurIPS 2020

How do we extend the general GNN design

space to heterogneous graphs?
(5) Learning objective

(2) Aggregation
GNN Layer 1
(1) Message
(3) Layer
connectivity

GNN Layer 2

(4) Graph augmentation

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 50
¡ (1) Message computation
(7) 7 789
§ Message function: 𝐦* = MSG 𝐡*
§ Intuition: Each node will create a message, which will be
sent to other nodes later
(#) #()
§ Example: A Linear layer 𝐦% = 𝐖 # 𝐡%

A
Node 𝒗
TARGET NODE B B C

A (2) Aggregation
A
C B
A C
F E
D (1) Message
F
E
D
INPUT GRAPH A

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 51
¡ (1) Heterogeneous message computation
(7) 7 789
§ Message function: = 𝐦* MSG" 𝐡*
§ Observation: A node could receive multiple types of
messages. Num of message type = Num of relation
type
§ Idea: Create a different message function for each
relation type
(,)
§𝐦+ = MSG*, 𝐡+,-. , 𝑟 = (𝑢, 𝑒, 𝑣) is the relation
type between node 𝑢 that sends the message, edge
type 𝑒 , and node 𝑣 that receive the message
(,) ,-. ,
§ Example: A Linear layer 𝐦+ = 𝐖* 𝐡+
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 52
¡ (2) Aggregation
§ Intuition: Each node will aggregate the messages from
node 𝑣’s neighbors
(,) , 7
𝐡/ = AGG 𝐦* ,𝑢 ∈ 𝑁 𝑣
§ Example: Sum(⋅), Mean(⋅) or Max(⋅) aggregator
§ 𝐡!# = Sum({𝐦%# , 𝑢 ∈ 𝑁(𝑣)})
A

TARGET NODE B Node 𝒗 B C

A
A
C (2) Aggregation
B
A C
F E
D F
E (1) Message
D
INPUT GRAPH A

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 53
¡ (2) Heterogeneous Aggregation
§ Observation: Each node could receive multiple types of
messages from its neighbors, and multiple neighbors
may belong to each message type.
§ Idea: We can define a 2-stage message passing

(#) #
§ 𝐡! = AGG?## AGG/# 𝐦%# , 𝑢 ∈ 𝑁/ 𝑣
§ Given all the messages sent to a node
§ Within each message type, aggregate the messages
that belongs to the relation type with AGG/#
#
§ Aggregate across the edge types with AGG?##
# #
§ Example: 𝐡! = Concat Sum 𝐦% , 𝑢 ∈ 𝑁/ 𝑣
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 54
¡ (3) Layer connectivity
§ Add skip connections, pre/post-process layers

Pre-processing layers: Important when

encoding node features is necessary.
E.g., when nodes represent images/text

Post-processing layers: Important when

reasoning / transformation over node
embeddings are needed
E.g., graph classification, knowledge graphs

In practice, adding these layers works great!

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 55
¡ Heterogeneous pre/post-process layers:
§ MLP layers with respect to each node type
§ Since the output of GNN are node embeddings
(7) (7)
§ 𝐡, = MLP +(,) (𝐡, )
§ 𝑇(𝑣) is the type of node 𝑣
¡ Other successful GNN designs are
also encouraged for heterogeneous
GNNs: skip connections, batch/layer
normalization, …

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 56
¡ Graph Feature manipulation
§ The input graph lacks features à feature
augmentation
¡ Graph Structure manipulation
§ The graph is too sparse à Add virtual nodes / edges
§ The graph is too dense à Sample neighbors when
doing message passing
§ The graph is too large à Sample subgraphs to
compute embeddings
§ Will cover later in lecture: Scaling up GNNs

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 57
¡ Graph Feature manipulation
§ 2 Common options: compute graph statistics (e.g.,
node degree) within each relation type, or across the
full graph (ignoring the relation types)
¡ Graph Structure manipulation
§ Neighbor and subgraph sampling are also common
for heterogeneous graphs.
§ 2 Common options: sampling within each relation
type (ensure neighbors from each type are covered),
or sample across the full graph

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 58
Node-level prediction:
$ (7) ($)
:𝒗 = Head3456 (𝐡2 ) = 𝐖 𝐡2
¡ 𝒚
Edge-level prediction:
$ $
:𝒖𝒗 = Head6589 (𝐡1 , 𝐡2 )=
¡ 𝒚
$ $
Linear(Concat(𝐡1 , 𝐡2 ))
Graph-level prediction:
$
:: = Head8;<=> ({𝐡2 ∈ ℝ+ , ∀𝑣 ∈ 𝐺})
¡ 𝒚

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 59
Node-level prediction:
$ 7 ($)
:𝒗 = Head3456, A(2) (𝐡2 ) = 𝐖A(2) 𝐡2
¡ 𝒚
Edge-level prediction:
$ $
:𝒖𝒗 = Head6589, ! (𝐡1 , 𝐡2 )=
¡ 𝒚
$ $
Linear! (Concat(𝐡1 , 𝐡2 ))
Graph-level prediction:
$
:
¡ 𝒚: = AGG(Head8;<=>, B ({𝐡2 ∈
ℝ+ , ∀𝑇 𝑣 = 𝑖}))

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 60
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
¡ Graph Attention Networks (GAT)
(7) (7) (789)
𝐡, = 𝜎(∑*∈; , 𝛼,* 𝐖 𝐡* )
Attention weights

Not all node’s neighbors are equally important

§ Attention is inspired by cognitive attention.
§ The attention 𝜶𝒗𝒖 focuses on the important parts of
the input data and fades out the rest.
§ Idea: the NN should devote more computing power on that
small but important part of the data.
¡ Can we adapt GAT for heterogeneous graphs?
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 62
¡ HGT uses Scaled Dot-Product Attention
(proposed in Transformer)

¡ Query: 𝑄, Key: 𝐾, Value: 𝑉

§ 𝑄, 𝐾, 𝑉 have shape (batch_size, dim)
How do we obtain 𝑄, 𝐾, 𝑉?
¡ Apply Linear layer to the input
§ 𝑄 = 𝑄_𝐿𝑖𝑛𝑒𝑎𝑟(𝑋)
§ 𝐾 = 𝐾_𝐿𝑖𝑛𝑒𝑎𝑟(𝑋)
§ 𝑉 = 𝑉_𝐿𝑖𝑛𝑒𝑎𝑟(𝑋)
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 63
Hu et al. Heterogeneous Graph Transformer. WWW 2020.

¡ Recall: Applying GAT to a homogeneous graph

7
§𝐻 is the 𝑙-th layer representation:

How do we take relation type (node_s, edge,

node_e) into attention computation?

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 64
Hu et al. Heterogeneous Graph Transformer. WWW 2020.

¡ Mutual Attention:

¡ A set of neural networks for the attention scores of

each edge.

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 65
Hu et al. Heterogeneous Graph Transformer. WWW 2020.

¡ Motivation: GAT is unable to represent different

node & different edge types
¡ Introduce a set of neural networks for each
relation type is too expensive for attention
§ Recall: relation describes (node_s, edge, node_e)

Weight for rel_1

… Too expensive!
Weight for rel_N
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 66
Hu et al. Heterogeneous Graph Transformer. WWW 2020.

¡ Innovation: Decompose heterogeneous attention to

Node- and edge-type dependent attention mechanism
§ 3 node weight matrices, 2 edge weight matrices
§ Without decomposition: 3*2*3=18 relation types -> 18
weight matrices (suppose all relation types exist)
Paper

" Q-Linear!"#$%

'()*+[-]
Write Cite
/)+[0+] %&& 1--[01, -]
!!"#$
!! K-Linear!"#$%
Paper
'()*+[-] …
…
!" %&&
!'("#$ 1--[02, -]
K-Linear&'()*% /)+[0,]
Author
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 67
Hu et al. Heterogeneous Graph Transformer. WWW 2020.

¡ Heterogeneous Mutual Attention (First Attempt):

¡ Introduce a set of neural networks for the attention

scores of each relation type.
¡ Too expensive for attention!

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 68
Hu et al. Heterogeneous Graph Transformer. WWW 2020.

¡ Heterogeneous Mutual Attention:

¡ Each relation (𝜏 𝑠 , 𝜑 𝑒 , 𝜏 𝑡 ) has a distinct set

of projection weights
§ 𝜏 𝑠 : type of node 𝑠, 𝜑 𝑒 : type of edge 𝑒
§ 𝜏(𝑠) & 𝜏(𝑡) parameterize 𝐾_𝐿𝑖𝑛𝑒𝑎𝑟@ A & 𝑄_𝐿𝑖𝑛𝑒𝑎𝑟@ B ,
which further return Key and Query vectors 𝐾(𝑠) & 𝑄(𝑡)
§ Edge type 𝜑(𝑒) directly parameterizes Wφ(e)
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 69
¡ A full HGT layer

We have just computed

¡ Similarly, HGT decomposes weights with node & edge

types in the message computation

Weights for Weights for

each node type each edge type
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 70
Hu et al. Heterogeneous Graph Transformer. WWW 2020.

¡ Benchmark: ogbn-mag from Microsoft

Academic Graph, to predict paper venues

¡ HGT uses much fewer parameters, even

though the attention computation is expensive,
while performs better than R-GCN
§ Thanks to the weight decomposition over node &
edge types
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 71
J. You, R. Ying, J. Leskovec. Design Space of Graph Neural Networks, NeurIPS 2020

Heterogeneous GNNs extend GNNs by separately

modeling node/relation types + additional AGG
(5) Learning objective

(2) Aggregation
GNN Layer 1
(1) Message
(3) Layer
connectivity

GNN Layer 2

(4) Graph augmentation

10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 72
¡ Heterogeneous graphs: graphs with multiple
nodes or edge types
§ Key concept: relation type (node_s, edge, node_e)
§ Be aware that we don’t always need
heterogeneous graphs
¡ Learning with heterogeneous graphs
§ Key idea: separately model each relation type
§ Relational GCNs
§ Design space for heterogeneous GNNs
§ Heterogeneous Graph Transformer
10/22/24 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 73

Generative Design: Visualize, Program, and Create with JavaScript in p5.js
From Everand
Generative Design: Visualize, Program, and Create with JavaScript in p5.js
Benedikt Gross
5/5 (1)
DSA by Shradha Didi & Aman Bhaiya - DSA in 2.5 Months
No ratings yet
DSA by Shradha Didi & Aman Bhaiya - DSA in 2.5 Months
5 pages
07 Hetero
No ratings yet
07 Hetero
62 pages
09 Hetero
No ratings yet
09 Hetero
62 pages
05 GNN2
No ratings yet
05 GNN2
72 pages
07 GNN2
No ratings yet
07 GNN2
71 pages
04 GNN2
No ratings yet
04 GNN2
73 pages
Graph Neural Network Introduction
No ratings yet
Graph Neural Network Introduction
88 pages
04 GNN1
No ratings yet
04 GNN1
73 pages
Stanford CS224W Limitations of Graph Neural Networks 18-Limitations
No ratings yet
Stanford CS224W Limitations of Graph Neural Networks 18-Limitations
75 pages
10 KG
No ratings yet
10 KG
63 pages
Xford Presentation GNN Part 3
No ratings yet
Xford Presentation GNN Part 3
10 pages
08 GNN
No ratings yet
08 GNN
79 pages
02 Nodeemb
No ratings yet
02 Nodeemb
71 pages
CS 224W 02-Nodeemb
No ratings yet
CS 224W 02-Nodeemb
71 pages
07 Theory
No ratings yet
07 Theory
62 pages
06 GNN3
No ratings yet
06 GNN3
73 pages
07 Theory2
No ratings yet
07 Theory2
57 pages
03 GNN1
No ratings yet
03 GNN1
71 pages
Graph Neural Network & Traditional Neural Network Introduction
No ratings yet
Graph Neural Network & Traditional Neural Network Introduction
69 pages
14 GNN
No ratings yet
14 GNN
58 pages
Xford Presentation GNN Part 1
No ratings yet
Xford Presentation GNN Part 1
6 pages
02 Tradition ML
No ratings yet
02 Tradition ML
68 pages
03 Nodeemb
No ratings yet
03 Nodeemb
66 pages
Exam Preparation
No ratings yet
Exam Preparation
18 pages
Graph Neural Network Node Emending - Node Edge and Sub Graph
No ratings yet
Graph Neural Network Node Emending - Node Edge and Sub Graph
66 pages
08 Message
No ratings yet
08 Message
61 pages
Gnns
No ratings yet
Gnns
75 pages
A Gentle Introduction To Graph Neural Network
100% (1)
A Gentle Introduction To Graph Neural Network
122 pages
Xford Presentation Part 2 GNN
No ratings yet
Xford Presentation Part 2 GNN
5 pages
Ai Presentation
No ratings yet
Ai Presentation
71 pages
Unit I Graph Theory and Concepts
No ratings yet
Unit I Graph Theory and Concepts
35 pages
Butler 2025workshop Graph Networks Talk
No ratings yet
Butler 2025workshop Graph Networks Talk
46 pages
CS 224W 01-Intro
No ratings yet
CS 224W 01-Intro
68 pages
GML Introduction
No ratings yet
GML Introduction
11 pages
STCN Major 2
No ratings yet
STCN Major 2
96 pages
GNN Foundations Frontiers and Applications Chapter4
No ratings yet
GNN Foundations Frontiers and Applications Chapter4
21 pages
10 Graph Neural Networks v2.2
No ratings yet
10 Graph Neural Networks v2.2
61 pages
Graph Neural Networks
No ratings yet
Graph Neural Networks
5 pages
Seminar Presentation
No ratings yet
Seminar Presentation
19 pages
STCN Major 1
No ratings yet
STCN Major 1
95 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
124 pages
Intro To GNN
No ratings yet
Intro To GNN
49 pages
822 2020 Metapath Aggregated Graph Neural Network For Heterogeneou Graph Embedding
No ratings yet
822 2020 Metapath Aggregated Graph Neural Network For Heterogeneou Graph Embedding
11 pages
Defence Transcription
No ratings yet
Defence Transcription
4 pages
Web - Stanford.edu 01-Intro
No ratings yet
Web - Stanford.edu 01-Intro
87 pages
Documents 2025-3 (v2) GNN (Node Classification) GNN Classification v2
No ratings yet
Documents 2025-3 (v2) GNN (Node Classification) GNN Classification v2
74 pages
What Is Graph Neural Network - An Introduction To GNN and Its Applications - Simplilearn
No ratings yet
What Is Graph Neural Network - An Introduction To GNN and Its Applications - Simplilearn
13 pages
Graph Based Data Science
No ratings yet
Graph Based Data Science
37 pages
2024 - Introduction To Graph Neural Networks A Starting
No ratings yet
2024 - Introduction To Graph Neural Networks A Starting
49 pages
Yang 20 A
No ratings yet
Yang 20 A
16 pages
04 Pagerank
No ratings yet
04 Pagerank
64 pages
Original GNN
No ratings yet
Original GNN
22 pages
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
No ratings yet
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
107 pages
Content Augmented Graph Neural Networks
No ratings yet
Content Augmented Graph Neural Networks
15 pages
GNNChap 7
No ratings yet
GNNChap 7
26 pages
Graph Neural Networks: A Review of Methods and Applications
No ratings yet
Graph Neural Networks: A Review of Methods and Applications
20 pages
Graph Neural Networks
No ratings yet
Graph Neural Networks
124 pages
Chap7 GNN (20240229) - DL4H Practioner Guide
No ratings yet
Chap7 GNN (20240229) - DL4H Practioner Guide
37 pages
Ishigurognnintroduction201023 201027054344
No ratings yet
Ishigurognnintroduction201023 201027054344
81 pages
Nell: An SVG Drawing Language
From Everand
Nell: An SVG Drawing Language
Stefan Hollos
No ratings yet
English Practice
No ratings yet
English Practice
4 pages
Provably Powerful Graph Networks: Haggai Maron Heli Ben-Hamu Hadar Serviansky Yaron Lipman
No ratings yet
Provably Powerful Graph Networks: Haggai Maron Heli Ben-Hamu Hadar Serviansky Yaron Lipman
15 pages
Functional Protein Design With Local Domain Alignment
No ratings yet
Functional Protein Design With Local Domain Alignment
20 pages
R E P GNN G B: Ethinking The Xpressive Ower of S Via Raph Iconnectivity
No ratings yet
R E P GNN G B: Ethinking The Xpressive Ower of S Via Raph Iconnectivity
60 pages
Aligning Transformers With Weisfeiler-Leman: K K K K
No ratings yet
Aligning Transformers With Weisfeiler-Leman: K K K K
51 pages
On The Connection Between MPNN and Graph Transformer
No ratings yet
On The Connection Between MPNN and Graph Transformer
23 pages
Assignment 2 1. Solve by Simplex Method
No ratings yet
Assignment 2 1. Solve by Simplex Method
3 pages
Pub - Foundations-Of-Constraint-Satisfaction 278
No ratings yet
Pub - Foundations-Of-Constraint-Satisfaction 278
1 page
Lectures On Artificial Intelligence 12.09.14
No ratings yet
Lectures On Artificial Intelligence 12.09.14
97 pages
The Limits of Quantum Computers PDF
No ratings yet
The Limits of Quantum Computers PDF
8 pages
CSC Answer Key - XII
No ratings yet
CSC Answer Key - XII
13 pages
8ai Question Bank
No ratings yet
8ai Question Bank
4 pages
Java MCQs
100% (2)
Java MCQs
5 pages
Lect2 PDF
No ratings yet
Lect2 PDF
25 pages
Solving Problems by Searching: The Theory and Technology of Building Agents That Can Plan Ahead To Solve Problems
No ratings yet
Solving Problems by Searching: The Theory and Technology of Building Agents That Can Plan Ahead To Solve Problems
67 pages
Asymptotic Analysis PDF
No ratings yet
Asymptotic Analysis PDF
26 pages
Unit 1 - Lab 1-Flowchart
No ratings yet
Unit 1 - Lab 1-Flowchart
6 pages
Module No.: July 2, 2021 Module 4: Linear Programming: Transportation Method
No ratings yet
Module No.: July 2, 2021 Module 4: Linear Programming: Transportation Method
88 pages
Writing Proofs - Analyzing Games - Problems - Lavrov (2015-16)
No ratings yet
Writing Proofs - Analyzing Games - Problems - Lavrov (2015-16)
2 pages
ELEC1010 Homework 4
No ratings yet
ELEC1010 Homework 4
3 pages
Dynamic Programming: Design and Analysis of Algorithms
No ratings yet
Dynamic Programming: Design and Analysis of Algorithms
41 pages
Girard - On The Unity of Logic
No ratings yet
Girard - On The Unity of Logic
17 pages
Bisection Method Final
No ratings yet
Bisection Method Final
13 pages
TP 03: Technique D'optimisation PSO
No ratings yet
TP 03: Technique D'optimisation PSO
3 pages
CS6700 Pa3
No ratings yet
CS6700 Pa3
3 pages
Introduction To Operations Research: Ninth Edition
No ratings yet
Introduction To Operations Research: Ninth Edition
8 pages
Applied Numerical Methods: Digital Assignment-1
No ratings yet
Applied Numerical Methods: Digital Assignment-1
18 pages
Lecture 22
No ratings yet
Lecture 22
39 pages
Python Practical Index: SR No Aim Date Marks /10 Sign Practical Set - 1
No ratings yet
Python Practical Index: SR No Aim Date Marks /10 Sign Practical Set - 1
5 pages
C & Data Structures Qa
No ratings yet
C & Data Structures Qa
122 pages
Ai Assignment 2
No ratings yet
Ai Assignment 2
19 pages
BLM 1-3 Section 1.1 Practic
No ratings yet
BLM 1-3 Section 1.1 Practic
1 page
A2SV DFS Lecture - No Code
No ratings yet
A2SV DFS Lecture - No Code
105 pages
TOC Full Note For PU
No ratings yet
TOC Full Note For PU
50 pages

09 Hetero

Uploaded by

09 Hetero

Uploaded by

Note to other teachers and users of these slides: We would be delighted if you found our

CS224W: Machine Learning with Graphs

(Paper, Cite, Paper) (Author, Cite, Author)

(Paper, Like, Paper) (Author, Like, Author)

(Paper, Cite, Author) (Author, Cite, Paper)

(Paper, Like, Author) (Author, Like, Paper)

Relation types: (node_start, edge, node_end)

¡ (1) Graph Convolutional Networks (GCN)

¡ How to write this as Message + Aggregation?

B Only pass messages C

¡ (1) Graph Convolutional Networks (GCN)

Weight for rel_1

Weight for self-loop

¡ Rapid growth of the number of parameters w.r.t

¡ If use 𝐵 blocks, then # param reduces from

(1) At training time: (2) At validation time: (3) At test time:

Training supervision edges

Note the negative edges should NOT

(1) Use training message edges to

ℓ = − log 𝜎 𝑓"! ℎ- , ℎ# − log(1 − 𝜎 𝑓"! (ℎ- , ℎ. ))

(2) At validation time:

§ SOTA method: SeHGNN

¡ Can perform entity classification as well as

¡ Ideas can easily be extended into RGNN

¡ Benchmark: ogbn-mag from Microsoft

How do we extend the general GNN design

(4) Graph augmentation

TARGET NODE B Node 𝒗 B C

Pre-processing layers: Important when

Post-processing layers: Important when

In practice, adding these layers works great!

Not all node’s neighbors are equally important

¡ Query: 𝑄, Key: 𝐾, Value: 𝑉

¡ Recall: Applying GAT to a homogeneous graph

How do we take relation type (node_s, edge,

¡ A set of neural networks for the attention scores of

¡ Motivation: GAT is unable to represent different

Weight for rel_1

¡ Innovation: Decompose heterogeneous attention to

¡ Heterogeneous Mutual Attention (First Attempt):

¡ Introduce a set of neural networks for the attention

¡ Heterogeneous Mutual Attention:

¡ Each relation (𝜏 𝑠 , 𝜑 𝑒 , 𝜏 𝑡 ) has a distinct set

We have just computed

¡ Similarly, HGT decomposes weights with node & edge

Weights for Weights for

¡ Benchmark: ogbn-mag from Microsoft

¡ HGT uses much fewer parameters, even

Heterogeneous GNNs extend GNNs by separately

(4) Graph augmentation

You might also like