0% found this document useful (0 votes)

14 views66 pages

Graph Neural Network Node Emending - Node Edge and Sub Graph

Graph Neural Network Node Emending | node edge and sub graph

Uploaded by

mibmuw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views66 pages

Graph Neural Network Node Emending - Node Edge and Sub Graph

Graph Neural Network Node Emending | node edge and sub graph

Uploaded by

mibmuw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

CS224W: Machine Learning with Graphs

Jure Leskovec, Stanford University

http://cs224w.stanford.edu
Given an input graph, extract node, link
and graph-level features, learn a model
(SVM, neural network, etc.) that maps
features to labels.

Input Structured Learning

Prediction
Graph Features Algorithm

Feature engineering Downstream

(node-level, edge-level, graph- prediction task
level features)

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 2
Graph Representation Learning alleviates
the need to do feature engineering every
single time.
Input Structured Learning
Prediction
Graph Features Algorithm

Feature Representation Learning -- Downstream

Engineering Automatically prediction task
learn the features

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 3
Goal: Efficient task-independent feature
learning for machine learning with graphs!

node vector
𝑢
𝑓: 𝑢 → ℝ𝑑
ℝ𝑑
Feature representation,
embedding

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 4
 Task: Map nodes into an embedding space
▪ Similarity of embeddings between nodes indicates
their similarity in the network. For example:
▪ Both nodes are close to each other (connected by an edge)
▪ Encode network information
▪ Potentially used for many downstream predictions

Vec Tasks
• Node classification
• Link prediction
• Graph classification
• Anomalous node detection
embeddings ℝ 𝑑 • Clustering
• ….
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 5
Example
2D embedding of nodes of the Zachary’s

Karate Karate
• Zachary’s Club network:
Network:

Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations. KDD 2014.
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 6
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
 Assume we have a graph G:
▪ V is the vertex set.
▪ A is the adjacency matrix (assume binary).
▪ For simplicity: No node features or extra
information is used
4
3
2 0 1 0 1
1
 
1 0 0 1
A=
V: {1, 2, 3, 4} 0 0 0 1
 
1 0 
 1 1

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 8
 Goal is to encode nodes so that similarity in
the embedding space (e.g., dot product)
approximates similarity in the graph

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 9
Goal: similarity 𝑢, 𝑣 ≈ 𝐳𝑣Τ 𝐳𝑢
in the original network Similarity of the embedding

Need to define!

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 10
1. Encoder maps from nodes to embeddings
2. Define a node similarity function (i.e., a
measure of similarity in the original network)
3. Decoder 𝐃𝐄𝐂 maps from embeddings to the
similarity score
4. Optimize the parameters of the encoder so
that: 𝐃𝐄𝐂(𝐳Τ 𝐳 ) 𝑣 𝑢

similarity 𝑢, 𝑣 ≈ 𝐳𝑣Τ 𝐳𝑢
in the original network Similarity of the embedding

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 11
 Encoder: maps each node to a low-dimensional
vector d-dimensional
ENC 𝑣 = 𝐳𝑣 embedding
node in the input graph
 Similarity function: specifies how the
relationships in vector space map to the
relationships in the original network
similarity 𝑢, 𝑣 ≈ 𝐳𝑣Τ 𝐳𝑢 Decoder
Similarity of 𝑢 and 𝑣 in dot product between node
the original network embeddings
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 12
Simplest encoding approach: Encoder is just an
embedding-lookup
ENC 𝑣 = 𝐳𝒗 = 𝐙 ⋅ 𝑣

matrix, each column is a node

embedding [what we learn /
optimize]
indicator vector, all zeroes
except a one in column
indicating node v
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 13
Simplest encoding approach: encoder is just an
embedding-lookup
embedding vector for a
embedding specific node
matrix

Dimension/size
of embeddings

one column per node

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 14
Simplest encoding approach: Encoder is just an
embedding-lookup

Each node is assigned a unique

embedding vector
(i.e., we directly optimize
the embedding of each node)

Many methods: DeepWalk, node2vec

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 15
 Encoder + Decoder Framework
▪ Shallow encoder: embedding lookup
▪ Parameters to optimize: 𝐙 which contains node
embeddings 𝐳𝑢 for all nodes 𝑢 ∈ 𝑉
▪ We will cover deep encoders (GNNs) in Lecture 6

▪ Decoder: based on node similarity.

▪ Objective: maximize 𝐳𝑣Τ 𝐳𝑢 for node pairs (𝑢, 𝑣)
that are similar

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 16
 Key choice of methods is how they define node
similarity.

 Should two nodes have a similar embedding if

they…
▪ are linked?
▪ share neighbors?
▪ have similar “structural roles”?
 We will now learn node similarity definition that uses
random walks, and how to optimize embeddings for
such a similarity measure.

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 17
 This is unsupervised/self-supervised way of
learning node embeddings.
▪ We are not utilizing node labels
▪ We are not utilizing node features
▪ The goal is to directly estimate a set of coordinates
(i.e., the embedding) of a node so that some aspect
of the network structure (captured by DEC) is
preserved.
 These embeddings are task independent
▪ They are not trained for a specific task but can be
used for any task.
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 18
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
 Vector 𝐳𝑢 :
▪ The embedding of node 𝑢 (what we aim to find).
 Probability 𝑃 𝑣 𝐳𝑢 ) : Our model prediction based on 𝐳𝑢
▪ The (predicted) probability of visiting node 𝑣 on
random walks starting from node 𝑢.

Non-linear functions used to produce predicted probabilities

 Softmax function:
▪ Turns vector of 𝐾 real values (model predictions)
𝒛[𝑖]
into
𝑒
𝐾 probabilities that sum to 1: 𝜎(𝒛)[𝑖] = 𝐾 𝒛[𝑗]
σ𝑗=1 𝑒
 Sigmoid function:
▪ S-shaped function that turns real values into the range of (0, 1).
1
Written as 𝑆 𝑥 = −𝑥
.
1+𝑒

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 20
10
9

12
2 Step 3 Step 4

8 Step 5
1
11
3
Given a graph and a starting
4 Step 1 Step 2 point, we select a neighbor of
it at random, and move to this
neighbor; then we select a
6
5 neighbor of this point at
random, and move to it, etc.
The (random) sequence of
7 points visited this way is a
random walk on the graph.
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 21
probability that u
and v co-occur on a
random walk over
the graph

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 22
1. Estimate probability of visiting node 𝒗 on a
random walk starting from node 𝒖 using
some random walk strategy 𝑹

2. Optimize embeddings to encode these

random walk statistics:
Similarity in embedding space (Here:
dot product=cos(𝜃)) encodes random walk “similarity”
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 23
1. Expressivity: Flexible stochastic definition of
node similarity that incorporates both local
and higher-order neighborhood information
Idea: if random walk starting from node 𝑢
visits 𝑣 with high probability, 𝑢 and 𝑣 are
similar (high-order multi-hop information)

2. Efficiency: Do not need to consider all node

pairs when training; only need to consider
pairs that co-occur on random walks
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 24
 Intuition: Find embedding of nodes in
𝑑-dimensional space that preserves similarity

 Idea: Learn node embedding such that nearby

nodes are close together in the network

 Given a node 𝑢, how do we define nearby

nodes?
▪ 𝑁𝑅 𝑢 … neighbourhood of 𝑢 obtained by some
random walk strategy 𝑅
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 25
 Given 𝐺 = (𝑉, 𝐸),
 Our goal is to learn a mapping 𝑓: 𝑢 → ℝ𝑑 :
𝑓 𝑢 = 𝐳𝑢

 Log-likelihood objective:

▪ 𝑁𝑅 (𝑢) is the neighborhood of node 𝑢 by strategy 𝑅

 Given node 𝑢, we want to learn feature

representations that are predictive of the nodes
in its random walk neighborhood 𝑁𝑅 (𝑢).
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 26
1. Run short fixed-length random walks
starting from each node 𝑢 in the graph using
some random walk strategy R.
2. For each node 𝑢 collect 𝑁𝑅 (𝑢), the multiset*
of nodes visited on random walks starting
from 𝑢.
3. Optimize embeddings according to: Given
node 𝑢, predict its neighbors 𝑁R (𝑢).
Maximum likelihood objective

*𝑁𝑅 (𝑢) can have repeat elements since nodes can be visited multiple times on random walks
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 27
Equivalently,

• Intuition: Optimize embeddings 𝒛𝑢 to maximize

the likelihood of random walk co-occurrences.
• Parameterize 𝑃(𝑣|𝐳𝑢) using softmax:
Why softmax?
We want node 𝑣 to be
most similar to node 𝑢
(out of all nodes 𝑛).
Intuition: σ𝑖 exp 𝑥𝑖 ≈
max exp(𝑥𝑖 )
𝑖
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 28
Putting it all together:

sum over all sum over nodes 𝑣 predicted probability of 𝑢

nodes 𝑢 seen on random and 𝑣 co-occuring on
walks starting from 𝑢 random walk

Optimizing random walk embeddings =

Finding embeddings 𝐳𝒖 that minimize L
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 29
But doing this naively is too expensive!

Nested sum over nodes gives

O(|V|2) complexity!

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 30
But doing this naively is too expensive!

The normalization term from the softmax is

the culprit… can we approximate it?

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 31
Why is the approximation valid?
Technically, this is a different objective. But

 Solution: Negative sampling Negative Sampling is a form of Noise

Contrastive Estimation (NCE) which approx.
maximizes the log probability of softmax.
New formulation corresponds to using a
logistic regression (sigmoid func.) to
distinguish the target node 𝑣 from nodes 𝑛𝑖
sampled from background distribution 𝑃𝑣 .
More at https://arxiv.org/pdf/1402.3722.pdf

≈ log 𝜎 𝐳𝑢T 𝐳𝑣 − σ𝑘𝑖=1 log 𝜎 𝐳𝑢T 𝐳𝑛𝑖 , 𝑛𝑖 ~𝑃𝑉

sigmoid function random distribution

(makes each term a “probability”
between 0 and 1)
over nodes
Instead of normalizing w.r.t. all nodes, just
normalize against 𝑘 random “negative samples” 𝑛𝑖
 Negative sampling allows for quick likelihood calculation.
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 32
random distribution
over nodes

▪ Sample 𝑘 negative nodes each with prob.

proportional to its degree
▪ Two considerations for 𝑘 (# negative samples):
1. Higher 𝑘 gives more robust estimates
2. Higher 𝑘 corresponds to higher bias on negative events
In practice 𝑘 =5-20. Can negative sample be any node or only the nodes not on the
walk? People often use any nodes (for efficiency). However, the
most “correct” way is to use nodes not on the walk.
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 33
▪ After we obtained the objective function, how do
we optimize (minimize) it?
ℒ=෍ ෍ −log(𝑃(𝑣|𝐳𝑢 ))
𝑢∈𝑉 𝑣∈𝑁𝑅 (𝑢)

▪ Gradient Descent: a simple way to minimize ℒ :

▪ Initialize 𝑧𝑢 at some randomized value for all nodes 𝑢.

▪ Iterate until convergence:

𝜕ℒ 𝜂: learning rate
▪ For all 𝑢, compute the derivative .
𝜕𝑧𝑢

𝜕ℒ
▪ For all 𝑢, make a step in reverse direction of derivative: 𝑧𝑢 ← 𝑧𝑢 − 𝜂 .
𝜕𝑧𝑢
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 34
▪ Stochastic Gradient Descent: Instead of evaluating
gradients over all examples, evaluate it for each
individual training example.
▪ Initialize 𝑧𝑢 at some randomized value for all nodes 𝑢.

▪ Iterate until convergence: ℒ (𝑢) = ෍ −log(𝑃(𝑣|𝐳𝑢 ))

𝑣∈𝑁𝑅 (𝑢)
𝜕ℒ (𝑢)
▪ Sample a node 𝑢, for all 𝑣 calculate the derivative .
𝜕𝑧𝑣
𝜕ℒ (𝑢)
▪ For all 𝑣, update:𝑧𝑣 ← 𝑧𝑣 − 𝜂 .
𝜕𝑧𝑣

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 35
1. Run short fixed-length random walks starting
from each node on the graph
2. For each node 𝑢 collect 𝑁𝑅 (𝑢), the multiset of
nodes visited on random walks starting from 𝑢.
3. Optimize embeddings using Stochastic
Gradient Descent:

We can efficiently approximate this using

negative sampling!
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 36
 So far we have described how to optimize
embeddings given a random walk strategy R
 What strategies should we use to run these
random walks?
▪ Simplest idea: Just run fixed-length, unbiased
random walks starting from each node (i.e.,
DeepWalk from Perozzi et al., 2013)
▪ The issue is that such notion of similarity is too constrained
 How can we generalize this?
Reference: Perozzi et al. 2014. DeepWalk: Online Learning of Social Representations. KDD.
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 37
 Goal: Embed nodes with similar network
neighborhoods close in the feature space.
 We frame this goal as a maximum likelihood
optimization problem, independent to the
downstream prediction task.
 Key observation: Flexible notion of network
neighborhood 𝑁𝑅 (𝑢) of node 𝑢 leads to rich node
embeddings
 Develop biased 2nd order random walk 𝑅 to
generate network neighborhood 𝑁𝑅 (𝑢) of node 𝑢
Reference: Grover et al. 2016. node2vec: Scalable Feature Learning for Networks. KDD.
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 38
Feature Learning for Networks

Idea: use flexible, biased

Jure random walks that can
Leskovec
trade off betweenStanford
localUniversity
and global views of the
[email protected]
network (Grover and Leskovec, 2016).

s1 s2 s8
careful
s7
cent re-
BFS
d to sig- u s6
eatures DFS
sitive to s4 s9
s3 s5
r learn- Figur e 1: BFS and DFS sear ch str ategies from node u (k = 3).
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 39
Jure Leskovec
Stanford University
[email protected]

Two classic strategies to define a neighborhood

𝑵𝑹 𝒖 of a given node 𝒖:
s1 s2 s8
careful
s7
cent re-
BFS
d to sig- u s6
eatures DFS
sitive to s4 s9
s3 s5
r learn- Figur e 1: BFS and DFS sear ch str ategies from node u (k = 3).
vec, we Walk of length 3 (𝑁 𝑢 of size 3): 𝑅
eatures
𝑁
een net- 𝐵𝐹𝑆 𝑢 = { 𝑠 , 𝑠 , 𝑠 } Local microscopic view
and edges. A typical solution involves hand-engineering domain-
1 2 3
speciﬁc features based on expert knowledge. Even if one discounts
node’s
the tedious work of feature engineering, such features are usually
proce- 𝑁 𝐷𝐹𝑆 𝑢 = { 𝑠 , 𝑠 , 𝑠 } Global macroscopic view
designed for speciﬁc 4 tasks 5 and 6 do not generalize across different
eads to9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 40
u

BFS: DFS:
Micro-view of Macro-view of
neighbourhood neighbourhood

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 41
Biased fixed-length random walk 𝑹 that given a
node 𝒖 generates neighborhood 𝑵𝑹 𝒖
 Two parameters:
▪ Return parameter 𝒑:
▪ Return back to the previous node
▪ In-out parameter 𝒒:
▪ Moving outwards (DFS) vs. inwards (BFS)
▪ Intuitively, 𝑞 is the “ratio” of BFS vs. DFS

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 42
Biased 2nd-order random walks explore network
neighborhoods:
▪ Rnd. walk just traversed edge (𝑠1 , 𝑤) and is now at 𝑤
▪ Insight: Neighbors of 𝑤 can only be:
Same distance to 𝒔𝟏
s2 s3
w Farther from 𝒔𝟏

u s1
Back to 𝒔𝟏

Idea: Remember where the walk came from

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 43
 Walker came over edge (𝐬𝟏 , 𝐰) and is at 𝐰.
Where to go next?
s2 1
1/𝑞 s3 1/𝑝, 1/𝑞, 1 are
w unnormalized
1/𝑞
s1 probabilities
u 1/𝑝 s4

 𝑝, 𝑞 model transition probabilities

▪ 𝑝 … return parameter
▪ 𝑞 … ”walk away” parameter

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 44
 Walker came over edge (𝐬𝟏 , 𝐰) and is at 𝐰.
Where to go next?
Target 𝒕 Prob. Dist. (𝒔𝟏, 𝒕)
s2 1
s3
1/𝑞 s1 1/𝑝 0
w w → s2 1 1
1/𝑞
u s1 1/𝑝 s4
s3 1/𝑞 2
s4 1/𝑞 2
Unnormalized
▪ BFS-like walk: Low value of 𝑝 transition prob.
segmented based

▪ DFS-like walk: Low value of 𝑞 on distance from 𝑠!

𝑁𝑅 (𝑢) are the nodes visited by the biased walk

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 45
 1) Compute random walk probabilities
 2) Simulate 𝑟 random walks of length 𝑙 starting
from each node 𝑢
 3) Optimize the node2vec objective using
Stochastic Gradient Descent

 Linear-time complexity
 All 3 steps are individually parallelizable

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 46
 Different kinds of biased random walks:
▪ Based on node attributes (Dong et al., 2017).
▪ Based on learned weights (Abu-El-Haija et al., 2017)

 Alternative optimization schemes:

▪ Directly optimize based on 1-hop and 2-hop random walk
probabilities (as in LINE from Tang et al. 2015).

 Network preprocessing techniques:

▪ Run random walks on modified versions of the original
network (e.g., Ribeiro et al. 2017’s struct2vec, Chen et al.
2016’s HARP).

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 47
 Core idea: Embed nodes so that distances in
embedding space reflect node similarities in
the original network.
 Different notions of node similarity:
▪ Naïve: similar if two nodes are connected
▪ Neighborhood overlap (covered in Lecture 2)
▪ Random walk approaches (covered today)

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 48
 So what method should I use..?
 No one method wins in all cases….
▪ E.g., node2vec performs better on node classification
while alternative methods perform better on link
prediction (Goyal and Ferrara, 2017 survey).
 Random walk approaches are generally more
efficient.
 In general: Must choose definition of node
similarity that matches your application.

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 49
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
 Goal: Want to embed a subgraph or an entire
graph 𝐺. Graph embedding: 𝐳𝑮 .

𝒛𝐺

 Tasks:
▪ Classifying toxic vs. non-toxic molecules
▪ Identifying anomalous graphs
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 51
Simple (but effective) approach 1:
 Run a standard graph embedding
technique on the (sub)graph 𝐺.
 Then just sum (or average) the node
embeddings in the (sub)graph 𝐺.

 Used by Duvenaud et al., 2016 to classify

molecules based on their graph structure
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 52
 Approach 2: Introduce a “virtual node” to
represent the (sub)graph and run a standard
graph embedding technique

 Proposed by Li et al., 2016 as a general

technique for subgraph embedding
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 53
States in anonymous walks correspond to the
index of the first time we visited the node in a
random walk

Anonymous Walk Embeddings, ICML 2018 https://arxiv.org/pdf/1805.11921.pdf

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 54
 Agnostic to the identity of the nodes visited
(hence anonymous)
 Example: Random walk w1:

 Step 1: node A node 1

 Step 2: node B node 2 (different from node 1)
 Step 3: node C node 3 (different from node 1, 2)
 Step 4: node B node 2 (same as the node in step 2)
 Step 5: node C node 3 (same as the node in step 3)

 Note: Random walk w2 gives

the same anonymous walk:

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 55
Number of anonymous walks grows exponentially:
▪ There are 5 anon. walks 𝑤𝑖 of length 3:
𝑤1=111, 𝑤2=112, 𝑤3= 121, 𝑤4= 122, 𝑤5= 123
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 56
 Simulate anonymous walks 𝑤𝑖 of 𝑙 steps and
record their counts.
 Represent the graph as a probability
distribution over these walks.

 For example:
▪ Set 𝑙 = 3
▪ Then we can represent the graph as a 5-dim vector
▪ Since there are 5 anonymous walks 𝑤𝑖 of length 3: 111, 112,
121, 122, 123
▪ 𝒛𝑮 [𝑖] = probability of anonymous walk 𝑤𝑖 in graph 𝐺.
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 57
 Sampling anonymous walks: Generate
independently a set of 𝑚 random walks.
 Represent the graph as a probability distribution
over these walks.

 How many random walks 𝑚 do we need?

▪ We want the distribution to have error of more than
𝜀 with prob. less than 𝛿: For example:
There are 𝜂 = 877
anonymous walks of length
𝑙 = 7. If we set
𝜀 = 0.1 and 𝛿 = 0.01 then
we need to generate
𝑚=122,500 random walks
where: 𝜂 is the total number of anon. walks of length 𝑙.
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 58
Rather than simply representing each walk by the
fraction of times it occurs, we learn embedding
𝒛𝒊 of anonymous walk 𝒘𝒊 .
 Learn a graph embedding 𝒛𝑮 together with all
the anonymous walk embeddings 𝒛𝒊
𝒁 = {𝒛𝒊 : 𝑖 = 1 … 𝜂}, where 𝜂 is the number of
sampled anonymous walks.

How to embed walks?

 Idea: Embed walks s.t. the next walk can be
predicted.
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 59
 Output: A vector 𝒛𝑮 for input graph 𝑮
▪ The embedding of entire graph to be learned
 Starting from node 1: Sample anonymous
𝑤1 𝑤2 𝑤3 𝑤4
random walks, e.g.

 Learn to predict walks that co-occur in 𝚫-size

window (e.g., predict 𝑤3 given 𝑤1, 𝑤2 if Δ = 2)
 Objective:

▪ Where 𝑇 is the total number of walks

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 60
 Run 𝑻 different random walks from 𝒖 each of length 𝒍:
𝑁𝑅 𝑢 = 𝑤1𝑢 , 𝑤2𝑢 … 𝑤𝑇𝑢
 Learn to predict walks that co-occur in 𝚫-size window
 Estimate embedding 𝒛𝑖 of anonymous walk 𝑤𝑖 .
Let 𝜂 be number of all possible walk embeddings.

Objective:

exp(𝑦 𝑤𝑡 )
▪ 𝑃 𝑤𝑡 {𝑤𝑡−Δ, … , 𝑤𝑡−1, 𝒛𝑮 ) = σ𝜂
𝑖=1 exp(𝑦(𝑤𝑖 ))
1
▪ 𝑦 𝑤𝑡 = 𝑏 + 𝑈 ⋅ 𝑐𝑎𝑡( σΔ𝑖=1 𝒛𝑖 , 𝒛𝑮) All possible anonymous walks
Δ (requires negative sampling)
1
▪ 𝑐𝑎𝑡( σΔ𝑖=1 𝒛𝑖 , 𝒛𝑮) means an average of anonymous walk embeddings 𝒛𝑖 in
Δ
the window, concatenated with the graph embedding 𝒛𝑮.
▪ 𝑏 ∈ ℝ, 𝑈 ∈ ℝ𝐷 are learnable parameters. This represents a linear layer.
Anonymous Walk Embeddings, ICML 2018 https://arxiv.org/pdf/1805.11921.pdf
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 61
 We obtain the graph embedding
𝒛𝑮 (learnable parameter) after Overall Architecture
the optimization.
▪ Is 𝒛𝑮 simply the sum over walk
embeddings 𝒛𝒊 ? Or is 𝒛𝑮 the residual
embedding next to 𝒛𝒊 ?
▪ According to the paper, 𝒛𝑮 is a
separately optimized vector
parameter, just like other 𝒛𝒊 ’s.
 Use 𝒛𝑮 to make predictions
(e.g., graph classification):
▪ Option1: Inner product
Kernel 𝒛𝑇𝑮𝟏 𝒛𝑮𝟐 (Lecture 2)
▪ Option2: Use a neural
network that takes 𝒛𝑮
9/28/2021
as input to classify 𝑮.
Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 62
We discussed 3 ideas to graph embeddings:
 Approach 1: Embed nodes and sum/avg them

 Approach 2: Create super-node that spans the

(sub) graph and then embed that node.

 Approach 3: Anonymous Walk Embeddings

▪ Idea 1: Sample the anon. walks and represent the
graph as fraction of times each anon walk occurs.
▪ Idea 2: Learn graph embedding together with
anonymous walk embeddings.
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 63
 We will discuss more advanced ways to obtain
graph embeddings in Lecture 8.
 We can hierarchically cluster nodes in graphs,
and sum/avg the node embeddings according
to these clusters.

9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 64
 How to use embeddings 𝒛𝒊 of nodes:
▪ Clustering/community detection: Cluster points 𝒛𝒊
▪ Node classification: Predict label of node 𝑖 based on 𝒛𝒊
▪ Link prediction: Predict edge (𝑖, 𝑗) based on (𝒛𝒊, 𝒛𝒋)
▪ Where we can: concatenate, avg, product, or take a difference
between the embeddings:
▪ Concatenate: 𝑓(𝒛𝑖 , 𝒛𝑗 )= 𝑔([𝒛𝑖 , 𝒛𝑗 ])
▪ Hadamard: 𝑓(𝒛𝑖 , 𝒛𝑗 )= 𝑔(𝒛𝑖 ∗ 𝒛𝑗 ) (per coordinate product)
▪ Sum/Avg: 𝑓(𝒛𝑖 , 𝒛𝑗 )= 𝑔(𝒛𝑖 + 𝒛𝑗 )
▪ Distance: 𝑓(𝒛𝑖 , 𝒛𝑗 )= 𝑔(||𝒛𝑖 − 𝒛𝒋 ||2)
▪ Graph classification: Graph embedding 𝒛𝑮 via aggregating
node embeddings or anonymous random walks.
Predict label based on graph embedding 𝒛𝐺 .
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 65
We discussed graph representation learning, a way to
learn node and graph embeddings for downstream
tasks, without feature engineering.
 Encoder-decoder framework:
▪ Encoder: embedding lookup
▪ Decoder: predict score based on embedding to match
node similarity
 Node similarity measure: (biased) random walk
▪ Examples: DeepWalk, Node2Vec

 Extension to Graph embedding: Node embedding

aggregation and Anonymous Walk Embeddings
9/28/2021 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 66

Full download Environmental Science: A Global Concern 15th Edition William Cunningham pdf docx
80% (5)
Full download Environmental Science: A Global Concern 15th Edition William Cunningham pdf docx
66 pages
Delta 1 Text Book Answers
100% (4)
Delta 1 Text Book Answers
187 pages
Syllabus Inorganic and Organic Chemistry 2018 2 1 1
0% (1)
Syllabus Inorganic and Organic Chemistry 2018 2 1 1
5 pages
03 Nodeemb
No ratings yet
03 Nodeemb
66 pages
02-nodeemb
No ratings yet
02-nodeemb
71 pages
CS 224W 02-Nodeemb
No ratings yet
CS 224W 02-Nodeemb
71 pages
04-GNN1
No ratings yet
04-GNN1
73 pages
08 GNN
No ratings yet
08 GNN
79 pages
09 Node2vec
No ratings yet
09 Node2vec
60 pages
03-GNN1
No ratings yet
03-GNN1
71 pages
Stanford CS224W Graph Representation Learning 09-Node2vec PDF
No ratings yet
Stanford CS224W Graph Representation Learning 09-Node2vec PDF
60 pages
Stanford CS224W Limitations of Graph Neural Networks 18-Limitations
No ratings yet
Stanford CS224W Limitations of Graph Neural Networks 18-Limitations
75 pages
Graph Neural Network & Traditional Neural Network Introduction
No ratings yet
Graph Neural Network & Traditional Neural Network Introduction
69 pages
02 Tradition ML
No ratings yet
02 Tradition ML
68 pages
07-theory
No ratings yet
07-theory
62 pages
04 Pagerank
No ratings yet
04 Pagerank
64 pages
07 Theory2
No ratings yet
07 Theory2
57 pages
08 Message
No ratings yet
08 Message
61 pages
09-hetero
No ratings yet
09-hetero
62 pages
Xford Presentation GNN Part 3
No ratings yet
Xford Presentation GNN Part 3
10 pages
Graph Neural Network Introduction
No ratings yet
Graph Neural Network Introduction
88 pages
10 KG
No ratings yet
10 KG
63 pages
Nrltutorial Part2 Gnns PDF
0% (1)
Nrltutorial Part2 Gnns PDF
66 pages
This Talk: 1) Node Embeddings 2) Graph Neural Networks 3) Applications
No ratings yet
This Talk: 1) Node Embeddings 2) Graph Neural Networks 3) Applications
59 pages
Exam_Preparation
No ratings yet
Exam_Preparation
18 pages
09 Hetero
No ratings yet
09 Hetero
72 pages
07 Hetero
No ratings yet
07 Hetero
62 pages
06-GNN3
No ratings yet
06-GNN3
73 pages
14-gnn
No ratings yet
14-gnn
58 pages
unit-II node embeddings
No ratings yet
unit-II node embeddings
44 pages
07 GNN2
No ratings yet
07 GNN2
71 pages
stcn major 2
No ratings yet
stcn major 2
96 pages
Representation Learning On Graphs: Methods and Applications
No ratings yet
Representation Learning On Graphs: Methods and Applications
23 pages
Peerj Cs 357
No ratings yet
Peerj Cs 357
62 pages
04 GNNBasic
No ratings yet
04 GNNBasic
107 pages
GraphBasedDataScience
No ratings yet
GraphBasedDataScience
37 pages
Node2vec: Scalable Feature Learning For Networks: Aditya Grover Et Al. Presented By: Saim Mehmood Ahmadreza Jeddi
No ratings yet
Node2vec: Scalable Feature Learning For Networks: Aditya Grover Et Al. Presented By: Saim Mehmood Ahmadreza Jeddi
30 pages
Yang 20 A
No ratings yet
Yang 20 A
16 pages
10 Graph Neural Networks v2.2
No ratings yet
10 Graph Neural Networks v2.2
61 pages
A Gentle Introduction To Graph Neural Networks
No ratings yet
A Gentle Introduction To Graph Neural Networks
9 pages
19MAM81-GRLmidsem 1 answer key - Copy
No ratings yet
19MAM81-GRLmidsem 1 answer key - Copy
14 pages
Content Augmented Graph Neural Networks
No ratings yet
Content Augmented Graph Neural Networks
15 pages
04-GNN2
No ratings yet
04-GNN2
73 pages
199 Fast Node Embeddings Learning - 13
No ratings yet
199 Fast Node Embeddings Learning - 13
11 pages
PINE-T-PAMI
No ratings yet
PINE-T-PAMI
19 pages
822 2020 Metapath Aggregated Graph Neural Network For Heterogeneou Graph Embedding
No ratings yet
822 2020 Metapath Aggregated Graph Neural Network For Heterogeneou Graph Embedding
11 pages
Lifelong Graph Learning: Preprint. Under Review
No ratings yet
Lifelong Graph Learning: Preprint. Under Review
12 pages
Lecture 5 - Analyzing Massive Graphs Part II
No ratings yet
Lecture 5 - Analyzing Massive Graphs Part II
7 pages
GNNs
No ratings yet
GNNs
28 pages
Graph Representation Learning
No ratings yet
Graph Representation Learning
141 pages
GNNS
No ratings yet
GNNS
7 pages
Machine Learning On Graphs: A Model and Comprehensive Taxonomy
No ratings yet
Machine Learning On Graphs: A Model and Comprehensive Taxonomy
49 pages
Bacciu 2020
No ratings yet
Bacciu 2020
62 pages
Sub2vec Pakdd18
No ratings yet
Sub2vec Pakdd18
12 pages
GNN-Foundations-Frontiers-and-Applications-chapter2
No ratings yet
GNN-Foundations-Frontiers-and-Applications-chapter2
10 pages
2024_Introduction to Graph Neural Networks A Starting
No ratings yet
2024_Introduction to Graph Neural Networks A Starting
49 pages
Chap7 GNN (20240229) - DL4H Practioner Guide
No ratings yet
Chap7 GNN (20240229) - DL4H Practioner Guide
37 pages
nguyen20c
No ratings yet
nguyen20c
11 pages
Unit I Graph Theory and concepts
No ratings yet
Unit I Graph Theory and concepts
35 pages
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
No ratings yet
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
107 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
The Secret of Logic Gates: Unveiling the Power of Light Dependent Resistors
From Everand
The Secret of Logic Gates: Unveiling the Power of Light Dependent Resistors
GURUPRASAD N H
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
Halim Sir
No ratings yet
Halim Sir
133 pages
Tutorial1
No ratings yet
Tutorial1
2 pages
KST ServoGun FC 21 en
100% (1)
KST ServoGun FC 21 en
125 pages
Basic Mathematics _ DPP 13 (of Lec 16) __ Arjuna JEE 2025 (1)
No ratings yet
Basic Mathematics _ DPP 13 (of Lec 16) __ Arjuna JEE 2025 (1)
3 pages
3.1_teacher
No ratings yet
3.1_teacher
3 pages
Exploring Simple Siamese Representation Learning
No ratings yet
Exploring Simple Siamese Representation Learning
10 pages
Children's Digest - April 2006
No ratings yet
Children's Digest - April 2006
99 pages
Material Test Certificates 2.1, 2.2, 3.1, 3.2
100% (1)
Material Test Certificates 2.1, 2.2, 3.1, 3.2
4 pages
Relevance of Free Jet Model For Soil Erosion by Impinging Jets
No ratings yet
Relevance of Free Jet Model For Soil Erosion by Impinging Jets
15 pages
The Research Problem and The Research Title
100% (1)
The Research Problem and The Research Title
15 pages
101 Stories For Enhancing Happiness and Well-Being - Using Metaphors in Positive Psychology and Therapy (PDFDrive) - 208-210
No ratings yet
101 Stories For Enhancing Happiness and Well-Being - Using Metaphors in Positive Psychology and Therapy (PDFDrive) - 208-210
3 pages
Personality, Lifestyle, and Self-Concept
No ratings yet
Personality, Lifestyle, and Self-Concept
27 pages
835/practical Research 1 PDF
No ratings yet
835/practical Research 1 PDF
6 pages
Human resources assignment
No ratings yet
Human resources assignment
17 pages
PDF The Anatomy of Murder Ethical Transgressions and Anatomical Science during the Third Reich 1st Edition Sabine Hildebrandt download
100% (1)
PDF The Anatomy of Murder Ethical Transgressions and Anatomical Science during the Third Reich 1st Edition Sabine Hildebrandt download
50 pages
Applied Statistics and Probability For Engineers Chapter - 2
No ratings yet
Applied Statistics and Probability For Engineers Chapter - 2
16 pages
Koraput Coffee Dissertation
No ratings yet
Koraput Coffee Dissertation
65 pages
Week 8 Worksheets For Qa
No ratings yet
Week 8 Worksheets For Qa
10 pages
Immediate download Teaching and Learning in a Digital World Proceedings of the 20th International Conference on Interactive Collaborative Learning Volume 2 1st Edition Michael E. Auer ebooks 2024
100% (5)
Immediate download Teaching and Learning in a Digital World Proceedings of the 20th International Conference on Interactive Collaborative Learning Volume 2 1st Edition Michael E. Auer ebooks 2024
55 pages
Cree Xlamp Xm-l2 Leds: Product Family Data Sheet
No ratings yet
Cree Xlamp Xm-l2 Leds: Product Family Data Sheet
12 pages
Geography Mary Grace Rosagaran
No ratings yet
Geography Mary Grace Rosagaran
4 pages
Lecture 10 CIE 562
No ratings yet
Lecture 10 CIE 562
16 pages
Hazen Williams Equation
100% (1)
Hazen Williams Equation
5 pages
Backlog Exam Schedule-20-11-2024
No ratings yet
Backlog Exam Schedule-20-11-2024
6 pages
Reactive Power Compensation
No ratings yet
Reactive Power Compensation
8 pages
Spiritualist Manual - Various
No ratings yet
Spiritualist Manual - Various
215 pages
Predictors of Work-Family Conflict Among Women Employees
No ratings yet
Predictors of Work-Family Conflict Among Women Employees
4 pages

Graph Neural Network Node Emending - Node Edge and Sub Graph

Uploaded by

Graph Neural Network Node Emending - Node Edge and Sub Graph

Uploaded by

CS224W: Machine Learning with Graphs

Jure Leskovec, Stanford University

Input Structured Learning

Feature engineering Downstream

Feature Representation Learning -- Downstream

matrix, each column is a node

one column per node

Each node is assigned a unique

Many methods: DeepWalk, node2vec

▪ Decoder: based on node similarity.

 Should two nodes have a similar embedding if

Non-linear functions used to produce predicted probabilities

2. Optimize embeddings to encode these

2. Efficiency: Do not need to consider all node

 Idea: Learn node embedding such that nearby

 Given a node 𝑢, how do we define nearby

▪ 𝑁𝑅 (𝑢) is the neighborhood of node 𝑢 by strategy 𝑅

 Given node 𝑢, we want to learn feature

• Intuition: Optimize embeddings 𝒛𝑢 to maximize

sum over all sum over nodes 𝑣 predicted probability of 𝑢

Optimizing random walk embeddings =

Nested sum over nodes gives

The normalization term from the softmax is

 Solution: Negative sampling Negative Sampling is a form of Noise

≈ log 𝜎 𝐳𝑢T 𝐳𝑣 − σ𝑘𝑖=1 log 𝜎 𝐳𝑢T 𝐳𝑛𝑖 , 𝑛𝑖 ~𝑃𝑉

sigmoid function random distribution

▪ Sample 𝑘 negative nodes each with prob.

▪ Gradient Descent: a simple way to minimize ℒ :

▪ Iterate until convergence:

▪ Iterate until convergence: ℒ (𝑢) = ෍ −log(𝑃(𝑣|𝐳𝑢 ))

We can efficiently approximate this using

Idea: use flexible, biased

Two classic strategies to define a neighborhood

Idea: Remember where the walk came from

 𝑝, 𝑞 model transition probabilities

▪ DFS-like walk: Low value of 𝑞 on distance from 𝑠!

𝑁𝑅 (𝑢) are the nodes visited by the biased walk

 Alternative optimization schemes:

 Network preprocessing techniques:

 Used by Duvenaud et al., 2016 to classify

 Proposed by Li et al., 2016 as a general

Anonymous Walk Embeddings, ICML 2018 https://arxiv.org/pdf/1805.11921.pdf

 Step 1: node A node 1

 Note: Random walk w2 gives

 How many random walks 𝑚 do we need?

How to embed walks?

 Learn to predict walks that co-occur in 𝚫-size

▪ Where 𝑇 is the total number of walks

 Approach 2: Create super-node that spans the

 Approach 3: Anonymous Walk Embeddings

 Extension to Graph embedding: Node embedding

You might also like