09 Node2vec
09 Node2vec
09 Node2vec
?
? Machin
e
? Learni
ng
Node
classification
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2
?
? x
? Machin
e
Learni
ng
Ra Structur Learnin
Mode
w ed g
l
Dat Data Algorith
a m
Feature Automaticall Downstrea
Engineeri y learn the m
ng features task
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5
Goal: Efficient task-independent feature learning
for machine learning
in networks!
nod ve
ƒ:𝑢 →
e c
ℝ𝑑
u
ℝ𝑑
Feature
10/23/18 representation,
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6
embedding
◾ Task: We map each node in a network into a
low-dimensional space
Distributed representation for nodes
Similarity of embedding between nodes indicates
their network similarity
Encode network information and generate node
representation
17
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7
◾ 2D embedding of nodes of the Zachary’s
Karate Club network:
Need to
define!
Similarity of u and
similarity(u, v) ⇡ z>
v zu
v in the original
dot product between
node
network
10/23/18
embeddings
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16
◾ Simplest encoding approach: encoder is just
an embedding-lookup
E N C (v) =
Z Z v matrix, each column is
d⇥|V| node
R embedding [what we
v I |V learn!]
indicator vector, all
zeroes except a one
in column indicating
node v
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17
◾ Simplest encoding approach: encoder is just
an embedding-lookup
embedding vector for a
embeddi specific
ng node
matrix
Dimension/
Z size of
embeddings
=
one column per
node
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18
Simplest encoding approach: encoder is just an
embedding-lookup
some strategy R
starting from each node on the graph using
starting from u
multiset* of nodes visited on random walks
@
z 𝑢 parametrization:
∑ B ∈ C :;<(𝑧 𝑛 ⋅$ node 𝑢 (out of all
be most similar to
nodes 𝑛).
◾ Softmax
@)
Intuition: ∑ i exp 𝑥i
max
≈
i
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu exp(𝑥 i) 28
Putting it all together:
✓ ◆
exp(z u> z v )
L = — P
X log n2V exp(z u > z n )
v 2 N R (u ) from 𝑢
walks starting random walk
L
But doing this naively is too expensive!!
X X ✓ ◆
exp(z u> z v )
L = — P
exp(z>u z n )
u2V v2N R
(u ) log n2V
O(|V|2) complexity!
Nested sum over nodes gives
X X ✓ ◆
exp(z u> z v )
L = — P
exp(z>u z n )
u2V v2N R
(u ) log n2V
Xk
https://arxiv.org/pdf/1402.3722.pdf
⇡ log(σ(z> z
u v )) — log(σ(z >
u z n i )), n i ⇠
i =1 PV
sigmoid random distribution
function
between 0 and
over all nodes
𝑛i
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 33
✓ ◆
exp(z >
u zv )
log P random distribution
n2V exp(z>u z n ) over all nodes
Xk
⇡ log(σ(z> z
u v )) — log(σ(z >
u z n i )), n i ⇠
degree
Two considerations for 𝑘 (# negative samples):
1. Higher 𝑘 gives more robust estimates
2. Higher 𝑘 corresponds to higher prior on negative events In
practice 𝑘 =5-20
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 34
each node on the graph using some strategy R.
1. Run short fixed-length random walks starting from
starting from u
multiset of nodes visited on random walks
node embeddings
node 𝑢
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 37
Idea: use flexible, biased random walks that can
trade off between local and global views of the
network (Grover and Leskovec, 2016).
s1 s2 s8
s7
BFS
u s6
s4 s9
s3 s5 DFS
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 38
𝑛 𝑅 𝑢 of a given node 𝑢:
Two classic strategies to define a neighborhood
s1 s2 s8
s7
BFS
u s6
s4 s9
s3 s5
𝑛 𝐵𝐹𝑆 𝑢
= { 𝑠 * ,
𝑠2 , 𝑠3 }
Local microscopic view
Global macroscopic view
𝑛 𝐷𝐹𝑆
= { 𝑠0 ,
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 39
Biased fixed-length random walk 𝑅 that given
a node 𝑢 generates neighborhood 𝑛 𝑅 𝑢
◾ Two parameters:
Return parameter 𝑝:
In-out parameter %:
Return back to the previous node
u s1
𝑠(
Back to
1/ are
s1 1/ 𝑞 s unnormaliz
𝑝
u
◾ 𝑝, 𝑞 model transition
4
ed
probabilitie
probabilities
𝑝 … return parameter
s
𝑞 … ”walk away” parameter
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 42
◾ Walker came over edge (s1 , w) and is at
w. Where to go next?
1
1/𝑞 1/
Target 𝑡
(𝑠 2 , 𝑡)
Prob. Dist.
w
s2
𝑝
3 s
1/
→ 1
w s 0
𝑞 s
1
1/
1/𝑞
1
s1 s
𝑝
u
1/𝑞
4 2
s
walk
10/23/18 4
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 43
◾ 2) Simulate 𝑟 random walks of length 𝑙
◾ 1) Compute random walk probabilities
Linear-time complexity.
All 3 steps are individually parallelizable
BFS: DFS:
Micro-view of Macro-view
of
neighbourho neighbourho
od
10/23/18
od
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 45
Interactions of characters in a novel:
0.15 0.15
e
r e
r
sc sc
performance
performance
o1 0.10 o1 0.10
F F
Predictive
- -
Predictive
o o
c c
r r
a a
M M
0.05 0.05
0.00 0.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6
Fraction of missing Fraction of additional
edges edges
How does predictive performance change as we
◾ randomly remove a fraction of edges (left)
◾ randomly add a fraction of edges (right)
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 47
◾ Different kinds of biased random walks:
Based on node attributes (Dong et al., 2017).
Based on a learned weights (Abu-El-Haija et al., 2017)
◾ Alternative optimization schemes:
Directly optimize based on 1-hop and 2-hop random walk
probabilities (as in LINE from Tang et al. 2015).
◾ Network preprocessing techniques:
Run random walks on modified versions of the original
network (e.g., Ribeiro et al. 2017’s struct2vec, Chen et al.
2016’s HARP).
◾ Basic idea:Embed nodes so that distances in
embedding space reflect node similarities in
the original network.
◾ Different notions of node similarity:
Adjacency-based (i.e., similar if connected)
Multi-hop similarity definitions.
Random walk approaches (covered today)
z𝐺
◾ Tasks:
Classifying toxic vs. non-toxic molecules
Identifying anomalous graphs
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 53
Simple idea:
𝑧𝐺 = % 𝑧𝑣
𝑣∈𝐺
◾ Used by Duvenaud et al., 2016 to classify
molecules based on their graph
structure Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 54
◾ Idea: Introduce a “virtual node” to represent
the (sub)graph and run a standard graph
embedding technique
◾ For example:
Set 𝑙 = 3
Then we can represent the graph as a 5-dim vector
Since there are 5 anonymous walks 𝑎i of length 3: 111,
of length 𝑙 = 7. If
anonymous walks
s = 0.1 and # =
we set
z&
sum/avg/concatenation of walk embeddings
𝑃 𝑤 𝑤 , … , 𝑤 =
next walk can be predicted
Set z𝑡& s.t.𝑡 +,
𝑢 𝑢 𝑢
0(𝑧)
𝑡
Where 𝑤𝑡 is a 𝑡-th
we maximize
𝑢
node 𝑢
walk starting at
random
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 60
Run 𝑇 different random walks from 𝑢 each of
length 𝑙: 𝑁𝑅 𝑢 = ( )𝑢 , (2 𝑢 …
◾
(𝑢
Let 𝑎0 be its anonymous 𝑇 of walk ( 0
version
1
◾
�
𝑇𝑡 : ;
where: Δ… ( 𝑡context
–))
window
𝑃 (𝑡 (𝑡 – ; , … , EFG(𝑦 I J
size
)
(𝑡 – )
∑i𝑦
𝑦 (𝑡 𝑧
= 𝑏 + ;) ∑0:; EFG(𝑦(I i ))
=
𝑈 ⋅
where 𝑏 ∈ ℝ, 𝑈 ∈ ℝ ) 𝐷, 𝑧
0 0 is the embedding of the anonymized version of
walk ( 0
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 61
Anonymous Walk Embeddings, ICML 2018 https://arxiv.org/pdf/1805.11921.pdf
We discussed 3 ideas to graph embeddings
◾ Approach 1: Embed nodes and sum/avg them
◾ Approach 2: Create super-node that spans the
(sub) graph and then embed that node