0% found this document useful (0 votes)

38 views60 pages

09 Node2vec

community detection

Uploaded by

pratibha joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views60 pages

09 Node2vec

community detection

Uploaded by

pratibha joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 60

CS224W: Analysis of

Networks Jure Leskovec,

Stanford University
http://cs224w.stanf
ord.edu
? ?

?
? Machin
e
? Learni
ng

Node
classification
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2
?

? x
? Machin
e
Learni
ng

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4

◾ (Supervised) Machine Learning Lifecycle
requires feature engineering every
single time!

Ra Structur Learnin
Mode
w ed g
l
Dat Data Algorith
a m
Feature Automaticall Downstrea
Engineeri y learn the m
ng features task
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5
Goal: Efficient task-independent feature learning
for machine learning
in networks!

nod ve

ƒ:𝑢 →
e c

ℝ𝑑
u
ℝ𝑑
Feature
10/23/18 representation,
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6

embedding
◾ Task: We map each node in a network into a
low-dimensional space
 Distributed representation for nodes
 Similarity of embedding between nodes indicates
their network similarity
 Encode network information and generate node
representation

17
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7
◾ 2D embedding of nodes of the Zachary’s
Karate Club network:

Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations.

KDD 10/23/18
2014. Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8
◾ Modern deep learning toolbox is designed for
simple sequences or grids.
 CNNs for fixed-size images/grids….

 RNNs or word2vec for text/sequences…

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9

◾ But networks are far more complex!
 Complex topographical structure
(i.e., no spatial locality like grids)

 No fixed node ordering or reference point

(i.e., the isomorphism problem)
 Often dynamic and have multimodal features.
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10
◾ Assume we have a graph G:
 V is the vertex set.
 A is the adjacency matrix (assume binary).
 No node features or extra information is used!

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

◾ Goal is to encode nodes so that similarity in
the embedding space (e.g., dot product)
approximates similarity in the original
network

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

Goal similarity(u, v) ⇡ z> v zu
in the original network
: Similarity of the embedding

Need to
define!

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14

1. Define an encoder (i.e., a mapping from
nodes to embeddings)
2. Define a node similarity function (i.e., a
measure of similarity in the original
network).
3. Optimize the parameters of the encoder so
that:
>
similarity(u, v) ⇡ v zu
in the original network
z
Similarity of the embedding

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15

◾ Encoder maps each node to a low-
dimensional vector d-
E N C (v) = z v dimension
node in the input al
graph embeddin
◾ Similarity function specifies
g how relationships
in vector space map to relationships in the
original network

Similarity of u and
similarity(u, v) ⇡ z>
v zu

v in the original
dot product between
node
network
10/23/18
embeddings
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16
◾ Simplest encoding approach: encoder is just
an embedding-lookup

E N C (v) =
Z Z v matrix, each column is
d⇥|V| node
R embedding [what we
v I |V learn!]
indicator vector, all
zeroes except a one
in column indicating
node v
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17
◾ Simplest encoding approach: encoder is just
an embedding-lookup
embedding vector for a
embeddi specific
ng node
matrix
Dimension/
Z size of
embeddings
=
one column per
node
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18
Simplest encoding approach: encoder is just an
embedding-lookup

Each node is assigned a unique

embedding vector

Many methods: node2vec, DeepWalk,

LINE

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19

◾ Key choice of methods is how they define
node similarity.
◾ E.g., should two nodes have similar
embeddings if they….
 are connected?
 share neighbors?
 have similar “structural roles”?
 …?

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20

Material based on:
• Perozzi et al. 2014. DeepWalk: Online Learning of Social
Representations. KDD.
• Grover et al. 2016. node2vec: Scalable Feature Learning for
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21
Networks. KDD.
u and v co-
probability that
z u zv >
occur on a
⇡ random walk
over the
network
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 22
Estimate probability of visiting node 𝑣 on
a random walk starting from node 𝑢 using
1.

some random walk strategy 𝑅

2. Optimize embeddings to encode these

random walk statistics:
Similarity (here: dot product=cos(())
encodes random walk “similarity”
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 23
1. Expressivity: Flexible stochastic definition of
node similarity that incorporates both local
and higher-order neighborhood information

2. Efficiency: Do not need to consider all node

pairs when training; only need to consider
pairs that co-occur on random walks

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24

◾ Intuition: Find embedding of nodes to
d-dimensions that preserves
similarity

◾ Idea: Learn node embedding such that nearby

nodes are close together in the network
◾ Given a node 𝑢, how do we define
nearby nodes?
 𝑛 𝑅 𝑢 … neighbourhood of 𝑢 obtained by
10/23/18
some strategy 𝑅
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 25
◾ Given 𝐺 = (𝑉, 𝐸),
◾ Our goal is to learn a mapping 𝑧: 𝑢 →
ℝ𝑑 .
◾ Log-likelihood objective:
1
max 2𝑢 ∈𝑉log P(𝑛 ; (𝑢)| 𝑧𝑢 )
 where 𝑛 𝑅 (𝑢) is neighborhood of
node 𝑢

◾ Given node 𝑢, we want to learn feature

its neighborhood 𝑛 ; (𝑢)

representations predictive of nodes in
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 26
1. Run short fixed-length random walks

some strategy R
starting from each node on the graph using

2. For each node 𝑢 collect 𝑛 𝑅 (𝑢), the

starting from u
multiset* of nodes visited on random walks

node 𝑢, predict its neighbors 𝑛 ( (𝑢)

3. Optimize embeddings to according to: Given

maxz - log P(𝑛 ( (𝑢)| 𝑧𝑢 )

𝑢 ∈𝑉
*𝑛𝑅 (𝑢) can have repeat elements since nodes can be visited multiple times on
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 27
random walks
max % log P(𝑁/ (𝑢)|
$
𝑧𝑢 )𝑢 ∈𝑉
◾ Assumption: Conditional likelihood
factorizes
over the set of neighbors:
log P(𝑁/ (𝑢)|𝑧𝑢 ) = %
log P(z𝑣 | 𝑧𝑢 )
Pr 𝑣 𝑧 ) = ⋅$𝑣∈𝑁𝑣𝑠)(𝑢)
: ; <(𝑧 We want node 𝑣 to
Why softmax?

@
z 𝑢 parametrization:
∑ B ∈ C :;<(𝑧 𝑛 ⋅$ node 𝑢 (out of all
be most similar to

nodes 𝑛).
◾ Softmax
@)
Intuition: ∑ i exp 𝑥i
max
≈
i
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu exp(𝑥 i) 28
Putting it all together:
✓ ◆
exp(z u> z v )
L = — P
X log n2V exp(z u > z n )

sum over all sum over nodes 𝑣

u 2V
predicted probability of 𝑢
nodes 𝑢 X seen on random and 𝑣 co-occuring on

v 2 N R (u ) from 𝑢
walks starting random walk

Optimizing random walk embeddings =

Finding embeddings zu that minimize
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 30

L
But doing this naively is too expensive!!

X X ✓ ◆
exp(z u> z v )
L = — P
exp(z>u z n )
u2V v2N R
(u ) log n2V

O(|V|2) complexity!
Nested sum over nodes gives

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31

But doing this naively is too expensive!!

X X ✓ ◆
exp(z u> z v )
L = — P
exp(z>u z n )
u2V v2N R
(u ) log n2V

The normalization term from the softmax is the

culprit… can we approximate it?

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 32

Why is the approximation valid?
Technically, this is a different objective.
◾ Solution: Negative sampling But Negative Sampling is a form of
Noise Contrastive Estimation (NCE)
which approx. maximizes the log
probability of softmax.
✓ ◆ New formulation corresponds to using

exp(zu> z v ) to distinguish the target node 𝑣 from

a logistic regression (sigmoid func.)

log P nodes 𝑛i sampled from background

exp(z >z ) distribution 𝑃𝑣.
n 2V u n
More at

Xk
https://arxiv.org/pdf/1402.3722.pdf

⇡ log(σ(z> z
u v )) — log(σ(z >
u z n i )), n i ⇠
i =1 PV
sigmoid random distribution
function
between 0 and
over all nodes

normalize against 𝑘 random “negative samples”

Instead of normalizing w.r.t. all nodes, just
(makes1)each term a
“probability”

𝑛i
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 33
✓ ◆
exp(z >
u zv )
log P random distribution
n2V exp(z>u z n ) over all nodes
Xk
⇡ log(σ(z> z
u v )) — log(σ(z >
u z n i )), n i ⇠

 Sample 𝑘 negative nodes proportional to

i =1 PV

degree
 Two considerations for 𝑘 (# negative samples):
1. Higher 𝑘 gives more robust estimates
2. Higher 𝑘 corresponds to higher prior on negative events In
practice 𝑘 =5-20
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 34
each node on the graph using some strategy R.
1. Run short fixed-length random walks starting from

2. For each node u collect NR(u), the

starting from u
multiset of nodes visited on random walks

3. Optimize embeddings using Stochastic

Descent:
X X
Gradient
L = — log(P (v|
z u ))
We can efficiently approximate this
u2V usingv 2negative
N R ( u ) sampling!
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 35
◾ So far we have described how to optimize
embeddings given random walk statistics
◾ What strategies should we use to run these
random walks?
 Simplest idea: Just run fixed-length, unbiased
random walks starting from each node (i.e.,
DeepWalk from Perozzi et al., 2013).
 The issue is that such notion of similarity is too
constrained
 How can we generalize this?

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 36

◾ Goal: Embed nodes with similar network
neighborhoods close in the feature space

◾ We frame this goal as prediction-task independent

maximum likelihood optimization problem

neighborhood 𝑛 𝑅 (𝑢) of node 𝑢 leads to rich

◾ Key observation: Flexible notion of network

node embeddings

Develop biased 2nd order random walk 𝑅 to

generate network neighborhood 𝑛 𝑅 (𝑢) of
◾

node 𝑢
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 37
Idea: use flexible, biased random walks that can
trade off between local and global views of the
network (Grover and Leskovec, 2016).

s1 s2 s8
s7
BFS
u s6

s4 s9
s3 s5 DFS
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 38
𝑛 𝑅 𝑢 of a given node 𝑢:
Two classic strategies to define a neighborhood

s1 s2 s8
s7
BFS
u s6

s4 s9
s3 s5

Walk of length 3 (𝑛𝑅 𝑢 of size 3):

DFS

𝑛 𝐵𝐹𝑆 𝑢
= { 𝑠 * ,
𝑠2 , 𝑠3 }
Local microscopic view
Global macroscopic view
𝑛 𝐷𝐹𝑆
= { 𝑠0 ,
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 39
Biased fixed-length random walk 𝑅 that given
a node 𝑢 generates neighborhood 𝑛 𝑅 𝑢
◾ Two parameters:
 Return parameter 𝑝:

 In-out parameter %:
 Return back to the previous node

 Moving outwards (DFS) vs. inwards (BFS)

 Intuitively, 𝑞 is the “ratio” of BFS vs. DFS

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 40

Biased 2nd-order random walks explore network
neighborhoods:
 Rnd. walk just traversed edge (𝑠1, 𝑤) and is now
at 𝑤
 Insight: Neighbors of 𝑤 can only be:
Same s
distance
2 to 𝑠 ( s 3
w
from 𝑠 (
Farther

u s1
𝑠(
Back to

Idea: Remember where that walk came from

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 41
◾ Walker came over edge (s1 , w) and is at
w. Where to go next?
1
s2 1/ 1/𝑝, 1/𝑞, 1
w 𝑞
s3

1/ are
s1 1/ 𝑞 s unnormaliz
𝑝
u

◾ 𝑝, 𝑞 model transition
4
ed
probabilitie
probabilities
 𝑝 … return parameter
s
 𝑞 … ”walk away” parameter
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 42
◾ Walker came over edge (s1 , w) and is at
w. Where to go next?
1
1/𝑞 1/
Target 𝑡
(𝑠 2 , 𝑡)
Prob. Dist.

w
s2
𝑝
3 s
1/
→ 1
w s 0

𝑞 s
1

1/
1/𝑞
1
s1 s
𝑝
u

1/𝑞
4 2

 BFS-like walk: Low value of 𝑝

2
2
s transition prob.
Unnormalized

 DFS-like walk: Low value of 𝑞

segmented

𝑛 𝑅 (𝑢) are the nodes visited by the biased

distance from 𝑠!
3 based on

s
walk
10/23/18 4
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 43
◾ 2) Simulate 𝑟 random walks of length 𝑙
◾ 1) Compute random walk probabilities

starting from each node 𝑢

◾ 3) Optimize the node2vec objective using
Stochastic Gradient Descent

Linear-time complexity.
All 3 steps are individually parallelizable

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 44

BFS: DFS:
Micro-view of Macro-view
of
neighbourho neighbourho
od
10/23/18
od
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 45
Interactions of characters in a novel:

p=1, q=2 p=1, q=0.5

Microscopic view of Macroscopic view of
the network the network
neighbourhood neighbourhood
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 46
0.20 0.20

0.15 0.15

e
r e
r
sc sc

performance
performance

o1 0.10 o1 0.10
F F

Predictive
- -
Predictive

o o
c c
r r
a a
M M
0.05 0.05

0.00 0.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6
Fraction of missing Fraction of additional
edges edges
How does predictive performance change as we
◾ randomly remove a fraction of edges (left)
◾ randomly add a fraction of edges (right)
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 47
◾ Different kinds of biased random walks:
 Based on node attributes (Dong et al., 2017).
 Based on a learned weights (Abu-El-Haija et al., 2017)
◾ Alternative optimization schemes:
 Directly optimize based on 1-hop and 2-hop random walk
probabilities (as in LINE from Tang et al. 2015).
◾ Network preprocessing techniques:
 Run random walks on modified versions of the original
network (e.g., Ribeiro et al. 2017’s struct2vec, Chen et al.
2016’s HARP).

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 48

◾ How to use embeddings z i of nodes:
 Clustering/community detection: Cluster points 𝑧$
 Node classification: Predict label ƒ(𝑧 $ ) of node (
based on 𝑧$
 Link prediction: Predict edge ((, *) based on
ƒ(𝑧 $ , 𝑧+ )
 Where we can: concatenate, avg, product, or take a
difference between the embeddings:
 Concatenate: ƒ (𝑧$, 𝑧+)= 𝑔([𝑧$, 𝑧+])
 Hadamard: ƒ (𝑧$, 𝑧+)= 𝑔(𝑧$ ∗ 𝑧+) (per coordinate
product)
10/23/18  Sum/Avg:Jureƒ Leskovec,
(𝑧$,Stanford
𝑧+CS224W:
)= 𝑔(𝑧Analysis$ + 𝑧http://cs224w.stanford.edu
of Networks, +) 49


◾ Basic idea:Embed nodes so that distances in
embedding space reflect node similarities in
the original network.
◾ Different notions of node similarity:
 Adjacency-based (i.e., similar if connected)
 Multi-hop similarity definitions.
 Random walk approaches (covered today)

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 50

◾ So what method should I use..?
◾ No one method wins in all cases….
 E.g., node2vec performs better on node classification
while multi-hop methods performs better on link
prediction (Goyal and Ferrara, 2017 survey)
◾ Random walk approaches are generally more
efficient
◾ In general: Must choose def’n of node
similarity that matches your application!

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 51

𝐺
◾ Goal: Want to embed an entire graph

z𝐺

◾ Tasks:
 Classifying toxic vs. non-toxic molecules
 Identifying anomalous graphs
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 53
Simple idea:

technique on the (sub)graph 𝐺

◾ Run a standard graph embedding

embeddings in the (sub)graph 𝐺

◾ Then just sum (or average) the node

𝑧𝐺 = % 𝑧𝑣
𝑣∈𝐺
◾ Used by Duvenaud et al., 2016 to classify
molecules based on their graph
structure Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 54
◾ Idea: Introduce a “virtual node” to represent
the (sub)graph and run a standard graph
embedding technique

◾ Proposed by Li et al., 2016 as a general

technique for subgraph embedding
Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 55
States in anonymous walk correspond to the
index of the first time we visited the node in a
random walk

Anonymous Walk Embeddings, ICML 2018

https://arxiv.org/pdf/1805.11921.pdf
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 56
Number of anonymous walks grows exponentially:
 There are 5 anon. walks 𝑎i of length 3:
𝑎1=111, 𝑎$=112, 𝑎3= 121, 𝑎&= 122, 𝑎5= 123
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 57
◾ Enumerate all possible anonymous walks 𝑎 i of 𝑙
steps and record their counts
◾ Represent the graph as a probability distribution
over these walks

◾ For example:
 Set 𝑙 = 3
 Then we can represent the graph as a 5-dim vector
 Since there are 5 anonymous walks 𝑎i of length 3: 111,

 𝑍𝐺[)] = probability of anonymous walk 𝑎i in 𝐺

112, 121, 122, 123

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 58

◾ Complete counting of all anonymous walks in a
large graph may be infeasible

distribution: Generate independently a set of 𝑚

◾ Sampling approach to approximating the true
random walks and calculate its corresponding
empirical distribution of anonymous walks
◾ How many random walks 𝑚 do we need?
s with prob. less than #:
 We want the distribution to have error of more than

There are $ = 877

For example:

of length 𝑙 = 7. If
anonymous walks

s = 0.1 and # =
we set

where: $… number of anon. walks of 0.01 then random

𝑚=122500
length 𝑙.
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu we need to
walks 59
generate
Learn embedding z i of every anonymous walk
𝑎 iThe embedding of a graph 𝐺 is then
◾

z&
sum/avg/concatenation of walk embeddings

How to embed walks?

◾ Idea: Embed walks s.t.

𝑃 𝑤 𝑤 , … , 𝑤 =
next walk can be predicted
 Set z𝑡& s.t.𝑡 +,
𝑢 𝑢 𝑢

0(𝑧)
𝑡
Where 𝑤𝑡 is a 𝑡-th
we maximize
𝑢

node 𝑢
walk starting at
random
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 60
Run 𝑇 different random walks from 𝑢 each of
length 𝑙: 𝑁𝑅 𝑢 = ( )𝑢 , (2 𝑢 …
◾

(𝑢
 Let 𝑎0 be its anonymous 𝑇 of walk ( 0
version

Learn to predict walks that co-occur in 1-size window

Estimate embedding 𝑧0 of anonymous walk 𝑎0 of ( 0 :
◾

1
◾
�

max 8 log 𝑃(( 𝑡 |( 𝑡 – ; , … ,

�

𝑇𝑡 : ;
where: Δ… ( 𝑡context
–))
window
 𝑃 (𝑡 (𝑡 – ; , … , EFG(𝑦 I J
size
)
(𝑡 – )
∑i𝑦

 𝑦 (𝑡 𝑧
= 𝑏 + ;) ∑0:; EFG(𝑦(I i ))
=
𝑈 ⋅
 where 𝑏 ∈ ℝ, 𝑈 ∈ ℝ ) 𝐷, 𝑧
0 0 is the embedding of the anonymized version of
walk ( 0
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 61
Anonymous Walk Embeddings, ICML 2018 https://arxiv.org/pdf/1805.11921.pdf
We discussed 3 ideas to graph embeddings
◾ Approach 1: Embed nodes and sum/avg them
◾ Approach 2: Create super-node that spans the
(sub) graph and then embed that node

◾ Approach 3: Anonymous Walk Embeddings

 Idea 1: Represent the graph via the distribution over all
the anonymous walks
 Idea 2: Sample the walks to approximate the
distribution
 Idea 3: Embed anonymous walks
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 62

Node2vec: Scalable Feature Learning For Networks: Aditya Grover Et Al. Presented By: Saim Mehmood Ahmadreza Jeddi
No ratings yet
Node2vec: Scalable Feature Learning For Networks: Aditya Grover Et Al. Presented By: Saim Mehmood Ahmadreza Jeddi
30 pages
Torque Values For Nut
No ratings yet
Torque Values For Nut
1 page
Electrical and Electronics Measurement and Instrumentation
100% (1)
Electrical and Electronics Measurement and Instrumentation
50 pages
Assembler Instructions
50% (4)
Assembler Instructions
101 pages
Optimal Design and Performance Analysis of Hydraulic Ram Pump System
No ratings yet
Optimal Design and Performance Analysis of Hydraulic Ram Pump System
16 pages
Social Network Analysis: Lakshminarayana Sadineni Assistant Professor Department of Iot & Is
No ratings yet
Social Network Analysis: Lakshminarayana Sadineni Assistant Professor Department of Iot & Is
23 pages
Nrltutorial Part2 Gnns PDF
0% (1)
Nrltutorial Part2 Gnns PDF
66 pages
Graph Neural Network Introduction
No ratings yet
Graph Neural Network Introduction
88 pages
Philosophy 1ST Prelim Notes 1
No ratings yet
Philosophy 1ST Prelim Notes 1
8 pages
05 Linkpred
No ratings yet
05 Linkpred
79 pages
04 GNN1
No ratings yet
04 GNN1
73 pages
08 GNN
No ratings yet
08 GNN
79 pages
05 GNN2
No ratings yet
05 GNN2
72 pages
CS 224W 02-Nodeemb
No ratings yet
CS 224W 02-Nodeemb
71 pages
Graph Neural Network & Traditional Neural Network Introduction
No ratings yet
Graph Neural Network & Traditional Neural Network Introduction
69 pages
07 Theory
No ratings yet
07 Theory
62 pages
02 Tradition ML
No ratings yet
02 Tradition ML
68 pages
08 Message
No ratings yet
08 Message
61 pages
04 GNNBasic
No ratings yet
04 GNNBasic
107 pages
10 KG
No ratings yet
10 KG
63 pages
1.3 Translational Equilibrium Statics
No ratings yet
1.3 Translational Equilibrium Statics
55 pages
07 Theory2
No ratings yet
07 Theory2
57 pages
04 Pagerank
No ratings yet
04 Pagerank
64 pages
09 Pagerank
No ratings yet
09 Pagerank
61 pages
GNN&Reasoning
No ratings yet
GNN&Reasoning
187 pages
The Secret of Logic Gates: Unveiling the Power of Light Dependent Resistors
From Everand
The Secret of Logic Gates: Unveiling the Power of Light Dependent Resistors
GURUPRASAD N H
No ratings yet
Stanford CS224W Limitations of Graph Neural Networks 18-Limitations
No ratings yet
Stanford CS224W Limitations of Graph Neural Networks 18-Limitations
75 pages
Unit 1 Lesson 1-5
No ratings yet
Unit 1 Lesson 1-5
24 pages
Trading Strategies Market Colour Ravi Kashyap 2018
No ratings yet
Trading Strategies Market Colour Ravi Kashyap 2018
26 pages
05 Motifs PDF
No ratings yet
05 Motifs PDF
41 pages
Tugas FTF - Annisa Vada Febriani - 2307054003 - P5
No ratings yet
Tugas FTF - Annisa Vada Febriani - 2307054003 - P5
29 pages
03 GNN1
No ratings yet
03 GNN1
71 pages
06 GNN3
No ratings yet
06 GNN3
73 pages
02 Nodeemb
No ratings yet
02 Nodeemb
71 pages
Graph Neural Network Node Emending - Node Edge and Sub Graph
No ratings yet
Graph Neural Network Node Emending - Node Edge and Sub Graph
66 pages
Peerj Cs 357
No ratings yet
Peerj Cs 357
62 pages
03 Nodeemb
No ratings yet
03 Nodeemb
66 pages
Why Are Complex Numbers Needed in Quantum Mechanics? Some Answers For The Introductory Level
No ratings yet
Why Are Complex Numbers Needed in Quantum Mechanics? Some Answers For The Introductory Level
8 pages
Dushu # Unit-3, 4 Ru TK
No ratings yet
Dushu # Unit-3, 4 Ru TK
22 pages
Ajol File Journals - 716 - Articles - 249533 - Submission - Proof - 249533 8452 596659 1 10 20230619
No ratings yet
Ajol File Journals - 716 - Articles - 249533 - Submission - Proof - 249533 8452 596659 1 10 20230619
21 pages
14 GNN
No ratings yet
14 GNN
58 pages
Thura2022-06-20 (Research Plan)
No ratings yet
Thura2022-06-20 (Research Plan)
26 pages
10 Graph Neural Networks v2.2
No ratings yet
10 Graph Neural Networks v2.2
61 pages
unit-II Node Embeddings
No ratings yet
unit-II Node Embeddings
44 pages
GNNs
No ratings yet
GNNs
28 pages
A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications
No ratings yet
A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications
20 pages
Machine Learning On Graphs: A Model and Comprehensive Taxonomy
No ratings yet
Machine Learning On Graphs: A Model and Comprehensive Taxonomy
49 pages
Exam Preparation
No ratings yet
Exam Preparation
18 pages
Embeddings Networks
No ratings yet
Embeddings Networks
19 pages
6 - Set Membership and Set Containment
No ratings yet
6 - Set Membership and Set Containment
30 pages
5 - Day 3 - Cot
No ratings yet
5 - Day 3 - Cot
11 pages
Solderless Breadboard Explorations: Unveiling the Wonders of Electronic Prototyping
From Everand
Solderless Breadboard Explorations: Unveiling the Wonders of Electronic Prototyping
GURUPRASAD N H
No ratings yet
This Talk: 1) Node Embeddings 2) Graph Neural Networks 3) Applications
No ratings yet
This Talk: 1) Node Embeddings 2) Graph Neural Networks 3) Applications
59 pages
Yang 20 A
No ratings yet
Yang 20 A
16 pages
Unit I Graph Theory and Concepts
No ratings yet
Unit I Graph Theory and Concepts
35 pages
CS 224W Fall 2023 HW2
No ratings yet
CS 224W Fall 2023 HW2
11 pages
Graph Based Data Science
No ratings yet
Graph Based Data Science
37 pages
Stanford CS224W Graph Representation Learning 09-Node2vec PDF
No ratings yet
Stanford CS224W Graph Representation Learning 09-Node2vec PDF
60 pages
Xford Presentation GNN Part 3
No ratings yet
Xford Presentation GNN Part 3
10 pages
Pine T Pami
No ratings yet
Pine T Pami
19 pages
Anonymous Walk Embeddings: Polynomial
No ratings yet
Anonymous Walk Embeddings: Polynomial
10 pages
APMOPS (SMOPS) 2008 First Round With Answers
No ratings yet
APMOPS (SMOPS) 2008 First Round With Answers
6 pages
Lecture 5 - Analyzing Massive Graphs Part II
No ratings yet
Lecture 5 - Analyzing Massive Graphs Part II
7 pages
SPSS
No ratings yet
SPSS
30 pages
DDMRP Study Material
No ratings yet
DDMRP Study Material
6 pages
Reliability in Pavement Design: Paola Dalla Valle, Nick Thom
No ratings yet
Reliability in Pavement Design: Paola Dalla Valle, Nick Thom
15 pages
Case Studies On Theory of Computation
No ratings yet
Case Studies On Theory of Computation
2 pages
Dyngraph2vec: Capturing Network Dynamics Using Dynamic Graph Representation Learning
No ratings yet
Dyngraph2vec: Capturing Network Dynamics Using Dynamic Graph Representation Learning
10 pages
19MAM81-GRLmidsem 1 Answer Key
No ratings yet
19MAM81-GRLmidsem 1 Answer Key
14 pages
Representation Learning On Graphs: Methods and Applications
No ratings yet
Representation Learning On Graphs: Methods and Applications
23 pages
Graph CNN
No ratings yet
Graph CNN
13 pages
199 Fast Node Embeddings Learning - 13
No ratings yet
199 Fast Node Embeddings Learning - 13
11 pages
GNN Foundations Frontiers and Applications Chapter2
No ratings yet
GNN Foundations Frontiers and Applications Chapter2
10 pages
Revision Numbers Ws 2
No ratings yet
Revision Numbers Ws 2
4 pages
A General View For Network Embedding As Matrix Factorization
No ratings yet
A General View For Network Embedding As Matrix Factorization
9 pages
Classify A Geometric Sequence As Finite or Infinite
100% (1)
Classify A Geometric Sequence As Finite or Infinite
3 pages
Discriminative Regularized Deep Generative Models For Semi-Supervised Learning
No ratings yet
Discriminative Regularized Deep Generative Models For Semi-Supervised Learning
10 pages
MMD Hw2
No ratings yet
MMD Hw2
2 pages
Laplace Table
No ratings yet
Laplace Table
2 pages
Node2vec: Scalable Feature Learning For Networks
No ratings yet
Node2vec: Scalable Feature Learning For Networks
10 pages
Graph Alignment Using Graph Embeddings
No ratings yet
Graph Alignment Using Graph Embeddings
8 pages
CO2 Ged102 pg.193
No ratings yet
CO2 Ged102 pg.193
3 pages
DS 3002 Project Proposal Template
No ratings yet
DS 3002 Project Proposal Template
5 pages
Math 1210 Project 2
No ratings yet
Math 1210 Project 2
3 pages
CH2114
No ratings yet
CH2114
2 pages
02 Eisenman Cardboard Architecture
No ratings yet
02 Eisenman Cardboard Architecture
12 pages
Naive Set Theory
No ratings yet
Naive Set Theory
5 pages
San Francisco Bread Co
No ratings yet
San Francisco Bread Co
3 pages
Nonlinear Solid Mechanics A Continuum Ap PDF
No ratings yet
Nonlinear Solid Mechanics A Continuum Ap PDF
2 pages
RRB NTPC 12 January 2021 Question Paper PDF
No ratings yet
RRB NTPC 12 January 2021 Question Paper PDF
3 pages

09 Node2vec

Uploaded by

09 Node2vec

Uploaded by

CS224W: Analysis of

Networks Jure Leskovec,

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4

Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations.

 RNNs or word2vec for text/sequences…

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9

 No fixed node ordering or reference point

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15

Each node is assigned a unique

Many methods: node2vec, DeepWalk,

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20

some random walk strategy 𝑅

2. Optimize embeddings to encode these

2. Efficiency: Do not need to consider all node

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24

◾ Idea: Learn node embedding such that nearby

◾ Given node 𝑢, we want to learn feature

its neighborhood 𝑛 ; (𝑢)

2. For each node 𝑢 collect 𝑛 𝑅 (𝑢), the

node 𝑢, predict its neighbors 𝑛 ( (𝑢)

maxz - log P(𝑛 ( (𝑢)| 𝑧𝑢 )

sum over all sum over nodes 𝑣

Optimizing random walk embeddings =

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31

The normalization term from the softmax is the

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 32

exp(zu> z v ) to distinguish the target node 𝑣 from

log P nodes 𝑛i sampled from background

normalize against 𝑘 random “negative samples”

 Sample 𝑘 negative nodes proportional to

2. For each node u collect NR(u), the

3. Optimize embeddings using Stochastic

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 36

◾ We frame this goal as prediction-task independent

neighborhood 𝑛 𝑅 (𝑢) of node 𝑢 leads to rich

Develop biased 2nd order random walk 𝑅 to

Walk of length 3 (𝑛𝑅 𝑢 of size 3):

 Moving outwards (DFS) vs. inwards (BFS)

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 40

Idea: Remember where that walk came from

 BFS-like walk: Low value of 𝑝

 DFS-like walk: Low value of 𝑞

𝑛 𝑅 (𝑢) are the nodes visited by the biased

starting from each node 𝑢

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 44

p=1, q=2 p=1, q=0.5

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 48

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 50

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 51

technique on the (sub)graph 𝐺

embeddings in the (sub)graph 𝐺

◾ Proposed by Li et al., 2016 as a general

Anonymous Walk Embeddings, ICML 2018

 𝑍𝐺[)] = probability of anonymous walk 𝑎i in 𝐺

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 58

distribution: Generate independently a set of 𝑚

There are $ = 877

where: $… number of anon. walks of 0.01 then random

How to embed walks?

Learn to predict walks that co-occur in 1-size window

max 8 log 𝑃(( 𝑡 |( 𝑡 – ; , … ,

◾ Approach 3: Anonymous Walk Embeddings

You might also like