0% found this document useful (0 votes)
17 views60 pages

09 Node2vec

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 60

CS224W: Analysis of

Networks Jure Leskovec,


Stanford University
http://cs224w.stanf
ord.edu
? ?

?
? Machin
e
? Learni
ng

Node
classification
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2
?

? x
? Machin
e
Learni
ng

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4


◾ (Supervised) Machine Learning Lifecycle
requires feature engineering every
single time!

Ra Structur Learnin
Mode
w ed g
l
Dat Data Algorith
a m
Feature Automaticall Downstrea
Engineeri y learn the m
ng features task
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5
Goal: Efficient task-independent feature learning
for machine learning
in networks!

nod ve

ƒ:𝑢 →
e c

ℝ𝑑
u
ℝ𝑑
Feature
10/23/18 representation,
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6

embedding
◾ Task: We map each node in a network into a
low-dimensional space
 Distributed representation for nodes
 Similarity of embedding between nodes indicates
their network similarity
 Encode network information and generate node
representation

17
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7
◾ 2D embedding of nodes of the Zachary’s
Karate Club network:

Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations.


KDD 10/23/18
2014. Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8
◾ Modern deep learning toolbox is designed for
simple sequences or grids.
 CNNs for fixed-size images/grids….

 RNNs or word2vec for text/sequences…

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 9


◾ But networks are far more complex!
 Complex topographical structure
(i.e., no spatial locality like grids)

 No fixed node ordering or reference point


(i.e., the isomorphism problem)
 Often dynamic and have multimodal features.
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10
◾ Assume we have a graph G:
 V is the vertex set.
 A is the adjacency matrix (assume binary).
 No node features or extra information is used!

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12


◾ Goal is to encode nodes so that similarity in
the embedding space (e.g., dot product)
approximates similarity in the original
network

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13


Goal similarity(u, v) ⇡ z> v zu
in the original network
: Similarity of the embedding

Need to
define!

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14


1. Define an encoder (i.e., a mapping from
nodes to embeddings)
2. Define a node similarity function (i.e., a
measure of similarity in the original
network).
3. Optimize the parameters of the encoder so
that:
>
similarity(u, v) ⇡ v zu
in the original network
z
Similarity of the embedding

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15


◾ Encoder maps each node to a low-
dimensional vector d-
E N C (v) = z v dimension
node in the input al
graph embeddin
◾ Similarity function specifies
g how relationships
in vector space map to relationships in the
original network

Similarity of u and
similarity(u, v) ⇡ z>
v zu

v in the original
dot product between
node
network
10/23/18
embeddings
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16
◾ Simplest encoding approach: encoder is just
an embedding-lookup

E N C (v) =
Z Z v matrix, each column is
d⇥|V| node
R embedding [what we
v I |V learn!]
indicator vector, all
zeroes except a one
in column indicating
node v
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17
◾ Simplest encoding approach: encoder is just
an embedding-lookup
embedding vector for a
embeddi specific
ng node
matrix
Dimension/
Z size of
embeddings
=
one column per
node
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18
Simplest encoding approach: encoder is just an
embedding-lookup

Each node is assigned a unique


embedding vector

Many methods: node2vec, DeepWalk,


LINE

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19


◾ Key choice of methods is how they define
node similarity.
◾ E.g., should two nodes have similar
embeddings if they….
 are connected?
 share neighbors?
 have similar “structural roles”?
 …?

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20


Material based on:
• Perozzi et al. 2014. DeepWalk: Online Learning of Social
Representations. KDD.
• Grover et al. 2016. node2vec: Scalable Feature Learning for
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21
Networks. KDD.
u and v co-
probability that
z u zv >
occur on a
⇡ random walk
over the
network
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 22
Estimate probability of visiting node 𝑣 on
a random walk starting from node 𝑢 using
1.

some random walk strategy 𝑅

2. Optimize embeddings to encode these


random walk statistics:
Similarity (here: dot product=cos(())
encodes random walk “similarity”
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 23
1. Expressivity: Flexible stochastic definition of
node similarity that incorporates both local
and higher-order neighborhood information

2. Efficiency: Do not need to consider all node


pairs when training; only need to consider
pairs that co-occur on random walks

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 24


◾ Intuition: Find embedding of nodes to
d-dimensions that preserves
similarity

◾ Idea: Learn node embedding such that nearby


nodes are close together in the network
◾ Given a node 𝑢, how do we define
nearby nodes?
 𝑛 𝑅 𝑢 … neighbourhood of 𝑢 obtained by
10/23/18
some strategy 𝑅
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 25
◾ Given 𝐺 = (𝑉, 𝐸),
◾ Our goal is to learn a mapping 𝑧: 𝑢 →
ℝ𝑑 .
◾ Log-likelihood objective:
1
max 2𝑢 ∈𝑉log P(𝑛 ; (𝑢)| 𝑧𝑢 )
 where 𝑛 𝑅 (𝑢) is neighborhood of
node 𝑢

◾ Given node 𝑢, we want to learn feature

its neighborhood 𝑛 ; (𝑢)


representations predictive of nodes in
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 26
1. Run short fixed-length random walks

some strategy R
starting from each node on the graph using

2. For each node 𝑢 collect 𝑛 𝑅 (𝑢), the

starting from u
multiset* of nodes visited on random walks

node 𝑢, predict its neighbors 𝑛 ( (𝑢)


3. Optimize embeddings to according to: Given

maxz - log P(𝑛 ( (𝑢)| 𝑧𝑢 )


𝑢 ∈𝑉
*𝑛𝑅 (𝑢) can have repeat elements since nodes can be visited multiple times on
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 27
random walks
max % log P(𝑁/ (𝑢)|
$
𝑧𝑢 )𝑢 ∈𝑉
◾ Assumption: Conditional likelihood
factorizes
over the set of neighbors:
log P(𝑁/ (𝑢)|𝑧𝑢 ) = %
log P(z𝑣 | 𝑧𝑢 )
Pr 𝑣 𝑧 ) = ⋅$𝑣∈𝑁𝑣𝑠)(𝑢)
: ; <(𝑧 We want node 𝑣 to
Why softmax?

@
z 𝑢 parametrization:
∑ B ∈ C :;<(𝑧 𝑛 ⋅$ node 𝑢 (out of all
be most similar to

nodes 𝑛).
◾ Softmax
@)
Intuition: ∑ i exp 𝑥i
max

i
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu exp(𝑥 i) 28
Putting it all together:
✓ ◆
exp(z u> z v )
L = — P
X log n2V exp(z u > z n )

sum over all sum over nodes 𝑣


u 2V
predicted probability of 𝑢
nodes 𝑢 X seen on random and 𝑣 co-occuring on

v 2 N R (u ) from 𝑢
walks starting random walk

Optimizing random walk embeddings =


Finding embeddings zu that minimize
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 30

L
But doing this naively is too expensive!!

X X ✓ ◆
exp(z u> z v )
L = — P
exp(z>u z n )
u2V v2N R
(u ) log n2V

O(|V|2) complexity!
Nested sum over nodes gives

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 31


But doing this naively is too expensive!!

X X ✓ ◆
exp(z u> z v )
L = — P
exp(z>u z n )
u2V v2N R
(u ) log n2V

The normalization term from the softmax is the


culprit… can we approximate it?

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 32


Why is the approximation valid?
Technically, this is a different objective.
◾ Solution: Negative sampling But Negative Sampling is a form of
Noise Contrastive Estimation (NCE)
which approx. maximizes the log
probability of softmax.
✓ ◆ New formulation corresponds to using

exp(zu> z v ) to distinguish the target node 𝑣 from


a logistic regression (sigmoid func.)

log P nodes 𝑛i sampled from background


exp(z >z ) distribution 𝑃𝑣.
n 2V u n
More at

Xk
https://arxiv.org/pdf/1402.3722.pdf

⇡ log(σ(z> z
u v )) — log(σ(z >
u z n i )), n i ⇠
i =1 PV
sigmoid random distribution
function
between 0 and
over all nodes

normalize against 𝑘 random “negative samples”


Instead of normalizing w.r.t. all nodes, just
(makes1)each term a
“probability”

𝑛i
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 33
✓ ◆
exp(z >
u zv )
log P random distribution
n2V exp(z>u z n ) over all nodes
Xk
⇡ log(σ(z> z
u v )) — log(σ(z >
u z n i )), n i ⇠

 Sample 𝑘 negative nodes proportional to


i =1 PV

degree
 Two considerations for 𝑘 (# negative samples):
1. Higher 𝑘 gives more robust estimates
2. Higher 𝑘 corresponds to higher prior on negative events In
practice 𝑘 =5-20
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 34
each node on the graph using some strategy R.
1. Run short fixed-length random walks starting from

2. For each node u collect NR(u), the

starting from u
multiset of nodes visited on random walks

3. Optimize embeddings using Stochastic


Descent:
X X
Gradient
L = — log(P (v|
z u ))
We can efficiently approximate this
u2V usingv 2negative
N R ( u ) sampling!
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 35
◾ So far we have described how to optimize
embeddings given random walk statistics
◾ What strategies should we use to run these
random walks?
 Simplest idea: Just run fixed-length, unbiased
random walks starting from each node (i.e.,
DeepWalk from Perozzi et al., 2013).
 The issue is that such notion of similarity is too
constrained
 How can we generalize this?

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 36


◾ Goal: Embed nodes with similar network
neighborhoods close in the feature space

◾ We frame this goal as prediction-task independent


maximum likelihood optimization problem

neighborhood 𝑛 𝑅 (𝑢) of node 𝑢 leads to rich


◾ Key observation: Flexible notion of network

node embeddings

Develop biased 2nd order random walk 𝑅 to


generate network neighborhood 𝑛 𝑅 (𝑢) of

node 𝑢
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 37
Idea: use flexible, biased random walks that can
trade off between local and global views of the
network (Grover and Leskovec, 2016).

s1 s2 s8
s7
BFS
u s6

s4 s9
s3 s5 DFS
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 38
𝑛 𝑅 𝑢 of a given node 𝑢:
Two classic strategies to define a neighborhood

s1 s2 s8
s7
BFS
u s6

s4 s9
s3 s5

Walk of length 3 (𝑛𝑅 𝑢 of size 3):


DFS

𝑛 𝐵𝐹𝑆 𝑢
= { 𝑠 * ,
𝑠2 , 𝑠3 }
Local microscopic view
Global macroscopic view
𝑛 𝐷𝐹𝑆
= { 𝑠0 ,
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 39
Biased fixed-length random walk 𝑅 that given
a node 𝑢 generates neighborhood 𝑛 𝑅 𝑢
◾ Two parameters:
 Return parameter 𝑝:

 In-out parameter %:
 Return back to the previous node

 Moving outwards (DFS) vs. inwards (BFS)


 Intuitively, 𝑞 is the “ratio” of BFS vs. DFS

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 40


Biased 2nd-order random walks explore network
neighborhoods:
 Rnd. walk just traversed edge (𝑠1, 𝑤) and is now
at 𝑤
 Insight: Neighbors of 𝑤 can only be:
Same s
distance
2 to 𝑠 ( s 3
w
from 𝑠 (
Farther

u s1
𝑠(
Back to

Idea: Remember where that walk came from


10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 41
◾ Walker came over edge (s1 , w) and is at
w. Where to go next?
1
s2 1/ 1/𝑝, 1/𝑞, 1
w 𝑞
s3

1/ are
s1 1/ 𝑞 s unnormaliz
𝑝
u

◾ 𝑝, 𝑞 model transition
4
ed
probabilitie
probabilities
 𝑝 … return parameter
s
 𝑞 … ”walk away” parameter
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 42
◾ Walker came over edge (s1 , w) and is at
w. Where to go next?
1
1/𝑞 1/
Target 𝑡
(𝑠 2 , 𝑡)
Prob. Dist.

w
s2
𝑝
3 s
1/
→ 1
w s 0

𝑞 s
1

1/
1/𝑞
1
s1 s
𝑝
u

1/𝑞
4 2

 BFS-like walk: Low value of 𝑝


2
2
s transition prob.
Unnormalized

 DFS-like walk: Low value of 𝑞


segmented

𝑛 𝑅 (𝑢) are the nodes visited by the biased


distance from 𝑠!
3 based on

s
walk
10/23/18 4
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 43
◾ 2) Simulate 𝑟 random walks of length 𝑙
◾ 1) Compute random walk probabilities

starting from each node 𝑢


◾ 3) Optimize the node2vec objective using
Stochastic Gradient Descent

Linear-time complexity.
All 3 steps are individually parallelizable

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 44


u

BFS: DFS:
Micro-view of Macro-view
of
neighbourho neighbourho
od
10/23/18
od
Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 45
Interactions of characters in a novel:

p=1, q=2 p=1, q=0.5


Microscopic view of Macroscopic view of
the network the network
neighbourhood neighbourhood
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 46
0.20 0.20

0.15 0.15

e
r e
r
sc sc

performance
performance

o1 0.10 o1 0.10
F F

Predictive
- -
Predictive

o o
c c
r r
a a
M M
0.05 0.05

0.00 0.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6
Fraction of missing Fraction of additional
edges edges
How does predictive performance change as we
◾ randomly remove a fraction of edges (left)
◾ randomly add a fraction of edges (right)
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 47
◾ Different kinds of biased random walks:
 Based on node attributes (Dong et al., 2017).
 Based on a learned weights (Abu-El-Haija et al., 2017)
◾ Alternative optimization schemes:
 Directly optimize based on 1-hop and 2-hop random walk
probabilities (as in LINE from Tang et al. 2015).
◾ Network preprocessing techniques:
 Run random walks on modified versions of the original
network (e.g., Ribeiro et al. 2017’s struct2vec, Chen et al.
2016’s HARP).

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 48


◾ How to use embeddings z i of nodes:
 Clustering/community detection: Cluster points 𝑧$
 Node classification: Predict label ƒ(𝑧 $ ) of node (
based on 𝑧$
 Link prediction: Predict edge ((, *) based on
ƒ(𝑧 $ , 𝑧+ )
 Where we can: concatenate, avg, product, or take a
difference between the embeddings:
 Concatenate: ƒ (𝑧$, 𝑧+)= 𝑔([𝑧$, 𝑧+])
 Hadamard: ƒ (𝑧$, 𝑧+)= 𝑔(𝑧$ ∗ 𝑧+) (per coordinate
product)
10/23/18  Sum/Avg:Jureƒ Leskovec,
(𝑧$,Stanford
𝑧+CS224W:
)= 𝑔(𝑧Analysis$ + 𝑧http://cs224w.stanford.edu
of Networks, +) 49


◾ Basic idea:Embed nodes so that distances in
embedding space reflect node similarities in
the original network.
◾ Different notions of node similarity:
 Adjacency-based (i.e., similar if connected)
 Multi-hop similarity definitions.
 Random walk approaches (covered today)

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 50


◾ So what method should I use..?
◾ No one method wins in all cases….
 E.g., node2vec performs better on node classification
while multi-hop methods performs better on link
prediction (Goyal and Ferrara, 2017 survey)
◾ Random walk approaches are generally more
efficient
◾ In general: Must choose def’n of node
similarity that matches your application!

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 51


𝐺
◾ Goal: Want to embed an entire graph

z𝐺

◾ Tasks:
 Classifying toxic vs. non-toxic molecules
 Identifying anomalous graphs
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 53
Simple idea:

technique on the (sub)graph 𝐺


◾ Run a standard graph embedding

embeddings in the (sub)graph 𝐺


◾ Then just sum (or average) the node

𝑧𝐺 = % 𝑧𝑣
𝑣∈𝐺
◾ Used by Duvenaud et al., 2016 to classify
molecules based on their graph
structure Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 54
◾ Idea: Introduce a “virtual node” to represent
the (sub)graph and run a standard graph
embedding technique

◾ Proposed by Li et al., 2016 as a general


technique for subgraph embedding
Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 55
States in anonymous walk correspond to the
index of the first time we visited the node in a
random walk

Anonymous Walk Embeddings, ICML 2018


https://arxiv.org/pdf/1805.11921.pdf
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 56
Number of anonymous walks grows exponentially:
 There are 5 anon. walks 𝑎i of length 3:
𝑎1=111, 𝑎$=112, 𝑎3= 121, 𝑎&= 122, 𝑎5= 123
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 57
◾ Enumerate all possible anonymous walks 𝑎 i of 𝑙
steps and record their counts
◾ Represent the graph as a probability distribution
over these walks

◾ For example:
 Set 𝑙 = 3
 Then we can represent the graph as a 5-dim vector
 Since there are 5 anonymous walks 𝑎i of length 3: 111,

 𝑍𝐺[)] = probability of anonymous walk 𝑎i in 𝐺


112, 121, 122, 123

10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 58


◾ Complete counting of all anonymous walks in a
large graph may be infeasible

distribution: Generate independently a set of 𝑚


◾ Sampling approach to approximating the true
random walks and calculate its corresponding
empirical distribution of anonymous walks
◾ How many random walks 𝑚 do we need?
s with prob. less than #:
 We want the distribution to have error of more than

There are $ = 877


For example:

of length 𝑙 = 7. If
anonymous walks

s = 0.1 and # =
we set

where: $… number of anon. walks of 0.01 then random


𝑚=122500
length 𝑙.
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu we need to
walks 59
generate
Learn embedding z i of every anonymous walk
𝑎 iThe embedding of a graph 𝐺 is then

z&
sum/avg/concatenation of walk embeddings

How to embed walks?


◾ Idea: Embed walks s.t.

𝑃 𝑤 𝑤 , … , 𝑤 =
next walk can be predicted
 Set z𝑡& s.t.𝑡 +,
𝑢 𝑢 𝑢

0(𝑧)
𝑡
Where 𝑤𝑡 is a 𝑡-th
we maximize
𝑢

node 𝑢
walk starting at
random
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 60
Run 𝑇 different random walks from 𝑢 each of
length 𝑙: 𝑁𝑅 𝑢 = ( )𝑢 , (2 𝑢 …

(𝑢
 Let 𝑎0 be its anonymous 𝑇 of walk ( 0
version

Learn to predict walks that co-occur in 1-size window


Estimate embedding 𝑧0 of anonymous walk 𝑎0 of ( 0 :

1

max 8 log 𝑃(( 𝑡 |( 𝑡 – ; , … ,


𝑇𝑡 : ;
where: Δ… ( 𝑡context
–))
window
 𝑃 (𝑡 (𝑡 – ; , … , EFG(𝑦 I J
size
)
(𝑡 – )
∑i𝑦

 𝑦 (𝑡 𝑧
= 𝑏 + ;) ∑0:; EFG(𝑦(I i ))
=
𝑈 ⋅
 where 𝑏 ∈ ℝ, 𝑈 ∈ ℝ ) 𝐷, 𝑧
0 0 is the embedding of the anonymized version of
walk ( 0
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 61
Anonymous Walk Embeddings, ICML 2018 https://arxiv.org/pdf/1805.11921.pdf
We discussed 3 ideas to graph embeddings
◾ Approach 1: Embed nodes and sum/avg them
◾ Approach 2: Create super-node that spans the
(sub) graph and then embed that node

◾ Approach 3: Anonymous Walk Embeddings


 Idea 1: Represent the graph via the distribution over all
the anonymous walks
 Idea 2: Sample the walks to approximate the
distribution
 Idea 3: Embed anonymous walks
10/23/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 62

You might also like