Dynamic Network Embedding: An Extended Approach For Skip-Gram Based Network Embedding
Dynamic Network Embedding: An Extended Approach For Skip-Gram Based Network Embedding
Lun Du∗ , Yun Wang∗ , Guojie Song† , Zhicong Lu, Junshan Wang
Peking University
{dulun, wangyun94, gjsong, phyluzhicong, wangjunshan}@pku.edu.cn
2086
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)
vertices and edges, and the update of edge weights. It’s chal- al., 2012; Ribeiro et al., 2017] and property-preserving meth-
lenging to design a framework for all these changes. Sec- ods [Li et al., 2015; Kipf and Welling, 2016] . Inspired by
ondly, in order to improve efficiency, only the most affected the skip-gram in word2vec [Mikolov et al., 2013], some ap-
vertices should be updated. Sometimes, the nearest vertices proaches represent a node with its nearby nodes. Deepwalk
from the new vertex are not always the most affected vertices. [Perozzi et al., 2014] generalizes the word embedding and
So how to measure the influence on the representations of ver- employs a truncated random walk to learn latent representa-
tices is also a challenge. Thirdly, when the network has great tions of a network. LINE [Tang et al., 2015] designs an op-
changes, it’s challenging to keep the accuracy of network em- timized objective function to preserve first-order and second-
bedding from decreasing obviously. order proximities to learn network representations. Besides
In this paper, we propose an efficient and stable embed- the structure-preserving, many property-preserving works [Li
ding framework for dynamic networks. It is an extension et al., 2017] specialize to design for attributes network. How-
of the network embedding methods based on skip-gram in ever, all the aforementioned methods are designed for the
a dynamic setting. All SGNE methods are applicable to our static network embedding specially.
framework and achieve the optimal solution of retraining the-
oretically. In this paper, we apply our framework to extend Dynamic Network Embedding Similar to the static net-
LINE into dynamic settings as an example. Besides, our work embedding, the dynamic network embedding can
model can also be applied to multiple dynamic changes and also be categorized into structure-preserving methods and
keep the learning effect when the dynamic network changes property-preserving methods. Actually, DeepWalk [Perozzi
a lot. et al., 2014] LINE [Tang et al., 2015] can also be regarded
In details, we divide the dynamic network embedding into as a structure-preserving dynamic network embedding. The
two tasks: calculating the representations of new vertices and two methods handle new vertices based on static embedding.
adjusting the representations of original vertices that are af- But the new vertices do not update the original vertices and
fected greatly. Due to the changes of dynamic network at the relationship among the new vertices will not be preserved
each time step is small comparing with the network size, we into the representations. [Zhou et al., 2018] focuses on min-
hope to learn new vertices and only update the representa- ing the pattern of network evolvement to predicts whether
tions of a part of vertices to improve the efficiency. There- there will be a link between two vertices at the next time
fore, we firstly propose a decomposable objective equivalent step. But it can’t handle the addition of vertices. In term
to the Skip-gram objective, which can learn the representation of property-preserving dynamic network embedding, SAGE
of each vertex separately. Secondly, we analyze the influence [Hamilton et al., 2017] proposes a inductive method to learn
of dynamic network changes on the original vertex represen- the projection between the node features and the node rep-
tations quantificationally. A selection mechanism is proposed resentations. But the parameters in the model are fixed after
to choose the original vertices affected greatly and update the training, which greatly limits the scalability of the model. Be-
representations of them. In addition, through smooth regu- sides, it is proved in the paper that the model is still structure-
larization, our model ensures the stability of the embedding preserving when dealing with high-dimensional embeddings.
results. But our experiments show that its ability of preserving struc-
To summarize, we make the following contributions: ture is inferior to our skip-gram based structure-preserving
• We propose Dynamic Network Embedding (DNE), an framework.
extended dynamic network embedding framework for
Skip-gram based network embedding methods, which 3 Problem Definition and Analysis
can approximately achieve the performance of retrain- 3.1 Problem Definition
ing more efficiently. Definition 1 (Dynamic Network). A dynamic network G
• We present a decomposable objective which can learn is a sequence of network snapshots within a time interval
the representation of each vertex separately. In theory, and evolving over time: G = (G1 , ..., GT ). Each Gτ =
we quantitatively analyze the degree of impact on origi- (Vτ , Eτ ) ∈ G is a weighted and directed network snapshot
nal vertices during the network evolvement, and propose recorded at time τ , where Vτ is the set of vertices and Eτ
a selection mechanism to select the greatly affected ver- is the set of edges within the time interval τ . Each edge
tices to update. (i, j) ∈ Eτ is associated with a weight wij > 0. For each
(i, j) ∈
/ Eτ , wij is set to 0.
• We conduct extensive experiments on four real networks
to validate the effectiveness of our model in vertex clas- As undirected networks and unweighted networks are spe-
sification and network layout. The results demonstrate cial cases of the network we defined, they are included in our
that DNE can approximately achieve the performance of problem definition. In addition, ∆Gτ = (∆Vτ , ∆Eτ ) de-
LINE retraining, and it is about 4 times faster than LINE. notes the change of the whole network, where ∆Vτ and ∆Eτ
Besides, DNE shows strong layout stability. are the sets of the vertices and edges to be added (or removed)
at time τ . According to the definition of dynamic network, we
define the dynamic network embedding:
2 Related Work Definition 2 (Dynamic Network Embedding). Given a dy-
Static Network Embedding Network embedding can be namic network G, the dynamic network embedding is a se-
categorized into structure-preserving methods [Henderson et quence of functions Φ = (Φ1 , ..., ΦT ), where each function
2087
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)
Φτ ∈ Φ is Vτ → Rd (d minτ |Vτ |) which can preserve the as the setting of [Levy and Goldberg, 2014]. In fact, sam-
structure property of Gτ . pling according to P 3/4 produces somewhat superior results
Since we focus on the vector representations of vertices, on some of the semantic evaluation tasks [Levy and Goldberg,
the dynamic embedding process can be divided into two parts, 2014] and it has not been proved to have a better effect in the
the learning of the representations of the new vertices and the network representation. The other one is that we set dv to the
adjustment of the original ones. All situations of dynamic in-degree of v instead of the out-degree. As the meaning of
changes mentioned above will influence the representations Pn (v) is the probability of a vertex to be sampled as a neg-
of original vertices, while only the addition of new vertices ative sample when v is a specific successor node, we believe
needs to learn new vectors. In our framework, other changes the in-degree is a better choice.
are considered as a special case of adding vertices. The ad-
dition of vertices is introduced as the example of our model, 4 Dynamic Network Embedding Model
and how to handle other situations will be presented briefly In this section, we firstly propose a decomposable objective
later. equivalent to the objective of LINE (Eq.(2)), which can be op-
timized for ~ui , ~ci on the basis that the representations of most
3.2 Analysis vertices don’t need to be adjusted. Based on the objective, we
Our method is an extended framework for the Skip-gram net- secondly introduce a method to learn the representations of
work embedding (SGNE) methods in a dynamic setting. We new vertices. Thirdly, we present an embedding adjustment
firstly generalize about the main idea of SGNE. Since we use mechanism for original vertices, analyzing the influence of
LINE as an example of SGNE to introduce our method, the the dynamic changes on original vertices quantitatively. Fi-
objective of LINE [Tang et al., 2015] is introduced secondly. nally, we discuss the applicability of the framework on other
SGNE methods, and to deal with other dynamic changes of
Main Idea of SGNE The SGNE methods are to learn use- the network.
ful vertex representations for predicting the neighbor vertices.
4.1 Decomposable Objective Equivalent to LINE
Thus, they have the same essential objective that is to maxi-
mize the log probability: The objective of LINE-SP Eq.(2) can not decompose into lo-
X X cal objective for ~ui and ~ci simultaneously. We give the de-
log p(vj |vi ), (1) composable objective:
vi ∈V vj ∈NS (vi )
X
wij log σ(~cj · ~ui ) + k µ · Evn ∼Pin (v) [log σ(−~cn · ~ui )]
where NS (vi ) is the neighbor set of vi and p(vj |vi ) is mod- (i,j)∈E
eled using Softmax. Different methods have different defini-
tions of neighbour set NS . Among several SGNE methods, + (1 − µ) · Evn ∼Pout (v) [log σ(−~cj · ~un )] ,
the neighborhood definition of LINE is the most direct. So (3)
we use LINE as an example to introduce our model, and we where, µ is an arbitrary real number between 0 and 1.
will also present how other methods are applied to our model (in) (out)
later. Pin (v) ∝ dv and Pout (v) ∝ dv are the noise distri-
(in)
butions for negative sampling, where di is the in-degree
(out)
Objective of LINE In LINE, the neighborhood definition of vertex vi and di is the out-degree of vertex vi , i.e.
of the vertex vi is direct, namely, NS (vi ) = {vj |(i, j) ∈ E}. (in) P (out) P
di = j wji and di = j wij .
LINE introduces two definitions of objectives. One of them,
LINE with Second-order Proximity (LINE-SP), is based on Lemma 1. For any real number µ ∈ [0, 1], the objective
Skip-gram which we mainly talk about. Eq.(3) is equivalent to Eq.(2), i.e.the objective of LINE-SP.
LINE-SP learns the vertex representations to preserve the Proof. According to the wij = 0 for any (i, j) ∈
/ E, Eq.(2)
similarity between the neighborhood network structures of can be rewritten:
vertex pairs. The objective is defined as: X X
X L= wij (log σ(~cj · ~ui ))
max L = wij (log σ(~cj · ~ui ) vi ∈V vj ∈V
~
u,~
c
(2)
X X
(i,j)∈E
+ wij (k · Evn ∼Pin (v) [log σ(−~cn · ~ui )])
+ k · Evn ∼Pn (v) [log σ(−~cn · ~ui )]), vi ∈V vj ∈V
X X (4)
where, σ(·) = 1/(1 + e−x ) is the sigmoid function, wij is the = wij (log σ(~cj · ~ui ))
weight of the edge (i, j), ~ui is the representation of vi when vi ∈V vj ∈V
it is treated as a central vertex and ~ci is the representation X (out)
of vi when it is treated as a specific ”context” (i.e a specific + di (k · Evn ∼Pin (v) [log σ(−~cn · ~ui )]),
successor vertex). Pn (v) ∝ dα v is the noise distribution for
vi ∈V
negative sampling, where dv is the out-degree of the vertex v where the expectation term can be explicitly expressed:
and α is a hyper parameter which is set to 3/4 in LINE. X d(in)
In our model, we have two adjustments regarding the dα v. Evn ∼Pin (v) [log σ(−~cn · ~ui )] =
n
log σ(−~cnT · ~ui ),
To facilitate the theoretical proof, we set α = 1, the same D
vn ∈V
2088
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)
P
where D = (i,j)∈E wij . where,
We focus on the latter term of Eq.(4): X (τ ) (τ ) k (τ ) k (τ )
X (out) `1 = −wij (log σ(~cj ·~ui )+ Fin (~ui )+ Fout (~cj )),
di (k · Evn ∼Pin (v) [log σ(−~cn · ~ui )]) (1)
2 2
(i,j)∈∆Eτ
vi ∈V
(τ −1) (τ ) (τ )
X
(in) `2 = − wij (log σ(~cj · ~ui ) + k · Fin (~ui )),
X X (out) dj
= k · di · log σ(−~cjT · ~ui ) (5) (i,j)∈∆Eτ
(2)
D
vi ∈V vj ∈V (τ ) (τ −1) (τ )
X
`3 = − wij (log σ(~cj · ~ui ) + k · Fout (~cj )),
(in)
X
= dj k · Evn ∼Pout (v) [log σ(−~cj · ~un )] . (i,j)∈∆Eτ
(3)
vj ∈V
(τ ) (τ )
Fin (~ui ) = Evn ∼Pin (v) [log σ(−~cn(τn ) · ~ui )],
Combine Eq.(4) and Eq.(5):
(τ ) (τ )
X X Fout (~cj ) = Evn ∼Pout (v) [log σ(−~cj · ~un(τn ) )],
`= wij (log σ(~cj · ~ui ))
(τ ) (τ )
vi ∈V vj ∈V where ~ui , ~ci is the vector representations of vi at time
X (out) τ . τn equals to τ when vn ∈ ∆Vτ , otherwise τn equals to
+µ di (k · Evn ∼Pin (v) [log σ(−~cn · ~ui )]) τ − 1. It shows that the objective only learns the vector repre-
vi ∈V
sentations of the new vertices, but doesn’t adjust the original
(in)
X
+ (1 − µ) dj k · Evn ∼Pout (v) [log σ(−~cj · ~un )] ones.
vj ∈V
X 4.3 Adjustment of Original Vertex Representation
= wij log σ(~cj · ~ui )+k µ · Evn ∼Pin (v) [log σ(−~cn · ~ui )] Generally, for a dynamic change in the network, a few ver-
(i,j)∈E tices will be influenced. Therefore, in order to improve the
efficiency, we don’t have to adjust the representations of all
+ (1 − µ) · Evn ∼Pout (v) [log σ(−~cj · ~un )] . vertices. We only adjust the vertices influenced greatly. We
analyze theoretically the changes of the optimal solution for
Thus, the lemma has been proved.
the original vertex representations when the network changes
The objective can decompose on w ~ when µ is set to 1: dynamically.
X Based on the Eq.(6), we can calculate the delta of the theo-
max wij (log σ(~cj·~ui )+k·Evn ∼Pin (v) [log σ(−~cn·~ui )]), retical optimal solution ∆xij between two snapshots. When
~
ui
vj ∈Nout (vi ) a new vertex v∗ is added at time τ , the delta is
and the objective can decompose on ~c, when µ is set to 0: (τ ) (τ ) (τ −1)
∆xij = xij − xij
X
max wji (log σ(~ci·~uj )+k·Evn ∼Pout (v) [log σ(−~ci·~un )]). (out) (in) (out) (in)
~
ci D+d∗ +d∗ di dj
vj ∈Nin (vi ) = log +log (out)
+log (in)
.
D di +wi∗ dj +w∗j
[Levy and Goldberg, 2014] derives the theoretical optimal
solution of Skip-gram with negative sampling. Based on the In reality, we are unable to obtain the optimal solution.
work, we give the theoretical optimal solution of LINE-SP: Therefore, we give the standard to justify whether vi should
be adjusted combing the edge weights:
wij ·D
xij = ~cj · ~ui = log( (out) (in)
) − log k. (6) 1 X (τ ) (τ −1) (τ −1)
di · dj i = (τ −1) ( wij (xij − ~cj · ~ui )
Zi (τ −1)
j∈Nout (vi )
It shows that the optimal solution is for the inner product ~cj ·~ui (8)
(τ ) (τ −1) (τ −1)
X
of the vertex pairs rather than every specific ~ui and ~cj . Thus, + wji (xji − ~ci · ~uj )),
with a fixed ~ui (or ~cj ), we can optimize the objective only for (τ −1)
j∈Nin (vi )
~cj (or ~ui ), which makes ~cj · ~ui tend to xij .
(τ ) (τ −1)
4.2 New Vertex Representation where xij is calculated according to Eq.(6) and Zi is
(τ −1) P
We discuss the situation of adding several vertices at time τ . a normalization factor, i.e. Zi = j∈N (τ −1) (vi ) wij +
P out
For the new vertices, their edges can be categorized into three (τ −1)
j∈Nin
w
(vi ) ji
.
(1)
types. For any edges (i, j) ∈ ∆Eτ : ∆Eτ = {(i, j)|vi ∈ We adjust the top m original vertices with the largest i .
(2)
∆Vτ ∧ vj ∈ ∆Vτ }, ∆Eτ = {(i, j)|vi ∈ ∆Vτ ∧ vj ∈ / For the original vertices needed to be adjusted, we add them
(3)
∆Vτ }, ∆Eτ = {(i, j)|vi ∈ / ∆Vτ ∧ vj ∈ ∆Vτ }. They cor- into ∆Gτ with their edges, and adopt the objective Eq.(7) to
learn them.To guarantee the stability of original vertex repre-
respond to different values of µ in the objective: 12 , 1 or 0. sentations, we introduce a smooth regularization term.
Thus, the objective for new vertices is defined as: The overall objective is defined as:
min ` = `1 + `2 + `3 , (7) max L = ` + λ`reg , (9)
u (τ ) ,~
~ c (τ ) u (τ ) ,~
~ c (τ )
2089
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)
2090
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)
0.85 6 6 6
40
0.8 30 0 9 0 9 0 9
0.75 20 2 2 2
0.7 10
0 4 8 12 16 20 24 28 32 36
Number of snapshots 0
DNE LINE LINE-NV SAGE TimeStep 0 TimeStep 1 TimeStep 2
200
0.5
150
9
0.45 100
2
150 6 9 6 2 09 0 9 6
0
Accuracy
0.55
100
0.5
50 TimeStep 0 TimeStep 1 TimeStep 2
0.45
0 16 48 80 112 144 176 192 (c) Karate network layout with SAGE
Number of snapshot 0
DNE LINE LINE-NV SAGE
(c) The comparison of vertex classification accuracy and time cost of Figure 2: 2D visualization on dynamic network. The Karate dataset
different models on the Auburn dataset are embedded into 2-dimensional space at each snapshot. In or-
der to evaluate the layout stability of different model, four vertices
Figure 1: The performance comparison of vertex classification on (0,2,6,9) in Karate are tracked. According to the changes of the
three real networks. For each sub-graph, the left is the effect of tracked vertex positions, DNE is superior to other model in layout
number of snapshots on the vertex classification, and the right is the stability.
average time cost on each snapshot.
nodes, and 12 nodes are added into G at each snapshot. In
SAGE, it can be found that the time cost of DNE will be order to evaluate the layout stability of models, G are embed-
more stable, especially when the network becomes larger and ded into 2-dimensional space. Besides, we track four point
larger. This is because the sampling strategy of SAGE will (0, 2, 6, 9), which can be seen in Fig.2, to observe their po-
lead more and more original vertices need to be updated while sitions at different snapshots. From the visualization, LINE,
the network is becoming larger and larger. However, DNE as the representative of the static embedding method, appears
presents an embedding adjustment mechanism for original obvious embedding space drift. Our model DNE is superior
vertices to adjust the top m original vertices. In general, our to other two models, because just the most affected m nodes
model performs better, especially in the large-scale network. will be updated at each snapshot.
2091
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)
static network embedding methods in dynamic setting based [Ma et al., 2018] Jianxin Ma, Peng Cui, and Wenwu Zhu.
on our idea. Depthlgp: Learning embeddings of out-of-sample nodes
in dynamic networks. In Association for the Advancement
Acknowledgments of Artificial Intelligence Conference, pages 1–9, 2018.
This work was supported by the National Natural Science [Mikolov et al., 2013] Tomas Mikolov, Ilya Sutskever, Kai
Foundation of China (Grant No. 61572041). Chen, Gregory S Corrado, and Jeffrey Dean. Distributed
representations of words and phrases and their composi-
References tionality. Neural Information Processing Systems, pages
[Belkin and Niyogi, 2001] Mikhail Belkin and Partha 3111–3119, 2013.
Niyogi. Laplacian eigenmaps and spectral techniques for [Perozzi et al., 2014] Bryan Perozzi, Rami Alrfou, and
embedding and clustering. pages 585–591, 2001. Steven Skiena. Deepwalk: online learning of social repre-
[Cao et al., 2015] Shaosheng Cao, Wei Lu, and Qiongkai sentations. Knowledge Discovery and Data mining, pages
Xu. Grarep: Learning graph representations with global 701–710, 2014.
structural information. In ACM International on Confer- [Ribeiro et al., 2017] Leonardo F.R. Ribeiro, Pedro H.P.
ence on Information and Knowledge Management, pages Saverese, and Daniel R. Figueiredo. Struc2vec: Learn-
891–900, 2015. ing node representations from structural identity. In ACM
[Cao et al., 2016] Shaosheng Cao, Wei Lu, and Qiongkai Knowledge Discovery and Data Mining, pages 385–394,
Xu. Deep neural networks for learning graph represen- 2017.
tations. In Association for the Advancement of Artificial [Rushing et al., 2005] John Rushing, Udaysankar Nair,
Intelligence Conference, pages 1145–1152, 2016. Udaysankar Nair, Ron Welch, Ron Welch, and Hong Lin.
[Cui et al., 2017] P. Cui, X. Wang, J. Pei, and W. Zhu. A Adam: a data mining toolkit for scientists and engineers.
Survey on Network Embedding. ArXiv e-prints, Novem- Computers and Geosciences, 31(5):607–618, 2005.
ber 2017. [Song et al., 2015] Guojie Song, Xiabing Zhou, Yu Wang,
[Grover and Leskovec, 2016] Aditya Grover and Jure and Kunqing Xie. Influence maximization on large-scale
Leskovec. node2vec: Scalable feature learning for mobile social network: A divide-and-conquer method.
networks. Knowledge Discovery and Data mining, pages IEEE Transactions on Parallel and Distributed Systems,
855–864, 2016. 26(5):1379–1392, 2015.
[Hamilton et al., 2017] Will Hamilton, Zhitao Ying, and Jure [Tang et al., 2015] Jian Tang, Meng Qu, Mingzhe Wang,
Leskovec. Inductive representation learning on large Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-
graphs. In Advances in Neural Information Processing scale information network embedding. In International
Systems, pages 1025–1035, 2017. Conference on World Wide Web, pages 1067–1077, 2015.
[He et al., 2012] Xinran He, Guojie Song, Wei Chen, and [Traud et al., 2012] Amanda L. Traud, Peter J. Mucha, and
Qingye Jiang. Influence blocking maximization in social Mason A. Porter. Social structure of facebook networks.
networks under the competitive linear threshold model. Social Science Electronic Publishing, 391(16):4165–4180,
pages 463–474, 2012. 2012.
[Henderson et al., 2012] Keith Henderson, Brian Gallagher, [Wang et al., 2016] Daixin Wang, Peng Cui, and Wenwu
Tina Eliassi-Rad, Hanghang Tong, Sugato Basu, Leman Zhu. Structural deep network embedding. In ACM Knowl-
Akoglu, Danai Koutra, Christos Faloutsos, and Lei Li. edge Discovery and Data Mining, pages 1225–1234, 2016.
Rolx:structural role extraction & mining in large [Zachary, 1977] Wayne W Zachary. An information flow
graphs. In ACM Knowledge Discovery and Data Mining, model for conflict and fission in small groups. Journal
pages 1231–1239, 2012. of anthropological research, 33(4):452–473, 1977.
[Kipf and Welling, 2016] Thomas N. Kipf and Max Welling. [Zhang et al., 2017] Ziwei Zhang, Peng Cui, Jian Pei, Xiao
Semi-supervised classification with graph convolutional Wang, and Wenwu Zhu. Timers: Error-bounded svd restart
networks. CoRR, abs/1609.02907, 2016. on dynamic networks. In Association for the Advancement
[Levy and Goldberg, 2014] Omer Levy and Yoav Goldberg. of Artificial Intelligence Conference, pages 1–8, 2017.
Neural word embedding as implicit matrix factorization. [Zhou et al., 2018] Lekui Zhou, Yang Yang, Xiang Ren, Fei
In Advances in neural information processing systems, Wu, and Yueting Zhuang. Dynamic network embedding
pages 2177–2185, 2014. by modeling triadic closure process. In Association for the
[Li et al., 2015] Yujia Li, Daniel Tarlow, Marc Advancement of Artificial Intelligence Conference, 2018.
Brockschmidt, and Richard S. Zemel. Gated graph [Zhu et al., 2018] Dingyuan Zhu, Peng Cui, Ziwei Zhang,
sequence neural networks. CoRR, abs/1511.05493, 2015.
Jian Pei, and Wenwu Zhu. High-order proximity preserved
[Li et al., 2017] Jundong Li, Harsh Dani, Xia Hu, Jiliang embedding for dynamic networks. IEEE Transactions on
Tang, Yi Chang, and Huan Liu. Attributed network em- Knowledge & Data Engineering, PP(99):1–1, 2018.
bedding for learning in a dynamic environment. CoRR,
abs/1706.01860, 2017.
2092