A General View For Network Embedding As Matrix Factorization
A General View For Network Embedding As Matrix Factorization
375
Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia
show that the resulting embeddings are sensitive to β and we 2.2 As A Framework
can refine the embeddings by tuning β. For example, based Our objective function is a framework that allows freely
on this tuning strategy we achieved a performance gain of defining sv , se , f↑v , f↑e , f↓v , and f↓e . By appropriate selection of
up to 46.9%. them, many network embedding methods such as DeepWalk
Our second contribution is the proposition a Global Re- [36], node2vec [16], Cauchy Graph Embedding [30], GraRep
source Allocation (GRA) similarity measure. Experiments in [4] and Laplacian Eigenmap [2] can be explained and unified.
ten commonly used datasets show that matrix factorization As an example, we demonstrate that the objective func-
based on this new measure and β-tuning strategy significant- tion used in Cauchy Graph Embedding [30] is a a special
ly outperforms GraRep, LINE, node2vec, DeepWalk, and case of our definition. To see it, we first give an alternative
HOPE on the multilabel classification and link prediction definition of Eq. (1) as
tasks. ∑ v v
The rest of the paper is organized as follows. Section O1 = f↑ (sij ) · f↓e (deij ), (4)
2 proposes the objective function and demonstrates the i,j
relationship between network embedding and matrix fac- where de : (ei , ej ) 7→ R≥0 is a distance measure for a pair
torization. Section 3 introduces GRA similarity. Section 4 of embeddings: The greater deij is, the more distant (less
presents our algorithm. Section 5 reports the experiment similar) ei and ej are. Then, if we specify
results. Section 6 surveys related work. Finally, Section 7
gives our conclusion. svij = Aij , (5)
deij = ∥ei −ej ∥2 , (6)
2 OBJECTIVE FUNCTION
f↑v (x) = x, (7)
We begin with identifying the symbols that will be used. For
f↓e (x) = 1/(x + τ 2 ), (8)
simplicity and clarity, we limit our vision to an undirected
network G = (V, E), where V = {vi | i = 1, · · · , n} is the we can obtain
node set, and E ⊆ V × V is the edge set. A denotes the ∑ Aij
adjacency matrix, with the element Aij = Aji ∈ {0, 1} O1 = , (9)
∥ei −ej ∥2 + τ 2
indicating whether (vi , vj ) ∈ E. The goal is to map each i,j
node vi to a vector or embedding ei ∈ Rd , such that the which is just the objective used in Cauchy Graph Embedding
proximities between nodes can be maximally preserved by [30]. More examples about our objective function as a unified
the embeddings. d ≪ n is the embedding dimension. framework are omitted due to space limitation.
376
Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia
where S has elements measure simply counts the number of common neighbors, as
ρ1 + ρ2 v log ρ scij
n
= |Γi ∩ Γj | ,
Sij = (sij − ). (18) (20)
ρ3 ρ1 + ρ2
where Γi = {vj | Aij > 0} is the set of immediate neighbors
This implies that we can learn the embeddings by performing
of vi . CN similarity can be regarded as a rudimentary
truncated Singular Value Decomposition (SVD) on S. Due
measure between vi and vj . It is, however, not entirely
to the properties of SVD, the factor (ρ1 + ρ2 )/ρ3 merely
satisfactory. It can take large values for nodes with high
contributes an overall multiplicative factor to the resulting
degree even if only a small fraction of their neighbors are the
embeddings, in which we are not interested, so we will
same, and in many cases this runs contrary to our intuition
henceforth drop it. Thus, the elements of S can be reduced
about what constitutes similarity [23]. Commonly therefore
to
people normalize in some way and proposed variants, such
Sij = svij − β, (19) as Cosine (COS), Jaccard (JAC), Hub-Promoted (HUB) and
where β = log ρ/(ρ1 + ρ2 ) is a parameter. Resource Allocation (RA) similarities [64]. These definitions
The above analysis provides a general view for network are shown in the following.
embedding as matrix factorization. First, it demonstrates ∑ 1
the relationship between matrix factorization and maximiz- sjaijc = , (21)
v ∈Γ ∩Γ
|Γi ∪ Γj |
ing the objective function, in the sense that similar nodes t i j
3.1 Local Measure where l denotes the path length, α is a parameter controlling
The most simple local measure is Common Neighbor (CN) how fast the weight decays with the path length, and [Al ]ij
similarity, which is based on the assumption that two nodes is exactly equal to the number of length-l paths between vi
are similar if they have many common neighbors. Thus, this and vj .
377
Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia
Micro-F1
because that the local version of Katz similarity that is based 0.60
Sgra = αA + α2 AD−1 A + α3 AD−1 AD−1 A + · · · , (31) arithmically spaced values over (Sgr a gra
80% , S100% ]. The y-axis
378
Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia
Table 1: Statistics of the datasets: number of nodes the labeled nodes for training, with the rest for testing.
|V|; number of edges |E|; number of labels |L|. Then, we use the learned embeddings and the corresponding
Dataset |V| |E| |L| labels of the training nodes to train a one-vs-all logistic
BrazilAir 131 1,003 4 regression (LR) classifier. Feeding the embeddings of the
EuropeAir 399 5,993 4 testing nodes to the classifier we predict their labels, which
USAir 1,190 13,599 4 will be compared to the true labels for evaluation. We repeat
Cora 2,708 5,278 7 this procedure 10 times and evaluate the performance in
Citeseer 3,264 4,551 6 terms of the Macro-F1 and Micro-F1 scores. Due to space
DBLP 13,184 47,937 5 limitation, we report only Micro-F1, since we experience
WikiPage 2,363 11,596 17 similar behaviors with Macro-F1.
WikiWord 4,777 92,295 40 In the link prediction task, we are given a network G ′
PPI 3,860 37,845 50
with 50% of edges removed from the original network G and
Flickr 80,513 5,899,882 195
we aim to predict the missing edges (i.e. the 50% removed
edges). Specifically, based on the node embeddings learned
from G ′ , we generate edge embeddings for pairs of nodes
represents the Micro-F1 scores as an quantitative evaluation
using the Hadamard operator1 :
of the embeddings. We can find that the embeddings are
sensitive to β and setting β = 0 does not produce the [eij ]t = [ei ]t ·[ej ]t , (37)
best embeddings. Importantly, the quality of the embeddings
change with β regularly. Therefore, we can search the best where t ∈ 1, · · · , d denotes the subscript of the t-th element
β from SETβ based on some validation data in a semi- of an embedding. We label an edge embedding as positive if
supervised fashion. the corresponding edge exists in G ′ and negative otherwise.
Then, we train a binary LR classifier using all of the edge
4.2 Computational Issues embeddings that have positive labels and the same amount
of randomly sampled edge embeddings that have negative
Note that Sgra has the form Sgra = M−1 g Ml where Mg = labels. After that, feeding an edge embedding to the LR
I − αAD−1 and Ml = αA, as indicated in (35). This implies classifier we can calculate the existence probability of the
that when β = 0 we can obtain the embeddings by following corresponding edge and do link prediction. We evaluate the
the fast generalized SVD algorithm of Ou et al. [35], which performance based on the probabilities of the missing edges
avoids calculating the similarity matrix Sgra and only requires and non-existent edges (i.e. the edges that do not exist in G)
a time complexity of near O(|E|). in terms of the Area Under the Curve (AUC).
When β ̸= 0, we cannot use computation-saving shortcut
of the generalized SVD algorithm, and thus require more Datasets. We use ten real-world network datasets, which
computing power. In this case, the most time-consuming come from various domains and are commonly used by other
part is the calculation of Sgra , which involves a matrix in- researchers. A brief description of these datasets follows.
version operation. In practice, the calculation is most simply • BrazilAir [39], EuropeAir [39], USAir [39]: The air-
achieved by direct multiplication. We can rewrite (35) as traffic networks of Brazil, Europe, and the USA, re-
Sgra D−1 = αAD−1 (I + Sgra D−1 ). (36) spectively. The nodes indicate airports and the edges
gra −1
denote the existence of commercial flights. The labels
Making any guess for an initial value of S D , such as represent the capacity levels of the airports.
Sgra D−1 = 0, we iterate this equation repeatedly until it • Cora [58], Citeseer [22], DBLP [41]: Paper citation
converges. We have empirically found good convergence after networks. The labels represent the topics of the papers.
100 iterations or less. • WikiPage [49]: A network of webpages in Wikipedia,
with edges indicating hyperlinks. The labels represent
5 EXPERIMENTS the topic categories of the webpages.
In this section, we conduct experiments to answer the follow- • WikiWord [16]: A co-occurrence network of the word-
ing questions: s appearing in Wikipedia. The labels represent the
Q1. Is GRA a good measure to produce embeddings? part-of-speech tags inferred using the Stanford POS-
Q2. Does tuning β help improve the embeddings? Tagger.
Q3. Is the proposed method better than existent matrix • PPI [16]: A part of the protein-protein interactions
factorization related approaches? network for Homo Sapiens. The labels represent the
biological states.
Experiment Setup. To answer these questions, we do exper-
iments on two embedding-enabled tasks: multilabel classifi- 1
We omit several element-wise operators and only use the Hadamard
cation [36] and link prediction [16]. We uniformly set the operator to generate the edge embeddings. One reason is that all of
embedding dimension to 120 for all the experiments. the methods considered here express the node similarity in a (varied)
inner product form that conforms to the Hadamard operator. Another
Multilabel classification aims to predict the correct node reason is that [16] shows that the Hadamard operator is superior to
labels. To be specific, we randomly sample a portion of the others when used with node2vec and DeepWalk.
379
Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia
Table 2: Multilabel classification results for different similarities in terms of Micro-F1 scores.
EuropeAir USAir
Method
10% 30% 50% 70% 90% 10% 30% 50% 70% 90%
GRA 0.3666 0.4480 0.4668 0.4925 0.5550 0.5139 0.5847 0.6071 0.6336 0.6597
Katz 0.3844 0.4599 0.4985 0.5042 0.5625 0.5020 0.5805 0.5876 0.5949 0.5966
SimRank 0.3014 0.3670 0.3895 0.4125 0.4875 0.5262 0.5808 0.6019 0.6347 0.6366
LHN 0.3819 0.3867 0.3944 0.4120 0.4325 0.4691 0.4917 0.4944 0.4983 0.5118
0.5384 0.5695 0.5945 0.6000 0.6725 0.6360 0.6793 0.7003 0.7151 0.7387
GRA-β
(46.9%) (27.1%) (27.3%) (21.8%) (21.2%) (23.8%) (16.2%) (15.4%) (12.9%) (12.0%)
0.5209 0.5495 0.5874 0.5780 0.6700 0.6044 0.6483 0.6659 0.6748 0.7193
Katz-β
(35.5%) (19.5%) (17.8%) (15.0%) (19.1%) (20.4%) (11.7%) (13.3%) (13.4%) (20.6%)
Cora CiteSeer
Method
10% 30% 50% 70% 90% 10% 30% 50% 70% 90%
GRA 0.7788 0.8197 0.8352 0.8442 0.8638 0.5622 0.6040 0.6184 0.6272 0.6245
Katz 0.7225 0.7859 0.8036 0.8121 0.8284 0.5018 0.5458 0.5585 0.5706 0.5743
SimRank 0.7672 0.8036 0.8169 0.8241 0.8376 0.5538 0.5826 0.5905 0.6014 0.6046
LHN 0.6012 0.6611 0.6749 0.6819 0.6996 0.3758 0.4207 0.4380 0.4439 0.4617
0.7846 0.8236 0.8418 0.8525 0.8731 0.5688 0.6204 0.6362 0.6508 0.6610
GRA-β
(0.7%) (0.6%) (0.8%) (1.0%) (1.1%) (1.2%) (2.7%) (2.9%) (3.8%) (5.8%)
0.7321 0.7964 0.8096 0.8221 0.8424 0.5231 0.5676 0.5796 0.5909 0.6092
Katz-β
(1.3%) (1.3%) (0.8%) (1.2%) (1.7%) (4.2%) (4.0%) (3.8%) (3.6%) (6.1%)
• Flickr [43]: A network for the contacts between users EuropeAir network when the prediction is made based on
in Flickr. The labels represent the interest groups of 10% labeled nodes. The tuning strategy does not only work
the users. for GRA similarity but also other similarities such as Katz.
We remove self-loop edges and transform bi-directional edges This is because β is closely related to the parameter ρ that
to undirected edges for each network. The statistics of the balances the two sub-objectives for bringing together the
datasets after pre-processing are summarized in Table 1. embeddings of similar nodes and separating the embeddings
of distant nodes.
Comparing with Different Similarities. We compare the em-
beddings generated by different similarities (setting β = 0). Comparing with Existent Approaches. Next, we compare GRA-
In addition to Katz, we consider SimRank [21] and LHN β with DeepWalk [36], node2vec [16], HOPE [35], GraRep
[23] similarities as baselines. Both SimRank and LHN are [4], and LINE [42], which are representatives of matrix
based on the concept that two nodes are similar if they factorization related approaches.
are connected to similar nodes. All of the similarities here The parameter settings for these approaches are the same
are global measures and widely used. Table 2 reports the as the original literature. Specifically, for DeepWalk and
multilabel classification results in four of the networks2 . node2vec, we set the window size to 10, the walk length
GRA achieves the best overall performance. We attribute to 80, and the number of walks per node to 10. For HOPE,
this to the proper definition of GRA similarity, which de- we set the decay rate to 0.95 divided by the spectral radius
presses the contribution of the paths that contain high- of A. For LINE, we set the number of negative samples to
degree intermediate nodes. 5. For GraRep, we set the maximum transition step to 6.
Lastly, for node2vec, we obtain the best in-out and return
Improve Embeddings by Tuning β. Table 2 also compares the hyperparameters based on a grid search over {0.25, 0.50, 1,
multilabel classification results with and without tuning β. 2, 4}.
GRA-β and Katz-β indicate the results by tuning β based Figure 2 displays the results for multilabel classification.
on 10% ground truth validation data in a semi-supervised Table 3 lists the results for link prediction3 . GRA-β demon-
fashion for each network and each task (This is in the strate the best performance over the baselines. For the multi-
same way as node2vec [16] for tuning the in-out and return label classification task, it is markedly superior to the others
hyperparameters). The figures in the parentheses show the in eight out of the ten networks. In particular, it outperforms
relative improvement after tuning β. We can see that by the baselines by a considerable margin in networks such
tuning β we achieve a significant performance gain. For as BrazilAir, EuropeAir, USAir, Cora, Citeseer, WikiPage,
example, the gain for GRA-β is as high as 46.9% in the and PPI. For the link prediction task, it obtains the best
2 3
Because SimRank and LHN are computationally expensive, the The results for GraRep in the Flickr network are not reported due
results are limited to small and medium networks. to the scalability issue.
380
Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia
0.75 0.875
0.8 0.65 0.65
0.70 0.850
0.7 0.60
0.825 0.60
0.65
0.55
0.6 0.800
0.50 0.60 0.55
0.775
0.5
0.45 0.55 0.750
0.50
0.4 0.40 0.725
0.50
0.35 0.700 0.45
0.3
0.45
0.30 0.675
1
9
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
(a) BrazilAir (b) EuropeAir (c) USAir (d) Cora (e) Citeseer
01
02
03
04
05
06
07
08
09
10
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
(f ) DBLP (g) WikiPage (h) WikiWord (i) PPI (j) Flickr
Figure 2: Multilabel classification results for different methods. The x-axis represents the ratio of nodes with
known labels. The y-axis represents the Micro-F1 scores.
scores in all of the networks. In particular, it outperforms their method works, except Hamilton et al., who recently
the opponents by up to 13.8%, 25.9%, 47.1%, and 37.7% in explained the effectiveness of similarity matrix factorization
Cora, Citeseer, WikiWord, and PPI networks, respectively. based on a unified encoder-decoder framework [18].
Researchers have shown that DeepWalk [36] is related to
matrix factorization and derived the closed form of the ma-
6 RELATED WORK trix [49, 57]. Further, Qiu et al. proved that some other skip-
The study of network embedding dates back to the early gram based methods (DeepWalk [36], LINE [42], PTE [41],
2000s. In this period, researchers attempted to keep con- and node2vec [16]) are also equivalent to matrix factorization
nected nodes closer to each other in the embedding space [37]. However, there are significant differences between their
and developed methods such as LLE [40], IsoMap[46], and work and ours. Firstly, their work only involves the skip-
Laplacian Eigenmap [2]. In the past decade, thanks to a few gram based methods, while our objective function provides
pioneer works such as SocDim [43, 45], DeepWalk [36], and a more general framework that unifies several other methods
LINE [42], this study has attracted a great deal of interest. such as HOPE, Caughy Graph Embedding, GraRep, and
Researchers have proposed various methods such as matrix Eigenmap. Secondly, the proofs are totally different. They
factorization [4, 35, 58] and deep neural networks [5, 7, 12, separately derive the matrices that are being factorized,
17, 22, 51, 52, 56]. The topics are also varied, including semi- while we do proofs from another angle that is the relation
supervised network embedding [22, 49, 51, 59], community between the skip-gram models and the similarity matrices.
preserving embedding [6, 10, 55], network reconstruction Thirdly, the aims are different. They focus on what is the
based on embedding [28, 35, 51], and embedding in different exact mathematical form of the matrices that DeepWalk,
types of networks, such as the heterogeneous networks [7, LINE, PTE, and node2vec are factorizing, while we center on
13, 20, 41, 53, 61], the multi-relational networks [33, 38], the how to improve the similarity matrix factorization method.
signed networks [9, 53, 54], the dynamic networks [25, 32, 63], Besides, Chen et al. introduced a GEM-D [8] framework
the scale-free networks [14], the hyper-networks [50], and the and Hamilton et al. put forward an encoder-decoder [18]
attributed networks [19, 25, 48, 54, 57]. framework for network embedding. The two frameworks
Our work is closely related to similarity matrix factor- are based on a high-level point of view, by decomposing a
ization. Previous studies have shown the effectiveness of method into several components. With appropriate choice of
factorizing different matrices such as the adjacency matrix each component, many methods including similarity matrix
[1], the high-order adjacency matrix [58], modularity matrix factorization can be unified. On the other hand, the frame-
[43], graph Laplacians [37, 45], and Katz similarity matrix work proposed in our paper is based on a more specific point
[35]. These matrices can more or less be viewed as simi- of view, by designing an objective that involves node simi-
larity matrices. However, none of the authors studied why larities. With appropriate choice of the similarity function
381
Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia
Table 3: Link prediction results for different methods in terms of AUC scores.
Dataset HOPE GRA GRA-β GraRep node2vec DeepWalk LINE
BrazilAir 0.8472 0.8977 0.9117 0.8555 0.7505 0.7025 0.5215
EuropeAir 0.8895 0.9001 0.9157 0.9083 0.7387 0.7004 0.6905
USAir 0.9366 0.9527 0.9621 0.9434 0.8295 0.8045 0.8372
Cora 0.7018 0.7294 0.7707 0.6775 0.7372 0.7306 0.6779
Citeseer 0.6601 0.6255 0.7047 0.5597 0.6315 0.6130 0.5712
DBLP 0.9077 0.9237 0.9311 0.9211 0.9227 0.9176 0.8680
WikiPage 0.8839 0.9066 0.9180 0.8836 0.8568 0.8555 0.8495
WikiWord 0.8839 0.8036 0.9127 0.8910 0.6690 0.6205 0.6433
PPI 0.8635 0.8704 0.9024 0.8546 0.6796 0.6554 0.6980
Flickr 0.9286 0.9337 0.9563 —— 0.8686 0.8620 0.8455
382
Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia
[14] Rui Feng, Yang Yang, Wenjie Hu, Fei Wu, and Yueting Zhuang. structural identity. In Proceedings of KDD. 385–394.
2018. Representation learning for scale-free networks. In [40] Sam T Roweis and Lawrence K Saul. 2000. Nonlinear
Proceedingss of AAAI. 282–289. dimensionality reduction by locally linear embedding. Science
[15] Palash Goyal and Emilio Ferrara. 2018. Graph embedding 290, 5500 (2000), 2323–2326.
techniques, applications, and performance: A survey. Knowledge- [41] Jian Tang, Meng Qu, and Qiaozhu Mei. 2015. Pte: Predictive
Based Systems 151 (2018), 78–94. text embedding through large-scale heterogeneous text networks.
[16] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable In Proceedings of KDD. 1165–1174.
feature learning for networks. In Proceedings of KDD. 855–864. [42] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan,
[17] William L Hamilton, Rex Ying, and Jure Leskovec. 2017. and Qiaozhu Mei. 2015. Line: Large-scale information network
Inductive representation learning on large graphs. In Proceedings embedding. In Proceedings of WWW. 1067–1077.
of NIPS. 1025–1035. [43] Lei Tang and Huan Liu. 2009. Relational learning via latent
[18] William L Hamilton, Rex Ying, and Jure Leskovec. 2017. social dimensions. In Proceedings of KDD. 817–826.
Representation learning on graphs: methods and applications. [44] Lei Tang and Huan Liu. 2009. Scalable learning of collective
IEEE Data Engineering Bulletin 40 (2017), 52–74. behavior based on sparse social dimensions. In Proceedings of
[19] Xiao Huang, Jundong Li, and Xia Hu. 2017. Label informed CIKM. 1107–1116.
attributed network embedding. In Proceedings of WSDM. 731– [45] Lei Tang and Huan Liu. 2011. Leveraging social media networks
739. for classification. Data Min. Knowl. Discov. 23, 3 (2011), 447–
[20] Yann Jacob, Ludovic Denoyer, and Patrick Gallinari. 2014. 478.
Learning latent representations of nodes for classifying in [46] Joshua B Tenenbaum, Vin De Silva, and John C Langford.
heterogeneous social networks. In Proceedings of WSDM. 373– 2000. A global geometric framework for nonlinear dimensionality
382. reduction. Science 290, 5500 (2000), 2319–2323.
[21] Glen Jeh and Jennifer Widom. 2002. SimRank: a measure of [47] Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Em-
structural-context similarity. In Proceedings of KDD. 538–543. manuel Müller. 2018. VERSE: versatile graph embeddings from
[22] Thomas N Kipf and Max Welling. 2017. Semi-supervised similarity measures. In Proceedings of WWW. 539–548.
classification with graph convolutional networks. In Proceedings [48] Cunchao Tu, Han Liu, Zhiyuan Liu, and Maosong Sun. 2017.
of ICLR. CANE: Context-aware network embedding for relation modeling.
[23] Elizabeth A Leicht, Petter Holme, and M. E. J. Newman. 2006. In Proceedings of ACL. 1722–1731.
Vertex similarity in networks. Phys. Rev. E 73, 2 (2006), 026120. [49] Cunchao Tu, Weicheng Zhang, Zhiyuan Liu, and Maosong Sun.
[24] Omer Levy and Yoav Goldberg. 2014. Neural word embedding 2016. Max-margin DeepWalk: discriminative learning of network
as implicit matrix factorization. In Proceedings of NIPS. 2177– representation. In Proceedings of IJCAI. 3889–3895.
2185. [50] Ke Tu, Peng Cui, Xiao Wang, Fei Wang, and Wenwu Zhu. 2018.
[25] Jundong Li, Harsh Dani, Xia Hu, Jiliang Tang, Yi Chang, and Structural deep embedding for hyper-networks. In Proceedingss
Huan Liu. 2017. Attributed network embedding for learning in a of AAAI. 426–433.
dynamic environment. In Proceedings of CIKM. 387–396. [51] Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep
[26] David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction network embedding. In Proceedings of KDD. 1225–1234.
problem for social networks. Journal of the Association for [52] Hongwei Wang, Jia Wang, Jialin Wang, MIAO ZHAO, Weinan
Information Science and Technology 58, 7 (2007), 1019–1031. Zhang, Fuzheng Zhang, Xie Xing, and Minyi Guo. 2018.
[27] Xin Liu, Natthawut Kertkeidkachorn, Tsuyoshi Murata, Kyoung- GraphGAN: graph representation learning with generative
Sook Kim, Julien Leblay, and Steven Lynden. 2018. Network adversarial nets. In Proceedings of AAAI. 2508–2515.
embedding based on a quasi-local similarity measure. In [53] Hongwei Wang, Fuzheng Zhang, Min Hou, Xing Xie, Minyi
Proceedings of PRICAI. 429–440. Guo, and Qi Liu. 2018. SHINE: Signed Heterogeneous
[28] Xin Liu, Tsuyoshi Murata, and Kyoung-Sook Kim. 2018. Mea- Information Network Embedding for Sentiment Link Prediction.
suring graph reconstruction precisions—how well do embeddings In Proceedings of WSDM. 592–600.
preserve the graph proximity structure?. In Proceedings of [54] Suhang Wang, Charu Aggarwal, Jiliang Tang, and Huan Liu.
WIMS. 25:1–4. 2017. Attributed signed network embedding. In Proceedings of
[29] L. Lü and T. Zhou. 2011. Link prediction in complex networks: CIKM. 137–146.
a survey. Physica A 390 (2011), 1150–1170. [55] Xiao Wang, Peng Cui, Jing Wang, Jian Pei, Wenwu Zhu, and
[30] Dijun Luo, Chris Ding, Feiping Nie, and Heng Huang. 2011. Shiqiang Yang. 2017. Community preserving network embedding.
Cauchy graph embedding. In Proceedings of ICML. 553–560. In Proceedings of AAAI. 203–209.
[31] Tianshu Lyu, Yuan Zhang, and Yan Zhang. 2017. Enhancing [56] Linchuan Xu, Xiaokai Wei, Jiannong Cao, and Philip S Yu. 2018.
the network embedding quality with structural similarity. In On exploring semantic meanings of links for embedding social
Proceedings of CIKM. 147–156. networks. In Proceedings of WWW. 479–488.
[32] Jianxin Ma, Peng Cui, and Wenwu Zhu. 2018. DepthLGP: [57] Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward
learning embeddings of out-of-sample nodes in dynamic networks. Chang. 2015. Network representation learning with rich text
In Proceedings of AAAI. 370–377. information. In Proceedings of IJCAI. 2111–2117.
[33] Yao Ma, Zhaochun Ren, Ziheng Jiang, Jiliang Tang, and [58] Cheng Yang, Maosong Sun, Zhiyuan Liu, and Cunchao Tu. 2017.
Dawei Yin. 2018. Multi-dimensional network embedding with Fast network embedding enhancement via high order proximity
hierarchical structure. In Proceedingss of WSDM. 387–395. approximation. In Proceedings of IJCAI. 3894–3900.
[34] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and [59] Zhilin Yang, William Cohen, and Ruslan Salakhudinov. 2016.
Jeffrey Dean. 2013. Distributed representations of words and Revisiting semi-Supervised learning with graph embeddings. In
phrases and their compositionality. In Proceedings of NIPS. Proceedings of ICML. 40–48.
3111–3119. [60] Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. 2018.
[35] Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. Network representation learning: a survey. IEEE Transactions
2016. Asymmetric transitivity preserving graph embedding. In on Big Data (2018).
Proceedings of KDD. 1105–1114. [61] Jiawei Zhang, Congying Xia, Chenwei Zhang, Limeng Cui, Yanjie
[36] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deep- Fu, and Philip S Yu. 2017. BL-MNE: emerging heterogeneous
walk: Online learning of social representations. In Proceedings of social network embedding through broad learning with aligned
KDD. 701–710. autoencoder. In Proceedingss of ICDM. 605–614.
[37] Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, [62] Yuan Zhang, Tianshu Lyu, and Yan Zhang. 2018. COSINE:
and Jie Tang. 2018. Network embedding as matrix factorization: community-preserving social network embedding from informa-
unifying DeepWalk, LINE, PTE, and node2vec. In Proceedings tion diffusion cascades. In Proceedings of AAAI. 2620–2627.
of WSDM. 459–467. [63] Ziwei Zhang, Peng Cui, Jian Pei, Xiao Wang, and Wenwu Zhu.
[38] Meng Qu, Jian Tang, Jingbo Shang, Xiang Ren, Ming Zhang, and 2018. TIMERS: error-bounded SVD restart on dynamic networks.
Jiawei Han. 2017. An attention-based collaboration framework In Proceedings of AAAI. 224–231.
for multi-view network representation learning. In Proceedings [64] Tao Zhou, Linyuan Lü, and Yi-Cheng Zhang. 2009. Predicting
of CIKM. 1767–1776. missing links via local information. Eur. Phys. J. B 71, 4 (2009),
[39] Leonardo F. R. Ribeiro, Pedro H. P. Saverese, and Daniel R. 623–630.
Figueiredo. 2017. struc2vec: Learning node representations from
383