0% found this document useful (0 votes)
69 views9 pages

A General View For Network Embedding As Matrix Factorization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views9 pages

A General View For Network Embedding As Matrix Factorization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia

A General View for Network Embedding as Matrix Factorization


Xin Liu Tsuyoshi Murata Kyoung-Sook Kim
National Institute of Advanced Dept. of Computer Science National Institute of Advanced
Industrial Science and Technology Tokyo Institute of Technology Industrial Science and Technology
[email protected] [email protected] [email protected]

Chatchawan Kotarasu Chenyi Zhuang


Faculty of ICT National Institute of Advanced
Mahidol University Industrial Science and Technology
[email protected] [email protected]
ABSTRACT 1 INTRODUCTION
We propose a general view that demonstrates the relation- Recently, there has been a surge of interest in network
ship between network embedding approaches and matrix embedding, or learning representations of network nodes in
factorization. Unlike previous works that present the equiva- a low dimensional vector space, such that the network struc-
lence for the approaches from a skip-gram model perspective, tural information and properties are maximally preserved
we provide a more fundamental connection from an opti- [3, 11, 15, 18, 60]. We can use the learned embeddings as
mization (objective function) perspective. We demonstrate feature inputs for downstream machine learning tasks. This
that matrix factorization is equivalent to optimizing two technology is beneficial for many network analysis tasks,
objectives: one is for bringing together the embeddings of such as community detection [6, 55, 62], label classification
similar nodes; the other is for separating the embeddings of [43, 44], link prediction [16], and visualization [41].
distant nodes. The matrix to be factorized has a general form: Many studies focus on preserving the proximity structure
S−β·1. The elements of S indicate pairwise node similarities. between nodes. That is, the (dis)similarity of embeddings
They can be based on any user-defined similarity/distance in the low dimension space should, to some extent, approx-
measure or learned from random walks on networks. The imate the (dis)similarity of nodes in the original network
shift number β is related to a parameter that balances the (the similarity depends on the user’s definition). To address
two objectives. More importantly, the resulting embeddings this problem, researchers have proposed different methods
are sensitive to β and we can improve the embeddings by and many of them are based on matrix factorization. For
tuning β. Experiments show that matrix factorization based example, SocDim [45] factorizes the modularity matrix [43]
on a new proposed similarity measure and β-tuning strat- and the normalized Laplacian matrix; NEU [58] factorizes
egy significantly outperforms existing matrix factorization similarity matrices that encode higher order of the adjacency
approaches on a range of benchmark networks. matrix; HOPE [35] factorizes several matrices based on
different similarity measures; GraRep [4] factorizes a matrix
KEYWORDS that is related to the k-step transition probability matrix.
graph embedding, network representation learning, matrix Further, skip-gram [34] model based network embedding
factorization, node similarity, graph mining, social networks approaches, such as LINE [42], DeepWalk [36], PTE [41],
and node2vec [16] are also equivalent to implicit matrix
ACM Reference Format: factorization [37, 49, 57].
Xin Liu, Tsuyoshi Murata, Kyoung-Sook Kim, Chatchawan Ko- In this paper, we propose a general view that demon-
tarasu, and Chenyi Zhuang. 2019. A General View for Network
strates the relationship between network embedding ap-
Embedding as Matrix Factorization. In The Twelfth ACM Inter-
proaches and matrix factorization. Unlike previous works
national Conference on Web Search and Data Mining (WSDM
’19), February 11–15, 2019, Melbourne, VIC, Australia. ACM, [37, 49, 57] which present the equivalence for the approaches
New York, NY, USA, 9 pages. https://doi.org/10.1145/3289600. from a skip-gram perspective, we provides a more funda-
3291029 mental connection from an optimization (objective function)
perspective. Our contribution is twofold. First, we propose
Permission to make digital or hard copies of all or part of this work an objective function with two sub-objectives: one is for
for personal or classroom use is granted without fee provided that bringing together the embeddings of similar nodes; the other
copies are not made or distributed for profit or commercial advantage
and that copies bear this notice and the full citation on the first is for separating the embeddings of distant nodes. This func-
page. Copyrights for components of this work owned by others than tion is a framework that connects many network embedding
ACM must be honored. Abstracting with credit is permitted. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
methods. As a special case, maximizing this function is
requires prior specific permission and/or a fee. Request permissions equivalent to factorizing a shifted similarity matrix S − β·1.
from [email protected]. The elements of S indicate pairwise node similarities, which
WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia
© 2019 Association for Computing Machinery.
are optional. The shift number β is related to a parameter
ACM ISBN 978-1-4503-5940-5/19/02. . . $15.00 that balances the two sub-objectives. More importantly, we
https://doi.org/10.1145/3289600.3291029

375
Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia

show that the resulting embeddings are sensitive to β and we 2.2 As A Framework
can refine the embeddings by tuning β. For example, based Our objective function is a framework that allows freely
on this tuning strategy we achieved a performance gain of defining sv , se , f↑v , f↑e , f↓v , and f↓e . By appropriate selection of
up to 46.9%. them, many network embedding methods such as DeepWalk
Our second contribution is the proposition a Global Re- [36], node2vec [16], Cauchy Graph Embedding [30], GraRep
source Allocation (GRA) similarity measure. Experiments in [4] and Laplacian Eigenmap [2] can be explained and unified.
ten commonly used datasets show that matrix factorization As an example, we demonstrate that the objective func-
based on this new measure and β-tuning strategy significant- tion used in Cauchy Graph Embedding [30] is a a special
ly outperforms GraRep, LINE, node2vec, DeepWalk, and case of our definition. To see it, we first give an alternative
HOPE on the multilabel classification and link prediction definition of Eq. (1) as
tasks. ∑ v v
The rest of the paper is organized as follows. Section O1 = f↑ (sij ) · f↓e (deij ), (4)
2 proposes the objective function and demonstrates the i,j

relationship between network embedding and matrix fac- where de : (ei , ej ) 7→ R≥0 is a distance measure for a pair
torization. Section 3 introduces GRA similarity. Section 4 of embeddings: The greater deij is, the more distant (less
presents our algorithm. Section 5 reports the experiment similar) ei and ej are. Then, if we specify
results. Section 6 surveys related work. Finally, Section 7
gives our conclusion. svij = Aij , (5)
deij = ∥ei −ej ∥2 , (6)
2 OBJECTIVE FUNCTION
f↑v (x) = x, (7)
We begin with identifying the symbols that will be used. For
f↓e (x) = 1/(x + τ 2 ), (8)
simplicity and clarity, we limit our vision to an undirected
network G = (V, E), where V = {vi | i = 1, · · · , n} is the we can obtain
node set, and E ⊆ V × V is the edge set. A denotes the ∑ Aij
adjacency matrix, with the element Aij = Aji ∈ {0, 1} O1 = , (9)
∥ei −ej ∥2 + τ 2
indicating whether (vi , vj ) ∈ E. The goal is to map each i,j

node vi to a vector or embedding ei ∈ Rd , such that the which is just the objective used in Cauchy Graph Embedding
proximities between nodes can be maximally preserved by [30]. More examples about our objective function as a unified
the embeddings. d ≪ n is the embedding dimension. framework are omitted due to space limitation.

2.1 Definition 2.3 Relation to Matrix Factorization


We first define the following objective Suppose we specify
∑ v v
O1 = f↑ (sij ) · f↑e (seij ). (1) seij = e⊤
i ej , (10)
i,j
f↑v (x) = exp(ρ1 x), (11)
sv : (vi , vj ) → R is a similarity measure for a pair of
f↑e (x) = log σρ3(x), (12)
nodes (we will see the definition of this measure later): The
greater svij is, the more similar vi and vj are. se: (ei , ej ) 7→ f↓v (x) = exp(−ρ2 x), (13)
R is a similarity measure for a pair of embeddings: The f↓e (x) = log σρ3(−x), (14)
greater seij is, the more similar ei and ej are. f↑v , f↑e : R 7→
where σρ3(x) = 1/(1 + exp(−ρ3 x)) denotes the logistic func-
R≥0 are increasing functions. They serve for adjusting the
tion and ρ1 , ρ2 , ρ3 > 0 are parameters, obtaining
relative magnitude of svij and seij , respectively. Intuitively,
∑ [ ]
maximizing O1 entails large seij for large sij v
. Therefore, the O = exp(ρ1 svij ) log σρ3(e⊤
i ej )
significance is that the similar nodes vi and vj will have i,j
similar embeddings ei and ej . ∑ [ ]
+ρ exp(−ρ2 svij ) log σρ3(−e⊤
i ej ) . (15)
Similarly, we can define another objective
∑ v v i,j
O2 = f↓ (sij ) · f↓e (seij ), (2) To maximize (15), we set its partial derivative with respect
to e⊤
i,j
i ej to zero as in [24], and after simplification arrive at
where f↓v , f↓e: R 7→ R≥0 are decreasing functions. Intuitively,
maximizing O2 entails small seij for small svij . Therefore, the (ρ1 + ρ2 )svij − log ρ
e⊤
i ej = . (16)
result is that the distant nodes vi and vj will have distant ρ3
embeddings ei and ej . Combining (1) and (2), our final Let E = (e1 , e2 , · · · , en )⊤ denote the embedding matrix,
objective function is where the i-th row represents the embedding of vi . The
O = O1 + ρO2 , (3) equivalent matrix form of (16) can be expressed as
where ρ > 0 is a parameter for balancing the two objectives. EE⊤ = S, (17)

376
Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia

where S has elements measure simply counts the number of common neighbors, as
ρ1 + ρ2 v log ρ scij
n
= |Γi ∩ Γj | ,
Sij = (sij − ). (18) (20)
ρ3 ρ1 + ρ2
where Γi = {vj | Aij > 0} is the set of immediate neighbors
This implies that we can learn the embeddings by performing
of vi . CN similarity can be regarded as a rudimentary
truncated Singular Value Decomposition (SVD) on S. Due
measure between vi and vj . It is, however, not entirely
to the properties of SVD, the factor (ρ1 + ρ2 )/ρ3 merely
satisfactory. It can take large values for nodes with high
contributes an overall multiplicative factor to the resulting
degree even if only a small fraction of their neighbors are the
embeddings, in which we are not interested, so we will
same, and in many cases this runs contrary to our intuition
henceforth drop it. Thus, the elements of S can be reduced
about what constitutes similarity [23]. Commonly therefore
to
people normalize in some way and proposed variants, such
Sij = svij − β, (19) as Cosine (COS), Jaccard (JAC), Hub-Promoted (HUB) and
where β = log ρ/(ρ1 + ρ2 ) is a parameter. Resource Allocation (RA) similarities [64]. These definitions
The above analysis provides a general view for network are shown in the following.
embedding as matrix factorization. First, it demonstrates ∑ 1
the relationship between matrix factorization and maximiz- sjaijc = , (21)
v ∈Γ ∩Γ
|Γi ∪ Γj |
ing the objective function, in the sense that similar nodes t i j

have similar embeddings and distant nodes have distant ∑ 1


embeddings. Secondly, the matrix S is optional, since svij can scij
os
= √ , (22)
be based on any user-defined similarity/distance measure or vt ∈Γi ∩Γj
|Γi | |Γj |
can be learned from random walks. Thus, our view is not ∑ 1
only limited to skip-gram based methods (such as DeepWalk shijub = , (23)
[36], LINE [42], PTE [41], and node2vec [16]) as have already vt ∈Γi ∩Γj
min{ki , kj }
been addressed [37, 49, 57], but also applicable to a broad
∑ 1
range of approaches that factorize various matrices (such as srij
a
= , (24)
the adjacency matrix [1], the high-order adjacency matrix vt ∈Γi ∩Γj
kt
[58], modularity matrix [43], graph Laplacians [45], Root- ∑
ed PageRank, Adamic-Adar, and Katz similarity matrices where ki = n j=1 Aij denotes the degree of vi .
[35]). Thirdly, previous works [24, 37, 49, 57] for presenting The differences in (21)-(24) are just in the normalization
the equivalence between matrix factorization and skip-gram parts (the denominators). Liben-Nowell et al. [26] and Zhou
based approaches arrive at a shifted matrix where the shift et al. [64] systematically compared these local measures
number indicates the number of negative sampling. This and found that RA similarity has the best performance.
number is just used to reduce noise and replace negative Note that RA similarity appends a normalization to depress
elements of the matrix with zero. However, from our devel- the contribution of high-degree common neighbors. This
opment, we come to see that the shift number β is actually normalization method can be justified by a simple example.
related to the balance parameter of the two sub-objectives. Consider in a social network vi and vj are two friends of
As we will see, the resulting embeddings are sensitive to β vt . If vt is an ordinary person, it is much possible that vi
and we can refine the embeddings by tuning β. and vj are friends with each other. But if vt is a celebrity,
it is even less likely that vi and vj know each other. This is
3 GRA SIMILARITY because that a celebrity has thousands of friends, and thus
it is natural to depress its contribution as a “binding agent”
A problem is how to define a similarity measure svij that
of vi and vj .
will produce good embeddings. In recent years, researchers
have proposed different measures, which can be classified
into two categories based on whether they use a local or
3.2 Global Measure
global structure of the network [29]. The local measures have One of the most famous global measures is Katz similarity. It
the advantage of ease of computing. However, local measures is based on a weighted summation over the number of paths
seem insufficient because many node pairs are assigned the between two nodes. The weight decays exponentially with
same similarity scores. Therefore, we are more interested in the path length to assign higher weights on shorter paths.
global measures. In the following, we review some local and The mathematical expression of Katz similarity is
global measures. Then, we propose a new global measure ∑

called GRA similarity. skijatz = αl [Al ]ij , (25)
l=1

3.1 Local Measure where l denotes the path length, α is a parameter controlling
The most simple local measure is Common Neighbor (CN) how fast the weight decays with the path length, and [Al ]ij
similarity, which is based on the assumption that two nodes is exactly equal to the number of length-l paths between vi
are similar if they have many common neighbors. Thus, this and vj .

377
Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia

3.3 Definition of GRA Similarity 0.75

Now we propose GRA similarity, as an extension of RA and 0.70


Katz similarities. Let us first point out that Katz similarity
0.65
can be viewed as an extension of CN similarity. This is

Micro-F1
because that the local version of Katz similarity that is based 0.60

on l = 2 paths is just CN similarity, as can be seen from the 0.55


following equation
∑ 0.50

skijatz |l=2 = [A2 ]ij = 1 = scij


n
, (26) 0 10 20 30
vt ∈Γi ∩Γj Index of β in SETβ

Figure 1: The embeddings are sensitive to β.


where we have omitted the overall multiplicative factor α2 .
Our idea is to extend RA similarity to longer paths, in
the similar way that Katz similarity extends CN similarity. where D = diag(k1 , k2 , · · · , kn ) is the degree diagonal matrix.
The specific definition of GRA similarity follows. Suppose Multiplying D−1 at both sides of (31), we get
plij = (vi0 , vi1 , . . . , vil ) where i0 = i, il = j is a length-l path
Sgra D−1 = −I + I + αAD−1 + α2 (AD−1 )2 + · · ·(32)
between vi and vj . First, we assume the contribution of plij to
vi and vj ’s similarity is equal to the reciprocal of the product = (I − αAD−1 )−1 − I (33)
of the degrees of the intermediate nodes of the path. That = (I − αAD−1 )−1 αAD−1 , (34)
is,
where I is the identity matrix. Finally, we have
ki0 kil
c(plij ) = , (27) Sgra = (I − αAD−1 )−1 αA.
ki0 ki1 · · · kil (35)
Secondly, for each contribution c(plij ), we associate a 4 THE PROPOSED METHOD
weight αl , where again α is a parameter controlling the decay
As a special proposal, we can obtain embeddings by factor-
rate. Finally, the similarity is defined as a summation of the
izing the shifted GRA similarity matrix Sgra − β ·1, where 1
weighted contributions over all possible paths
denotes a matrix with all elements one.

∞ ∑
sgrija = αl c(plij ). (28) 4.1 Addressing the Parameters
l=1 plij
Let us examine the parameters. One parameter is α, which
We can derive that the local version of GRA similarity indicates the decay rate in the definition of GRA similarity.
based on l = 2 paths is RA similarity, as To ensure convergence of the series in (32), α should lie in
∑ ∑ 1 the range 0 < α < 1 (the spectral radius of AD−1 is one).
sgrija |l=2 = c(pl=2
ij ) = = srij
a
, (29) We empirically found that greater values of α produce good
v ∈Γ ∩Γ
kt
l=2 pij t i j results. Furthermore, for values of α in the range 0.9 < α <
0.98, the precise value is not critical. Hence, we simply set
where again we have omitted the overall multiplicative factor α = 0.95.
α2 . Another parameter is β = log ρ/(ρ1 + ρ2 ). Note that
Katz similarity can be defined in the same manner as GRA ρ, ρ1 , ρ2 > 0, hence, theoretically, β can be any real number.
similarity by considering the contribution of each path as It is not easy to select the best β automatically. However,
one, as the possible β that can produce sensible embeddings is highly

∞ ∑
∞ ∑ ∑
∞ ∑ related to Sgra , since the elements of the shifted matrix should
skijatz = αl [Al ]ij = αl 1= αl c′ (plij ), (30) fall within a significant range.
l=1 l=1 plij l=1 plij In practice, we limit the search space of β to a small
set SETβ . Specifically, suppose Sgr a gra gra
0% , S60% , S80% , and S100%
gra
Therefore, both Katz and GRA similarities render a high
denote the 0th, 60th, 80th, and 100th percentile of the
score if there are a large number of paths between two nodes.
elements in Sgra , respectively; SETβ is composed of ten evenly
A difference is that the former accepts equal contributions of
spaced values over (Sgr a gra gra gra
0% , S60% ] and (S60% , S80% ], and ten
the paths, whereas the latter depresses the contribution of
logarithmically spaced values over (Sgr a
80% , S gra
100% ].
the paths that contain high-degree intermediate nodes. Note
Figure 1 displays an example of the multilabel classi-
that a high-degree intermediate node would result in many
fication results generated by different β in the air-traffic
paths, and thus it is reasonable to depress the contributions
network of Brazil [39]. The x-axis represents the index of β in
of these paths. This comes down in a continuous line with
SETβ . That is, index=0 represents β = 0; index=1,2,· · · ,10
the normalization method of RA similarity.
represent the ten evenly spaced values over (Sgr a gra
0% , S60% ];
We could equivalently express (28) in matrix form and
index=11,12,· · · ,20 represent the ten evenly spaced values
define a similarity matrix
over (Sgr 60% , S80% ]; index=21,22,· · · ,30 represent the ten log-
a gra

Sgra = αA + α2 AD−1 A + α3 AD−1 AD−1 A + · · · , (31) arithmically spaced values over (Sgr a gra
80% , S100% ]. The y-axis

378
Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia

Table 1: Statistics of the datasets: number of nodes the labeled nodes for training, with the rest for testing.
|V|; number of edges |E|; number of labels |L|. Then, we use the learned embeddings and the corresponding
Dataset |V| |E| |L| labels of the training nodes to train a one-vs-all logistic
BrazilAir 131 1,003 4 regression (LR) classifier. Feeding the embeddings of the
EuropeAir 399 5,993 4 testing nodes to the classifier we predict their labels, which
USAir 1,190 13,599 4 will be compared to the true labels for evaluation. We repeat
Cora 2,708 5,278 7 this procedure 10 times and evaluate the performance in
Citeseer 3,264 4,551 6 terms of the Macro-F1 and Micro-F1 scores. Due to space
DBLP 13,184 47,937 5 limitation, we report only Micro-F1, since we experience
WikiPage 2,363 11,596 17 similar behaviors with Macro-F1.
WikiWord 4,777 92,295 40 In the link prediction task, we are given a network G ′
PPI 3,860 37,845 50
with 50% of edges removed from the original network G and
Flickr 80,513 5,899,882 195
we aim to predict the missing edges (i.e. the 50% removed
edges). Specifically, based on the node embeddings learned
from G ′ , we generate edge embeddings for pairs of nodes
represents the Micro-F1 scores as an quantitative evaluation
using the Hadamard operator1 :
of the embeddings. We can find that the embeddings are
sensitive to β and setting β = 0 does not produce the [eij ]t = [ei ]t ·[ej ]t , (37)
best embeddings. Importantly, the quality of the embeddings
change with β regularly. Therefore, we can search the best where t ∈ 1, · · · , d denotes the subscript of the t-th element
β from SETβ based on some validation data in a semi- of an embedding. We label an edge embedding as positive if
supervised fashion. the corresponding edge exists in G ′ and negative otherwise.
Then, we train a binary LR classifier using all of the edge
4.2 Computational Issues embeddings that have positive labels and the same amount
of randomly sampled edge embeddings that have negative
Note that Sgra has the form Sgra = M−1 g Ml where Mg = labels. After that, feeding an edge embedding to the LR
I − αAD−1 and Ml = αA, as indicated in (35). This implies classifier we can calculate the existence probability of the
that when β = 0 we can obtain the embeddings by following corresponding edge and do link prediction. We evaluate the
the fast generalized SVD algorithm of Ou et al. [35], which performance based on the probabilities of the missing edges
avoids calculating the similarity matrix Sgra and only requires and non-existent edges (i.e. the edges that do not exist in G)
a time complexity of near O(|E|). in terms of the Area Under the Curve (AUC).
When β ̸= 0, we cannot use computation-saving shortcut
of the generalized SVD algorithm, and thus require more Datasets. We use ten real-world network datasets, which
computing power. In this case, the most time-consuming come from various domains and are commonly used by other
part is the calculation of Sgra , which involves a matrix in- researchers. A brief description of these datasets follows.
version operation. In practice, the calculation is most simply • BrazilAir [39], EuropeAir [39], USAir [39]: The air-
achieved by direct multiplication. We can rewrite (35) as traffic networks of Brazil, Europe, and the USA, re-
Sgra D−1 = αAD−1 (I + Sgra D−1 ). (36) spectively. The nodes indicate airports and the edges
gra −1
denote the existence of commercial flights. The labels
Making any guess for an initial value of S D , such as represent the capacity levels of the airports.
Sgra D−1 = 0, we iterate this equation repeatedly until it • Cora [58], Citeseer [22], DBLP [41]: Paper citation
converges. We have empirically found good convergence after networks. The labels represent the topics of the papers.
100 iterations or less. • WikiPage [49]: A network of webpages in Wikipedia,
with edges indicating hyperlinks. The labels represent
5 EXPERIMENTS the topic categories of the webpages.
In this section, we conduct experiments to answer the follow- • WikiWord [16]: A co-occurrence network of the word-
ing questions: s appearing in Wikipedia. The labels represent the
Q1. Is GRA a good measure to produce embeddings? part-of-speech tags inferred using the Stanford POS-
Q2. Does tuning β help improve the embeddings? Tagger.
Q3. Is the proposed method better than existent matrix • PPI [16]: A part of the protein-protein interactions
factorization related approaches? network for Homo Sapiens. The labels represent the
biological states.
Experiment Setup. To answer these questions, we do exper-
iments on two embedding-enabled tasks: multilabel classifi- 1
We omit several element-wise operators and only use the Hadamard
cation [36] and link prediction [16]. We uniformly set the operator to generate the edge embeddings. One reason is that all of
embedding dimension to 120 for all the experiments. the methods considered here express the node similarity in a (varied)
inner product form that conforms to the Hadamard operator. Another
Multilabel classification aims to predict the correct node reason is that [16] shows that the Hadamard operator is superior to
labels. To be specific, we randomly sample a portion of the others when used with node2vec and DeepWalk.

379
Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia

Table 2: Multilabel classification results for different similarities in terms of Micro-F1 scores.
EuropeAir USAir
Method
10% 30% 50% 70% 90% 10% 30% 50% 70% 90%
GRA 0.3666 0.4480 0.4668 0.4925 0.5550 0.5139 0.5847 0.6071 0.6336 0.6597
Katz 0.3844 0.4599 0.4985 0.5042 0.5625 0.5020 0.5805 0.5876 0.5949 0.5966
SimRank 0.3014 0.3670 0.3895 0.4125 0.4875 0.5262 0.5808 0.6019 0.6347 0.6366
LHN 0.3819 0.3867 0.3944 0.4120 0.4325 0.4691 0.4917 0.4944 0.4983 0.5118
0.5384 0.5695 0.5945 0.6000 0.6725 0.6360 0.6793 0.7003 0.7151 0.7387
GRA-β
(46.9%) (27.1%) (27.3%) (21.8%) (21.2%) (23.8%) (16.2%) (15.4%) (12.9%) (12.0%)
0.5209 0.5495 0.5874 0.5780 0.6700 0.6044 0.6483 0.6659 0.6748 0.7193
Katz-β
(35.5%) (19.5%) (17.8%) (15.0%) (19.1%) (20.4%) (11.7%) (13.3%) (13.4%) (20.6%)
Cora CiteSeer
Method
10% 30% 50% 70% 90% 10% 30% 50% 70% 90%
GRA 0.7788 0.8197 0.8352 0.8442 0.8638 0.5622 0.6040 0.6184 0.6272 0.6245
Katz 0.7225 0.7859 0.8036 0.8121 0.8284 0.5018 0.5458 0.5585 0.5706 0.5743
SimRank 0.7672 0.8036 0.8169 0.8241 0.8376 0.5538 0.5826 0.5905 0.6014 0.6046
LHN 0.6012 0.6611 0.6749 0.6819 0.6996 0.3758 0.4207 0.4380 0.4439 0.4617
0.7846 0.8236 0.8418 0.8525 0.8731 0.5688 0.6204 0.6362 0.6508 0.6610
GRA-β
(0.7%) (0.6%) (0.8%) (1.0%) (1.1%) (1.2%) (2.7%) (2.9%) (3.8%) (5.8%)
0.7321 0.7964 0.8096 0.8221 0.8424 0.5231 0.5676 0.5796 0.5909 0.6092
Katz-β
(1.3%) (1.3%) (0.8%) (1.2%) (1.7%) (4.2%) (4.0%) (3.8%) (3.6%) (6.1%)

• Flickr [43]: A network for the contacts between users EuropeAir network when the prediction is made based on
in Flickr. The labels represent the interest groups of 10% labeled nodes. The tuning strategy does not only work
the users. for GRA similarity but also other similarities such as Katz.
We remove self-loop edges and transform bi-directional edges This is because β is closely related to the parameter ρ that
to undirected edges for each network. The statistics of the balances the two sub-objectives for bringing together the
datasets after pre-processing are summarized in Table 1. embeddings of similar nodes and separating the embeddings
of distant nodes.
Comparing with Different Similarities. We compare the em-
beddings generated by different similarities (setting β = 0). Comparing with Existent Approaches. Next, we compare GRA-
In addition to Katz, we consider SimRank [21] and LHN β with DeepWalk [36], node2vec [16], HOPE [35], GraRep
[23] similarities as baselines. Both SimRank and LHN are [4], and LINE [42], which are representatives of matrix
based on the concept that two nodes are similar if they factorization related approaches.
are connected to similar nodes. All of the similarities here The parameter settings for these approaches are the same
are global measures and widely used. Table 2 reports the as the original literature. Specifically, for DeepWalk and
multilabel classification results in four of the networks2 . node2vec, we set the window size to 10, the walk length
GRA achieves the best overall performance. We attribute to 80, and the number of walks per node to 10. For HOPE,
this to the proper definition of GRA similarity, which de- we set the decay rate to 0.95 divided by the spectral radius
presses the contribution of the paths that contain high- of A. For LINE, we set the number of negative samples to
degree intermediate nodes. 5. For GraRep, we set the maximum transition step to 6.
Lastly, for node2vec, we obtain the best in-out and return
Improve Embeddings by Tuning β. Table 2 also compares the hyperparameters based on a grid search over {0.25, 0.50, 1,
multilabel classification results with and without tuning β. 2, 4}.
GRA-β and Katz-β indicate the results by tuning β based Figure 2 displays the results for multilabel classification.
on 10% ground truth validation data in a semi-supervised Table 3 lists the results for link prediction3 . GRA-β demon-
fashion for each network and each task (This is in the strate the best performance over the baselines. For the multi-
same way as node2vec [16] for tuning the in-out and return label classification task, it is markedly superior to the others
hyperparameters). The figures in the parentheses show the in eight out of the ten networks. In particular, it outperforms
relative improvement after tuning β. We can see that by the baselines by a considerable margin in networks such
tuning β we achieve a significant performance gain. For as BrazilAir, EuropeAir, USAir, Cora, Citeseer, WikiPage,
example, the gain for GRA-β is as high as 46.9% in the and PPI. For the link prediction task, it obtains the best
2 3
Because SimRank and LHN are computationally expensive, the The results for GraRep in the Flickr network are not reported due
results are limited to small and medium networks. to the scalability issue.

380
Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia

HOPE LINE GRA-β GraRep node2vec DeepWalk

0.75 0.875
0.8 0.65 0.65
0.70 0.850
0.7 0.60
0.825 0.60
0.65
0.55
0.6 0.800
0.50 0.60 0.55
0.775
0.5
0.45 0.55 0.750
0.50
0.4 0.40 0.725
0.50
0.35 0.700 0.45
0.3
0.45
0.30 0.675
1

9
0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.
(a) BrazilAir (b) EuropeAir (c) USAir (d) Cora (e) Citeseer

0.725 0.56 0.26


0.38
0.92
0.700 0.54 0.37
0.24
0.90 0.675
0.52 0.36
0.650 0.22
0.88 0.35
0.50
0.625 0.34
0.86 0.48 0.20
0.600 0.33
0.84 0.46 0.18
0.575 0.32
0.82 0.44
0.550 0.16 0.31
0.42
1

01

02

03

04

05

06

07

08

09

10
0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.
(f ) DBLP (g) WikiPage (h) WikiWord (i) PPI (j) Flickr

Figure 2: Multilabel classification results for different methods. The x-axis represents the ratio of nodes with
known labels. The y-axis represents the Micro-F1 scores.

scores in all of the networks. In particular, it outperforms their method works, except Hamilton et al., who recently
the opponents by up to 13.8%, 25.9%, 47.1%, and 37.7% in explained the effectiveness of similarity matrix factorization
Cora, Citeseer, WikiWord, and PPI networks, respectively. based on a unified encoder-decoder framework [18].
Researchers have shown that DeepWalk [36] is related to
matrix factorization and derived the closed form of the ma-
6 RELATED WORK trix [49, 57]. Further, Qiu et al. proved that some other skip-
The study of network embedding dates back to the early gram based methods (DeepWalk [36], LINE [42], PTE [41],
2000s. In this period, researchers attempted to keep con- and node2vec [16]) are also equivalent to matrix factorization
nected nodes closer to each other in the embedding space [37]. However, there are significant differences between their
and developed methods such as LLE [40], IsoMap[46], and work and ours. Firstly, their work only involves the skip-
Laplacian Eigenmap [2]. In the past decade, thanks to a few gram based methods, while our objective function provides
pioneer works such as SocDim [43, 45], DeepWalk [36], and a more general framework that unifies several other methods
LINE [42], this study has attracted a great deal of interest. such as HOPE, Caughy Graph Embedding, GraRep, and
Researchers have proposed various methods such as matrix Eigenmap. Secondly, the proofs are totally different. They
factorization [4, 35, 58] and deep neural networks [5, 7, 12, separately derive the matrices that are being factorized,
17, 22, 51, 52, 56]. The topics are also varied, including semi- while we do proofs from another angle that is the relation
supervised network embedding [22, 49, 51, 59], community between the skip-gram models and the similarity matrices.
preserving embedding [6, 10, 55], network reconstruction Thirdly, the aims are different. They focus on what is the
based on embedding [28, 35, 51], and embedding in different exact mathematical form of the matrices that DeepWalk,
types of networks, such as the heterogeneous networks [7, LINE, PTE, and node2vec are factorizing, while we center on
13, 20, 41, 53, 61], the multi-relational networks [33, 38], the how to improve the similarity matrix factorization method.
signed networks [9, 53, 54], the dynamic networks [25, 32, 63], Besides, Chen et al. introduced a GEM-D [8] framework
the scale-free networks [14], the hyper-networks [50], and the and Hamilton et al. put forward an encoder-decoder [18]
attributed networks [19, 25, 48, 54, 57]. framework for network embedding. The two frameworks
Our work is closely related to similarity matrix factor- are based on a high-level point of view, by decomposing a
ization. Previous studies have shown the effectiveness of method into several components. With appropriate choice of
factorizing different matrices such as the adjacency matrix each component, many methods including similarity matrix
[1], the high-order adjacency matrix [58], modularity matrix factorization can be unified. On the other hand, the frame-
[43], graph Laplacians [37, 45], and Katz similarity matrix work proposed in our paper is based on a more specific point
[35]. These matrices can more or less be viewed as simi- of view, by designing an objective that involves node simi-
larity matrices. However, none of the authors studied why larities. With appropriate choice of the similarity function

381
Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia

Table 3: Link prediction results for different methods in terms of AUC scores.
Dataset HOPE GRA GRA-β GraRep node2vec DeepWalk LINE
BrazilAir 0.8472 0.8977 0.9117 0.8555 0.7505 0.7025 0.5215
EuropeAir 0.8895 0.9001 0.9157 0.9083 0.7387 0.7004 0.6905
USAir 0.9366 0.9527 0.9621 0.9434 0.8295 0.8045 0.8372
Cora 0.7018 0.7294 0.7707 0.6775 0.7372 0.7306 0.6779
Citeseer 0.6601 0.6255 0.7047 0.5597 0.6315 0.6130 0.5712
DBLP 0.9077 0.9237 0.9311 0.9211 0.9227 0.9176 0.8680
WikiPage 0.8839 0.9066 0.9180 0.8836 0.8568 0.8555 0.8495
WikiWord 0.8839 0.8036 0.9127 0.8910 0.6690 0.6205 0.6433
PPI 0.8635 0.8704 0.9024 0.8546 0.6796 0.6554 0.6980
Flickr 0.9286 0.9337 0.9563 —— 0.8686 0.8620 0.8455

and adjusting function, our objective also connects several ACKNOWLEDGMENTS


methods. This paper is based on results obtained from a project com-
Similarity information has been utilized for network em- missioned by the New Energy and Industrial Technology De-
bedding. struc2vec [39] measures node similarities at differ- velopment Organization (NEDO). Tsuyoshi Murata would
ent scales and obtains embeddings from structural identity. like to thank JSPS Grant-in-Aid for Scientific Research(B)
SNS [31] uses both neighbor information and local subgraphs (Grant Number 17H01785) and JST CREST (Grant Number
similarity to learn embeddings. AA+Emb [27] employs node JPMJCR1687).
similarities instead of random walk simulation. VERSE [47]
derive embeddings by sampling similarity information. How-
ever, these methods are not related to matrix factorization. REFERENCES
Another relevant work is [24], which is the first to demon- [1] Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy,
strate that the word embedding method based on the skip- Vanja Josifovski, and Alexander J Smola. 2013. Distributed large-
gram model [34] implicitly factorizes a shifted word-context scale natural graph factorization. In Proceedings of WWW. 37–
48.
mutual information matrix. However, the shift number, which [2] Mikhail Belkin and Partha Niyogi. 2002. Laplacian eigenmaps
indicates the number of negative samples, is a global constan- and spectral techniques for embedding and clustering. In
Advances in Neural Information Processing Systems. Vol. 14.
t, and thus, it is not tuned for improving the embeddings. 585–591.
To the best of our knowledge, we are the first to 1) show [3] Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang.
the general form of factorizing a shifted similarity matrix 2018. A comprehensive survey of graph embedding: problems,
techniques and applications. IEEE Transactions on Knowledge
for network embedding; 2) prove that this general form is and Data Engineering 30, 9 (2018), 1616–1637.
equivalent to maximizing two objectives, one for bringing [4] Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2015. GraRep: Learn-
together the embeddings of similar nodes and the other for ing graph representations with global structural information. In
Proceedings of CIKM. 891–900.
separating the embeddings of distant nodes; 3) improve the [5] Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2016. Deep neural
embeddings by tuning the shift number, which is related to networks for learning graph representations. In Proceedings of
AAAI. 1145–1152.
a parameter that balances the two objectives. [6] Sandro Cavallari, Vincent W. Zheng, Hongyun Cai,
Kevin ChenChuan Chang, and Erik Cambria. 2017. Learning
7 CONCLUSIONS community embedding with community detection and node
embedding on graphs. In Proceedings of CIKM. 377–386.
We proposed a general view that demonstrates the relation- [7] Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C
ship between network embedding and matrix factorization. Aggarwal, and Thomas S Huang. 2015. Heterogeneous network
embedding via deep architectures. In Proceedings of KDD. 119–
It is general because the scope is not only limited to skip- 128.
gram based approaches but also applicable to a broad range [8] Siheng Chen, Sufeng Niu, Leman Akoglu, Jelena Kovačević,
and Christos Faloutsos. 2017. Fast, warped graph embedding:
of methods that factorize various matrices. We also proposed Unifying framework and one-click algorithm. arXiv preprint
a network embedding method based on factorizing GRA sim- arXiv:1702.05764 (2017).
ilarity matrix and a parameter tuning strategy. Experiments [9] Kewei Cheng, Jundong Li, and Huan Liu. 2017. Unsupervised
feature selection in signed social networks. In Proceedings of
showed that our method significantly outperforms the state- KDD. 777–786.
of-the-art matrix factorization approaches. [10] Jun Jin Choong, Xin Liu, and Tsuyoshi Murata. 2018.
The proposed method can be extended to different types Learning community structure with variational autoencoder. In
Proceedings of ICDM. 69–78.
of networks, such as the directed networks, the heteroge- [11] Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. 2018. A survey
neous networks, and the uncertain networks. The key point on network embedding. IEEE Transactions on Knowledge and
Data Engineering (2018).
is to define proper similarity measures. Take the directed [12] Quanyu Dai, Qiang Li, Jian Tang, and Dan Wang. 2018.
network as an example. If we choose a directed similarity Adversarial Network Embedding. In Proceedingss of AAAI.
measure svi)j , our definitions (1)-(4) still hold, and thus we 2167–2174.
[13] Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017.
can factorize a shifted asymmetric matrix for embedding. We metapath2vec: Scalable representation learning for heterogeneous
will extend our method to these networks in the future. networks. In Proceedings of KDD. 135–144.

382
Session 6: Networks and Social Behavior WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia

[14] Rui Feng, Yang Yang, Wenjie Hu, Fei Wu, and Yueting Zhuang. structural identity. In Proceedings of KDD. 385–394.
2018. Representation learning for scale-free networks. In [40] Sam T Roweis and Lawrence K Saul. 2000. Nonlinear
Proceedingss of AAAI. 282–289. dimensionality reduction by locally linear embedding. Science
[15] Palash Goyal and Emilio Ferrara. 2018. Graph embedding 290, 5500 (2000), 2323–2326.
techniques, applications, and performance: A survey. Knowledge- [41] Jian Tang, Meng Qu, and Qiaozhu Mei. 2015. Pte: Predictive
Based Systems 151 (2018), 78–94. text embedding through large-scale heterogeneous text networks.
[16] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable In Proceedings of KDD. 1165–1174.
feature learning for networks. In Proceedings of KDD. 855–864. [42] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan,
[17] William L Hamilton, Rex Ying, and Jure Leskovec. 2017. and Qiaozhu Mei. 2015. Line: Large-scale information network
Inductive representation learning on large graphs. In Proceedings embedding. In Proceedings of WWW. 1067–1077.
of NIPS. 1025–1035. [43] Lei Tang and Huan Liu. 2009. Relational learning via latent
[18] William L Hamilton, Rex Ying, and Jure Leskovec. 2017. social dimensions. In Proceedings of KDD. 817–826.
Representation learning on graphs: methods and applications. [44] Lei Tang and Huan Liu. 2009. Scalable learning of collective
IEEE Data Engineering Bulletin 40 (2017), 52–74. behavior based on sparse social dimensions. In Proceedings of
[19] Xiao Huang, Jundong Li, and Xia Hu. 2017. Label informed CIKM. 1107–1116.
attributed network embedding. In Proceedings of WSDM. 731– [45] Lei Tang and Huan Liu. 2011. Leveraging social media networks
739. for classification. Data Min. Knowl. Discov. 23, 3 (2011), 447–
[20] Yann Jacob, Ludovic Denoyer, and Patrick Gallinari. 2014. 478.
Learning latent representations of nodes for classifying in [46] Joshua B Tenenbaum, Vin De Silva, and John C Langford.
heterogeneous social networks. In Proceedings of WSDM. 373– 2000. A global geometric framework for nonlinear dimensionality
382. reduction. Science 290, 5500 (2000), 2319–2323.
[21] Glen Jeh and Jennifer Widom. 2002. SimRank: a measure of [47] Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Em-
structural-context similarity. In Proceedings of KDD. 538–543. manuel Müller. 2018. VERSE: versatile graph embeddings from
[22] Thomas N Kipf and Max Welling. 2017. Semi-supervised similarity measures. In Proceedings of WWW. 539–548.
classification with graph convolutional networks. In Proceedings [48] Cunchao Tu, Han Liu, Zhiyuan Liu, and Maosong Sun. 2017.
of ICLR. CANE: Context-aware network embedding for relation modeling.
[23] Elizabeth A Leicht, Petter Holme, and M. E. J. Newman. 2006. In Proceedings of ACL. 1722–1731.
Vertex similarity in networks. Phys. Rev. E 73, 2 (2006), 026120. [49] Cunchao Tu, Weicheng Zhang, Zhiyuan Liu, and Maosong Sun.
[24] Omer Levy and Yoav Goldberg. 2014. Neural word embedding 2016. Max-margin DeepWalk: discriminative learning of network
as implicit matrix factorization. In Proceedings of NIPS. 2177– representation. In Proceedings of IJCAI. 3889–3895.
2185. [50] Ke Tu, Peng Cui, Xiao Wang, Fei Wang, and Wenwu Zhu. 2018.
[25] Jundong Li, Harsh Dani, Xia Hu, Jiliang Tang, Yi Chang, and Structural deep embedding for hyper-networks. In Proceedingss
Huan Liu. 2017. Attributed network embedding for learning in a of AAAI. 426–433.
dynamic environment. In Proceedings of CIKM. 387–396. [51] Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep
[26] David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction network embedding. In Proceedings of KDD. 1225–1234.
problem for social networks. Journal of the Association for [52] Hongwei Wang, Jia Wang, Jialin Wang, MIAO ZHAO, Weinan
Information Science and Technology 58, 7 (2007), 1019–1031. Zhang, Fuzheng Zhang, Xie Xing, and Minyi Guo. 2018.
[27] Xin Liu, Natthawut Kertkeidkachorn, Tsuyoshi Murata, Kyoung- GraphGAN: graph representation learning with generative
Sook Kim, Julien Leblay, and Steven Lynden. 2018. Network adversarial nets. In Proceedings of AAAI. 2508–2515.
embedding based on a quasi-local similarity measure. In [53] Hongwei Wang, Fuzheng Zhang, Min Hou, Xing Xie, Minyi
Proceedings of PRICAI. 429–440. Guo, and Qi Liu. 2018. SHINE: Signed Heterogeneous
[28] Xin Liu, Tsuyoshi Murata, and Kyoung-Sook Kim. 2018. Mea- Information Network Embedding for Sentiment Link Prediction.
suring graph reconstruction precisions—how well do embeddings In Proceedings of WSDM. 592–600.
preserve the graph proximity structure?. In Proceedings of [54] Suhang Wang, Charu Aggarwal, Jiliang Tang, and Huan Liu.
WIMS. 25:1–4. 2017. Attributed signed network embedding. In Proceedings of
[29] L. Lü and T. Zhou. 2011. Link prediction in complex networks: CIKM. 137–146.
a survey. Physica A 390 (2011), 1150–1170. [55] Xiao Wang, Peng Cui, Jing Wang, Jian Pei, Wenwu Zhu, and
[30] Dijun Luo, Chris Ding, Feiping Nie, and Heng Huang. 2011. Shiqiang Yang. 2017. Community preserving network embedding.
Cauchy graph embedding. In Proceedings of ICML. 553–560. In Proceedings of AAAI. 203–209.
[31] Tianshu Lyu, Yuan Zhang, and Yan Zhang. 2017. Enhancing [56] Linchuan Xu, Xiaokai Wei, Jiannong Cao, and Philip S Yu. 2018.
the network embedding quality with structural similarity. In On exploring semantic meanings of links for embedding social
Proceedings of CIKM. 147–156. networks. In Proceedings of WWW. 479–488.
[32] Jianxin Ma, Peng Cui, and Wenwu Zhu. 2018. DepthLGP: [57] Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward
learning embeddings of out-of-sample nodes in dynamic networks. Chang. 2015. Network representation learning with rich text
In Proceedings of AAAI. 370–377. information. In Proceedings of IJCAI. 2111–2117.
[33] Yao Ma, Zhaochun Ren, Ziheng Jiang, Jiliang Tang, and [58] Cheng Yang, Maosong Sun, Zhiyuan Liu, and Cunchao Tu. 2017.
Dawei Yin. 2018. Multi-dimensional network embedding with Fast network embedding enhancement via high order proximity
hierarchical structure. In Proceedingss of WSDM. 387–395. approximation. In Proceedings of IJCAI. 3894–3900.
[34] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and [59] Zhilin Yang, William Cohen, and Ruslan Salakhudinov. 2016.
Jeffrey Dean. 2013. Distributed representations of words and Revisiting semi-Supervised learning with graph embeddings. In
phrases and their compositionality. In Proceedings of NIPS. Proceedings of ICML. 40–48.
3111–3119. [60] Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. 2018.
[35] Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. Network representation learning: a survey. IEEE Transactions
2016. Asymmetric transitivity preserving graph embedding. In on Big Data (2018).
Proceedings of KDD. 1105–1114. [61] Jiawei Zhang, Congying Xia, Chenwei Zhang, Limeng Cui, Yanjie
[36] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deep- Fu, and Philip S Yu. 2017. BL-MNE: emerging heterogeneous
walk: Online learning of social representations. In Proceedings of social network embedding through broad learning with aligned
KDD. 701–710. autoencoder. In Proceedingss of ICDM. 605–614.
[37] Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, [62] Yuan Zhang, Tianshu Lyu, and Yan Zhang. 2018. COSINE:
and Jie Tang. 2018. Network embedding as matrix factorization: community-preserving social network embedding from informa-
unifying DeepWalk, LINE, PTE, and node2vec. In Proceedings tion diffusion cascades. In Proceedings of AAAI. 2620–2627.
of WSDM. 459–467. [63] Ziwei Zhang, Peng Cui, Jian Pei, Xiao Wang, and Wenwu Zhu.
[38] Meng Qu, Jian Tang, Jingbo Shang, Xiang Ren, Ming Zhang, and 2018. TIMERS: error-bounded SVD restart on dynamic networks.
Jiawei Han. 2017. An attention-based collaboration framework In Proceedings of AAAI. 224–231.
for multi-view network representation learning. In Proceedings [64] Tao Zhou, Linyuan Lü, and Yi-Cheng Zhang. 2009. Predicting
of CIKM. 1767–1776. missing links via local information. Eur. Phys. J. B 71, 4 (2009),
[39] Leonardo F. R. Ribeiro, Pedro H. P. Saverese, and Daniel R. 623–630.
Figueiredo. 2017. struc2vec: Learning node representations from

383

You might also like