Cross-Network Embedding For Multi-Network Alignment
Cross-Network Embedding For Multi-Network Alignment
Xiaokai Chu1,2 , Xinxin Fan1 , Di Yao1,2 , Zhihua Zhu1,2 , Jianhui Huang1 , Jingping Bi1,∗
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China1
University of Chinese Academy of Sciences, China2
{chuxiaokai,fanxinxin,yaodi,zhuzhihua,huangjianhui,bjp}@ict.ac.cn
ABSTRACT some particular relationships; on the other hand, all the networks
Recently, data mining through analyzing the complex structure can be related by the same participants/nodes. In recent years,
and diverse relationships on multi-network has attracted much many approaches have been proposed to mine the potential infor-
attention in both academia and industry. One crucial prerequisite mation inherent in these related networks, such as cross-network
for this kind of multi-network mining is to map the nodes across recommendation [42], mutual community detection [39] and ge-
different networks, i.e., so-called network alignment. In this paper, netic diseases classification [33], etc. Although the related networks
we propose a cross-network embedding method CrossMNA for can share the same participants, they are mostly isolated in differ-
multi-network alignment problem through investigating structural ent networks without any known connections accordingly among
information only. Unlike previous methods focusing on pair-wise them. Therefore, a crucial prerequisite for multi-network mining
learning and holding the topology consistent assumption, our pro- is to map the nodes/participants among these related networks,
posed CrossMNA considers the multi-network scenarios which i.e., so-called network alignment. In the field of network alignment,
involve at least two types of networks with diverse network struc- the shared participants among the networks are defined as anchor
tures. CrossMNA leverages the cross-network information to refine nodes, as they act like anchors aligning the networks they partici-
two types of node embedding vectors, i.e., inter-vector for network pate in, and the relationships among anchor nodes across networks
alignment and intra-vector for other downstream network analy- are called anchor links [12]. In many cases, a few anchor links
sis tasks. Finally, we verify the effectiveness and efficiency of our can be known beforehand, for example, some users in Foursquare
proposed method using several real-world datasets. The extensive may leave their corresponding accounts of Twitter, but most of the
experiments show that our CrossMNA can significantly outperform correspondences are unknown. Therefore network alignment aims
the existing baseline methods on multi-network alignment task, at inferring these unknown or potential anchor links among the
and also achieve better performance for link prediction task with networks.
less memory usage. In recent years, many literatures have been proposed to handle
the network alignment problem [13, 16, 37, 40, 43, 44]. However,
CCS CONCEPTS there still exist some issues that need further concern. First, most
existing works only consider the two-network scenarios or perform
• Computing methodologies → Learning latent representa-
pair-wise learning in multi-network applications. However, in
tions; • Information systems → Data mining.
real-world a network is usually related to multiple networks. Thus
KEYWORDS the pair-wise learning methods can ignore much valuable comple-
mentary information across the multiple networks. Second, many
multi-network alignment, network embedding, node representa- previous works hold the assumption of topology consistency,
tion, network mining which indicates that a node tends to have a consistent connectivity
ACM Reference Format: structure across different networks. Although the same node may
Xiaokai Chu1, 2 , Xinxin Fan1 , Di Yao1, 2 , Zhihua Zhu1, 2 , Jianhui Huang1 , Jing- show some similar features among the networks, the differences
ping Bi1, ∗ . 2019. Cross-Network Embedding for Multi-Network Alignment. between the network semantic meanings can lead to quite diverse
In Proceedings of the 2019 World Wide Web Conference (WWW ’19), May local structure of this node in each network, and this assumption
13–17, 2019, San Francisco, CA, USA. ACM, New York, NY, USA, 11 pages.
can be easily violated in many applications [16, 23]. Thereby these
https://doi.org/10.1145/3308558.3313499
methods can make misleading alignment in the multi-network sce-
1 INTRODUCTION narios. Third, most previous works heavily rely on attributes, e.g.,
the username, gender or other profile information. However, the
The unprecedented growth of the diverse information has pro- attribute information is usually incomplete and unreliable [16] or
duced a large volume of networks, such as social networks, citation unavailable [1], thus these attribute-based methods are not applica-
networks, and biological networks, etc. Nowadays, people usually ble in many realistic scenarios.
participate in multiple networks, on the one hand, each network can Keeping these problems in our mind, we, in this paper, study the
depict the topological structure of all participants corresponding to network alignment problem in a multi-network scenario, wherein
This paper is published under the Creative Commons Attribution 4.0 International the number of networks is at least two with no attribute infor-
(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their mation available, in addition that the topology structure of each
personal and corporate Web sites with the appropriate attribution.
WWW ’19, May 13–17, 2019, San Francisco, CA, USA network can be diverse. Taking into account the effectiveness of
© 2019 IW3C2 (International World Wide Web Conference Committee), published preserving network structures and the efficiency of calculation in
under Creative Commons CC-BY 4.0 License.
ACM ISBN 978-1-4503-6674-8/19/05.
https://doi.org/10.1145/3308558.3313499 *Jingping Bi ([email protected]) is the corresponding author
continuous vector space, we propose a novel cross-network em- the same node has a consistent connectivity structure across dif-
bedding method for multi-network alignment, namely CrossMNA. ferent networks. For example, NetAlign [3] utilizes max-product
CrossMNA can integrate cross-network information to refine much belief propagation based on the network topology. REGAL [10]
more powerful embedding vectors for alignment tasks, and can also proposes an embedding-based method based on the assumption
be effectively applied to other downstream multi-network analysis that nodes with similar structural connectivity or degrees have a
tasks, such as link prediction. Compared to previous embedding- high probability to be aligned. Although one node may share some
based network alignment methods [10, 15, 45] or multi-network similar features in related networks, its local structural connections
embedding methods [19, 35, 38], CrossMNA can also dramatically can be entirely different in each network due to the distinctiveness
decline the space overhead especially in large-scale multi-network in network semantic meanings, such as the types of interaction
scenarios. between proteins in bioinformatics. Some works have proved that
In CrossMNA, an additional vector named network vector is topology consistency can be easily violated in many multi-network
proposed to extract the semantic meaning of the network, which scenarios. For example, in bioinformatics, different types of genetic
can reflect the difference of global structure among the networks. interactions can construct diverse network structures [23]. Another
What’s more, two kinds of embedding vectors are refined for each similar example can also be found in online social network sites [16],
node: (i) inter-vector, which reflects the common features of the such as Twitter and Facebook, where usersâĂŹ behavior may be di-
anchor nodes in different networks and is shared among the known vergent and platform dependent, making different social platforms
anchor nodes; and (ii) intra-vector, which preserves the specific show various connection relationships. Thereby, previous methods
structural feature for a node in its selected network and is generated may lead to sub-optimal or even misleading alignments in these
through the combination of network vector and intra-vector. This scenarios.
combination strategy can save much more space overhead but not (ii) Pair-Wise Learning. Most previous works only consider
sacrifice the performance. We also propose a transformation matrix the two-network scenarios or perform pair-wise learning in the
to align these vectors with different dimensions. We summarize multi-network scenario. However, if we jointly consider all related
our contributions as follows: networks, we can obtain much more useful information to benefit
(i) We propose a novel cross-network embedding approach the node matching. Figure 1 illustrates a toy example with three
CrossMNA to deal with network alignment in a more gen- related networks. If we only consider each pair of networks, it is
eral scenario, where the number of networks is at least two hard to infer the node v 4 in network G 3 is the counterpart of the
with no attribute information available, in addition that the node v 1 in network G 1 and G 2 . However, if we consider the three
structure of each network can be quite diverse. networks together, this alignment can be easily found.
(ii) CrossMNA declines the physical memory overhead for model (iii) Attributes Dependence. Many previous works utilize the
storage to a large extent compared to other embedding based attribute information, like username or gender in social networks,
network alignment methods or multi-network embedding to directly match nodes in different networks [26] or make use of
models, which enables our CrossMNA appropriate to tackle the attributes to guide model learning structure information [44].
today’s large-scale multi-network applications. However, on the one hand, users may deliberately hide certain
(iii) Extensive experiments show that our CrossMNA signifi- pieces of personal information or provide false data on their se-
cantly outperforms existing network alignment methods, i.e., lected networks [16]. On the other hand, network data available
achieves 10% to 20% improvement with the Twitter dataset for research is usually anonymized by the service providers for
and 10% with the arXiv datasets. Furthermore, CrossMNA privacy concerns, where the attribute information is removed or
can achieve 5% improvement in link prediction mission with replaced with meaningless unique identifiers [41]. Therefore, the
the Twitter dataset compared to existing multi-network em- methods based on attribute information are not general in many
bedding methods. applications.
Considering the issues above, we intend to study the network
We organize the remains of our work as follows. Section 2 states the alignment problem in a more general scenario wherein the number
problems of multi-network alignment. We investigate several inher- of networks is at least two, only topological structure available,
ent vulnerabilities in previous works and propose the challenges even the networks may have different structures. In addition to the
in multi-network alignment. We formally describe our CrossMNA importance and novelty of this problem, we present the following
in Section 3, validate our approach by analyzing extensive experi- challenges:
ments in Section 4, present related work in Section 5 and conclude
our work in Section 6. • Semantics diversity. The diversities in network semantics
lead to the different interactional behaviors of the same node
2 PROBLEM STATEMENT in each network, which can effect the accuracy of the node
As a crucial prerequisite for many cross-network applications, net- matching. Moreover, the more networks taken into consider-
work alignment aims to establish the node correspondences across ation, the more complicated the problem is. Therefore, how
different networks. Therefore, in recent years, network alignment to alleviate the impact of diverse network semantics on node
has become a hot spot in both academia and industry. However, matching is a big challenge.
there still exist some problems in currently existing works: • Data imbalance. The data imbalance has two aspects. First,
(i) Topology Consistency. Many previous works [2, 3, 10, 13, the size of each network may vary considerably, including
14, 29, 41, 45] hold the assumption of topology consistency, namely the numbers of nodes or edges in each network. Second, the
Figure 1: An illustration of a multi-network scenario. CrossMNA learns two types of embedding vectors, i.e., network vector
for each network and inter-vector for each node, which is used for node matching. The intra-vector, which can be adopted for
other downstream tasks, is combined by these two types of embedding vectors
number of anchor links between each pair of networks can 3 CROSSMNA: A MULTI-NETWORK
be unequal. It is obvious that the previous pair-wise learning ALIGNMENT APPROACH
methods inevitably suffer from the data imbalance problem
as they only consider each pair of networks. Thus how to 3.1 Problem Formulation
make full use of the information across all the networks to In this work, we suppose the networks are unweighted and all the
deal with the data imbalance problem is another challenge. edges are directed, as an undirected edge can be divided into two
• Model Storage. Network embedding is a practical approach directed edges. For the sake of easy understanding, we follow the
to extract structural features of the node and has been ap- definitions of network alignment as reported in [41, 42].
plied in some network alignment methods [10, 15, 45]. How- Definition 1 (Multiple Aligned Networks). We define a set of net-
ever, in large-scale multi-network scenarios, it is essential works G = ((G 1 , G 2 , · · · , G N ), (A (1,2) , A (1,3) , · · · , A (N −1, N ) )) as
to take into account the space overhead of the methods. multiple aligned networks, where G i , i ∈ {1, 2, · · · , N } represents
Previous embedding-based methods need to generate the the i-th network in the set, N is the number of related networks,
embedding vector for each node in each pair of networks and A (i, j ) , i, j ∈ {1, 2, · · · , N } denotes the set of anchor links be-
in multi-network scenarios, and this takes too much stor- tween G i and G j . We define each network G i as {V i , E i }, where V i
age space. Thereby, how to make overhead cheaper but not is the set of nodes in G i and E i is the set of links.
sacrifice the performance is what we should concern. Definition 2 (Anchor Link). Given each two networks G i and G j ,
j
we define an anchor link between them as (vki , vk ) ∈ A (i, j ) , where
j
vki and vk is the anchor node in networks G i and G j respectively.
The anchor links in multiple networks following the transitivity law
j
defined in [41], where if (vki , vk ) ∈ A (i, j ) and (vki , vkh ) ∈ A (i,h) ,
j
then (vkh , vk ) ∈ A (h, j ) . For notational convenience, the nodes in
To address the challenges above, we propose a cross-network different networks with the same subscript are the known anchor
embedding based multi-network alignment method CrossMNA. nodes.
It extracts an extra feature vector named network vector for each Definition 3 (Multi-Network Alignment Problem). Given a set
network, which reflects the difference of global structure among of networks and part of known anchor links among the networks,
the networks. This means if the global structure of two networks is this problem is to discover the unknown or potential anchor links.
similar, their network vectors will be close in vector space. For each It is noteworthy that the networks are partially aligned [42], which
node in a network, we propose two types of embedding vectors: means not all the nodes have counterparts in other networks, and
inter-vector and intra-vector. The former depicts the commonness of the anchor nodes follow the one-to-one matching constraint [41].
anchor nodes in different networks and is shared among the known
anchor nodes. The latter reflects the specific structural feature of
this node in its selected network but is generated through a combi- 3.2 Cross-Network Embedding
nation of the network vector and inter-vector. By jointly training As stated before, the different network semantic meanings lead to
all the networks, CrossMNA can refine powerful inter-vector for diverse interactional behaviors of the anchor nodes, which influ-
network alignment tasks, and intra-vector for other downstream ences the global structure of the networks. For instance, in Figure 1
multi-network analysis tasks. What’s more, the shared inter-vector the anchor node v 1 shows different connection relationships in G 1
and the combining intra-vector can significantly save space over- and G 2 . However, as the same entity, the anchor node should also
head without sacrificing the performance. display some common features across the networks. For example,
suppose that we have known that node v 4 in G 3 is the counterpart Then we consider the node v 4 in G 3 . Its intra-vector can be
of node v 1 , we can observe that the anchor node v 1 tends to inter- writen as:
act with nodes v 2 or v 3 in each network. Therefore, we propose v34 = u4 + r3 . (3)
CrossMNA in multi-network scenarios under the assumption that
By learning the structural information in G 3 , v34 can contain some
an anchor node in a selected network can both show some similar
structural features with its counterparts and distinctive connection common features similar to v11 or v21 , for example v 4 also tends to
relationships due to its network semantic meaning. It is noteworthy interact with v 2 and v 3 . Owing to to other known anchor links
that if the related networks have the same semantic meaning, the an- between G 3 and the other networks, the network vector r3 can
chor nodes will display consistent connection relationships among preserve the specific semantic features of G 3 . Therefore, by peeling
these networks, which is the topology consistent assumption as off the impact of network semantic difference via u4 = v34 − r3 , the
previous works hold. inter-vector u4 can reflect much more similar features with u1 , by
It is obvious that the common features among the anchor nodes which we can infer that v 4 is the counterpart of v 1 .
are what we need for network alignment. To this end, we propose an The meanings of the inter-vector and intra-vector are different,
inter-vector u to preserve the common features among the anchor they can be in different vector spaces. Therefore, we propose a
nodes. Through training, we hope the inter-vector of an unknown transformation matrix W to align them with different dimensions
anchor node can be close to its counterparts in vector space. Nev- and rewrite the Equation (1) as:
ertheless, this inter-vector is hard to be learnt directly, as there vki = Wui + rk , ui ∈ Ud1 , vki ∈ Rd2 , (4)
is no direct correlation between the unknown anchor nodes. For
example, there is no direct relationship between the node v 1 in G 1 where Ud1 and Rd2 are two different vector spaces with dimension
and the node v 4 in G 3 . Therefore, we have to learn the inter-vector d 1 and d 2 respectively. W, ui and rk are the parameters need to
indirectly. learned in CrossMNA.
On the other side, it is straightforward to extract structural fea- From the above discussion, we can find that theoretically the
tures of nodes in a network via network embedding methods, which more related networks we considered, the more accurate the node
named intra-vector v in CrossMNA. This type of vector contains matching is. Therefore, CrossMNA needs to jointly train all the
both the commonness among counterparts and the specific local related work, and we thus propose the total objective function as:
connections in its selected network due to the semantics, so it can-
Jk,
X
not be applied for node matching unless we can divestiture the J = (5)
impact of network semantic. Thus, we are motivated to present k
the following equation to build a correlation among intra-vector, where Jk is the objective function of each G k , which tries to
inter-vector, and network semantics: preserve the structural information of each node in G k . Following
vki = ui + rk , (1) [31], for each directed edge (vik , v kj ) in G k , we define the conditional
probability of v kj generated by vik as:
where vki is the intra-vector of node vi in network G k which can
be easily learnt, ui is the inter-vector of node vi and its known exp(vki · vkj )
counterparts are shared this vector, rk is the network vector which p(v kj |vik ) = P
exp(vki · vkz )
can extract the unique characteristics of G k and reflects the global v zk ∈V k
, (6)
difference among the networks. Thus, we can refine the inter-vector exp((Wui + rk ) · (Wuj + rk ))
of the anchor nodes indirectly by training the combining-based = P
v zk ∈V k exp((Wui + rk ) · (Wuz + rk ))
intra-vectors.
We take the toy multi-network in Figure 1 as example for further and the objective of each network G k can be defined as:
explanation. Considering the anchor node v 1 , it should share some
Jk = p(v kj |vik ).
X
common features among the networks and this commonness is (7)
what the inter-vector should represent. Meanwhile, as G 1 and G 2 (v ik ,v jk ) ∈E k
are two different networks, the local connections of v 1 can also be
By jointly optimizing, the known anchor nodes will play their
distinct because of the diverse network semantic meanings. Follow-
roles as much as possible to transmit the structural information
ing Equation (1), we can give the intra-vector of node v 1 in G 1 and
through the shared inter-vector u, making this vector contain the
G 2 as:
commonness among the anchor nodes. At the same time, through
v11 = u1 + r1 , Equation (6), the intra-vector of each anchor node can preserve
(2) the diverse structural features in different networks, which makes
v21 = u1 + r2 .
the network vector rk extract the specific feature of its own global
On the one hand, through jointly training the intra-vectors, the structure.
shared u1 can store the complementary information between the For each unknown anchor nodes vik , although we cannot di-
two networks: for example v 1 can be connected to v 2 and v 3 at the rectly make connections to its counterparts, its combining-based
same time. On the other hand, r1 and r2 can reflect the global differ- intra-vector can preserve some similar features as its counterparts
ences between the two networks owing to the common information in other networks. By peeling off the impact of the network seman-
transmitting in u1 . tic meaning rk , the inter-vector ui will display the characteristic
features as its anchor nodes do, which can be effectively used for training, the inter-vector takes the role of transmitting the struc-
node matching. tural information among the anchor nodes. Therefore, CrossMNA
To speed up the training process, following word2vec [21], we can preserve the complementary information across multiple net-
perform negative sampling to approximate the objective function works. At the same time, because we extract the specific features
(6) as: of each network into its network vector, the intra-vector which is
combined by the inter-vector and the specific network vector, can
E (v k ,v k ) = −loдσ (vki · vkj ) − loдσ (−vki · vkz ),
X
(8) both keep the complementary information of its counterparts in
i j
v zk ∈P (z ) other networks and the distinctive properties of its selected net-
work. This can be effective in many downstream multi-network
where σ (x ) = 1/(1 + exp(−x )) is the sigmoid function and the analysis tasks, such as link prediction and vertex identity.
distribution P (v) ∝ dv3/4 , where dv is the degree of node v in the
given network. Therefore, we rewrite the objective function of (5) 4 EXPERIMENTAL EVALUATION
as the total loss function of the whole multi-network:
In this section, we perform a set of experiments to validate our pro-
−loдσ (vki · vkj ) − loдσ (−vki · vkz ). (9) posed CrossMNA. We first compare our method with several state-
X X X
L=
k ∈[1, N ] (v k ,v k ) ∈E k z of-the-art methods on the multi-network alignment task. Then,
i j
to verify the effectiveness of CrossMNA in extracting useful fea-
We adopt Adam[11] to minimize the total loss function. tures of the nodes, we also compare our CrossMNA with several
As we know, in real-world, it is ubiquitous to meet the data im- multi-network embedding methods on link prediction tasks.
balance problem among multiple networks. Previous methods can
easily suffer from the data imbalance problem as they under the 4.1 Dataset Description
pair-wise learning scheme. However, our CrossMNA can make up We employ three real-world multi-network datasets from different
this problem, because CrossMNA jointly trains all the networks and fields, e.g., social platform, bioinformatics, and academics. To reflect
transmits the complementary information among the known an- the data imbalance problem in each dataset, we present the distri-
chor nodes through the shared inter-vectors, which can alleviate the bution of anchor links between each pair of networks in Figure
influence of the data imbalance problem. What’s more, compared to 2(a) and size of each network in Figure 2(b). We can find that the
existing embedding-based methods, CrossMNA is much more light- scale of each network vary greatly in all datasets and distribution
weight. Instead of generating embedding vector for each node in of anchor links can be uneven, especially in Twitter and SacchCere.
each pair of networks, CrossMNA shares the common intra-vector We also present the global structural similarities between each pair
across the networks and uses a more flexible combining-based vec- of networks, by comparing the common neighbors reachable in
tor generation strategy as defined in Equation (4) to generate the three steps of the anchor nodes. From Figure 2(c) we can find that
intra-vector for each node in each network. No matter compared to the related networks have quite diversity structures, especially in
the embedding-based network alignment methods or the existing Twitter and SacchCere. We detail the datasets as follows:
multi-network embedding methods, CrossMNA can always signifi- arXiv1 [6] consists of various networks in terms of different
cantly reduce the memory overhead, and we will compare the size arXiv categories. There are 14,489 nodes, 59,026 coauthorship con-
of CrossMNA with other models in Section 4.7. nections, and 13 networks. The number of anchor links among the
networks is 23,626.
3.3 Multi-Network Mining Twitter1 [24] This dataset is a specific Twitter dataset which
After training, we can get two types of embedding vectors for focused on People’s Climate March in 2014. It makes use of 3 net-
each node: the intra-vector reflects the structural information of a works, corresponding to retweet, mentions, and replies. 102,439
node in a selected network, which is suitable for the intra-network nodes, 353,495 edges and 55,600 anchor links are included.
analysis tasks, while the inter-vector interprets the commonness of SacchCere1 [7, 30] is a subset of BioGRID, which is a public
the anchor nodes in different networks, which can be effectively database that archives and disseminates genetic and protein inter-
applied to multi-network alignment task. action data from humans and model organisms. It contains 6,570
A naive way to find the alignments for a node is to compute nodes, 282,755 connections and 7 networks, each represents one
all pairs of similarities between the inter-vectors, which is time- type of genetic interaction. There are a total of 55,831 anchor links.
consuming. However, in practice, we usually only need to find the
soft alignments for each node by returning its top-α most likely 4.2 Baseline Methods
nodes in other networks. So following [10], we use the k-d tree Multi-Network Alignment. To show the effectiveness of our
data structure to accelerate similarity search [4] in the matching method in addressing the multi-network alignment problem, we
process. We here use cosine similarity to define the distance of two compare with four different baseline methods through only using
nodes in vector space. network structure information.
In some applications, we can get the complete groundtruth NetAlign [3] proposes the message passing algorithms to match
among multiple networks beforehand, which means all the aligned networks under unsupervised schemas.
links are known. As our CrossMNA is a cross-network embedding
approach, it can be easily exploited in these applications to make the
data mining tasks on multiple networks more actionable. During 1 https://comunelab.fbk.eu/data.php
(a) Distribution of anchor links. (b) Size of networks. (c) Global similarity between networks.
Figure 2: Detailed information of the datasets. (a) Each grid represents |A(i, j ) |/|A|. (b) Each dot is a network, the ab-
scissa/ordinate represents the number of nodes/edges. (c) Each grid reflects the global similarity between two networks.
FINAL [43] proposes a family of algorithms to align attributed 4.3 Experiment Configuration
networks. It formulates network alignment from an optimization The parameters of the compared methods are set as follows: For
perspective referring to the alignment consistency principle. FINAL, we follow the default setting, i.e. {α, tmax } = {0.3, 30}. For
REGAL [10] is an unsupervised multi-network alignment method, REGAL we set the maximum hop distance K = 2; For Deepwalk,
which extracts similarity-based representations among graphs and we set walks per vertex as 20, window size as 5 and walk length as
infers the soft alignments by comparing the learnt embedding vec- 80; For LINE, we employ both first-order and second-order prox-
tors. imity and obtain representations via concatenation, and we set the
IONE [15] is a state-of-the-art network embedding based method negative samples k = 5; For node2vec, we empirically set p = 2
under semi-supervised schemes, which solves both the network and q = 0.5. For PMNE and MELL, we follow their default set-
embedding problem and the user alignment problem simultaneously tings, i.e., {α, p, q} = {0.5, 0.5, 0.5}, {k, λ, β, γ } = {4, 1, 1, 1}. For fair
with a unified optimization framework. comparisons, we set the same node dimension d = 200 for all
As NetAlign and FINAL both require the prior alignment infor- embedding-based methods in network alignment, and d = 100 for
mation, following previous work we construct a degree similarity each network embedding model in link prediction. For our method
matrix and take the top-loд|V | entries for each node. CrossMNA, we thus set d 1 = 200, d 2 = 100 and the number of
Intra-Link Prediction. To verify the ability of CrossMNA on negative samples as 1 to speed up the training process. We use the
extracting effective features of the nodes among the networks. We inter-vector u for network alignment task and the intra-vector v
also compare our proposed CrossMNA with five network embed- for link prediction task.
ding methods on link prediction task, which are divided into two For multi-network alignment task, we randomly remove part
categories in terms of single-network embedding or multi-network of the anchor links as the test set and the rest as the training set.
embedding: We compare the precision of the soft alignment for each method.
DeepWalk [27] is a typical network embedding method that We first introduce the Precision (i,j) @α as the evaluation metric of
learns vertex representations based on single network structures. It mapping nodes from G i to G j :
performs a random walk on the network to obtain vertex sequences
and conducts Skip-Gram model to train the sequences. |CorrectNodes (i,j) @α |
LINE [31] is another network embedding model which aims at Precision (i,j) @α = , (10)
|UnMappedAnchors (i, j ) |
learning node embeddings in a large-scale network. It minimizes a
loss function to preserve both first-order and second-order proxim-
where |U nMappedAnchors (i, j ) | is the number of unknown anchor
ity between nodes.
links between G i and G j and |CorrectNodes (i, j ) @α | is the number
node2vec [8] defines a flexible notion of a nodeâĂŹs neighbor-
of the correct alignments from G i to G j in top-α choices. Therefore,
hood and propose a biased random walk, which is a trade-off be-
the evaluation metric on multi-network alignment can be defined
tween Breadth First Search and Depth First Search.
as:
PMNE [17] is a multi-network embedding method which pro-
poses two simply merged approaches and a Co-analysis method to 1 XX
Precision@α = Precision (i,j) @α, (11)
obtain one overall embedding for each node. In our experiments, N (N − 1) i j,i
we compare with its final Co-analysis method. The Co-analysis
PMNE performs a biased random-walk across each network and where N is the number of networks.
conducts Skip-Gram model to train the node sequences. For intra-link prediction task, we randomly split all edges into
MELL [19] is a multi-network embedding method which simul- two sets for training and testing respectively. We also randomly
taneously learns the embedding vector of each node and a layer sample an unconnected node pair as a negative edge for each pos-
embedding of each single network, using all of the network struc- itive edge in the test set and use both the positive and negative
tures. edges for testing. We here adopt a standard evaluation metric ROC-
AUC [9] in our experiments.
(a) P@α vs. α. (b) P@30 vs. Training ratio. (a) P@α vs. α. (b) P@30 vs. Training ratio.
Figure 6: Parameter study on multi-network alignment. Figure 7: Parameter study on link prediction.
the common features among the anchor nodes as much as possible, fixing another to check this parameter’s impact on the tasks of net-
which shows significant improvement on this dataset. work alignment and link prediction. For network alignment task,
To summarize, the above observations illustrate our method is we keep the training ratio to 50% and use the P@30 as the metric.
significantly effective and appropriate to address multi-network For link prediction task, we also keep the training ratio to 50%.
alignment problem. From Figures 6 and 7, we can observe that d 1 has a great influence
Intra-Link Prediction. Besides applied on network alignment, on both tasks, and as d 1 grows the performance is significantly
the node embedding learnt by our CrossMNA can also be adopted improved. While d 1 is larger than a threshold, the performance will
in other downstream network analysis tasks, e.g., link prediction be stable, and the threshold is related to the size of the network.
and vertex identity. We take the link prediction task as an example When d 1 is larger than 200 on arXiv and SacchCere, the performance
to evaluate the quality of the learnt features. We compare with five will shift towards a steady distribution. However, d 1 = 300 has
state-of-the-art methods, three for the single-layer network and a significant improvement than d 1 = 200 on Twitter dataset in
two for multiple networks. For the models designed for the single- alignment task, as the scale of this dataset is very large. We also
layer network, we train a separate embedding for each network find that d 2 has a relatively smaller effect on the performance if
and use that to predict the links of the corresponding networks. We d 1 chooses a suitable number. Upon these observations, we are
evaluate the AUC values of different models with the training ratio motivated to set a relative larger d 1 , such as 200, 300 and a very
30%, 50%, and 80% and propose the results in Table 2. small d 2 , such as 30, 50 to save the memory in practice.
Upon the experimental results, we find that for all the datasets,
our CrossMNA can achieve comparable performance or even signif-
icantly outperform all the baselines. What calls for special attention 4.6 Case Study
is that CrossMNA can achieve a nearly 5% improvement on Twitter In this section, we perform a set of experiments to explore whether
dataset on all the training ratios. This is because the inter-vector CrossMNA can gain better performance with more related net-
in CrossMNA plays a role to transmit the complementary informa- works taken into account. To make the experimental results more
tion across the networks while the network vectors can store the convincing, we choose the arXiv dataset which has the most num-
distinctive properties of each network, which make the combining- ber of networks. This means the relationships among the networks
based intra-vector integrate the cross-network information without are very complicated.
sacrificing the distinctive properties in its selected network. To We vary the number of networks from 2 to 13 and set the training
summarize, the above results indicate the effectiveness of features ratio to 50%. We perform this experiment 10 times, and at each
refined by our method in multi-network mining tasks. time we shuffle the order of the networks which are taken into
consideration. We calculate the average score and present the result
4.5 Parameter Sensitivity of P@30 in Figure 8. We can observe that as the number of networks
There are two parameters in our method: the dimension of inter- increases, the overall trend of precision also goes up. These results
vector d 1 and intra-vector d 2 . To explore the influence of the param- indicate that the complementary information among the networks
eters on the model performance, we set d 1 = d 2 = 100 as default, benefits the result of alignment, and the more related networks
pick up one parameter each time and vary that parameter while considered, the better performance our CrossMNA can achieve.
Figure 8: Precision w.r.t. N . Figure 9: Memory use comparisons among multi-network embedding methods.
4.7 Scalability
In this section, we explore the scalability of our CrossMNA. Con- there is no known anchor link, our method can still take up nearly
cretely, we analyze the space complexity and time complexity of the same amount of space compared to REGAL. When there are a
our method respectively. few known anchor links, our CrossMNA can take advantage of its
combination strategy and show less space overhead.
4.7.1 Space Complexity Analysis. As we know, in real life the net- Then we compare CrossMNA with existing multi-network em-
works could be very huge. In large-scale multi-network analysis bedding methods. From Table 4, we can find that PMNE has the
tasks, if we learn embeddings for all nodes in each network, the least number of parameters among all the methods. Compared to
storage issue could be a big challenge. Therefore, we ought to ana- other methods, the size of SMNE is relatively smaller as its hyper-
lyze the space complexity of the embedding-based models. We first parameter s can be set as a small number. However, our CrossMNA
analyze the number of parameters in each model. Then we perform can take the smallest memory overhead in most cases, because we
a set of experiments to show the actual space overhead with regard do not need to generate the embedding vector for each node in each
to the size of the network. For convenience, we assume that each network, which requires dN |V | parameters. We use the strategy of
network shares the same set of nodes. We here use a subgraph of combining the common inter-vector and the network vector via a
Youtube [32] and adopt randomized permutation to generate multi- transformation matrix to generate the vector for each node in each
ple networks. We analyze the embedding-based network alignment network. Therefore CrossMNA only generates d 1 |V | + d 2 N + d 1d 2
methods and the multi-network embedding methods respectively. parameters, where d 1 ≈ d and d 2 can be a very small number. To
From Table 3, we observe that IONE inevitably takes too much show the difference more clearly, we extend the size of a multi-
memory especially when many related networks exist. Given IONE network in the number of networks and nodes respectively, and
belongs to pair-wise learning, it needs to generate embeddings for present the memory overhead of each model in Figure 9. We can ob-
each pair of networks. Unlike IONE, REGAL saves much space as it serve that CrossMNA and PMNE are significantly lighter than other
can directly train multiple networks together owing to the randomly models, and as the size of the network increases, the space overhead
sampling landmarks strategy to approximate the similarities among of CrossMNA and PMNE grows very slowly. However, as PMNE
nodes. For our CrossMNA, because of jointly training multiple merges the information from all networks into one type of embed-
networks and the combination strategy, it only generates the inter- ding, it inevitably loses the distinctive information in each network
vector for the node and the network vector for each network. It is as shown in our previous experiments. Thus our CrossMNA is dra-
noteworthy that the inter-vector is shared among the known anchor matically space efficient for large-scale multi-network applications
nodes, so the more the known anchor nodes, the less space our compared to existing methods.
CrossMNA takes. For easy understanding, we give an example with
five networks and 100,000 nodes and present the space overhead 4.7.2 Time Complexity Analysis. The runtime of CrossMNA is in
with different ratios of the known anchor links. We set d = d 1 = two parts: learning cross-network embedding and matching nodes
200, d 2 = 50. We can find that even in the extreme case where through vectors comparison. The total time complexity of learning
embeddings is approximately O (tN (d 1d 2 |V | + d 2 |E|)), where t is works can be roughly divided into two categories: single-network
the number of iterations, N denotes the number of networks, and embedding and multi-network embedding.
|V |, |E| denote the number of nodes and edges in each network Most of the previous works focus on the single-layer network.
respectively. The time complexity of finding soft alignment between Inspired by the distributed representation learning of words in NLP
each two networks is O (|V |loд|V |). [20], DeepWalk [27] performs random walk over networks to gen-
erate vertex sequences and conducts Skip-Gram to obtain node
embeddings. On top of DeepWalk, Node2vec [8] modifies the ran-
5 RELATED WORK dom walk strategy into a biased random walk to explore network
5.1 Network Alignment structure more efficiently. Unlike the random walk based methods
Network alignment is a fundamental problem for cross-network above, LINE [31] optimizes two objective functions to separately ap-
mining and many pieces of literature have been proposed to handle proximate first-order and second-order proximity in the large-scale
this problem. Most previous works make full use of the attribute networks. Another general approach for obtaining node embed-
information of nodes, e.g., username, gender, etc. The username is dings is matrix factorization. GreRap [5] proposed a matrix fac-
the most commonly used feature in almost all these works. Perito et torization based methods to encode k-step representations, where
al. [26] employ binary classifiers to determine if the cross-platform each step reflects different local information. TADW [36] incorpo-
user with the same name is the same person. Vosecky et al. [34] ex- rates text contents into network embedding under the framework
tracts distance-based profile features and build classifiers to match of matrix factorization. Although these methods have achieved
users in multiple networks, and similar works can also be found satisfactory performance on many single-layer network mining
in [18, 25]. In some scenarios, we can get some special contents tasks, they ignore the multiple networks scenario.
information of the nodes. Therefore, there also exist some meth- To make the data mining task on the multiple networks more
ods to take account of these useful contents for more precision actionable, recently several works have been proposed to transform
node matching. Riederer et al. [28] links users by considering their the multiple networks into low-dimension vector space. PMNE [17]
trajectory-based content features. Almishari et al. [22] aligns users proposes two simple merge-based methods which only consider
by exploiting their writing style. inter-layer edges or intra-layer edges, and one cross-layer method
The connection relationship between the nodes is another com- which performs a biased random-walk across each layer, to obtain
mon and vital feature. Thus many works have been proposed to one overall embedding for each node. MTNE [35] and SMNE [38]
consider both the content information and the network structure. builds a bridge among different layers by sharing a common em-
COSNET [44] uses an energy-based model to link users by consid- bedding across each layer of the multiple networks. MELL [19]
ering both local and global consistency. HYDRA [16] presents a proposes the method of simultaneously learning node embeddings
multi-objective framework to model heterogeneous behaviors and and layer embeddings using all of the network structures.
structure consistency simultaneously. However, most of these multi-network embedding models ig-
However, as the attribute information of the nodes is usually nore the problem of space overhead, which makes them hard to
missing, unreliable or even unavailable in real-life applications, be applied in large-scale multi-network applications. Unlike these
some methods have been proposed to align nodes only based on methods, our CrossMNA occupies much less physical without sac-
the structural information. BigAlign [13] proposes to align two rificing the performance, owing to the shared inter-vectors and the
bipartite graphs with a fast alignment algorithm. UMA [41] jointly flexible combination strategy.
optimizes multiple anonymized social networks in unsupervised
schemes, under the constraints of transitivity law and one-to-one 6 CONCLUSION
property. IONE [15] learns the follower-ship/followee-ship of each We have studied the problem of multi-network alignment and pro-
user under the framework of network embeddings and utilizes the pose CrossMNA, a light-weight cross-network embedding based
embedding vectors to match unknown anchor users. network alignment method for tackling today’s large-scale multi-
People nowadays usually participate in multiple diversities of network applications. CrossMNA novelly defines two categories of
networks simultaneously, and jointly learning from these related embedding vectors for each node, i.e. inter-vector and intra-vector.
networks can get much more useful complementary information The commonness of anchor nodes in different networks is repre-
for alignment. UMA [41] and REGAL [10] have been proposed to sented by the inter-vector, while the specific structural feature is
optimize multiple networks together only considering structural in- interpreted by the intra-vector. In addition, the coordination of inter-
formation. However, the assumption of topology consistency makes vector and intra-vector dramatically decline the space overhead
them fail to deal with networks with distinct structures. Consid- while reserving the adequate performance. Extensive experiments
ering the problems above, we propose CrossMNA, a light-weight showed analytically our CrossMNA can significantly outperform
cross-network embedding based network alignment method, to other currently existing baselines and multi-network embedding
jointly learn structural information across diversities of networks. methods.