FINAL: Fast Attributed Network Alignment
FINAL: Fast Attributed Network Alignment
ABSTRACT site (e.g., eBay) to the users from another site (e.g., Face-
Multiple networks naturally appear in numerous high-impact ap- book) [29]. In bioinformatics, integrating different tissue-
plications. Network alignment (i.e., finding the node correspon- specific protein-protein interaction (PPI) networks has led
dence across different networks) is often the very first step for to a significant improvement for candidate gene prioritiza-
many data mining tasks. Most, if not all, of the existing align- tion [16].
ment methods are solely based on the topology of the underlying Despite the extensive research on network alignment (see
networks. Nonetheless, many real networks often have rich at-
tribute information on nodes and/or edges. In this paper, we Section 6 for a review), most, if not all, of those work focus
propose a family of algorithms (FINAL) to align attributed net- on inferring the node correspondence solely based on the
works. The key idea is to leverage the node/edge attribute infor- topology. For instance, IsoRank [21] propagates pairwise
mation to guide (topology-based) alignment process. We formu- topology similarities in the product graph. NetAlign uti-
late this problem from an optimization perspective based on the lizes max-product belief propagation based on the network
alignment consistency principle, and develop effective and scal- topology [2]. BigAlign and UniAlign [11] aim to infer the
able algorithms to solve it. Our experiments on real networks
show that (1) by leveraging the attribute information, our algo- soft alignment based on the assumption that the adjacency
rithms can significantly improve the alignment accuracy (i.e., up matrix of one network is a noisy permutation of another
to a 30% improvement over the existing methods); (2) compared network. A fundamental assumption behind these existing
with the exact solution, our proposed fast alignment algorithm methods is the topology consistency. That is, the same node
leads to a more than 10× speed-up, while preserving a 95% ac- has a consistent connectivity structure across different net-
curacy; and (3) our on-query alignment method scales linearly, works (e.g., connecting to the same or similar set of the
with an around 90% ranking accuracy compared with our exact
full alignment method and a near real-time response time. neighbors). However, such an assumption could be easily
violated in some applications. For example, a user might be
very active on one social network site (e.g., Facebook), but
Categories and Subject Descriptors behaves more quietly on another site (e.g., LinkedIn) [11];
H.2.8 [Database Management]: Database Applications— the same gene might exhibit dramatically different behav-
Data Mining iors across different tissue-specific PPI network [16]. In these
cases, the topology-based methods could lead to sub-optimal
Keywords or even misleading alignment results.
At the same time, many real networks often have rich
Attributed network alignment; Alignment consistency; On- accompanying attributes on the nodes and/or edges (e.g.,
query alignment demographic information of the users, the communication
types between different users), which might provide a com-
1. INTRODUCTION plementary solution to address the node topology consis-
Multiple networks naturally appear in many high-impact tency assumption violation. Nonetheless, it remains a daunt-
application domains, ranging from computer vision, bioin- ing task to align such attributed networks. To be spe-
formatics, web mining, chemistry to social network analy- cific, the following questions have largely remained open.
sis. More often than not, network alignment (i.e., to find First (Q1. Formulation), it is not clear how to assimilate
node correspondence across different networks) is virtually node/edge attribute information into the topology-based net-
the very first step for any data mining task in these appli- work alignment formulation. For instance, many topology-
cations. For example, by linking users from different social based alignment approaches can often be formulated from
network sites, we could recommend the products from one the optimization perspective, yet it is unknown what its
attributed counterpart is. Second (Q2. Algorithms), the
Permission to make digital or hard copies of all or part of this work for personal or
optimization problem behind the topology-based network
classroom use is granted without fee provided that copies are not made or distributed alignment is often non-convex or even NP-hard. Introduc-
for profit or commercial advantage and that copies bear this notice and the full cita- ing attributes into the alignment process could only further
tion on the first page. Copyrights for components of this work owned by others than complicate the corresponding optimization problem. How
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission can we develop an effective solver for the attributed network
and/or a fee. Request permissions from [email protected]. alignment, with a similar or comparable time complexity to
KDD ’16, August 13-17, 2016, San Francisco, CA, USA its topology-only counterpart? Third (Q3. Scalable Compu-
c 2016 ACM. ISBN 978-1-4503-4232-2/16/08. . . $15.00 tation), how can we scale up the attributed network align-
DOI: http://dx.doi.org/10.1145/2939672.2939766
1345
ment process by taking advantage of some intrinsic proper- Table 1: Symbols and Notation
Symbols Definition
ties (e.g., low-rank) of real networks? For some applications G = {A, N, E} an attributed network
(e.g., cross-network search), we might be interested in find- A the adjacency matrix of the network
ing similar nodes across different networks (e.g., to find sim- N the node attribute matrix of the network
E the edge attribute matrix of the network
ilar users on LinkedIn for a given user on Facebook). How n1 , n2 # of nodes in G1 and G2
can we further speed up the computation for such an on- m1 , m2 # of edges in G1 and G2
query attributed network alignment process, without solving K, L # of the node and edge labels
a, b node/edge indices of G1
the full alignment problem? x, y node/edge indices of G2
In this paper, we address the attributed network align- v, w node-pair indices of the vectorized alignment s = vec(S)
ment problem, aiming to answer all these questions. The k, l node/edge label indices
main contributions of this paper are as follows: I, 1 an identity matrix and a vector of 1s, respectively
H n2 × n1 prior alignment preference
S n2 × n1 alignment matrix
1. Formulation. We formulate the attributed network r, p reduced ranks
alignment from the optimization perspective. The key α the parameter, 0 < α < 1
idea behind the proposed formulation is to leverage the a = vec(A) vectorize a matrix A in column order
node/edge attribute information to guide (topology- Q = mat(q, n2 , n1 ) reshape vector q into a n2 × n1 matrix in column order
A
e symmetrically normalize matrix A
based) alignment process based on the alignment con- D = diag(d) diagonalize a vector d
sistency principle. As a side product, our formulation ⊗ Kronecker product
helps reveal the quantitative relationships between the element-wise matrix product
(attributed) network alignment problem and several
other network mining problems (e.g., graph kernel, matrices (e.g., A), bold lowercase letters for vectors (e.g.,
node proximity). s), and lowercase letters (e.g., α) for scalars. For matrix
indexing, we use a convention similar to Matlab’s syntax as
2. Algorithms and Analysis. We propose a family follows. We use A(i, j) to denote the entry at the intersec-
of algorithms FINAL to solve the attributed network tion of the i-th row and j-th column of matrix A, A(i, :)
alignment problem. Our analysis shows that the pro- to denote the i-th row of A and A(:, j) to denote the j-th
posed FINAL algorithms are both effective and effi- column of A. We denote the transpose of a matrix by the
cient - they converge to the global optima with a com- superscript T (e.g., AT is the transpose of A). We use e
plexity that is comparable to the topology-only coun- on top to denote the symmetric normalization of a matrix
terpart. (e.g., Ae = D−1/2 AD−1/2 , where D is the degree matrix of
3. Computations. We further develop (1) an approxi- A). The vectorization of a matrix (in the column order) is
mate algorithm FINAL-N+ to solve the full attributed denoted by vec(.), and the resulting vector is denoted by the
network alignment problem, which reduces the time corresponding bold lowercase letter (e.g., a = vec(A)).
complexity from O(mn) to O(n2 ) (where m and n are We represent an attributed network by a triplet: G =
number of edges and nodes in the network, respec- {A, N, E}, where (1) A is the adjacency matrix, and (2) N
tively); and (2) a linear algorithm to solve the on-query and E are the node attribute matrix and the edge attribute
attributed network alignment, which achieves a good matrix, respectively1 . The attribute of node-a corresponds
quality-speed trade-off. to the value of N(a, a), and E(a, b) describes the edge at-
tribute of the edge between node-a and node-b. For a given
4. Evaluations. We perform extensive experiments to node attribute value k, we define Nk as a diagonal matrix
validate the effectiveness and the efficiency of the pro- with the same size as N, where Nk (a, a) = 1 if node-a has
posed algorithms. Our experimental evaluations demon- the attribute value k and Nk (a, a) = 0 otherwise. For a
strate that (1) our FINAL algorithms significantly im- given edge attribute value l, we define El as a matrix of the
prove the alignment accuracy by up to 30% over the same size with E, where El (a, b) = 1 if edge (a, b) has the
existing methods; (2) the proposed FINAL-N+ al- attribute value l and El (a, b) = 0 otherwise.
gorithm leads to a more than 10× speed-up, while Figure 1 presents an illustrative example. We can see
preserving a 95% accuracy compared with the exact from Figure 1(a), the set of nodes (2, 3, 4 and 5) from the
method; and (3) our on-query alignment method scales first network share the exact same topology with another
linearly, with an around 90% ranking accuracy com- set of nodes (20 , 30 , 40 and 50 ). The topology alone would
pared with the exact full alignment method and a near be inadequate to differentiate these two sets. On the other
real-time response time. hand, we can see that (1) 2, 20 , 5 and 50 share the same
node attribute value; (2) 3, 30 , 4 and 40 share the same
The rest of the paper is organized as follows. Section 2 node attribute value; and (3) the two edges incident to 3
defines the attributed network alignment problem and the share the same edge attribute value with those incident to 40 .
on-query attributed network alignment problem. Section 3 These node/edge attributes could provide vital information
presents the proposed optimization formulation of FINAL to establish the accurate node-level alignment (i.e., 2 aligns
and its solutions. Section 4 proposes two speed-up meth- to 50 , 5 aligns to 20 , 3 aligns to 40 and 4 aligns to 30 ). This
ods for approximate full alignment and on-query alignment. is exactly what this paper aims to address. Formally, the
Section 5 presents the experimental results. Related work attributed network alignment problem is defined as follows.
and conclusion are given in Section 6 and Section 7.
Problem 1. Attributed Network Alignment.
2. PROBLEM DEFINITIONS Given: (1) two attributed networks G1 = {A1 , N1 , E1 }
Table 1 summarizes the main symbols and notations used 1
In this paper, we use ‘graph’ and ‘network’ interchangeably,
throughout the paper. We use bold uppercase letters for and ‘(categorical) attribute’ and ‘label’ interchangeably.
1346
(a) Input Attributed Networks. (b) Matrix Representation. (c) Alignment Output.
Figure 1: An illustrative example of the attributed network alignment problem. (a): two input attributed networks. (b): the
matrix representation for attributed networks, where the upper matrices represent the adjacency matrices, and the bottom
matrices represent the node attribute (the diagonal positions) and the edge attribute (the off-diagonal entries) matrices. (c):
the desired alignment output (denoted by the red dashed lines).
and G2 = {A2 , N2 , E2 } with n1 and n2 nodes respectively, input networks should be consistent if these two pairs of
(2 - optional) a prior alignment preference H. nodes themselves are “similar/consistent” with each other.
Output: the n2 ×n1 alignment/similarity matrix S, where Let us elaborate this using the following example. In Fig-
S(x, a) represents the alignment/similarity between node-a ure 2, we are given two pairs of nodes: (1) node-a in G1 and
in G1 and node-x in G2 . node-x in G2 ; and (2) node-b in G1 and node-y in G2 . By the
alignment consistency principle, we require the alignment
In the above definition, we have an optional input, to en-
between a and x, and that between b and y to be consis-
code the prior knowledge of pairwise alignment preference
tent (i.e., small kS(x, a) − S(y, b)k), if all of the following
H, which is an n2 × n1 matrix. An entry in H reflects
conditions hold, including
our prior knowledge of the likelihood to align two corre-
sponding nodes across the two input networks. When such C1 Topology Consistency. a and b are close neighbors in G1
prior knowledge is absent, we set all entries of H equal, i.e., (i.e., large A1 (a, b)), and x, y are also close neighbors
a uniform distribution. Without loss of generality, we as- in G2 (i.e., large A2 (x, y));
sume that A1 and A2 share a comparable size, i.e., O(n1 ) = C2 Node Attribute Consistency. a and x share the same
O(n2 ) = O(n) and O(m1 ) = O(m2 ) = O(m). This will node attribute value, and so do b and y;
also help simplify the complexity analysis in the next two C3 Edge Attribute Consistency. Edge (a, b) and (x, y)
sections. share the same edge attribute value.
Notice that the alignment matrix S in Problem 1 is es-
sentially a cross-network node similarity matrix. In some The intuition behind the alignment consistency principle
applications, we might be interested in finding a small num- is as follows. If we already know that node-a is aligned to
ber of similar nodes in one network w.r.t a query node from node-x (i.e., large S(x, a)), then their close neighbors (e.g.,
the other network. For instance, we might want to find b and y) with same node attribute value should have a high
the top-10 most similar LinkedIn users for a given Face- chance to be aligned with each other (i.e., large S(y, b)),
book user. We could first solve Problem 1 and then return where we say that b and y are close neighbors of a and
the corresponding row or column in the alignment matrix S, y respectively if they are connected by the same edge at-
which might be computationally too costly as well as unnec- tribute value, with large edge weights (i.e., large A1 (a, b)
essary. Having this in mind, we further define the on-query and A2 (x, y)). This naturally leads to the following objec-
attributed network alignment problem as follows: tive function which we wish to minimize in terms of the
alignment matrix S:
Problem 2. On-Query Attributed Network Align- X S(x, a) S(y, b) 2
ment. J1 (S) = [p −p ] A1 (a, b)A2 (x, y)
Given: (1) two attributed networks G1 = {A1 , N1 , E1 } a,b,x,y
f (x, a) f (y, b) | {z }
C1: Topology Consistency
and G2 = {A2 , N2 , E2 }, (2 -optional) a prior alignment
preference H, (3) a query node-a in G1 . × 1(N1 (a, a) = N2 (x, x))1(N1 (b, b) = N2 (y, y))
| {z }
Output: an n2 × 1 vector sa measuring similarities be- C2: Node Attribute Consistency
tween the query node-a in G1 and all the nodes in G2 effi- × 1(E1 (a, b) = E2 (x, y)) (1)
ciently. | {z }
C3: Edge Attribute Consistency
3. TOPOLOGY MEETS ATTRIBUTES where (1) a, b = 1, ..., n1 , and x, y = 1, ..., n2 ; (2) 1(·) is
In this section, we present our solutions for Problem 1. We the indicator function, which takes 1 if the condition inside
start by formulating Problem 1 as a regularized optimization the parenthesis is true and zero otherwise; and (3) f (·) is a
problem, and then propose effective algorithms to solve it, node-pair normalization function that is defined as
X
followed by some theoretic analysis.
1(E1 (a, b) = E2 (x, y))A1 (a, b) if N1 (a, a) = N2 (x, x)
b,y
f (x, a) = (2)
3.1 FINAL: Optimization Formulation
× A2 (x, y)1(N1 (b, b) = N2 (y, y))
The key idea behind our proposed formulation lies in the 1 otherwise
alignment consistency principle, which basically says that The function f (x, a) measures how many (weighted) neighbor-
the alignments between two pairs of nodes across the two pairs a and x have that (1) share the same node attribute
1347
value between themselves (e.g., b and y), and (2) connect to
a and x via the same edge attribute value, respectively.
Notice that the indicator function 1(·) reflects whether
the two input nodes/edges share the same attribute value2 ,
we factorize it as follows
XK
1(N1 (a, a) = N2 (x, x)) = Nk1 (a, a)Nk2 (x, x)
k=1
(3)
L
X Figure 2: An illustration of alignment consistency.
1(E1 (a, b) = E2 (x, y)) = El1 (a, b)El2 (x, y)
l=1
which leads to the following equation
Substitute Eq. (3) into Eq. (1) and Eq. (2), we have f + (1 − α)h
s = αWs
K L
X S(x, a) S(y, b) 2 X X k k 1 1
(8)
J1 (S) = [p
f (x, a)
− p
f (y, b)
] N1 (a, a)N2 (x, x) = αD− 2 N(E (A1 ⊗ A2 ))ND− 2 s + (1 − α)h
a,b,x,y k,k0 =1 l=1
We could directly develop an iterative algorithm based on
k0 k0
l l
×A1 (a, b)E1 (a, b)A2 (x, y)E2 (x, y)N1 (b, b)N2 (y, y) (4) Eq. (8). However, such an iterative procedure involves the
and for nodes with the same attribute value, i.e., N1 (a, a) = Kronecker product between A1 and A2 whose time com-
N2 (x, x) K X
XX L plexity is O(m2 ).
f (x, a) = Nk1 (b, b)Nk2 (y, y)El1 (a, b)El2 (x, y) In order to develop a more efficient algorithm, thanks to
b,y k=1 l=1 (5) a key Kronecker product property (i.e., vec(ABC) = (CT ⊗
A)vec(B)), we re-write Eq. (8) as follows
×A1 (a, b)A2 (x, y) L
1 X
Next, we present an equivalent matrix form of J1 , which is s = αD− 2 Nvec( (El2 A2 )Q(El1 A1 )T )+(1−α)h (9)
more convenient for the following algorithm description and l=1
1
the theoretic proof. By vectorizing the alignment matrix where Q is an n2 × n1 matrix reshaped by q = ND− 2 s in
S (i.e., s = vec(S)), and with the notation of element-wise column order, i.e., Q = mat(q, n2 , n1 ). We can show that
product and Kronecker product, Eq. (1) can be re-written Eq. (8) and Eq. (9) are equivalent with each other (detailed
as X s(v) s(w) proofs are omitted due to space). The advantage of Eq. (9)
J1 (s) = [p −p ]2 W(v, w) is that it avoids the expensive matrix Kronecker product,
D(v, v) D(w, w) (6)
v,w which leads to a more efficient iterative algorithm FINAL-
=sT (I − W)s
f NE (summarized in Algorithm 1).
where v = n2 (a − 1) + x, w = n2 (b − 1) + y, N = K k
P
k=1 N1 ⊗ Algorithm 1 FINAL-NE: Attributed Network Alignment.
k P L l l
N2 , E = l=1 E1 ⊗ E2 and W = N[E (A1 ⊗ A2 )]N. Input: (1) G1 = {A1 , N1 , E1 } and G2 = {A2 , N2 , E2 }, (2) op-
f = D− 21 WD− 21 is the symmetric normalized matrix of
W tional prior alignment preference H, (3) the regularization
W. The diagonal degree matrix D of W is defined as parameter α, and (4) the maximum iteration number tmax .
K L
Output: the alignment matrix S between G1 and G2 .
X X 0 0 1: Construct degree matrix D and node attribute matrix N;
D = diag( (Nk1 (El1 A1 )Nk1 1) ⊗ (Nk2 (El2 A2 )Nk2 1))
2: Initiate the alignment s = h = vec(H), and t = 1;
k,k0 =1 l=1 3: while t ≤ tmax do
1
Note that some diagonal elements in D could be zero (e.g., 4: Compute vector q = ND− 2 s;
D(v, v) = 0). For such elements, we define the correspond- 5: Reshape q as Q = mat(q, n2 , n1 );
ing D(v, v)−1/2 , 0. 6: Initiate an n2 × n1 zero matrix T;
Putting everything together, our proposed optimization 7: for l = 1 → L do
problem can be stated as follows. 8: Update T ← T + (El2 A2 )Q(El1 A1 )T ;
9: end for
argmins J(s) = αsT (I − W)s
f + (1 − α) k s − h k2
F (7) 10:
1
Update s ← αD− 2 Nvec(T) + (1 − α)h;
where k · kF denotes the Frobenius norm, and α is the reg- 11: Set t ← t + 1;
ularization parameter. Notice that compared with J1 , we 12: end while
13: Reshape s to S = mat(s, n2 , n1 ).
have an additional regularization term ks − hk2F to reflect
the prior alignment preference, where h = vec(H). When
no such prior information is given, we set h as a uniform
Variants of FINAL-NE.
Our proposed FINAL-NE algorithm assumes that the in-
column vector. From the optimization perspective, this ad-
put networks have both node and edge attributes. It is worth
ditional regularization term would also help prevent a trivial
pointing out that it also works when the node and/or the
solution of J1 with a zero alignment matrix S.
edge attribute information is missing.
3.2 FINAL: Optimization Algorithms First, when only node attributes are available, we can set
all entries in the edge attribute matrix E to 1 where an edge
The objective function in Eq. (7) is essentially a quadratic indeed exists. The intuition is that we treat all the edges
function w.r.t. s. We seek to find its fixed-point solution by in the networks to share a common edge attribute value. In
setting its derivative to be zero this case, the fixed-point solution in Eq. (8) becomes
∂J(s) −1 −1
= 2(I − αW)s
f + 2(1 − α)h = 0
s = αDN 2 WN DN 2 s + (1 − α)h
∂s
(10)
2 −1 −1
We remark that by replacing the indicator function 1(·) by = αDN 2 N(A1 ⊗ A2 )NDN 2 s + (1 − α)h
an attribute value similarity function, the proposed formu- k0 k0
where DN = diag( K k k
P
lation can be naturally generalized to handle the numerical k,k0 =1 (N1 A1 N1 1) ⊗ (N2 A2 N2 1)) de-
attributes on nodes and/or edges. notes the degree matrix of WN . Similar to Eq. (9), we can
1348
use the vectorization operator to accelerate the computa- Lemma 1. Complexity of FINAL-NE. The time com-
tion. We refer to this variant as FINAL-N, and omit the plexity of Algorithm 1 is O(Lmntmax + LK 2 n2 ), and its
detailed algorithm description due to space. space complexity is O(n2 ). Here, n and m are the orders
Second, when only the edge attributes are available, we of the number of nodes and edges of the input networks, re-
treat all nodes to share one common node attribute value spectively; K, L denote the number of unique node and edge
by setting N to be an identity matrix. In this case, the attributes respectively, and tmax is the maximum iteration
fixed-point solution in Eq. (8) becomes number.
−1 −1
s = αDE 2 (E (A1 ⊗ A2 ))DE 2 s + (1 − α)h (11) Proof. Omitted for space.
where DE = diag(
PL l A1 )1] ⊗ [(El2 A2 )1]). Again, Finally, we analyze the relationships between the proposed
l=1 [(E1
we omit the detailed algorithm description due to space, and FINAL algorithms and several classic graph mining prob-
refer to this variant as FINAL-E. lems. Due to the space limit, we omit the detailed proofs
Finally, if neither the node attributes nor the edge at- and summarize the major findings as follows.
tributes are available, Eq. (8) degenerates to A - FINAL vs. Node Proximity.
−1 −1
s = αDU 2 (A1 ⊗ A2 )DU 2 s + (1 − α)h (12) An important (single) network mining task is the node
proximity, i.e., to measure the proximity/similarity between
where DU = D1 ⊗ D2 , D1 and D2 are the degree matrix two nodes on the same network. By ignoring the node/edge
of A1 and A2 respectively. This variant is referred to as attributes and setting A1 = A2 , our FINAL algorithms, up
FINAL-P. to a scaling operation D1/2 , degenerate to SimRank [12] - a
3.3 Proofs and Analysis prevalent choice for node proximity. Our FINAL algorithms
are also closely related to another popular node proximity
In this subsection, we first analyze the convergency, the method, random walk with restart [24]. That is, Eq. (8)
optimality and the complexity of our FINAL algorithms. can be viewed as random walk with restart on the attributed
Due to the space limit, we only present the results for the Kronecker graph with h being the starting vector. Note that
most general case (i.e., FINAL-NE). Then we analyze the neither the standard SimRank nor random walk with restart
relationships between FINAL and several classic graph min- considers the node or edge attribute information.
ing problems.
We start with Theorem 1, which says the proposed FINAL- B - FINAL vs. Graph Kernel.
NE algorithm converges to the global optimal solution of The alignment result s by our FINAL algorithms is closely
Eq. (7). related to the random walk based graph kernel [25]. To be
specific, if k(G1 , G2 ) is the random walk graph kernel between
Theorem 1. Convergency and Optimality of FINAL-
the two input graphs and p is the stopping vector, we can
NE. Algorithm 1 converges to the closed-form global mini-
f −1 h. show that k(G1 , G2 ) = pT s. This intuitively makes sense,
mal solution of J(s): s = (1 − α)(I − αW) as we can view the graph kernel/similarity as the weighted
Proof. To prove the convergency of Eq. (8), we first aggregation (by the stopping vector p) over the pairwise
show the eigenvalues of αW f are in (−1, 1). Since W f is cross-network node similarities (encoded by the alignment
1
similar to the stochastic matrix WD−1 = D 2 WD f − 21 , the vector s). We also remark that in the original random walk
eigenvalues of Wf are within [−1, 1]. Since 0 < α < 1, the graph kernel [25], it mainly focuses on the edge attribute
eigenvalues of αW f are in (−1, 1). information.
We denote the alignment vector s in the t-th iteration as C - FINAL vs. Existing Network Alignment Methods.
s(t) . We have that t−1
X If we ignore all the node and edge attribute information,
f t h + (1 − α)
s(t) = αt W αi W
f ih our FINAL-P algorithm is equivalent to IsoRank [21] by
i=0 scaling the alignment result and alignment preference by
Since the eigenvalues of αWf are in (−1, 1), we have that
D1/2 . We would like to point out that such a scaling opera-
lim αt Wf t = 0 and lim Pt−1 αi W f −1 . Putting
f i = (I − αW)
t→+∞ t→+∞ i=0 tion is important to ensure the convergence of the iterative
these together, we have that procedure. Recall that the key idea behind our optimiza-
f −1 h
s = lim s(t) = (1 − α)(I − αW) tion formulation is the alignment consistency. When the
t→+∞
attribute information is absent, the alignment consistency
Next, we prove that the above result is indeed the global principle is closely related to the concept of “squares” be-
minimal solution of the objective function defined in Eq. (7). hind NetAlign algorithm [2]. Like most, if not all of the,
We prove this by showing that J(s) in Eq. (7) is convex. existing network alignment algorithms, the node or the edge
To see this, we have that the Hessian matrix of Eq. (7) is attribute information is ignored in IsoRank and NetAlign.
O2 J = 2(I − αW).
f By the Weyl’s inequality theorem [4], We remark that these findings are important in the follow-
all eigenvalues of 2(I − αW)
f are greater than 0. In other ing two aspects. First, they help establish a quantitative re-
words, we have that O2 J is positive definite. Therefore, lationship between several, seemingly unrelated graph min-
the objective function defined in Eq. (7) is convex, and its ing problems, which might in turn help better understand
fixed-point solution by Algorithm 1 corresponds to its global these existing graph mining problems. Second, these find-
minimal solution, which completes the proof. ings also have an important algorithmic implication. Take
The time and space complexity of Algorithm 1 are summa- SimRank as an example, it was originally designed for plain
rized in Lemma 1. Notice that such a complexity is compa- graphs (i.e., without attributes), and was formulated from
rable to the complexity of topology-alone network alignment random walk perspective and it is not clear what the algo-
methods, such as IsoRank [21]. In the next section, we will rithm tries to optimize. By setting G1 = G2 and ignoring
propose an even faster algorithm. the attribute information, our objective function in Eq. (7)
1349
provides a natural way to interpret SimRank from an opti- Proof. Omitted for space.
mization perspective. By setting G1 = G2 alone (i.e., keeping
the attribute information), our FINAL algorithms can be di-
Algorithm 2 FINAL-N+: Low-Rank Approximation of
rectly used to measure node proximity on an attributed net-
FINAL-N.
work. Finally, our upcoming FINAL On-Query algorithm
Input: (1) G1 = {A1 , N1 } and G2 = {A2 , N2 }, (2) optional
also naturally provides an efficient way (i.e., with a linear prior alignment preference H, (3) the regularization parame-
time complexity) for on-query SimRank with or without at- ter α, and (4) the rank of eigenvalue decomposition r.
tribute information (i.e., finding the similarity between a Output: approximate alignment matrix S between G1 and G2 .
given query node and all the remaining nodes in the same 1: Construct degree matrix DN and node attribute matrix N;
network). 2: Construct alignment preference vector h = vec(H);
3: Eigenvalue decomposition U1 Λ1 UT1 ← A1 , U2 Λ2 UT2 ← A2
4: Compute U = U1 ⊗ U2 ;
4. SPEED-UP COMPUTATION 5: Compute Λ = [(Λ1 ⊗ Λ2 )−1 − αUT ND−1 N NU]
−1 ;
1350
• Effectiveness: How accurate are our algorithms for
O(nr) O(nr)
aligning attributed networks?
p X
K
z }| { z }| { • Efficiency: How fast are our proposed algorithms?
X −1 −1
g= σi (UT k
1 N 1 D1
2
v i ) ⊗ ( UT k
2 N 2 D2
2
ui ) (19)
i=1 k=1
| {z } 5.1 Experimental Setup
O(nr+r 2 )=O(nr)
Datasets. We evaluate our proposed algorithms on six real-
We can see that the time cost for Eq. (19) is reduced to world attributed networks.
O(pKrn), which is linear w.r.t the number of nodes n. • DBLP Co-Authorship Network: This dataset contains
We reduce the time cost for computing Λ̂ by reformulating 42,252 nodes and 210,320 edges [19]. Each author has
as follows, whose time complexity is O(Knr2 + Kr4 + r6 ) a feature vector which represents the number of publi-
O(nr 2 ) O(nr 2 ) cations of the author in each of 29 major conferences.
K z }| { z }| {
Λ̂ = [(Λ2 ⊗ Λ1 )−1 −α
X
(UT k −1 T k −1 −1 • Douban: This Douban dataset was collected in 2010
1 N1 D1 U1 ) ⊗ (U2 N2 D2 U2 )]
| {z } k=1
| {z } and contains 50k users and 5M edges [31]. Each user
O(r 2 ) O(nr 2 +r 4 ) has rich information, such as the location and offline
(20) event participation. Each edge has an attribute repre-
Putting everything together, the ranking vector of node-a senting whether two users are contacts or friends.
now becomes O(n) O(Kn)
z }| {z }| { • Flickr: This dataset was collected in 2014 and con-
K sists of 215,495 users and 9,114,557 friend relation-
1
−2
X
sa =(1 − α)H(:, a) + α(1 − α)[(D1 (a, a)D2 ) Nk1 (a, a)Nk2 ] ships. Users have detailed profile information, such
k=1
as gender, hometown and occupation, each of which
×[(U1 (a, :) ⊗ U2 ) Λ̂ g ] (21) can be treated as the node attributes [30].
| {z } |{z} |{z}
O(nr 2 ) O(Knr 2 +Kr 4 +r 6 ) O(pKnr) • Lastfm: This dataset was collected in 2013 and con-
tains 136,420 users and 1,685,524 following relation-
Based on Eq. (21), our proposed FINAL On-Query algo-
ships [30]. A detailed profile of some users is also pro-
rithm is summarized in Algorithm 3. The time complexity
vided, including gender, age and location, etc.
of FINAL On-Query is summarized in Lemma 3. Notice
that we often have r, p n, mH m n2 and K n. • Myspace: This dataset contains 854,498 users and 6
FINAL On-Query has a linear time complexity w.r.t the ,489,736 relationships. The profile of users includes
size of the input network, which is much more scalable than gender, hometown and religion, etc. [30].
both FINAL-N and FINAL-N+. • ArnetMiner: ArnetMiner dataset consists of the in-
formation up to year 2013. The whole dataset has
Algorithm 3 FINAL On-Query: Approximate On-Query 1,053,188 nodes and 3,916,907 undirected edges [30].
Algorithm for Node Attributed Networks. Based on these datasets, we construct the following five
Input: (1) G1 = {A1 , N1 } and G2 = {A2 , N2 }, (2) optional alignment scenarios for evaluations.
prior alignment preference H, (3) the regularization parame- S1. DBLP vs. DBLP. We extract a subnetwork with 9,143
ter α, (4) the rank of eigenvalue decomposition r, and (5) the users/nodes from the original dataset, together with their
rank of SVD for H p.
Output: approximate ranking vector sa between node-a in G1 publications in each conference. We randomly permute this
and all nodes in G2 . subnetwork with noisy edge weights and treat it as the sec-
Pre-Compute: ond network. We choose the most active conference of a
1: Compute degree matrices D1 and D2 ; P given author as the node attribute, i.e., the conference with
2: Compute Da = D1 (a, a)D2 , and Na = K k k
k=1 N1 (a, a)N2 ; the most publications. We construct the prior alignment
T
3: Rank r eigenvalue decomposition U1 Λ1 U1 ← A1 ; preference H based on the node degree similarity. For this
4: Rank r eigenvalue decomposition U2 Λ T
2 U 2 ← A2 ; scenario, the prior alignment matrix H alone leads to a very
p
5: Rank p singular value decomposition i=1 σi ui viT ← H;
P
poor alignment result, with only 0.6% one-to-one alignment
6: Compute g by Eq. (19); accuracy.
7: Compute Λ̂ by Eq. (20); S2. Douban Online vs. Douban Offline. We construct an
Online-Query: alignment scenario for Douban dataset in the same way as
8: Compute sa by Eq. (21). [31]. We construct the offline network according to users’
co-occurrence in social gatherings. We treat people as (1)
Lemma 3. Time complexity of FINAL On-Query. ‘contacts’ of each other if they participate in the same of-
The time complexity of FINAL On-Query is O(r6 + mr + fline events more than ten times but less than twenty times,
nr2 + mH p + np2 + Knr2 + Kr4 + pKnr). where n, m are the and (2) ‘friends’ if they co-participate in more than twenty
orders of the number of nodes and edges respectively, r, p is social gatherings. The constructed offline network has 1,118
the rank of eigenvalue decomposition and SVD, respectively, users and we extract a subnetwork with 3,906 nodes from
K is the number of node attributes and mH is the number the provided online network that contains all these offline
of non-zero elements in H. users. We treat the location of a user as the node attribute,
and ‘contacts’/‘friends’ as the edge attribute. We use the
Proof. Omitted for brevity. degree similarity to construct the prior alignment prefer-
ence H. The prior alignment matrix H alone leads to 7.07%
5. EXPERIMENTAL RESULTS one-to-one alignment accuracy.
In this section, we present the experimental results and S3. Flickr vs. Lastfm. We have the partial ground-truth
analysis of our proposed algorithms FINAL. The experi- alignment for these two datasets [30]. We extract the sub-
ments are designed to evaluate the following aspects: networks from them that contain the given ground-truth
1351
FINAL-N FINAL-NE IsoRank UniAlign FINAL-N FINAL-NE IsoRank UniAlign
nodes. The two subnetworks have 12,974 nodes and 15,436 FINAL-E FINAL-N+ NetAlign Klau FINAL-E FINAL-N+ NetAlign Klau
1
nodes, respectively. We consider the gender of a user as
0.6
node attribute. For those users with the missing informa- 0.8
0.5
tion of gender, we treat them as ‘unknown’. Same as [30],
Accuracy
Accuracy
0.6
0.4
we sort nodes by their pagerank scores and label 1% high-
0.3
est nodes as ‘opinion leaders’, the next 10% nodes as ‘middle 0.4
class’ and remaining nodes as ‘ordinary users’. Edges are at- 0.2
0.2
S3 for node attributes, edge attributes and the prior align- 0.8 0.8
ment preference. The username similarity alone can cor-
Accuracy
Accuracy
0.6 0.6
rectly align 61.80% users.
S5. ArnetMiner-ArnetMiner. We use the same method as 0.4 0.4
rithms, we test the following variants, including (1) FINAL- (a) Flickr-Lastfm networks. (b) Flickr-Myspace networks.
NE with both node and edge attributes; (2) FINAL-N with Figure 4: (Higher is better.) Alignment accuracy vs. the
node attributes only; (3) FINAL-E with edge attributes noise level in H. (α = 0.3, tmax = 30, r = 5).
only; (4) FINAL-N+, a low-rank based approximate al- the existing methods is even bigger (Figure 3(b)). This is be-
gorithm of FINAL-N. We compare them with the follow- cause in this case, the topology information alone (IsoRank,
ing existing network alignment algorithms including (1) Iso- NetAlign and Klau) could actually mislead the alignment
Rank [21], (2) NetAlign [2], (3) UniAlign [11] and (4) Klau’s process.
Algorithm [2, 9]. Second, we evaluate the impact of the noise in the prior
Machines and Repeatability. All experiments are per- alignment preference (i.e., H) on the alignment results, which
formed on a Windows machine with four 3.6GHz Intel Cores is summarized in Figure 4. As expected, a higher noise in
and 32G RAM. The algorithms are programmed with MAT- H has more negative impacts on the alignment accuracy for
LAB using a single thread. We intend to release the source most of the methods. Nonetheless, our FINAL algorithms
code as well as all the non-proprietary datasets after the still consistently outperform all other four existing methods
paper is published. across different noise levels.
Accuracy
a 20%-30% improvement in terms of the alignment accu- (a) DBLP co-author networks. (b) Flickr-Lastfm networks.
α = 0.8. α = 0.3.
racy over the existing methods (i.e., IsoRank, NetAlign and
UniAlign). Second, FINAL-N and FINAL-E both outper- Figure 5: Balance between the accuracy and the speed.
form the existing methods, yet are not as good as FINAL- tmax = 30, r = 5.
NE, suggesting that node attributes and edge attributes Quality-Speed Trade-off. We first evaluate how different
might be complementary in terms of improving the align- methods balance the alignment accuracy and the running
ment accuracy. Third, the alignment accuracy of FINAL- time for the full network alignment problem (i.e., Prob-
N+ is very close to its exact counterpart FINAL-N (i.e., lem 1). The results are summarized in Figure 5. As we
with a 95% accuracy compared with FINAL-N). Fourth, by can see, the running time of our proposed exact methods
jointly considering the attributes and the topology of net- (e.g., FINAL-N, FINAL-E) is only slightly higher than
works, our methods are more resilient to the permutation its topology-alone counterpart (i.e., IsoRank ), and in the
noise. Finally, for the two networks whose topologies are meanwhile, they all achieve a 10%-20% accuracy improve-
dramatically different from each other (e.g., Douban online- ment. FINAL-N+ and UniAlign are the fastest, yet the
offline networks), the accuracy gap between FINAL-N+ and proposed FINAL-N+ produces a much higher alignment
1352
1
r=1 A classic alignment approach can be attributed to Iso-
800 r=2
0.8 Rank algorithm [21], which is in turn inspired by PageRank
r=10
FINAL-N+ Exact FINAL-N 600 r=15
[17]. The original IsoRank algorithm propagates the pair-
0.6
FINAL On-Query
wise node similarity in the Kronecker product graph. Sev-
0.4
400
eral approximate algorithms have been proposed to speed
Flickr-Myspace
Flickr-Lastfm
up its computation. Kollias et al. [10] propose an efficient
0.2 200
method based on uncoupling and decomposition. SpaIso-
0 0 Rank modifies IsoRank to maximize the number of “squares”
100 101 102 103 0 0.5 1 1.5 2 2.5 3
Log of Time (second) Number of Nodes ×104
for sparse networks [2]. In addition, IsoRankN [13] extends
Figure 6: Alignment accu- Figure 7: Scalability of the original IsoRank algorithm and uses a similar approach
racy vs. running time for FINAL-N+. as PageRank-Nibble [1] to align multiple networks.
on-query alignment. Bayati et al. [3] propose a maximum weight matching al-
accuracy. NetAlign takes the longest running time as it in- gorithm for graph alignment using the max-product belief
volves a time-consuming, greedy/max-weight matching pro- propagation [26]. Bradde et al. [5] propose another dis-
cess during each iteration. We do not show the balance of tributed message-passing algorithm based on belief prop-
Klau’s Algorithm because the running time is usually several agation for protein-protein interaction network alignment.
hours which is not comparable with other methods. Overall, Recently, NetAlign [2] is proposed by formulating the net-
FINAL-N+ achieves the best trade-off between the align- work alignment problem as an integer quadratic program-
ment accuracy and the running time. ing problem to maximize the number of “squares”. A near-
Second, we evaluate the quality-speed trade-off for on- optimal solution is obtained by finding the maximum a pos-
query alignment problem (i.e., Problem 2). Here, we treat teriori (MAP) assignment using message-passing approach.
the top-10 ranking results by FINAL-N as the ground-truth, BigAlign formulates the bipartite network alignment prob-
and compare the average ranking accuracy of 500 random lem and uses the alternating projected gradient descent to
nodes with two proposed approximate algorithms (FINAL- solve it [11]. Zhang et al. solve the multiple anonymized
N+ and FINAL On-Query). The results are summarized network alignment in two steps, i.e., unsupervised anchor
in Figure 6. We can see, that (1) FINAL-N+ preserves link inference and transitive multi-network matching [28].
a 95% ranking accuracy, with a more than 10× speedup A related problem is to identify users from multiple so-
over FINAL-N, (2) FINAL On-Query preserves an about cial networks (i.e., the cross-site user identification prob-
90% ranking accuracy, and it is 100× faster than the exact lem). Zafarani et al. identify users by modeling user be-
FINAL-N. havior patterns based on human limitations, exogenous and
Scalability. We first evaluate the scalability of FINAL- endogenous factors [27]. Tan et al. [23] propose a subspace
N+, which is summarized in Figure 7. We can see that learning method, which models user relationship by a hy-
the running time is quadratic w.r.t the number of nodes pergraph. Liu et al. propose a method to identify same
of the input networks, which is consistent with the time users by behavior modeling, structure consistency model-
complexity results in Lemma 2. Second, we evaluate the ing and learning by multi-objective optimization [14]. COS-
scalability of FINAL On-Query, for both its pre-compute NET [30] considers both local the global consistency and
phase and online-query phase. As we can see from Figure uses an energy-based model to find connections among mul-
8, the running time is linear w.r.t the number of nodes in tiple heterogeneous networks.
both stages, which is consistent with Lemma 3. In addition,
the actual online-query time on the entire ArnetMiner data 7. CONCLUSION
set (with r = 10) is less than 1 second, suggesting that the In this paper, we study the attributed network alignment
proposed FINAL On-Query method might be well suitable problem, including the full alignment version as well as its
for the real-time query response. on-query variant. We propose an optimization-based for-
60 2.5
mulation based on the alignment consistency principle. We
rank=1 rank=1
propose a family of effective and efficient algorithms to solve
Online Query Time (second)
rank=2 rank=2
Running Time (second)
50 2 rank=5
rank=5
rank=10 rank=10
the attributed network alignment problem. In detail, we
40
rank=15
1.5
rank=15 first propose exact algorithms (FINAL) which are proved
30 to converge to the global optima, with a comparable com-
1
20 plexity with their topology-alone counterparts. We then
10 0.5 propose (1) an approximate alignment algorithm (FINAL-
N+) to further reduce the time complexity; and (2) an ef-
0 0
0 2 4 6 8 10 0 2 4 6 8 10 fective alignment algorithm (FINAL On-Query) to solve
Number of Nodes in Networks ×105 Number of Nodes in Networks ×105
the on-query network alignment problem with a linear time
(a) Pre-compute phase. (b) Online-query phase.
complexity. We conduct extensive empirical evaluations on
Figure 8: Scalability of FINAL On-Query.
real networks, which demonstrate that (1) by assimilating
the attribute information during the alignment process, the
proposed FINAL algorithms significantly improve the align-
6. RELATED WORK ment accuracy by up to 30% over the existing methods; (2)
The network alignment has attracted lots of research in- the proposed approximate alignment algorithm (FINAL-
terests with extensive literatures. It appears in numerous N+) achieves a good balance between the running time
domains, ranging from database schema matching [15], bioin- and the alignment accuracy; and (3) the proposed on-query
formatics [21, 13, 8], chemistry [22], computer vision [7, 20], alignment algorithm (FINAL On-Query) (a) preserves an
to data mining [11, 2]. around 90%+ ranking accuracy, (b) scales linearly w.r.t the
1353
size of the input networks, and (c) responds in near real [14] S. Liu, S. Wang, F. Zhu, J. Zhang, and R. Krishnan.
time. Future work includes generalizing FINAL algorithms Hydra: Large-scale social identity linkage via
to handle dynamic networks. heterogeneous behavior modeling. ACM, 2014.
[15] S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity
8. ACKNOWLEDGEMENTS flooding: A versatile graph matching algorithm and its
This work is partially supported by the National Science application to schema matching. IEEE, 2002.
Foundation under Grant No. IIS1017415, by DTRA under [16] J. Ni, H. Tong, W. Fan, and X. Zhang. Inside the
the grant number HDTRA1-16-0017, by Army Research Of- atoms: ranking on a network of networks. In
fice under the contract number W911NF-16-1-0168, by Na- Proceedings of the 20th ACM SIGKDD international
tional Institutes of Health under the grant number R01LM conference on Knowledge discovery and data mining,
011986, Region II University Transportation Center under pages 1356–1365. ACM, 2014.
the project number 49997-33 25 and a Baidu gift. We would [17] L. Page, S. Brin, R. Motwani, and T. Winograd. The
like to sincerely thank Dr. Jie Tang and Dr. Yutao Zhang pagerank citation ranking: bringing order to the web.
for their generosity to share the datasets, and anonymous 1999.
reviewers for their insightful and constructive comments. [18] W. W. Piegorsch and G. Casella. Erratum: inverting a
sum of matrices. SIAM review, 32(3):470, 1990.
9. REFERENCES [19] A. Prado, M. Plantevit, C. Robardet, and J.-F.
[1] R. Andersen, F. Chung, and K. Lang. Local graph Boulicaut. Mining graph topological patterns: Finding
partitioning using pagerank vectors. IEEE, 2006. covariations among vertex descriptors. Knowledge and
[2] M. Bayati, M. Gerritsen, D. F. Gleich, A. Saberi, and Data Engineering, IEEE Transactions on,
Y. Wang. Algorithms for large, sparse network 25(9):2090–2104, 2013.
alignment problems. IEEE, 2009. [20] C. Schellewald and C. Schnörr. Probabilistic subgraph
[3] M. Bayati, D. Shah, and M. Sharma. Maximum matching based on convex relaxation. Springer, 2005.
weight matching via max-product belief propagation. [21] R. Singh, J. Xu, and B. Berger. Global alignment of
IEEE, 2005. multiple protein interaction networks with application
[4] R. Bhatia. Linear algebra to quantum cohomology: to functional orthology detection. Proceedings of the
the story of alfred horn’s inequalities. The American National Academy of Sciences, 105(35):12763–12768,
Mathematical Monthly, 108(4):289–318, 2001. 2008.
[5] S. Bradde, A. Braunstein, H. Mahmoudi, F. Tria, [22] A. Smalter, J. Huan, and G. Lushington. Gpm: A
M. Weigt, and R. Zecchina. Aligning graphs and graph pattern matching kernel with diffusion for
finding substructures by a cavity approach. EPL chemical compound classification. IEEE, 2008.
(Europhysics Letters), 89(3):37009, 2010. [23] S. Tan, Z. Guan, D. Cai, X. Qin, J. Bu, and C. Chen.
[6] W. Cohen, P. Ravikumar, and S. Fienberg. A Mapping users across networks by manifold alignment
comparison of string metrics for matching names and on hypergraph. 2014.
records. 2003. [24] H. Tong, C. Faloutsos, and J.-Y. Pan. Fast random
[7] D. Conte, P. Foggia, C. Sansone, and M. Vento. walk with restart and its applications. 2006.
Thirty years of graph matching in pattern recognition. [25] S. V. N. Vishwanathan, N. N. Schraudolph,
International journal of pattern recognition and R. Kondor, and K. M. Borgwardt. Graph kernels. The
artificial intelligence, 18(03):265–298, 2004. Journal of Machine Learning Research, 11:1201–1242,
[8] S. Hashemifar and J. Xu. Hubalign: an accurate and 2010.
efficient method for global alignment of [26] J. S. Yedidia, W. T. Freeman, and Y. Weiss.
protein–protein interaction networks. Bioinformatics, Understanding belief propagation and its
30(17):i438–i444, 2014. generalizations. Exploring artificial intelligence in the
[9] G. W. Klau. A new graph-based method for pairwise new millennium, 8:236–239, 2003.
global network alignment. BMC bioinformatics, [27] R. Zafarani and H. Liu. Connecting users across social
10(Suppl 1):S59, 2009. media sites: a behavioral-modeling approach. In
[10] G. Kollias, S. Mohammadi, and A. Grama. Network Proceedings of the 19th ACM SIGKDD international
similarity decomposition (nsd): A fast and scalable conference on Knowledge discovery and data mining,
approach to network alignment. Knowledge and Data pages 41–49. ACM, 2013.
Engineering, IEEE Transactions on, [28] J. Zhang and S. Y. Philip. Multiple anonymized social
24(12):2232–2243, 2012. networks alignment. Network, 3(3):6, 2015.
[11] D. Koutra, H. Tong, and D. Lubensky. Big-align: Fast [29] Y. Zhang. Browser-oriented universal cross-site
bipartite graph alignment. In Data Mining (ICDM), recommendation and explanation based on user
2013 IEEE 13th International Conference on, pages browsing logs. ACM, 2014.
389–398. IEEE, 2013. [30] Y. Zhang, J. Tang, Z. Yang, J. Pei, and P. S. Yu.
[12] C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and Cosnet: Connecting heterogeneous social networks
T. Wu. Fast computation of simrank for static and with local and global consistency. ACM, 2015.
dynamic information networks. ACM, 2010. [31] E. Zhong, W. Fan, J. Wang, L. Xiao, and Y. Li.
[13] C.-S. Liao, K. Lu, M. Baym, R. Singh, and B. Berger. Comsoc: adaptive transfer of user behaviors over
Isorankn: spectral methods for global alignment of composite social network. ACM, 2012.
multiple protein networks. Bioinformatics,
25(12):i253–i258, 2009.
1354