0% found this document useful (0 votes)
64 views6 pages

Incremental Algorithms For Closeness Centrality

The document proposes incremental algorithms to efficiently update closeness centrality values when a network topology is modified through edge insertions or deletions. The algorithms exploit properties of real-world networks like small diameters and spike-shaped shortest path distributions. They reduce closeness centrality computation time from days to minutes on large temporal networks with over a million nodes.

Uploaded by

Ardiansyah S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views6 pages

Incremental Algorithms For Closeness Centrality

The document proposes incremental algorithms to efficiently update closeness centrality values when a network topology is modified through edge insertions or deletions. The algorithms exploit properties of real-world networks like small diameters and spike-shaped shortest path distributions. They reduce closeness centrality computation time from days to minutes on large temporal networks with over a million nodes.

Uploaded by

Ardiansyah S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Incremental Algorithms for Closeness Centrality

Ahmet Erdem Sarıyüce1,2 , Kamer Kaya1 , Erik Saule1 , Ümit V. Çatalyürek1,3

Depts. 1 Biomedical Informatics, 2 Computer Science and Engineering, 3 Electrical and Computer Engineering
The Ohio State University
Email:[email protected], [email protected], [email protected], [email protected]

Abstract—Centrality metrics have shown to be highly cor- quickly evaluate the effects of topology modifications on
related with the importance and loads of the nodes within the centrality values.
network traffic. In this work, we provide fast incremental al-
gorithms for closeness centrality computation. Our algorithms
efficiently compute the closeness centrality values upon changes
in network topology, i.e., edge insertions and deletions. We
show that the proposed techniques are efficient on many real-
life networks, especially on small-world networks, which have a
small diameter and spike-shaped shortest distance distribution. 0.1"
a" b"

Closeness'centrality'
We experimentally validate the efficiency of our algorithms 0.08"
0.06" c" d"
on large-scale networks and show that they can update the
closeness centrality values of 1.2 million authors in the temporal 0.04"
e" f"
DBLP-coauthorship network 460 times faster than it would 0.02"
0" g" h"
take to recompute them from scratch. 1" 2" 3" 4"

Keywords-closeness centrality; dynamic networks; small- Figure 1. A toy network with eight nodes, three consecutive
world networks edge (ah, f h, and ab, respectively) insertions/deletions, and values
of closeness centrality.
I. I NTRODUCTION Our main contributions are incremental algorithms which
Centrality metrics, such as closeness or betweenness, efficiently update the closeness centralities upon edge inser-
quantify how central a node is in a network. They have tions and deletions. Compared with the existing algorithms,
been successfully used to carry analysis for various purposes our algorithms have a low-memory footprint which makes
such as structural analysis of knowledge networks [14, 18], them practical and applicable to very large graphs. For ran-
power grid contingency analysis [7], quantifying importance dom edge insertions/deletions to the Wikipedia users’ com-
in social networks [12], analysis of covert networks [9], and munication graph, we reduced the centrality (re)computation
even for finding the best store locations in cities [15]. Several time from 2 days to 16 minutes. And for the real-life
works on rapid computation of these metrics exist in the lit- temporal DBLP coauthorship network, we reduced the time
erature. The algorithm with the best time complexity to com- from 1.3 days to 4.2 minutes.
pute centrality metrics [2] is believed to be asymptotically The rest of the paper is organized as follows: Section II
optimal [8]. Research have focused on either approximation introduces the notation and the closeness centrality metric.
algorithms for computing centrality metrics [3, 4, 13] or Our algorithms are explained in detail in Section III. Related
on high performance computing techniques [11, 19]. Today, works are given in Section IV. An experimental analysis is
the networks one needs to analyze can be quite large and given in Section V. Section VI concludes the paper.
dynamic, and better analysis techniques are always required.
II. BACKGROUND
In a dynamic and streaming network, ensuring the correct-
ness of the centralities is a challenging task [5, 10]. Further- Let G = (V, E) be a network modeled as a simple graph
more, for some applications involving a static network such with n = |V | vertices and m = |E| edges where each node
as the contingency analysis of power grids and robustness is represented by a vertex in V , and a node-node interaction
evaluation of networks, to be prepared and take proactive is represented by an edge in E. Let ΓG (v) be the set of
measures, we need to know how the centrality values change vertices which are connected to v.
when the network topology is modified by an adversary or A graph G0 = (V 0 , E 0 ) is a subgraph of G if V 0 ⊆ V
outer effects such as natural disasters. As Figure 1 shows, and E 0 ⊆ E. A path is a sequence of vertices such that
the effect of a local topology modification is usually global. there exists an edge between each consecutive vertex
To quantify these effects and find exact centrality scores, pair. A path between two vertices s and t is denoted by
P
existing algorithms are not efficient enough to be used s t (or s t if a specific path P with endpoints s and
in practice. Novel, incremental algorithms are essential to t is mentioned). Two vertices u, v ∈ V are connected if
there is a path between u and v. If all vertex pairs in G III. M AINTAINING C ENTRALITY
are connected we say that G is connected. Otherwise, it is
Many real-life networks are scale free. The diameters of
disconnected and each maximal connected subgraph of G
these networks grow proportional to the logarithm of the
is a connected component, or a component, of G. We use
number of nodes. That is, even with hundreds of millions
dG (u, v) to denote the length of the shortest path between
of vertices, the diameter is small, and when the graph
two vertices u, v in a graph G. If u = v then dG (u, v) = 0.
is modified with minor updates, it tends to stay small.
If u and v are disconnected, then dG (u, v) = ∞.
Combining this with the power-law degree distribution of
Given a graph G = (V, E), a vertex v ∈ V is called an scale-free networks, we obtain the spike-shaped shortest-
articulation vertex if the graph G−v (obtained by removing distance distribution as shown in Figure 2. We use work
v) has more connected components than G. Similarly, an filtering with level differences and utilization of special
edge e ∈ E is called a bridge if G−e (obtained by removing vertices to exploit these observations and reduce the
e from E) has more connected components than G. G is centrality computation time. In addition, we apply SSSP
biconnected if it is connected and it does not contain an hybridization to speedup each SSSP computation.
articulation vertex. A maximal biconnected subgraph of G
is a biconnected component. 0.50  
amazon0601  

Pr(d(u,v)  =  x)  
0.40  
soc-­‐sign-­‐epinions  
0.30  
A. Closeness Centrality web-­‐Google  
0.20   web-­‐NotreDame  
Given a graph G, the farness of a vertex u is defined as 0.10  
X 0.00  
far[u] = dG (u, v). 1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  
Shortest  path  distance  
v∈V
dG (u,v)6=∞ Figure 2. The probability of the distance between two (connected)
vertices is equal to x for four social and web networks.
And the closeness centrality of u is defined as
1 A. Work Filtering with Level Differences
cc[u] = . (1)
far[u] For efficient maintenance of the closeness centrality val-
ues in case of an edge insertion/deletion, we propose a work
If u cannot reach any vertex in the graph cc[u] = 0. filter which reduces the number of SSSPs in Algorithm 1 and
For a sparse unweighted graph G = (V, E) the the cost of each SSSP by utilizing the level differences.
complexity of cc computation is O(n(m + n)) [2]. For Level-based filtering detects the unnecessary updates and
each vertex s ∈ V , Algorithm 1 executes a Single-Source filter them out. Let G = (V, E) be the current graph and uv
Shortest Paths (SSSP), i.e., it initiates a breadth-first be an edge to be inserted to G. Let G0 = (V, E ∪ uv) be
search (BFS) from s, computes the distances to the other the updated graph. The centrality definition in (1) implies
vertices and far[s], the sum of the distances which are that for a vertex s ∈ V , if dG (s, t) = dG0 (s, t) for all t ∈ V
different than ∞. As the last step, it computes cc[s]. Since then cc[s] = cc0 [s]. The following theorem is used to detect
a BFS takes O(m + n) time, and n SSSPs are required in such vertices and filter their SSSPs.
total, the complexity follows. Theorem 1: Let G = (V, E) be a graph and u and v be
two vertices in V s.t. uv ∈ / E. Let G0 = (V, E ∪ uv). Then
Algorithm 1: CC: Basic centrality computation cc[s] = cc0 [s] if and only if |dG (s, u) − dG (s, v)| ≤ 1.
Data: G = (V, E) Proof: If s is disconnected from u and v, uv’s insertion
Output: cc[.]
1 for each s ∈ V do will not change cc[s]. Hence, cc[s] = cc0 [s]. If s is only
.SSSP(G, s) with centrality computation connected to one of u and v in G the difference |dG (s, u) −
Q ← empty queue dG (s, v)| is ∞, and cc[s] needs to be updated by using the
d[v] ← ∞, ∀v ∈ V \ {s}
Q.push(s), d[s] ← 0
new, larger connected component containing s. When s is
far[s] ← 0 connected to both u and v in G, we investigate the edge
while Q is not empty do insertion in three cases as shown in Figure 3:
v ← Q.pop() P
for all w ∈ ΓG (v) do Case 1: dG (s, u) = dG (s, v): Assume that the path s
P0
if d[w] = ∞ then u–v t is a shortest s t path in G0 containing uv. Since
Q.push(w) P 00 P0
d[w] ← d[v] + 1 dG (s, u) = dG (s, v), there exists a shorter path s v t
far[s] ← far[s] + d[w] with one less edge. Hence, ∀t ∈ V , dG (s, t) = dG0 (s, t).
1
cc[s] = far[s] Case 2: |dG (s, u) − dG (s, v)| = 1: Let
return cc[.] P P0
dG (s, u) < dG (s, v). Assume that s u–v t is a shortest
path in G0 containing uv. Since dG (s, v) = dG (s, u) + 1,
P 00 P0
there exists another path s v t with the same length. 1) Filtering with biconnected components: Our filter can
Hence, ∀t ∈ V , dG (s, t) = dG0 (s, t). be assisted by maintaining a biconnected component decom-
Case 3: |dG (s, u) − dG (s, v)| > 1: Let dG (s, u) < position (BCD) of G = (V, E). A BCD is a partitioning Π
dG (s, v). The path s u–v in G0 is shorter than the shortest of E where Π(e) is the component of each edge e ∈ E.
s v path in G since dG (s, v) > dG (s, u) + 1. Hence, When uv is inserted to G and G0 = (V, E 0 = E ∪ {uv}) is
∀t ∈ V \ {v}, dG0 (s, t) ≤ dG (s, t) and dG0 (s, v) < dG (s, v), obtained, we check if
i.e., an update on cc[s] is necessary.
{Π(uw) : w ∈ ΓG (u)} ∩ {Π(vw) : w ∈ ΓG (v)}
is empty or not: if the intersection is not empty, there will be
only one element in it, cid, which is the id of the biconnected
component of G0 containing uv (otherwise Π is not a valid
BCD). In this case, Π0 (e) is set to Π(e) for all e ∈ E and
Π0 (uv) is set to cid. If there is no biconnected component
containing both u and v , i.e., if the intersection above is
empty, we construct Π0 from scratch and set cid = Π0 (uv).
Figure 3. Three cases of edge insertion: when an edge uv is Π can be computed in linear, O(m+n) time [6]. Hence, the
inserted to the graph G, for each vertex s, one of them is true:
(1) dG (s, u) = dG (s, v), (2) |dG (s, u) − dG (s, v)| = 1, and (3) cost of BCD maintenance is negligible compared to the cost
|dG (s, u) − dG (s, v)| > 1. of updating closeness centrality. Details can be found in [16].
2) Filtering with identical vertices: Our preliminary
Although Theorem 1 yields to a filter only in case of analyses show that real-life networks can contain a
edge insertions, the following corollary which is used for significant amount of identical vertices with the same/a
edge deletion easily follows. similar neighborhood structure. We investigate two types of
Corollary 2: Let G = (V, E) be a graph and u and v be identical vertices.
two vertices in V s.t. uv ∈ E. Let G0 = (V, E \{uv}). Then Definition 3: In a graph G, two vertices u and v are type-
cc[s] = cc0 [s] if and only if |dG0 (s, u) − dG0 (s, v)| ≤ 1. I-identical if and only if ΓG (u) = ΓG (v).
With this corollary, the work filter can be implemented Definition 4: In a graph G, two vertices u and v are type-
for both edge insertions and deletions. The pseudocode of II-identical if and only if {u} ∪ ΓG (u) = {v} ∪ ΓG (v).
the update algorithm in case of an edge insertion is given Both types form an equivalance class relation since they
in Algorithm 2. When an edge uv is inserted/deleted, to are reflexive, symmetric, and transitive. Hence, all the
employ the filter, we first compute the distances from u and classes they form are disjoint.
v to all other vertices. And, we filter the vertices satisfying Let u, v ∈ V be two identical vertices. One can see that
the statement of Theorem 1. for any vertex w ∈ V \ {u, v}, dG (u, w) = dG (v, w). Then
Algorithm 2: Simple work filtering the following is true.
Corollary 5: Let I ⊆ V be a vertex-class containing
Data: G = (V, E), cc[.], uv
Output: cc0 [.] type-I or type-II identical vertices. Then the closeness cen-
G0 ← (V, E ∪ {uv}) trality values of all the vertices in I are equal.
du[.] ← SSSP(G, u) . distances from u in G
dv[.] ← SSSP(G, v) . distances from v in G C. SSSP Hybridization
for each s ∈ V do
if |du[s] − dv[s]| ≤ 1 then
The spike-shaped distribution given in Figure 2 can also
cc0 [s] = cc[s] be exploited for SSSP hybridization. Consider the execution
else of Algorithm 1: while executing an SSSP with source s, for
. use the computation in Algorithm 1 each vertex pair {u, v}, u is processed before v if and only
with G0
return cc0 [.]
if dG (s, u) < dG (s, v). That is, Algorithm 1 consecutively
uses the vertices with distance k to find the vertices with
distance k + 1. Hence, it visits the vertices in a top-down
manner. SSSP can also be performed in a a bottom-up
B. Utilization of Special Vertices
manner. That is to say, after all distance (level) k vertices
We exploit some special vertices to speedup the incre- are found, the vertices whose levels are unknown can be
mental closeness centrality computation further. We leverage processed to see if they have a neighbor at level k. The top-
the articulation vertices and identical vertices in networks. down variant is expected to be much cheaper for small k val-
Although it has been previously shown that articulation ues. However, it can be more expensive for the upper levels
vertices in real social networks are limited and yield an where there are much less unprocessed vertices remaining.
unbalanced shattering [17], we present the related techniques Following the idea of Beamer et al. [1], we hybridize the
here to give a complete view. SSSPs. While processing the nodes at an SSSP level, we
Graph Time (in sec.)
simply compare the number of edges need to be processed name |V | |E| Org. Best Speedup
for each variant and choose the cheaper one. hep-th 8.3K 15.7K 1.41 0.05 29.4
PGPgiantcompo 10.6K 24.3K 4.96 0.04 111.2
astro-ph 16.7K 121.2K 14.56 0.36 40.5
IV. R ELATED W ORK cond-mat-2005 40.4K 175.6K 77.90 2.87 27.2
geometric mean 43.5
To the best of our knowledge, there are only two works soc-sign-epinions 131K 711K 778 6.25 124.5
loc-gowalla 196K 950K 2,267 53.18 42.6
on maintaining centrality in dynamic networks. Yet, both web-NotreDame 325K 1,090K 2,845 53.06 53.6
are interested in betweenness centrality. Lee et al. proposed amazon0601 403K 2,443K 14,903 298 50.0
web-Google 875K 4,322K 65,306 824 79.2
the QUBE framework which uses a BCD and updates the wiki-Talk 2,394K 4,659K 175,450 922 190.1
betweenness centrality values in case of edge insertions and DBLP-coauthor 1,236K 9,081K 115,919 251 460.8
geometric mean 99.8
deletions in the network [10]. Unfortunately, the perfor-
Table I
mance of QUBE is only reported on small graphs (less than T HE GRAPHS USED IN THE EXPERIMENTS . C OLUMN Org.
100K edges) with very low edge density. In other words, it SHOWS THE INITIAL CLOSENESS COMPUTATION TIME OF CC
only performs significantly well on small graphs with a tree- AND Best IS THE BEST UPDATE TIME WE OBTAIN IN CASE OF
STREAMING DATA .
like structure having many small biconnected components.
Green et al. proposed a technique to update the be-
tweenness centrality scores rather than recomputing them structure. Besides, that specific structure is also important
from scratch upon edge insertions (can be extended to edge for the SSSP hybridization.
deletions) [5]. The idea is to store the whole data structure
used by the previous computation. However, as the authors A. Handling topology modifications
stated, it takes O(n2 + nm) space to store all the required To assess the effectiveness of our algorithms, we need
values. Compared to their work, our algorithms are much to know when each edge is inserted to/deleted from the
more practical since the memory footprint of linear. graph. Our datasets from the UFL collection do not have this
information. To conduct our experiments on these datasets,
V. E XPERIMENTAL R ESULTS we delete 1,000 edges from a graph chosen randomly in
We implemented the algorithms in C and compiled the following way: A vertex u ∈ V is selected ran-
with gcc v4.6.2 with the optimization flags -O2 domly (uniformly), and a vertex v ∈ ΓG (u) is selected
-DNDEBUG. The graphs are kept in the compressed row randomly (uniformly). Since we do not want to change the
storage (CRS) format. The experiments are run in sequential connectivity in the graph (having disconnected components
on a computer with two Intel Xeon E5520 CPU clocked at can make our algorithms much faster and it will not be fair to
2.27GHz and equipped with 48GB of main memory. CC), we discard uv if it is a bridge. If this is not the case we
For the experiments, we used 10 networks from the UFL delete it from G and continue. We construct the initial graph
Sparse Matrix Collection1 and also extracted the coauthor by deleting these 1,000 edges. Each edge is then re-inserted
network from the current set of DBLP papers. Properties one by one, and our algorithms are used to recompute the
of the graphs are summarized in Table I. They are from closeness centrality scores after each insertion.
different application areas, such as social (hep-th, PGPgiant- In addition to the random insertion experiments, we also
compo, astro-ph, cond-mat-2005, soc-sign-epinions, loc- evaluated our algorithms on a real temporal dataset of the
gowalla, amazon0601, wiki-Talk, DBLP-coauthor), and web DBLP coauthor graph2 . In this graph, there is an edge
networks (web-NotreDame, web-Google). The graphs are between two authors if they published a paper together. We
listed by increasing number of edges and a distinction is used the publication dates as timestamps and constructed
made between small graphs (with less than 500K edges) the initial graph with the papers published before January 1,
and the large graphs (with more than 500K) edges. 2013. We used the coauthorship edges of the later papers
Although the filtering techniques can reduce the update for edge insertions. Although we used insertions in our
cost significantly in theory, their practical effectiveness de- experiments, a deletion is a very similar process which
pends on the underlying structure of G. Since the diameter should give comparable results.
of the social networks are small, the range of the shortest In addition to CC, we configure our algorithms in
distances is small. Furthermore, the distribution of these dis- four different ways: CC-B only uses BCD, CC-BL uses
tances is unimodal. When the distance with the peak (mode) BCD and filtering with levels, CC-BLI uses all three
is combined with the ones on its right and left, they cover work filtering techniques including identical vertices. And
a significant amount of the pairs (56% for web-NotreDame, CC-BLIH uses all the techniques described in this paper
65% for web-Google, 79% for amazon0601, and 91% for including the SSSP hybridization.
soc-sign-epinions). We expect the filtering procedure to have Table II presents the results of the experiments. The
a significant impact on social networks because of their second column, CC, shows the time to run the full base
1 http://www.cise.ufl.edu/research/sparse/matrices/ 2 http://www.informatik.uni-trier.de/∼ley/db/
algorithm for computing the closeness centrality values on by filtering using level differences. Therefore, level filtering
the original version of the graph. Columns 3–6 of the is more useful for the graphs having characteristics similar
table present absolute runtimes (in seconds) of the centrality to small-world networks.
computation algorithms. The next four columns, 7–10, give
0.6  
the speedups achieved by each configuration. For instance,
0.4   Pr(X  =  0)  
on the average, updating the closeness values by using CC- Pr(X  =  1)  
B on PGPgiantcompo is 11.5 times faster than running CC. 0.2   Pr(X  >  1)  

Finally the last column gives the overhead of our algorithms 0  


per edge insertion, i.e., the time necessary to filter the source
vertices and to maintain BCD and identical-vertex classes.
Geometric means of these times and speedups are also given
Figure 4. The bars show the distribution of random variable X =
to provide a comparison across all the instances. |dG (u, w) − dG (v, w)| into three cases we investigated when an
The times to compute the closeness values using CC on edge uv is added.
the small graphs range between 1 to 77 seconds. On large
graphs, the times range from 13 minutes to 49 hours. Clearly, Filtering with identical vertices is not as useful as the
CC is not suitable for real-time network analysis and man- other two techniques in the work filter. Overall, there is a
agement based on shortest paths and closeness centrality. 1.15 times improvement with CC-BLI on both small and
When all the techniques are used (CC-BLIH), the time large graphs compared to CC-BL. For some graphs, such as
necessary to update the closeness centrality values of the web-NotreDame and web-Google, improvements are much
small graphs drops below 3 seconds per edge insertion. The higher (30% and 31%, respectively).
improvements range from a factor of 27.2 (cond-mat-2005) The algorithm with the hybrid SSSP implementation, CC-
to 111.2 (PGPgiantcompo), with an average improvement BLIH, is faster than CC-BLI by a factor of 1.42 on small
of 43.5 across small instances and a factor of 42.6 (loc- graphs and by a factor of 1.96 on large graphs. Although it
gowalla) to 458.8 (DBLP-coauthor), on large graphs, with seems to improve the performance for all graphs, in some
an average of 99.7. For all graphs, the time spent for few cases, the performance is not improved significantly.
overheads is below one second which indicates that the This can be attributed to incorrect decisions on SSSP variant
majority of the time is spent for SSSPs. Note that this part to be used. Indeed, we did not benchmark the architecture
is pleasingly parallel since each SSSP is independent from to discover the proper parameter. CC-BLIH performs the
each other. Hence, by combining the techniques proposed in best on social network graphs with an improvement ratio of
this work with a straightforward parallelism, one can obtain 3.18 (soc-sign-epinions), 2.54 (loc-gowalla), and 2.30 (wiki-
a framework that can maintain the closeness centrality values Talk).
within a dynamic network in real time. All the previous results present the average single edge
The overall improvement obtained by the proposed al- update time for 1,000 successively added edges. Hence, they
gorithms is significant. The speedup obtained by using do not say anything about the variance. Figure 5 shows the
BCDs (CC-B) are 3.5 and 3.2 on the average for small runtimes of CC-B and CC-BLIH per edge insertion for
and large graphs, respectively. The graphs PGPgiantcompo, web-NotreDame in a sorted order. The runtime distribution
and wiki-Talk benefits the most from BCDs (with speedups of CC-B clearly has multiple modes. Either the runtime is
11.5 and 6.8, respectively). Clearly using the biconnected lower than 100 milliseconds or it is around 700 seconds.
component decomposition improves the update performance. We see here the benefit of BCD. According to the runtime
However, filtering by level differences is the most efficient distribution, about 59% of web-NotreDame’s vertices are
technique: CC-BL brings major improvements over CC- inside small biconnected components. Hence, the time per
B. For all social networks, when CC-BL is compared with edge insertion drops from 2,845 seconds to 700. Indeed, the
CC-B, the speedups range from 4.8 (web-NotreDame) to largest component only contains 41% of the vertices and
64 (DBLP-coauthor). Overall, CC-BL brings a 7.61 times 76% of the edges of the original graph. The decrease in the
improvement on small graphs and a 13.44 times improve- size of the components accounts for the gain of performance.
ment on large graphs over CC.
For each added edge uv, let X be the random variable 1000  
(secs,  log  scale)  
Update  (me    

equal to |dG (u, w)−dG (v, w)|. By using 1,000 uv edges, we 100  

computed the probabilities of the three cases we investigated 10  

before and give them in Fig. 4. For each graph in the 1  


CC-­‐B  
figure, the sum of the first two columns gives the ratio 0.1  
CC-­‐BLIH  
0.01  
of the vertices not updated by CC-BL. For the networks
0   10   20   30   40   50   60   70   80   90   100  
in the figure, not even 20% of the vertices require an
update (P r(X > 1)). This explains the speedup achieved Figure 5. Sorted list of the runtimes per edge insertion for the first
100 added edges of web-NotreDame.
Time (secs) Speedups Filter
Graph CC CC-B CC-BL CC-BLI CC-BLIH CC-B CC-BL CC-BLI CC-BLIH time (secs)
hep-th 1.413 0.317 0.057 0.053 0.048 4.5 24.8 26.6 29.4 0.001
PGPgiantcompo 4.960 0.431 0.059 0.055 0.045 11.5 84.1 89.9 111.2 0.001
astro-ph 14.567 9.431 0.809 0.645 0.359 1.5 18.0 22.6 40.5 0.004
cond-mat-2005 77.903 39.049 5.618 4.687 2.865 2.0 13.9 16.6 27.2 0.010
Geometric mean 9.444 2.663 0.352 0.306 0.217 3.5 26.8 30.7 43.5 0.003
soc-sign-epinions 778.870 257.410 20.603 19.935 6.254 3.0 37.8 39.1 124.5 0.041
loc-gowalla 2,267.187 1,270.820 132.955 135.015 53.182 1.8 17.1 16.8 42.6 0.063
web-NotreDame 2,845.367 579.821 118.861 83.817 53.059 4.9 23.9 33.9 53.6 0.050
amazon0601 14,903.080 11,953.680 540.092 551.867 298.095 1.2 27.6 27.0 50.0 0.158
web-Google 65,306.600 22,034.460 2,457.660 1,701.249 824.417 3.0 26.6 38.4 79.2 0.267
wiki-Talk 175,450.720 25,701.710 2,513.041 2,123.096 922.828 6.8 69.8 82.6 190.1 0.491
DBLP-coauthor 115,919.518 18,501.147 288.269 251.557 252.647 6.2 402.1 460.8 458.8 0.530
Geometric mean 13,884.152 4,218.031 315.777 273.036 139.170 3.2 43.9 50.8 99.7 0.146
Table II
E XECUTION TIMES IN SECONDS OF ALL THE ALGORITHMS AND SPEEDUPS WHEN COMPARED WITH THE BASIC CLOSENESS
CENTRALITY ALGORITHM CC. I N THE TABLE CC-B IS THE VARIANT WHICH USES ONLY BCD S , CC-BL USES BCD S AND FILTERING
WITH LEVELS , CC-BLI USES ALL THREE WORK FILTERING TECHNIQUES INCLUDING IDENTICAL VERTICES . A ND CC-BLIH USES
ALL THE TECHNIQUES DESCRIBED IN THIS PAPER INCLUDING SSSP HYBRIDIZATION .
The impact of level filtering can also be seen on Figure 5. NPRP grant 4-1454-1-233 from the Qatar National Research
60% of the edges in the main biconnected component do Fund (a member of Qatar Foundation). The statements made
not change the closeness values of many vertices and the herein are solely the responsibility of the authors.
updates that are induced by their addition take less than 1
R EFERENCES
second. The remaining edges trigger more expensive updates [1] S. Beamer, K. Asanović, and D. Patterson. Direction-optimizing
upon insertion. Within these 30% expensive edge insertions, breadth-first search. In Proc. of Supercomputing, 2012.
using identical vertices and SSSP hybridization provide a [2] U. Brandes. A faster algorithm for betweenness centrality. Journal
of Mathematical Sociology, 25(2):163–177, 2001.
significant improvement (not shown in the figure). [3] S. Y. Chan, I. X. Y. Leung, and P. Liò. Fast centrality approximation
Better Speedups on Real Temporal Data: The best in modular networks. In Proc. of CIKM-CNIKM, 2009.
speedups are obtained on the DBLP coauthor network which [4] D. Eppstein and J. Wang. Fast approximation of centrality. In Proc.
of SODA, 2001.
uses real temporal data. Using CC-B, we reach 6.2 speedup [5] O. Green, R. McColl, and D. A. Bader. A fast algorithm for streaming
w.r.t. CC, which is bigger than the average speedup on all betweenness centrality. In Proc. of SocialCom, 2012.
networks. Main reason for this behavior is that 10% of the [6] J. Hopcroft and R. Tarjan. Algorithm 447: efficient algorithms for
graph manipulation. Communications of the ACM, 16(6):372–378,
inserted edges are actually the new vertices joining to the June 1973.
network, i.e., authors with their first publication, and CC- [7] S. Jin, Z. Huang, Y. Chen, D. G. Chavarrı́a-Miranda, J. Feo, and P. C.
B handles these edges quite fast. Applying CC-BL gives a Wong. A novel application of parallel betweenness centrality to power
grid contingency analysis. In Proc. of IPDPS, 2010.
64.8 speedup over CC-B, which is drastically higher than [8] S. Kintali. Betweenness centrality : Algorithms and lower bounds.
all other graphs. Indeed, only 0.7% of the vertices require CoRR, abs/0809.1906, 2008.
to run a SSSP algorithm when an edge is inserted on the [9] V. Krebs. Mapping networks of terrorist cells. Connections, 24, 2002.
[10] M.-J. Lee, J. Lee, J. Y. Park, R. H. Choi, and C.-W. Chung. QUBE:
DBLP network. For the synthetic cases, this number is 12%. a Quick algorithm for Updating BEtweenness centrality. In Proc. of
Overall, speedups obtained with real temporal data reach WWW, 2012.
460.8, i.e., 4.6 times greater than the average speedup on [11] K. Madduri, D. Ediger, K. Jiang, D. A. Bader, and D. G. Chavarrı́a-
Miranda. A faster parallel algorithm and efficient multithreaded
all graphs. Our algorithms appear to perform much better implementations for evaluating betweenness centrality on massive
on real applications than on synthetic ones. datasets. In Proc. of IPDPS, 2009.
VI. C ONCLUSION [12] E. L. Merrer and G. Trédan. Centralities: Capturing the fuzzy notion
of importance in social graphs. In Proc. of SNS, 2009.
In this paper, we propose the first algorithms to achieve [13] K. Okamoto, W. Chen, and X.-Y. Li. Ranking of closeness centrality
for large-scale social networks. In Proc. of FAW, 2008.
fast updates of exact closeness centrality values on incre- [14] M. C. Pham and R. Klamma. The structure of the computer science
mental network modification at such a large scale. Our knowledge network. In Proc. of ASONAM, 2010.
techniques exploit the spike-shaped shortest-distance dis- [15] S. Porta, V. Latora, F. Wang, E. Strano, A. Cardillo, S. Scellato,
V. Iacoviello, and R. Messora. Street centrality and densities of retail
tributions of these networks, their biconnected component and services in Bologna, Italy. Environment and Planning B: Planning
decomposition, and the existence of nodes with identical and Design, 36(3):450–465, 2009.
neighborhood. In large networks with more than 500K [16] A. E. Sarıyüce, K. Kaya, E. Saule, and Ümit V. Çatalyürek. Incre-
mental algorithms for network management and analysis based on
edges, the proposed techniques bring 99 times speedup on closeness centrality. CoRR, abs/1303.0422, 2013.
average. For the temporal DBLP coauthorship graph, which [17] A. E. Sarıyüce, E. Saule, K. Kaya, and Ümit V. Çatalyürek. Shattering
has the most edges, we reduced the centrality update time and compressing networks for betweenness centrality. In Proc. of
SDM, 2013.
from 1.3 days to 4.2 minutes. [18] X. Shi, J. Leskovec, and D. A. McFarland. Citing for high impact.
In Proc. of JCDL, 2010.
VII. ACKNOWLEDGMENTS [19] Z. Shi and B. Zhang. Fast network centrality analysis using GPUs.
This work was partially supported by the NHI/NCI BMC Bioinformatics, 12:149, 2011.
grant R01CA141090; the NSF grant OCI-0904809; and the

You might also like