An Incremental Algorithm For Updating Betweenness Centrality and K-Betweenness Centrality and Its Performance On Realistic Dynamic Social Network Data
An Incremental Algorithm For Updating Betweenness Centrality and K-Betweenness Centrality and Its Performance On Realistic Dynamic Social Network Data
(2014) 4:235
DOI 10.1007/s13278-014-0235-z
ORIGINAL ARTICLE
Received: 5 January 2014 / Revised: 30 September 2014 / Accepted: 1 November 2014 / Published online: 25 November 2014
Springer-Verlag Wien 2014
Abstract The increasing availability of dynamically k-betweenness centrality on both synthetic social network
changing digital data that can be used for extracting social data sets and on several real-world social network data sets.
networks over time has led to an upsurge of interest in the The presented incremental algorithm can achieve sub-
analysis of dynamic social networks. One key aspect of stantial performance speedup (3–4 orders of magnitude
dynamic social network analysis is finding the central faster for some data sets) when compared to the state of the
nodes in a network. However, dynamic calculation of art. And, incremental k-betweenness centrality, which is a
centrality values for rapidly changing networks can be good predictor of betweenness centrality, can be carried out
computationally expensive, with the result that data are on social network data sets with millions of nodes.
frequently aggregated over many time periods and only
intermittently analyzed for centrality measures. This paper Keywords Betweenness centrality k-Betweenness
presents an incremental betweenness centrality algorithm centrality Incremental algorithms Dynamic network
that efficiently updates betweenness centralities or analysis All-pairs shortest paths
k-betweenness centralities of nodes in dynamic social
networks by avoiding re-computations through the efficient
storage of information from earlier computations. In this 1 Introduction
paper, we evaluate the performance of the proposed algo-
rithms for incremental betweenness centrality and For decades, social network analysis has been an important
tool for solving a number of problems such as revealing
patterns of information dissemination, assessing the impact
This paper was invited as an extended version of Miray Kas, Matthew of business decisions on organizational structures, and
Wachs, Kathleen M. Carley, L. Richard Carley. Incremental identifying influential actors in social networks. There are a
Algorithm for Updating Betweenness Centrality in Dynamically large number of algorithms in network science that seek to
Growing Networks. The 2013 IEEE/ACM International Conference
on Advances in Social Networks Analysis and Mining (ASONAM),
identify the most prominent nodes/actors and the relevant
August 25–28, 2013, Niagara Falls, Canada. characteristics of nodes in human social networks. This
paper is focused on betweenness centrality, which is one of
Present Address: the most widely used social network metrics for identifying
M. Kas
key nodes/actors. The betweenness centrality of a node is
Google Inc., Menlo Park, CA, USA
e-mail: [email protected] the fraction of the shortest paths between all possible pairs
of nodes that contain that node. And, because the best
M. Kas K. M. Carley L. R. Carley (&) known algorithm for computing betweenness centrality
Center for Computational Analysis of Social and Organizational
(Brandes’ algorithm 2001) starts by solving the all-pairs
Systems (CASOS), Carnegie Mellon University, Pittsburgh,
PA 15213, USA shortest path problem, it is also one of the most compu-
e-mail: [email protected] tationally expensive social network metrics to calculate.
K. M. Carley In many important cases, the social network being
e-mail: [email protected] analyzed, and/or an approximate model of the true
123
235 Page 2 of 23 Soc. Netw. Anal. Min. (2014) 4:235
underlying social network, is evolving over time. An not only included the shortest paths, but also included paths
example of a social network that evolves over time is a whose length was up to ‘‘k’’ longer than the shortest paths.
group of politicians. Their social network may change over Solving for k-betweenness centrality is advantageous
time as elections are lost or won and new political factions when considering extremely large social networks because
are formed and/or old ones are dissolved. There are also k-betweenness centrality is generally orders of magnitude
many cases of static social networks for which the analyst’s faster to compute than betweenness centrality. Some
approximate model evolves over time. Often an analyst researchers have proposed the use of parallel techniques to
does not ‘‘know’’ the true linkages between people; how- address such extremely large problems (Bader and Madduri
ever, by tracking their communications over time, through 2006). Extremely large social networks already exist; e.g.,
as many media channels for which information is available, (Yang and Leskovec 2011). Other sources for huge network
the analyst can infer the presence and strength of linkages data are patents (Batagelj et al. 2006), wikis (Leskovec et al.
in the underlying social network. But, this means that as 2010), or communication networks (Onnela et al. 2007).
new data about communications arrive, the model of the What makes k-betweenness centrality of interest for ana-
social network changes over time. In both of these cases, it lyzing extremely large social networks is that there is sig-
is important to analyze the betweenness centrality of the nificant evidence that for values of k in the range of 2–3 (on
social network repeatedly as it evolves over time. And, in binary networks), it is a good predictor of betweenness
many of these cases, rapidly updating the values of centrality in most human social networks, especially social
betweenness centrality may be critical for making deci- media networks (Pfeffer and Carley 2012).
sions. For example, an intelligence analyst may be trying to And, because many of the social networks of interest are
identify the leaders of a terrorist cell or a business-mar- ones that change over time, an incremental algorithm for
keting analyst may be deciding which agent to target with computing k-betweenness centrality on evolving social net-
what kind of advertising for what kind of product. In some works has the potential to facilitate over-time analysis of
cases the ideal scenario is to update the calculation of betweenness centrality on extremely large and sparse social
betweenness centrality every time there is a new observa- networks; e.g., Facebook and Twitter. The relative compute
tion about the communications, and hence an update in the time and memory requirements for the incremental
model of the underlying social network. k-betweenness centrality algorithm will be presented. And,
An incremental algorithm for efficiently calculating the the potential compute time speedup of the incremental
betweenness centrality of growing social networks was k-betweenness centrality algorithm relative to repeated invo-
presented in (Kas et al. 2012a) and described in detail in cations of Brandes’ k-betweenness algorithm (called boun-
(Kas et al. 2012a) and (Kas 2013b). In this paper, that ded-distance betweenness centrality) (Brandes 2008), is
algorithm is extended to cover all types of network presented for synthetic social network data sets and for sev-
updates. Results on synthetic and real-world social net- eral real-world social network data sets. It is believed that this
works using the extended algorithm are provided. The is the first algorithm that incrementally computes the exact
relative compute time and memory requirements for this value of k-betweenness centrality; though, there has been
extended algorithm are summarized. An important addi- research on developing a randomized method for speeding up
tional contribution of this paper is that the extension of this the incremental computation of the approximate value of
algorithm handles the incremental update of k-betweenness k-hop betweenness centrality (Kourtellis et al. 2013).
centrality (Borgatti and Everett 2006). The reader is cau-
tioned that ‘‘k betweenness’’ is not a uniquely defined term
in the literature. The term ‘‘k betweenness centrality’’ was 2 Background
originally used by (Borgatti and Everett 2006) to mean the
betweenness centrality calculated on a set of paths whose 2.1 Notation and betweenness centrality definition
maximum cost is k. Brandes (2008) offers an efficient
algorithm for calculating k-betweenness centrality and The betweenness centrality of node x is defined as the
termed it ‘‘bounded-distance betweenness centrality.’’ This fraction of the shortest paths that pass through x across all
is the meaning that is being used in this paper. Other papers pairs of nodes in a network. A dynamic social network can
consider k-betweenness centrality to be the centrality be represented as a directed graph, G(t), that consists of a
computed on paths of at most k hops, independent of the set of nodes VðGðtÞÞ and a set of edges EðGðtÞÞ where n is
weights on the links [e.g., (Kourtellis et al. 2013)], which is the number of nodes, and m is the number of edges in the
equivalent for binary networks, but different for weighted network. x ! y 2 EðGðtÞÞ represents an edge directed
networks. In addition, (Jiang et al. 2009) presented an from node x to node y, where x 2 VðGðtÞÞ is a predecessor
algorithm, also called ‘‘k betweenness centrality’’ that of y, and y 2 VðGðtÞÞ is a successor of x. PredðxÞ is used to
calculated the betweenness centrality on a subgraph that denote all predecessors of x in the graph. Px(y) denotes the
123
Soc. Netw. Anal. Min. (2014) 4:235 Page 3 of 23 235
set of predecessors of node y on the shortest paths from An algorithm to solve this single-source shortest path
node x. In positively weighted networks, each edge e in the problem was originally proposed by Dijkstra (1959). By
network has an associated real-valued traversal cost where traversing the network of shortest paths from node j, the
C(x ? y) [ 0. The cost of a path through multiple nodes is values of rðj;kÞ and rðj;kÞ ðiÞ for every possible value of i and
the sum of the costs of the sequence of edges. Let D(x, y) k can be calculated. Since Dijkstra’s original algorithm has
denote the lowest cost path (also referred to as the a computational complexity of O(n2), the overall compu-
‘‘shortest’’ path) between nodes x and y. In addition, r(x, y) tational complexity becomes O(n3). It is also possible to
is defined to be the number of distinct lowest cost paths solve the all-pairs shortest path algorithm directly using the
from node x to node y and rðj;kÞ ðiÞ is defined to be the Floyd–Warshall algorithm (Floyd 1962); however, its
number of shortest paths from j to k that contain node i. computational complexity is also O(n3). For networks that
The vector B holds the betweenness centrality value of are sparse, the number of edges, m, is much less than the
each node. Finally, SP(x, y, z) is defined to be true if the number of nodes squared (m n2), which is the case for a
edge x ! y 2 EðGÞ is on a shortest path from x to z, sat- majority of real-world social networks. For sparse net-
isfying the following two conditions: (1) there is a path works, a variation of Dijkstra’s algorithm was developed
from x to z (i.e. the distance from x to z, D(x, z), = ?) and that is based on a min-priority queue implemented by a
(2) D(x, z) = C(x, y) ? D(y, z). SP is false otherwise. The Fibonacci heap, which reduced the computational com-
betweenness centrality of node i can then be written as: plexity of the single-source all-pairs shortest path problem
X rðj;kÞ ðiÞ to O(m ? n log n) (Fredman and Tarjan 1984). This is
BðiÞ ¼ where i 6¼ k; i 6¼ j; j 6¼ k asymptotically the fastest known single-source shortest
rðj;kÞ
j 2 V ðG Þ path algorithm for arbitrary directed graphs with unboun-
k 2 VðGÞ ded non-negative weights; however, for special cases, such
as only integer or binary weights, even faster solutions are
k-betweenness centrality is defined exactly as betweenness
available. Brandes (2001) was the first to propose an
centrality except that the only paths whose total cost (the
algorithm for calculating betweenness centrality that has a
sum of the traversal cost of all of the edges in the path) is
computational complexity of O(nm ? n2 log n) for a
less than k are counted in r(x, y) and rðj;kÞ ðiÞ.
positive arbitrarily weighted directed network. Brandes’
A weighted network can be used to represent a binary algorithm has become the standard implementation
network by restricting edge costs to have a value of either 1 approach for calculating betweenness centrality. Therefore,
or infinity. Undirected networks can be handled by having it will be used as the primary basis for comparison in this
all of the edges be symmetric. Therefore, the directed paper. Note, Brandes (2008) also developed a variant of
positively weighted network description is extremely broad this algorithm for calculating k-betweenness centrality
and includes the cases of undirected and unweighted net- (bounded-distance betweenness centrality), which will be
works; hence, this paper will focus on algorithms that work used as the comparison basis for the speed improvement of
for any positively weighted, directed graphs. In computer the proposed incremental k-betweenness centrality
data networks, each edge cost is typically that the delay algorithm.
data will experience when traveling from node x to node y,
in which case infinite delay corresponds to a non-existent 2.3 Incremental algorithms for betweenness centrality
path. In social networks, the relationship specified between
node x and node y is often the strength of the tie between When betweenness centrality was first proposed as a metric
them, in which case zero corresponds to a non-existent for analyzing social network data, the primary application
connection. In social networks, the edge cost is typically was the analysis of static snapshots of small networks (e.g.
taken to be the reciprocal of the strength of the tie but other 20–30 nodes) (Freeman 1977) The algorithmic complexi-
mappings may be employed. ties and computation times of the betweenness centrality
measure were not a significant problem for such small,
2.2 Brief review of prior research work on betweenness static networks. In the modern era of social media, a goal of
centrality dynamic network analysis must be to analyze extremely
large social networks that evolve over time with the best
The original definition of betweenness centrality and the possible computational efficiency. Increasingly, network
proposal of its use for identifying important individuals in data are available through sensors and online resources
social networks were due to Freeman (1977). A straight- such as social media networks where the participants may
forward implementation of the calculation of betweenness be changing and/or the level of participation is changing
centrality that would start at every node, j, would find the rapidly over time. Examples include SMS networks,
lowest cost paths from j to every other node in the network. Twitter networks, inter-organizational alliance networks,
123
235 Page 4 of 23 Soc. Netw. Anal. Min. (2014) 4:235
etc. And, it is of great interest to social scientists, business note that this requires significant additional computation
decision makers and military analysts to quickly spot and memory over the incremental all-pairs shortest path
changes and trends on such networks. Therefore, it is of algorithms because betweenness centrality and k-
great value to have a methodology that can update calcu- betweenness centrality depend not just only on the cost of
lations of betweenness centrality with every change in a the shortest path, but also on the number of different
network in real time. Using incremental algorithms, this shortest paths with the same cost.
goal can be achieved for betweenness centrality, even for There has been substantial prior work on incremental
very large social networks. calculations of betweenness centrality. For example,
To facilitate solutions to costly problems on continually (Green et al. 2012) presented an incremental algorithm for
changing networks, incremental algorithms are commonly calculating betweenness centrality on growing binary-
adopted. An incremental algorithm is an algorithm that weighted social networks. Binary-weighted social networks
updates the solution to a problem after an incremental are sufficient for analyzing ties that exist or do not exist;
change is made on its input (Berman 1992). Incremental e.g., the friendship network in Facebook. However,
algorithms arrive at solutions for computationally complex because the frequency of communication is often taken as a
problems in an efficient manner without recomputing proxy for tie strength in communications networks (email,
everything from scratch by preserving significant infor- Twitter, etc.), citation networks, coauthorship networks,
mation from prior computations. In the literature, there are patent networks, etc., it is important to develop incremental
several different techniques proposed for solving the all- betweenness centrality algorithms that can handle weighted
pairs shortest path problem dynamically; e.g., (Ramalin- social networks, such as the one presented in this paper.
gam and Reps 1991a, b; King 1999; Demetrescu and Ital- Another approach that calculates incremental betweenness
iano 2004). The algorithm proposed in (King 1999) solves centrality is QuBE, which focuses on updating between-
the all-pairs shortest path problem in networks that have ness centralities without solving the all-pairs shortest paths
positive integer edge costs that are less than a certain in the network (Lee et al. 2012a, b). As will be seen in the
number, b, which is inapplicable for more general case of results section, the proposed algorithm achieves dramati-
networks whose edges are positive real-valued numbers. cally greater reduction in computation time relative to
The Demetrescu and Italiano algorithm (Demetrescu and Brandes’ algorithm than does the QuBE algorithm.
Italiano 2004) depends on the notions of locally shortest Many researchers have suggested algorithms for vari-
paths and locally historical paths. The main idea is to ants of betweenness centrality that generally require less
maintain dynamically the set of locally historical paths, computational complexity. One set of variants of
which is a path that has been identified as a shortest path at betweenness centrality focuses on incorporating over-time
some point and has not been modified since then. In this information into the definition of betweenness for dynam-
study, the dynamic all-pairs shortest path algorithm pro- ically changing graphs [e.g. (Kim and Anderson 2012;
posed by Ramalingam and Reps 1991a, b is used to Lerman et al. 2010; Tang et al. 2010; Habiba et al. 2007)].
maintain all-pairs shortest paths dynamically. There are In contrast, this work focuses on faster computation of the
several reasons why Ramalingam and Reps algorithm was original betweenness centrality metric and the k-between-
selected for solving the incremental shortest path problem. ness centrality metric as defined by Freeman (1977) in
First, Ramalingam and Reps algorithm is the most com- dynamically changing networks. Another recent study
monly used dynamic all-pairs shortest paths algorithm in focusing on speeding up the exact computation of
the literature. Second, it has good performance compared betweenness centrality is (Puzis et al. 2012). These authors
to other dynamic all-pairs shortest path algorithms avail- used two different heuristics: structural equivalence and
able in the literature considering the experiments presented partitioning the network into smaller components.
in (Demetrescu and Italiano 2006). Although this work focuses on speeding up exact
When computing betweenness centrality or k-between- betweenness computation, it targets static networks and
ness centrality a network update such as an edge insertion does not support efficient incremental betweenness cen-
or edge cost decrease might result in the creation of new trality computation.
shortest paths in the network. However, a considerable
portion of the older paths might remain intact, especially in
unaffected parts of the network. Therefore, accurate 3 Incremental betweenness algorithm
maintenance of the number of shortest paths and the pre-
decessors on the shortest paths will suffice for accurately In this section, the proposed incremental algorithm for
updating betweenness values in the case of dynamic net- calculating betweenness centrality is described. Network
work updates. This observation is a key in the design of our updates can be broadly classified into two categories: (1)
incremental betweenness centrality algorithm. However, growing network updates, and (2) shrinking network
123
Soc. Netw. Anal. Min. (2014) 4:235 Page 5 of 23 235
updates. The first group of updates, the growing network then to update the shortest paths to/from each affected sink/
updates, includes (1) inserting a new node, (2) inserting a source node. Algorithm-1 in ‘‘Appendix’’ provides the
new edge, and (3) decreasing the cost of an existing edge. pseudocode for the DELETEEDGE algorithm. The DELE-
They are called ‘growing network updates’ because they TEEDGE algorithm resembles the INSERTEDGE algorithm.
are usually observed due to new actors/agents joining the However, it has minor differences. The most important
network or more/new interactions. Due to space constraints difference is the call for an algorithm called ADJUSTNPS,
minimal discussion of the algorithm for growing network which is used to calculate the number of shortest paths
updates is included in this paper. For a detailed description accurately. After all the shortest distances and the prede-
of the growing update algorithm the reader is referred to cessors on the shortest paths are updated accurately, the
(Kas et al. 2013). call to the ADJUSTNPS algorithm (Line 9 of DELETEEDGE)
Shrinking network updates include (1) deleting an finalizes the computation of the number of shortest paths
existing node, (2) deleting an existing edge, and (3) and the call to the ADJUSTBETWEENNESS algorithm (Line 10
increasing the cost of an existing edge. These are called of DELETEEDGE) completes the computation of betweenness
‘shrinking network updates’ because they usually occur centrality values.
due to existing actors/agents departing from the network or The data structures initialized in Line 2 of the DELE-
broken/weakening relationships among agents. Handling TEEDGE algorithm are initialized as data structures that are
deletion of a node with several edges reduces to several visible to all the algorithms used for handling the shrinking
edge deletions to handle each edge emanating from or network updates. The data structures initialized in Line 2 of
entering into the deleted node. Similarly, deleting an the DELETEEDGE algorithm have similar structures and
existing edge can be represented as a special case of the similar usages to those in the INSERTEDGE algorithm. The
network update that increases the cost of an edge, because Dold and rold are implemented as hash maps. The keys for
deleting an existing edge corresponds to increasing the cost the data structure Dold are composed of the related \x, y[
of an edge from a real positive value to infinity in the node identifiers, which hold the original D(x, y) value
adjacency matrix. Shrinking network updates modify the before the network update has been issued. Similarly, the
shortest paths in a network only if the modified edge/node keys for the data structure rold are composed of the related
was previously on a shortest path; resulting in a change on \x, y[ node identifiers, which hold the original r(x,
the number of shortest paths or rediscovery of the shortest y) value before the network update has been issued. The
paths among the remaining connections. Since the manner data structure used for trackLost is also a hash map whose
in which shrinking network updates affect the shortest keys are again constructed as the identifiers of the related
paths in a network is markedly different from that of nodes\x, y[. The values held in trackLost are implemented
growing network updates, it requires its own set of algo- as hash set, holding the identifiers of the nodes that were
rithms, which will be described in this section. previously intermediates on the shortest paths from node
x to node y, and that are not on the shortest paths any more.
3.1 Handling shrinking network updates
for betweenness centrality 3.3 DeleteUpdate procedure
Next, the details of the part of the incremental betweenness The core of the incremental betweenness algorithm for
algorithm that handles the shrinking network updates (e.g. handling the shrinking network updates is the DELE-
edge/node deletion or edge cost increase) are discussed. TEUPDATE algorithm (Algorithm-2 in ‘‘Appendix’’). The
There are five sub-algorithms that are used to achieve first phase of the DELETEUPDATE algorithm is between Lines
accurate maintenance and incremental update of between- 2 and 7 while the second phase is between Lines 8 and 40.
ness centrality values: DELETEEDGE, DELETEUPDATE, The first phase of the algorithm identifies the set of affected
CLEARBETWEENNESS, ADJUSTNPS, and ADJUSTBETWEENNESS. nodes. For betweenness centrality, the affected nodes are
In the rest of this section, each sub-algorithm is discussed, the nodes whose shortest paths to node z have changed in
respectively. terms of number or length. The shortest paths from a node
x to another node z may change only if the deleted/modified
3.2 DeleteEdge procedure edge is an edge that used to lie on the shortest path(s) from
node x to node z. Such nodes are inserted into the Affect-
When a shrinking network update is issued (e.g. edge/node edVertices set for further processing to find the new
deletion or edge cost increase), the entry point of execution shortest paths from each node x to node z.
is the DELETEEDGE algorithm. The DELETEEDGE algorithm The second phase of the DELETEUPDATE algorithm
calls the DELETEUPDATE algorithm several times; first, to determines the new shortest path distances from all affec-
find the complete set of affected sink and source nodes, ted nodes to node z as well as the predecessors on the
123
235 Page 6 of 23 Soc. Netw. Anal. Min. (2014) 4:235
shortest paths. In the second phase of the DELETEUPDATE The rest of the DELETEUPDATE algorithm, Lines 25–40,
algorithm, the betweenness centrality values are also processes this priority queue to see if the shortest distances
opportunistically updated for the node pairs whose previ- can be further updated. In the first part of the algorithm, if
ously known shortest paths are invalidated. In Line 11, one there is no node b that would satisfy the condition in Line
of the AffectedVertices, node a, is removed from the Af- 16 or Line 20, then no shortest paths from node a to node
fectedVertices for finding the new shortest path(s) from it z were found and the shortest distance from node a to node
to node z. If this is the first time, the shortest paths from z is forced to be infinity. Starting with Line 25 of the
node a to node z are examined, the node pair \a, z[ is DELETEUPDATE algorithm, the shortest paths that are prop-
inserted into rold and Dold to keep a record of their previ- erly discovered earlier in the algorithm are examined to see
ously known shortest distance and shortest path counts if new shortest paths that are even shorter can be discov-
before any update is made on them. In Line 14, setting ered. The impact of an incremental network update prop-
myMin to infinity starts with the assumption that this is not agates in the form of ripples expanding outward from the
a shortest path from node a to node z anymore. Since a modification point of the update. Hence, it may be possible
shrinking network update might result in disconnecting the to find shorter paths in the outer levels of these update
two nodes a and z and making node z unreachable from ripples that use shorter paths from earlier levels.
node a. In order for node a to reach to node z, there needs For each node a inserted in the priority queue Q_inc,
to be at least one immediate neighbor of node a that con- check the predecessors of node a (c 2 PredðaÞ) and see if
nects node a to node z. Therefore, each successor b of node the newly discovered shortest distance from node a to node
a is examined one by one to see which node or nodes z is useful in finding a shorter path from node c to node z.
provide the shortest path(s) to node z and a running min is For each node c, check if the path from node c to node z
kept to identify the minimum shortest path distance. If a which uses the edge fc ! ag is shorter than or equal to
new shortest path that would pass from at least one of the the currently known shortest paths from node c to node z.
successor nodes of node a cannot be found, then there When a shorter path from node c to node z is found
would be no node b that would satisfy the condition on (Line 28), also check if the node c is not in the list of
Line 16, and the value of myMin has to be chosen as affected nodes (AffVert). If node c is not an element of
infinity. This process of discovering the new shortest AffVert, check if the node pair \c, z[ has been updated
path(s) from node a to node z incrementally builds on the before during the propagation of the current network
previous knowledge on the shortest paths as it uses prior update. If it is the first time the shortest paths from node
knowledge of the shortest path distance from each suc- c to node z are to be updated, to keep track of the original
cessor node b to node z. Every time a node b that satisfies values known before the network update started, the pre-
one of the ‘shorter path’ (Line 19) or ‘equivalent to the viously known shortest distance from node c to node z and
shortest path(s)’ (Line 21) conditions is found, the prede- the shortest path count from node c to node z are inserted
cessors on the shortest paths from node a to node z need to into Dold and rold, respectively. In addition, the between-
be updated. When a new path from node a to node z that is ness centrality values of the predecessors on the shortest
strictly shorter than the currently known shortest paths is paths from node c to node z are reduced as required by the
found (Line 19), the set of predecessors on the shortest CLEARBETWEENNESS algorithm in Line 29 of the DELE-
paths from node a to node z is cleared (Line 18), and the set TEUPDATE algorithm, and then cleared in Line 31. Line 32
of predecessors from node b to node z is inserted instead. of the DELETEUPDATE algorithm updates the predecessors on
However, if node b is equal to node z in Line 21 (i.e. node the shortest paths from node c to node z to include the
a can reach to node z in one hop and the shortest path is predecessors on the shortest paths from node c to node
actually the edge from node a to node z), only node a is z that pass through node a while Line 33 updates D(c, z),
inserted into the set of predecessors on the shortest paths the shortest distance from node c to node z, to be the new
from node a to node z. shortest distance discovered through node a. Following a
After all the successor nodes are examined, in Line 22, similar reasoning to the first part of the algorithm, in Lines
the shortest path distance from node a to node z, D(a, z), is 34–35, node c is inserted into the priority queue Q_inc to
updated to hold the value of the running min, myMin. In see if there are any shortest paths that might use the
Line 24, node a is inserted into the priority queue that holds shortest paths from node c to node z as their subpaths to
the distances to node z in ascending fashion. A node is reach to node z.
inserted into this priority queue only if its shortest distance When an equivalent shortest path from node c to node
to node z is changed and if it is still able to reach to node z is found (Line 36) and node c is not an element of Aff-
z with the possibility of discovering a shorter path from Vert, the algorithm goes through steps that are similar to
node a to node z as the update continues to propagate in the those when a new shorter path is found. First, check if the
network. node pair \c, z[ has been updated before during the
123
Soc. Netw. Anal. Min. (2014) 4:235 Page 7 of 23 235
propagation of the current network update. If it is the first node a to node z, but cannot be accessed now because some
time the shortest paths from node c to node z are to be of the pointers are broken when the current shrinking
updated, to keep track of the original values, the previously network update started propagating in the network. Such
known shortest path count from node c to node z is inserted nodes are processed in the second half of the CLEARBET-
into rold. In addition, the betweenness values of the pre- WEENNESS algorithm (Lines 8–15). These nodes are inserted
decessors on the shortest paths from node c to node z are into trackLost before they are cleared later in the DELE-
reduced as required by calling the CLEARBETWEENNESS TEUPDATE algorithm when the predecessors of the invali-
algorithm in Line 37 of the DELETEUPDATE algorithm. dated shortest paths are cleared. Lines 9–15 of the
However, unlike the case when a strictly shorter path is CLEARBETWEENNESS algorithm go through the entries of
found, in this case the previously known predecessors on trackLost to find the previously existing intermediates that
the shortest paths from node c to node z are unclear. Fol- cannot be found otherwise. Such intermediates are usually
lowing a similar reasoning to the earlier parts of the lost as a part of a subpath that is lost before the current
algorithm, in Lines 39–40, node c is inserted into the pri- network update propagated this far.
ority queue Q_inc to see if there are any shortest paths that Assume that there is a path in the following form:
might use the shortest paths from node c to node z as their a….?….x.…?…. interm …?….y…?…..z. By the time
subpaths to reach to node z. the propagation of the shrinking network update reaches
the level of the shortest paths from node a to node z, the
3.4 ClearBetweenness procedure subpath(s) from node x to y might have already been
updated and the node interm may not necessarily be an
Next, CLEARBETWEENNESS algorithm (Algorithm-3 in intermediate on the shortest path(s) from node x to node
‘‘Appendix’’) is discussed. It is invoked from the DELE- y anymore. Hence, the node interm cannot be found as an
TEUPDATE algorithm when a shortest path is invalidated. intermediate for any shortest path from node a to node
The betweenness values of the intermediates on the z although the node pair \a, z[ still has a contribution on
invalidated path(s) should be reduced before their rela- the betweenness centrality value of the node interm. This is
tionships with the previously known shortest paths become why such cases are tracked in a separate data structure (e.g.
intractable. In other words, the main functionality of the trackLost) and the betweenness values of such nodes are
CLEARBETWEENNESS algorithm is to reduce the betweenness reduced later as required (Line 13 of CLEARBETWEENNESS).
values of the intermediates that lie on the old set of shortest In this example, it was known that node interm was an
paths from node a to node z, where nodes a and z are given intermediate on the shortest path(s) from node x to node
as the input parameters to the algorithm. The CLEARBET- y. However, it was not obvious that node interm was an
WEENNESS algorithm is invoked only once for each node pair intermediate on a shortest path from node a to node z. In
\a, z[ when it is first discovered that the shortest Line 14 of the CLEARBETWEENNESS algorithm, the infor-
path(s) from node a to node z needs to be updated during mation that interm is a lost intermediate of the shortest
the propagation of the current shrinking network update. path(s) from node a to node z is stored by inserting a tuple
The first part of the CLEARBETWEENNESS algorithm (Lines of a, z[, interm[into trackLost. To avoid processing the
1–7) processes the old set of intermediate nodes that are same node multiple times, a list of prior lost nodes that are
currently accessible when the shortest path(s) from node already processed is stored in the AlreadyDone set.
a to node z is traced. The betweenness centrality value of
each intermediate node v is reduced by the contribution of 3.5 AdjustNPs procedure
the shortest path(s) from node a to node z (Line 6) and
inserted into trackLost. Before their betweenness centrality Next, the ADJUSTNPS algorithm (Algorithm-4) is discussed.
values are modified, check if the intermediate node The ADJUSTNPS algorithm is called from the DELETEBET-
v belongs to the original set of intermediates that existed WEENNESS algorithm after all the shortest path distances and
before the network update or if it is one of the new inter- the predecessors on the shortest paths are updated accu-
mediates that became accessible due to currently partially rately. The main functionality of the ADJUSTNPS algorithm
updated shortest paths from node a to node z. If node v is a is to accurately update the number of shortest for all
new intermediate node, it is skipped without further pro- modified shortest paths, either by length or number. The
cessing (Line 4). An intermediate node v satisfies the fol- ADJUSTNPS algorithm loops through the list of node pairs
lowing equality if it is an old intermediate: Dold (a, \a, z[ whose shortest paths changed in terms of length or
v) ? Dold (v, z) = Dold (a, z). number. Such node pairs are stored in rold during the
Other than the nodes that are still accessible by fol- course of a shrinking network update propagation (Line 1).
lowing the currently known predecessors, there might be There are two corner cases for the number of shortest paths.
some other nodes that were once on the shortest paths from If the nodes a and z are the same, then r (a, z) = 1 by
123
235 Page 8 of 23 Soc. Netw. Anal. Min. (2014) 4:235
definition. If the distance from node a to node z is unde- where we are comparing two distances with a less than
fined (e.g. ?), then there is no shortest path from a to z, equality. To give a more precise example, in the DELE-
and r (a, z) = 0. Lines 2–5 of the ADJUSTNPS algorithm TEUPDATE algorithm provided in the ‘‘Appendix’’, we
handle these two conditions. modify Line 16, Line 20, Line 28, and Line 36 to include
The rest of the algorithm is the core of the ADJUSTNPS conditions to check that the new distance found is less
algorithm and handles the common case. The key idea is than k.
the following. If, in a network, it is possible to reach from
node a to node z via multiple paths, then each path can be 3.8 Discussion on algorithmic complexity
reconstructed separately following the predecessors on the
shortest paths. Hence, starting with the destination node z, In this section, the time and memory complexities of the
each of the predecessors is pushed onto a stack at every proposed algorithms are discussed. Earlier, it has been
level until the source node a is reached. The number of shown that an incremental algorithm can perform asymp-
times the source node a is hit is equal to the number of totically no better than its static counterpart for some
distinct routes one can take, which is represented as the dynamic problems (Even and Gazit 1985) because in the
number of shortest paths from node a to node z. Every time worst case an incremental algorithm needs to solve the
an intermediate node is processed, it is checked to see if entire problem from scratch. In the worst case, the pro-
node a is hit. If so, then the counter is incremented (Lines posed incremental betweenness centrality algorithms’
15–16; Lines 20–21). complexity is not any lower than that of Brandes algorithm
(2001).
3.6 AdjustBetweenness procedure To be able to give a comprehensive analysis, we discuss
the complexities of the algorithms proposed in this paper as
The final step required for completing a shrinking network well as the algorithms in Kas et al. (2013). The
update is the ADJUSTBETWEENNESS algorithm (Algorithm-5). INCREASEBETWEENNESS procedure runs a for loop for rold
It loops through the list of node pairs \a, z[ where the many iterations and inside the outer for loop, there is one
shortest paths from node a to node z changed in terms of for loop, and one while loop. These two loops should be
length or number. For every node pair, the shortest path considered in combination because the intermediate nodes
distances, the intermediates on the shortest paths, and the on the shortest paths from src to dest are handled by one or
number of shortest paths are already computed before the the other and the distinction is irrelevant. The complexity
ADJUSTBETWEENNESS algorithm is called. The ADJUSTBET- of the bodies of these loops is O(1), and they are executed
WEENNESS algorithm increments the betweenness value of once for each intermediate node. So, the overall complexity
each intermediate node n by the fraction of the shortest of the procedure is O(|rold| I) where I represents the total
paths from node a to node z it lies on. With this step, the number of intermediates processed for all node pairs listed
incremental update of betweenness centralities for the in rold. In the REDUCEBETWEENNESS procedure, the run time
shrinking network updates is complete. is dominated by the if block at the end (Lines 25–29 of
Algorithm-3). This block performs a search over the map
3.7 Discussion on algorithmic modifications of all known intermediate nodes on the shortest paths from
for computing k-betweenness centrality a to z and uses two intermediates at a time to form the key
to the map. Hence, its complexity is O(I2a,z) where Ia,z
In this section, the modifications of the betweenness cen- represents the number of intermediates on the shortest
trality algorithm to handle the case of k-betweenness cen- paths from a to z.
trality are discussed. In the case of any path-based metric, The overall complexity of the INSERTUPDATE procedure
the part of the algorithm that is responsible from path is dominated by the complexity of the priority queue
discovery should be modified to limit the shortest paths to Workset. The Workset is used to track all the affected
be less than or equal to a specified maximum total cost, nodes as the propagation of the update progresses. INSER-
k. Path discovery is responsible for identifying paths that TUPDATE essentially performs a traversal in the neighbor-
are shorter than the previously known paths. When a hood of every AffectedSink and AffectedSource. The work
shorter path is discovered, this newly discovered shortest performed inside the while loop is O(||Affected|| log
path is tracked for further processing as a part of the path ||Affected||) ? I2) where ||Affected|| is used to denote the
discovery. sum of the number of the edges and the nodes in the
To compute k-betweenness centrality, our algorithm was subgraph formed by AffectedSource and AffectedSink
extended to check for the distance being less than or equal nodes’ neighborhoods. Finally, the INSERTEDGE procedure
to k. Since we are interested in tracking paths that are less invokes the INSERTUPDATE procedure for each AffectedSink
than k for our computations, this check is inserted in places and AffectedSource node once, followed by an invocation
123
Soc. Netw. Anal. Min. (2014) 4:235 Page 9 of 23 235
of the INCREASEBETWEENNESS procedure, yielding O((|Af- For the ADJUSTBETWEENNESS algorithm, the overall
fectedSink| ? |AffectedSource|)||Affected||log complexity of the procedure is O(|rold| I) where I represents
||Affected||) ? I2 ? |rold| I) time complexity overall. the total number of intermediates processed for all node
Next, we discuss the time complexities of the algorithms pairs listed in rold. For the ADJUSTNPS algorithm, the time
proposed for handling shrinking network updates. Starting complexity analysis resembles that of the INCREASEBETWE-
with the REDUCEBETWEENNESS and INCREASEBETWEENNESS ENNESS. The ADJUSTNPS algorithm runs a for loop for rold
algorithms, the INCREASEBETWEENNESS algorithm runs a many iterations. Inside this outer for loop, there is one for
for loop for rold many iterations and inside the outer for loop and one while loop. These two loops should be con-
loop, there is one for loop, and one while loop. These two sidered in combination because the intermediate nodes on
loops should be considered in combination because the the shortest paths from the source to the destination are
intermediate nodes on the shortest paths from the source to handled by one or the other loop and the distinction is
the destination are handled by one or the other loop and the irrelevant. The complexity of the bodies of these loops is
distinction is irrelevant. The complexity of the bodies of O(1), and they are executed once for each intermediate
these loops is O(1), and they are executed once for each node. Therefore, the overall complexity of the procedure is
intermediate node. So, the overall complexity of the pro- O(|rold| I) where I represents the total number of interme-
cedure is O(|rold| I) where I represents the total number of diates processed for all node pairs \a, z[ listed in rold.
intermediates processed for all node pairs listed in rold. In Similarly, the time complexity analysis of the
the REDUCEBETWEENNESS algorithm, the run time is domi- CLEARBETWEENNESS algorithm is in line with the time com-
nated by the if block at the end (Lines 25–29 of plexity analysis of the REDUCEBETWEENNESS algorithm. The
REDUCEBETWEENNESS). This block performs a search over run time of the CLEARBETWEENNESS algorithm is dominated
the map of all known intermediate nodes on the shortest by the if block at the end (Lines 9–15 of CLEARBETWEEN-
paths from node x to node z and uses two intermediates at a NESS). This block performs a search over the map of all
time to form the key to the map. Hence, its complexity is known intermediate nodes on the shortest paths from node
O(I(x, z)2) where I(x, z) represents the number of inter- a to node z (nodes a and z are the two parameters given as
mediates on the shortest paths from node x to node z. input to the algorithm) and uses two intermediates at a time
The overall run-time complexity of the INSERTUPDATE to form the key to the map. Hence, its complexity is O(I(a,
algorithm is dominated by the complexity of the priority z)2) where I(a, z) represents the number of intermediates on
queue Workset. The Workset is used to track all the the shortest paths from node a to node z.
affected nodes as the shrinking network update keeps In the DELETEUPDATE algorithm, the time complexity of
propagating. The INSERTUPDATE algorithm essentially per- the second phase of the algorithm dominates the time
forms a traversal in the neighborhood of every Affected- complexity of the first phase of the algorithm. The time
Sink and AffectedSource, respectively. The work complexity of the second phase of the algorithm is gov-
performed inside the while loop is O(||Affected|| log erned by the time complexity of the AffVert and Q_inc
||Affected||) ? I2) where ||Affected|| is used to denote the priority queues maintained by the algorithm. The Q_inc
sum of the number of the edges and the nodes in the priority queue is a subset of the AffVert as only certain
subgraph formed by AffectedSource and AffectedSink elements are added into it conditionally in Line 24 of the
nodes’ neighborhoods. Finally, the INSERTBETWEENNESS DELETEUPDATE algorithm. The operations performed on the
algorithm invokes the INSERTUPDATE algorithm for each while block starting on Line 10 and the while starting on
AffectedSink and AffectedSource node once, followed by a Line 25 are similar in terms of time complexity. However,
call for the INCREASEBETWEENNESS algorithm, yielding in the worst case, all the elements in the AffVert are added
O((|AffectedSink| ? |AffectedSource|) ||Affected|| log into Q_inc. In the first while block the successors of each
||Affected||) ? I2 ? |rold| I) time complexity overall. element, and in the second while block, the predecessors of
For handling the shrinking network updates, the each element are checked, which results in probing of the
DELETEBETWEENNESS algorithm is the entry point for exe- two-hop neighborhood of the affected vertices with a time
cution. It has multiple invocations of DELETEUPDATE and at complexity of O(||AffectedVertices||2,z) where the subscript
the end; it also calls the ADJUSTNPS and ADJUSTBETWEEN- 2 denotes the size of two-hop neighborhood of all affected
NESS algorithms. The DELETEUPDATE algorithm calls the nodes and z refers to the last parameter of the algorithm
CLEARBETWEENNESS algorithm, which as a result has (i.e. the node to which the shortest path distances are
a contribution to the overall time complexity of the DELE- updated). The insertions into and extractions from a pri-
TEUPDATE algorithm. Hence, the ADJUSTNPS, the ority queue are on the order of log n for a priority queue of
ADJUSTBETWEENNESS, and the CLEARBETWEENNESS algo- size n. Hence, these operations will take O(|AffectedVer-
rithms’ time complexities are independent of the other tices| log|AffectedVertices|) time in total. The CLEARBET-
algorithms’ time complexities. WEENNESS algorithm, in the worst case, might be called for
123
235 Page 10 of 23 Soc. Netw. Anal. Min. (2014) 4:235
all the elements in the AffectedVertices, covering all pos- searched around each node no longer increases with of the
sible AffectedSource–AffectedSink node pairs, which number of nodes. If that is strictly true, then the compu-
would result in O(I2) time complexity where I represents tational complexity of the k-betweenness centrality will be
the total number of intermediates processed for all node O(||Affected|| n), which becomes O(n) when ||Affected|| is
pairs listed in rold. Hence, the overall time complexity of independent of n. The size of the constant Affected region
the DELETEUPDATE algorithm is O(||Affected||2,z ? |- will grow as k is increased, eventually reaching the com-
Affected| (log |Affected| ? I2) ? I2 ? |rold| I) where the plexity of the original betweenness centrality algorithm.
set of affected nodes (Affected) is given by the combina- And, because it is the set of shortest paths out of every
tion of AffectedSink and AffectedSource nodes. In general, node that must be maintained from one iteration to the
because real-world social networks tend to be very sparse, next, the memory complexity of the algorithm is also
the computational complexity of the proposed incremental O(||Affected|| n), which for very large social networks
algorithm is frequently much less than repeated invocations becomes O(n). Therefore, the proposed incremental
of Brandes’ algorithm. However, in terms of the upper k-betweenness centrality algorithm has the potential to be
bound worst-case time complexity, the incremental used for over-time analysis of extremely large and sparse
betweenness algorithm does not do better than the Brandes’ social networks; e.g., Facebook and Twitter.
betweenness algorithm. This is because, in the worst case,
the size of the Affected set can be as large as the entire
node set of the network, and rold can be on the order of 4 Implementation, datasets, and results
O(n2). To assess the practical advantages of the proposed
algorithms, we evaluate its performance relatively to 4.1 Implementation environment
Brandes’ algorithm for a range of synthetic and real-world
social network data sets in the next section. The algorithms were implemented on top of an open
The run-time and memory consumption analyses done source, dynamic Java graph library called GraphStream
for the incremental betweenness centrality algorithm are (GraphStream Team 2010). The performance results were
valid for the incremental k-betweenness algorithm as well. collected on a machine with a 3.20-GHz Intel Xeon CPU
However, a lot of the complexities are expressed in terms with 256 GB of RAM.
of the sizes of affected sink and source node sets, the
number of shortest paths, and the average shortest path 4.2 Performance results on synthetic networks
length. When the shortest paths in a network are bounded
to remain within a distance of k, the percentages of affected To be able to collect a set of results with well-controlled
nodes drop substantially as shown later in the results sec- network parameters, synthetic social networks were created
tion. This is because some of the nodes that were connected using four well-established stylized social network pat-
by long, multi-hop paths before are not necessarily con- terns, three different numbers of nodes, and three different
nected anymore when the paths connecting them cannot average nodal degrees; in particular, preferential attach-
have a cumulative cost higher than k. For instance, the ment networks (PF) (Barabasi and Albert 1999), Erdos–
percentages of affected nodes drop from 0.89–68.31 % Renyi (ER) networks (Renyi and Erdos 1959), small-world
range to 0.32–3.47 % range for the incremental between- (SW) networks (Watts and Strogatz 1998), and directed
ness centrality when k = 3 is used as the bounding cycles. Directed cycles are formed as a variant of small-
parameter on typical binary social network data. Therefore, world networks using the same parameters and the network
k-betweenness centrality offers significant performance generation algorithm. The only difference between directed
improvements especially for very sparse networks such as cycles and small-world networks is that in directed cycles,
retweet networks. And, researchers have already found that the edges are directed while in small-world networks the
k = 3 on binary social networks is often sufficient to make edges are undirected or bidirectional. Therefore, when the
a good approximation to actual betweenness centrality edges are directed in one direction only, the resulting
rankings (Pfeffer and Carley 2012). network is a big ring of nodes, with random bounds in it.
Moreover, one of the typical properties of real-world Such networks have been studied in the literature in dif-
social networks is that the degree (number of edges into, ferent contexts before and appear under various names
in-degree, or number of edges out of a node, out-degree) including ‘directed small world networks’, ‘asymmetric
tends to follow a distribution based on the type of nodes, small world networks’, ‘rings with random bounds’, and
but that for large networks does not grow as the number of ‘directed cycles’ (Ramezanpour and Karimipour 2008; Xu
agents grows. This is crucial because it suggests that in the 2008; Liao et al. 2011; Zhu et al. 2004). In the rest of this
limit of very large social networks, the size of the bounded- paper, these networks are referred to as ‘directed cycles’.
distance network of precursors and successors that must be First, for each synthetic network topology, the number of
123
Soc. Netw. Anal. Min. (2014) 4:235 Page 11 of 23 235
nodes is varied from 1,000 to 5,000 with a step size of separately to prior literature. We generate these synthetic
2,000, and the average degree is fixed to 6. For small-world networks with all but 100 edges that are selected randomly.
networks, the rewiring probability is 0.5. The last 100 edges were inserted incrementally to generate
For growing networks, the synthetic networks were the average update performance in terms of execution time
generated with all but 100 edges that are selected ran- over the repeated invocations of Brandes’ algorithm. Similar
domly; and the last 100 edges were inserted incrementally experiments have been designed for measuring the perfor-
to get the average update performance in terms of the mance of the proposed incremental centrality algorithms for
execution time over repeated invocations of Brandes’ shrinking network updates as well. In the experiments per-
algorithm. That is, 100 runs of the incremental between- formed for shrinking network updates, the starting point is the
ness algorithm were averaged and 100 runs of Brandes’ complete version of the network and the test incrementally
algorithm were averaged. For example, for a single update removes the same set of randomly selected edges used in the
on the 1,000-node Erdos–Renyi network, the incremental experiments for growing network updates.
betweenness algorithm is 7.99 times faster on average than Tables 3 and 4 list the average speedups obtained by the
Brandes algorithm is on average when performing exactly incremental betweenness centrality algorithm over the
the same 100 analyses. Table 1 lists basic statistical Brandes’ algorithm both for the growing and the shrinking
information about the synthetic networks generated with network updates along with the percentages of the total
different topology generation models and network sizes. In number of nodes that are affected. The percentages of
the second set of synthetic networks, for each synthetic affected nodes are calculated in terms of the size of the
network topology, the number of nodes is fixed to 3,000, combination of the two sets, AffectedSinks and Affected-
and the average number of nodes is varied from 4 to 8 with Sources. The standard deviations for the performance
a step size of 2. Table 2 lists statistical information for the improvements are added in parenthesis for each corre-
networks with varied average number of nodes, similar to sponding average performance value. The performance
the information presented in Table 1. results in Table 3 show how the performance improve-
ments of the incremental betweenness centrality algorithm
4.3 Betweenness centrality results change along with the changing network size and topology.
On the other hand, Table 4 reports the changes in the
The proposed betweenness algorithm can run growing and performance improvements along with the changing aver-
shrinking network updates in any order or combination. The age node degree and the network topology.
reason that the performance improvements are shown sepa- The performance results presented in both tables indi-
rately in the following tables is to provide opportunities for cate that the incremental betweenness algorithm provides
comparison of the performances of the incremental centrality the highest performance benefits on the preferential
algorithms for the shrinking and growing network updates attachment networks and the lowest performance
123
235 Page 12 of 23 Soc. Netw. Anal. Min. (2014) 4:235
Table 3 Performance improvements of the incremental betweenness algorithm over repeated invocations of Brandes’ betweenness algorithm
obtained over networks with different sizes and topologies
Topology Growing network updates Shrinking network updates
Betweenness speedup Betweenness Betweenness speedup Betweenness
affected % affected %
improvements on the directed cycles. Directed cycles the performance obtained; the longer the shortest paths are
provide examples of pathological test cases for the incre- in a network, the lower the performance improvements
mental betweenness centrality algorithm because they tend achieved in that network.
to have very long diameters and relatively large average In the preferential attachment networks, the character-
(characteristic) shortest path lengths both of which increase istic path length and the diameter are substantially lower
the lengths of the shortest paths to be updated. compared to the other topologies. And, the shortest paths in
Comparing the network statistics and the performance the directed cycles are substantially longer, yielding per-
results obtained on different networks, the speedup formance results that are much lower compared to the other
obtained using the incremental betweenness update algo- network types as shown in Table 3.
rithm increases with the increased network size. It is also Apart from the lengths of the shortest paths, the aver-
observed that other parameters such as network diameter, age betweenness value (unscaled) and the network size
characteristic path length, and average/min/max unscaled are also very important factors, especially the network
betweenness centrality values are inversely correlated with size. As mentioned earlier, the performance benefits of the
123
Soc. Netw. Anal. Min. (2014) 4:235 Page 13 of 23 235
Table 4 Performance improvements of the incremental betweenness algorithm over repeated invocations of the Brandes’ algorithm obtained on
networks with different topologies and different average node degrees
Topology Avg deg. Growing network updates Shrinking network updates
Betweenness speedup Betweenness Betweenness speedup Betweenness
affected % affected %
incremental betweenness update algorithm increase with network data sets. There are two main trends that stand out:
the increasing network size. When the average between- (1) the incremental k-betweenness centrality algorithm
ness values are examined, it is observed that the differ- provides higher performance improvements for the grow-
ence across different topologies is very large. This is ing network updates than they do for the shrinking network
because in preferential attachment networks, compared to updates, and (2) the performance benefits of the incre-
other network topologies, there are fewer intermediate mental k-betweenness algorithm increase with the
nodes that are on the shortest paths. Hence, when there is increasing network size. All of these observations are in
a network update, there are fewer nodes whose between- line with the results presented earlier.
ness values should be adjusted. When most of the shortest However, there is one major difference that can only be
paths in a network consist of a couple of hops only (i.e., observed in the behavior of k- betweenness centrality.
when it takes a social agent only a number of interme- When the shortest paths are not limited by a distance of k,
diaries to reach out to the majority of the network), then the maximum performance benefits are obtained on the
there are fewer nodes that lie on the shortest paths and the preferential attachment networks. However, in k-
overall depth of the shortest path tree is shorter. As a betweenness, the incremental centrality algorithms provide
result, there are fewer predecessors to be tracked and the minimum performance benefits in the preferential
maintained when there is need for reconstructing the attachment networks and the maximum performance ben-
shortest paths in the network. efits in the small-world networks. In other words, the
ordering of the topologies in terms of how much they
4.4 k-Betweenness centrality results benefit from the incorporation of the incremental approach
is reversed when the shortest paths are limited within a
Next, the performance of the incremental k-betweenness distance of k both for the incremental algorithms and the
centrality algorithms is examined both for shrinking and baseline algorithms.
growing network updates. In the performance results pre- The performance benefits of the incremental k-
sented in Tables 5 and 6, k = 2 and k = 3, respectively. betweenness algorithm depend primarily on how much
The bounded-distance version of the Brandes’ algorithm work it can avoid. In preferential attachment networks, the
(Brandes 2001) was used as the reference point for all shortest paths are mostly composed of a couple of hops,
comparisons. In Tables 5 and 6, the same experiments as with relatively smaller average characteristic path length
presented in Tables 3 and 4 were repeated. and network diameter compared to the other network
Considering the performance results provided in topologies. With the introduction of the limiting parameter
Tables 5 and 6, it is observed that the incremental cen- k, the shortest paths become even shorter but the amount of
trality algorithms provide performance benefits that are on work that can be avoided is not as high as the work that can
the order of thousands of times faster for many social be avoided in other network topologies.
123
235 Page 14 of 23 Soc. Netw. Anal. Min. (2014) 4:235
The highest performance benefits are obtained on decrease in the percentage of the nodes that are affected by
directed cycles for k-betweenness centrality algorithm. In the incremental network updates. Consider the percentages
directed cycles, some nodes are connected by the shortcuts of the networks affected by the incremental network
in the network formed by rewiring. However, there are also updates as listed in Tables 5 and 6. When the shortest paths
other node pairs whose shortest paths do not include any are not bounded by any k distance limits, the percentages of
shortcut edges and such shortest paths tend to be very long. the affected nodes are in the range of 42.07–68.32 and
When the shortest paths are confined to the 2-hop or 3-hop 36.98–72.89 % (if only the Erdos–Renyi, small-world, and
neighborhood of each node for binary data, most of the directed cycles networks are accounted for). These ranges
very long shortest paths are eliminated from computations. extend to 1.16–68.32 and 0.89–72.89 % if the preferential
In addition, how far a network update can propagate across attachment networks are included as well. When the
the network is very limited as well (limited to the k dis- shortest paths are restricted to remain within a distance of
tance neighborhood only). k, the percentages of the affected nodes remain within
The abovementioned factors result in a dramatic 0.11–3.91 % for all network types (Table 5) for k = 2
increase in the speedup that can be achieved by incre- while for k = 3, these affected percentages remain within
mental k-centrality algorithms as well as a dramatic 0.25–14.61 % (Table 6).
123
Soc. Netw. Anal. Min. (2014) 4:235 Page 15 of 23 235
When the results in Tables 5 and 6 are compared against node degrees (e.g. 4.0, 6.0, and 8.0). Considering the
one another, it is observed that the performance benefits of incremental k-betweenness results, the performance
the incremental k-centralities are higher when k = 2 and improvements provided by the incremental k-betweenness
the percentages of affected nodes are lower. However, in algorithm over the Brandes’ k-betweenness algorithm
most cases the total percentages of affected nodes are less decreases with the increasing average node degree.
than 10–15 %, mostly staying below 5 % of the entire When k = 2, the largest performance improvements
network. The underlying reasons are the same as those are observed in the directed cycles, followed by the Er-
resulting in high-performance benefits. When the shortest dos–Renyi networks, preferential attachment networks,
paths are confined to the distance k neighborhood, how far and small-world networks. This is also in line with the
a network can propagate across the network is also limited. percentage of the nodes affected by the incremental
Next, in Tables 7 and 8, the performance results of the updates. In all three different types of networks, the per-
incremental k-betweenness centrality for networks with formance improvements decrease with the increasing
varying average node degrees are presented, using the average node degree. This is because when there are more
k distance limits of k = 2 and k = 3, respectively. The immediate neighbors connected to each node, an incre-
performance results are presented in Tables 7 and 8 col- mental network update is inevitably felt by more nodes in
lected on networks with 3,000 nodes and varying average the network.
123
235 Page 16 of 23 Soc. Netw. Anal. Min. (2014) 4:235
Table 9 Topological features of real-life networks. ‘D? U?’ column displays information on the directionality of edges in the networks. D
represents directed networks while U represents undirected (bidirectional) networks
Network D? Nodes Edges Avg deg min deg Max deg Std. dev. deg Diameter Char. path Clustering
U? length coefficient
When k = 3, similar to the case where k = 2, the per- observed during a single time period. Observed interactions
formance improvements decrease with the increasing were mapped to edge cost for these data sets as follows. If
average node degree. However, when k = 2, most of the an interaction between two nodes x and y has been recorded
longer shortest paths are eliminated while k = 3 retains twice up to a certain point in time, then the edge x ? y has
some of the longer shortest paths. The impact of the the cost of 1/2. When a third update is recorded between
structure of the shortest paths become more visible and the x and y, then the cost of the edge x ? y is updated to be
performance improvements are lower than the performance 1/3. The cost of edges between non-interacting pairs is
improvements obtained when the limiting parameter k = 2. infinity. The performance of our incremental betweenness
While the specific values for the speedups obtained over update algorithm was compared against Brandes’ algo-
the baseline algorithms depend very much on the network rithm (Brandes 2001) to determine the speed up. Table 9
topology and the shortest paths in the networks, the key shows the characteristics of the real-life networks that were
takeaway point is that in the computations of the k- used to test the proposed algorithms.
betweenness centrality, the incremental approach provides We use five different real-life networks: SocioPatterns
the best performance improvements in directed cycles, (communication between conference attendees) (Isella
followed by the Erdos–Renyi, the preferential attachment 2011), Facebook-like (onlineforum communication
networks, and the small-world networks. This is the reverse between students) (Opsahl and Panzarasa 2009), HEP
of the trend observed in non-approximate versions of the Coauthorship Network (coauthorship relations between
betweenness centrality computed on the full graph and is High-Energy Physics researchers) (Kas et al. 2012b; Kas
related with the amount of work that can be pruned with the et al. 2013), P2P communication network (P2P file sharing)
incorporation of the incremental computation approach. (Gringoli 2009), and retweet network of tweets about the
In these results, small-world networks have a number of sanctions on Iran (Gringoli 2009).
properties that cause them to behave slightly differently
than the other network types. Small-world networks have
low average shortest path lengths, the networks are highly 4.6 Betweenness centrality results
clustered compared to the other network types, and their
edges are undirected, all of which help increasing the Tables 10 and 11 present the performance results of the
connectivity in the network. Hence, the k distance neigh- incremental betweenness algorithm collected on real-life
borhood of a node might be more crowded than it is for the networks. In the performance results presented in Table 10,
other network types that have the same number of nodes the networks were modeled as weighted networks while
and the same average node degree, which reflects as higher Table 11 presents performance results collected on the
portions of networks being affected and a higher number of unweighted (binary) versions of the same networks. Both
shortest paths to be updated. Tables 10 and 11 provide the percentages of the total
number of nodes that are affected by the network updates.
The performance values and the percentage of affected
4.5 Performance results on real-life networks nodes are different in unweighted versions of the networks,
when compared with the results collected on their weighted
Next, the performance of incremental betweenness cen- versions. Due to increased redundancy in the number of
trality algorithm is assessed using a number of real-life shortest paths in SocioPatterns, Email, and HEP Coau-
networks. The networks used in our evaluations are pre- thorship networks, the performance improvements of the
pared as weighted networks where the cost of an edge is incremental betweenness algorithm over the Brandes’
inversely proportional to the strength of relationship. In algorithm decreased while the percentages of the affected
many of these real-life networks, multiple interactions are nodes have increased substantially. On the other hand, the
123
Soc. Netw. Anal. Min. (2014) 4:235 Page 17 of 23 235
Table 10 Performance improvements of the incremental betweenness algorithm over repeated invocations of the Brandes’ algorithm, collected
on real-life networks
Network Growing network updates Shrinking network updates
Betweenness speedup Betweenness affected % Betweenness speedup Betweenness affected %
Table 11 Performance improvements of the incremental betweenness algorithm over the repeated invocations of the Brandes’ algorithm,
collected on real-life networks
Network Growing network updates Shrinking network updates
Betweenness speedup Betweenness affected % Betweenness speedup Betweenness affected %
performance results remain similar in weighted and because authorship networks are bidirectional and in
unweighted version of the Twitter and P2P networks where authorship networks there are several small cliques that
the redundancy of shortest paths is not a significant factor. overlap to some extent and form densely connected com-
When comparing the performance results obtained on munities. Hence, the changes on the network have wide-
the weighted and unweighted versions of the networks, it is spread impact, which causes the incremental network
observed that the percentages of affected nodes for updates that may propagate farther than they would in other
shrinking network updates demonstrate different behaviors. types of networks.
For betweenness centrality, apart from the information on
the shortest distances the number of equivalent shortest 4.7 k-Betweenness centrality results
paths from one node to another is also important. Assume
that an edge e which lies on a shortest path from a node x to Next, the performance of the incremental k-betweenness
another y is removed. When one of the shortest paths from centrality algorithm on real-life networks is discussed. For
a node x to node y changes, the betweenness values of all the performance evaluations of the k-betweenness central-
the intermediate nodes on the shortest paths from node x to ity algorithm, the unweighted (binary) versions of the real-
node y change. life networks are used. In the performance results presented
In Fig. 1, the speedup obtained relative to Brandes’ in Tables 12 and 13, the bounding parameter k is set as
algorithm for the five real-life data sets is shown on the k = 2 and k = 3, respectively. The baseline algorithm (the
same speedup vs. # nodes axes as the small-world, Erdos– Brandes’ algorithm) is also bounded by maximum distance
Renyi, and scale-free synthetic networks. From this figure, of k.
it appears that the Iran Retweet network, the Facebook Considering performance results presented in Tables 12
Forum network, and the P2P network are all similar (to first (k = 2) and 13 (k = 3), it is observed that performance
order) to the scale-free synthetic networks. The sociopat- improvements of the incremental k-centrality algorithms
terns RFID network is much smaller than the simulated are higher when the bounding parameter k is lower in
results, so it is difficult to interpret. And, the HEP coauthor general. However, the specific performance values are
network achieves substantially lower speedup than would affected by the amount of work that can be avoided by
be suggested by any of the synthetic network styles. This is setting k to a lower value. For instance, in the Email and
123
235 Page 18 of 23 Soc. Netw. Anal. Min. (2014) 4:235
Table 12 Performance benefits Topology Growing network updates (k = 2) Shrinking network updates (k = 2)
of the incremental k-centrality
algorithms obtained on real-life Betweenness Betweenness Betweenness Betweenness
networks and the portions of the speedup affected % speedup affected %
networks affected by these
changes (k = 2) SocioPatterns 10.17x 43.85 12.66x 43.90
Twitter (Iran) 9,808.72x 0.29 7,876.06x 0.31
Email 327.61x 3.72 118.77x 27.85
HEP Coauthor 38,011.02x 0.22 6,958.46x 1.54
P2P 53,980.70x 0.03 26,442.56x 0.04
Table 13 Performance benefits Topology Growing network updates (k = 3) Shrinking network updates (k = 3)
of the incremental k-centrality
algorithms obtained on real-life Betweenness Betweenness Betweenness Betweenness
networks and the portion of the speedup affected % speedup affected %
networks affected by these
changes (k = 3) SocioPatterns 8.23x 44.14 9.66x 44.17
Twitter (Iran) 9,764.90x 0.31 6,949.98x 0.33
Email 34.04x 30.26 30.69x 39.74
HEP Coauthor 3,640.68x 1.55 845.69x 7.35
P2P 52,633.31x 0.04 29,980.52x 0.04
123
Soc. Netw. Anal. Min. (2014) 4:235 Page 19 of 23 235
Fig. 2 Incremental
betweenness centrality
algorithm execution time per
edge insertion in milliseconds
over the number of edges
inserted. The number of edges is
represented in thousands where
200 on the x axes represented
200,000 edges. The execution
time per edge insertion is given
in milliseconds, 30,000
represents the 30 s
4.8 Scaling of incremental algorithm the shortest path lengths limits the Affected region around
a node, which also limits the data that needs to be kept
In Sect. 3, we discussed the scaling of the incremental from one iteration to the next. For this reason,
algorithm with the number of nodes, the number of edges, k-betweenness centrality can handle problems with mil-
and the average number of nodes along the shortest paths. In lions of nodes and 10’s of millions of edges. The exact size
this section, we carried out a practical experiment to see how limit for incremental k-betweenness centrality depends on
the physical machine limits would affect scaling for very the value of k and the precise structure of the social
large data sets. This is a concern in any incremental algo- network.
rithm because the historical data stored from previous
evaluations may become large enough to cause the machine 4.9 Comparison with QuBE algorithm (Lee et al.
to run out of memory (RAM) and to begin to store inter- 2012a, b)
mediate results in the cache storage on a disk drive. Because
the latency for access to the disk drive is much higher than The idea of the QuBE algorithm depends on estimating
the latency for access to the RAM, this can cause a dramatic the nodes whose betweenness values might change due to
increase in execution time/edge insertion in any incremental an update in a network while avoiding computation of all-
algorithm. To carry out this exploration a large Preferential pairs shortest paths. In contrast, the presented algorithm
Attachment network was synthesized as described above. depends on dynamic maintenance of all-pairs shortest
However, edge insertion was continued allowing the overall paths and the related auxiliary data. The QuBE algorithm
network to grow while keeping the average nodal out-degree covers edge insertions/deletions, leaving out node inser-
constant at approximately 6 edges/node. tions for growing networks and edge cost modifications
In Fig. 2, we see that the execution time per evaluation for weighted network types. In contrast, the presented
of the incremental algorithm abruptly increases when the algorithm supports node/edge insertions and deletions as
network reaches about 180,000 edges (and about 30,000 well as edge cost modifications for both binary and
nodes). This is primarily because the size of the informa- weighted networks.
tion stored to allow incremental updates is too large for the Providing support for weighted networks makes the
DRAM (in this case 256 GB). In general, when the net- algorithm more complex. For example, assume that there is
work starts becoming very large, the number of node pairs a path from x to y. Then, with a network update an edge
that are kept in rold, Dold, and trackLost also become larger. from node x to y is inserted into the network. In binary
In addition, each node can connect to a higher number of networks, it is obvious that no path between x and y can be
nodes, which in turn requires larger memory requirements smaller than a direct edge between x and y, and several
for the information to be stored for each node. This is one changes on the shortest paths can be maintained by con-
motivation for employing the incremental k-betweenness sidering the distance limit. However, in weighted networks,
centrality approximation for problems that are too large for when an edge from x to y is inserted, it is still necessary to
the incremental betweenness centrality algorithm. As dis- check the paths of equivalent length before ruling out all
cussed in Sect. 3, the k maximum distance constraint on previously known shortest paths between x and y.
123
235 Page 20 of 23 Soc. Netw. Anal. Min. (2014) 4:235
The algorithm presented in this paper is compared supporting addition and deletion of edges and nodes as well
against the QuBE algorithm using two of the datasets the as changes in edge weights. The goal is to avoid re-com-
authors used in their paper (Lee et al. 2012a, b); specifi- putations involved in the analysis of dynamic social net-
cally, the dataset on which QuBE performs the best (Eva) works and reflect changes triggered by a network update as
and the dataset on which QuBE performs the worst efficiently as possible. For application to extremely large
(CAGrQc). Table 14 reports the average performance social network data sets, an incremental k-betweenness
results for 100 random updates on the networks. Both centrality algorithm was also presented. Our performance
QuBE and the presented algorithm are compared against results indicate substantial performance improvements for
the Brandes’ algorithm as baseline. The presented algo- both the incremental betweenness centrality algorithm and
rithm performs 10–30 times better than the QuBE algo- the incremental k-betweenness centrality algorithm over
rithm while providing substantial improvements over the state of the art on realistic social network data.
Brandes’ algorithm. Additional analyses of speedup and
memory consumption are presented in (Kas 2013b). Acknowledgments This work is supported in part by the Defense
Threat Reduction Agency (HDTRA11010102), and by the center for
Computational Analysis of Social and Organizational Systems (CA-
SOS) at Carnegie Mellon. The views and conclusions contained in
5 Conclusion this document are those of the authors and should not be interpreted as
representing the official policies, either expressed or implied by the
DTRA or the US government. This work is done when Miray Kas was
This paper proposes an incremental betweenness algorithm a Ph.D. student in Carnegie Mellon University’s Electrical and
that performs dynamic maintenance of betweenness values Computer Engineering department.
123
Soc. Netw. Anal. Min. (2014) 4:235 Page 21 of 23 235
Appendix
123
235 Page 22 of 23 Soc. Netw. Anal. Min. (2014) 4:235
123
Soc. Netw. Anal. Min. (2014) 4:235 Page 23 of 23 235
Demetrescu C, Italiano GF (2004) A new approach to dynamic all Lee MJ, Lee J, Park JY, Choi R, Chung CW (2012b) QUBE: a quick
pairs shortest paths. J ACM (JACM) 51(6):968–992 algorithm for updating BEtweenness centrality. In: Proceedings
Demetrescu C, Italiano GF (2006) Experimental analysis of dynamic of the 21st international conference on World Wide Web
all pairs shortest path algorithms. ACM Trans Algorithms (WWW). ACM, Lyon, pp 351–360
(TALG) 2(4):578–601 Lerman K, Ghosh R, Kang JH (2010) Centrality metric for dynamic
Dijkstra E (1959) A note on two problems in connexion with graphs. networks. 8th workshop on mining and learning with graphs
Numer Math 1(1):269–271 (MLG). ACM, pp 70–77
Even S, Gazit H (1985) Updating distances in dynamic graphs. Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution:
Methods Oper Res 49:371–387 densification and shrinking diameters. ACM Trans KDD
Floyd R (1962) Algorithm 97: shortest path. Commun ACM 5(6):345 1(2):1–41
Fredman ML, Tarjan RE (1984). Fibonacci heaps and their uses in Leskovec J, Huttenlocher DP, Kleinberg JM (2010) Governance in
improved network optimization algorithms. 25th annual sympo- social media: a case study of the Wikipedia promotion process.
sium on foundations of computer science. IEEE, pp 338–346 The international AAAI conference on weblogs and social media
Freeman LC (1977) A set of measures of centrality based on (ICWSM)
betweenness. Sociometry 40(1):35–41 Liao W, Ding J, Marinazzo D, Xu Q, Wang Z, Yuan C et al (2011)
GraphStream Team (2010) GraphStream. Retrieved 3 February 2012. Small-world directed networks in the human brain: multivariate
http://graphstream-project.org/ Granger causality analysis of resting-state fMRI. Neuroimage
Green O, McColl R, Bader DA (2012). A fast algorithm for streaming 54(4):2683–2694
betweenness centrality. International Conference on Privacy, Norlen K, Lucas G, Gebbie M, Chuang J (2002) EVA: extraction,
Security, Risk and Trust (PASSAT) and International Confer- visualization and analysis of the telecommunications and media
ence on Social Computing (SocialCom). IEEE, Amsterdam, ownership network. International telecommunications society
pp 11–20 14th biennial conference
Gringoli FE (2009) GT: picking up the truth from the ground for Onnela JP, Saramäki J, Hyvönen J, Szabó G, Lazer D, Kaski K et al
internet traffic. Comput Commun Rev 39(5):13–18 (2007) Structure and tie strengths in mobile communication
Habiba H, Tantipathananandh C, Berger-Wolf T (2007) Betweenness networks. Proc Natl Acad Sci 104(18):7332–7336
centrality measure in dynamic networks. University of Illinois at Opsahl T, Panzarasa P (2009) Clustering in weighted networks. Soc
Chicago, Department of Computer Science. DIMACS, Chicago Netw 31(2):155–163
Isella LE (2011) What’s in a crowd? Analysis of face-to-face Pfeffer J, Carley KM (2012) k-Centralities: local approximations of
behavioral networks. J Theor Biol 271(1):166–180 global measures based on shortest paths. WWW. ACM,
Jiang K, Ediger D, Bader DA (2009) Generalizing k-betweenness pp 1043–1050
centrality using short paths and a parallel multithreaded imple- Puzis R, Zilberman P, Elovici Y, Dolev S, Brandes U (2012)
mentation. International conference on parallel processing Heuristics for speeding up betweenness centrality computation.
(ICPP), Vienna, pp 542–549 Social computing and on privacy, security, risk and trust. IEEE
Kas M (2013b) Incremental centrality algorithms for dynamic Computer Society, pp 302–311
network analysis. Ph.D. Dissertation, Carnegie Mellon Univer- Ramalingam G, Reps T (1991a) On the computational complexity of
sity, ECE, Pittsburgh incremental algorithms. Technical report, University of Wiscon-
Kas M, Wachs M, Carley L, Carley K (2012a). Incremental centrality sin at Madison
computations for dynamic social networks. XXXII international Ramalingam G, Reps T (1991b) On the computational complexity of
sunbelt social network conference (Sunbelt 2012). INSNA, incremental algorithms. Technical Report, University of Wis-
Redondo Beach consin at Madison, Computer Science, Madison
Kas M, Carley KM, Carley LR (2012b) Trends in science networks: Ramezanpour A, Karimipour V (2008) Simple models of small world
understanding structures and statistics of scientific networks. networks with directed links. Sharif University of Technology,
Social network analysis and mining (SNAM) Department of Physics, Tehran
Kas M, Wachs M, Carley K, Carley L (2013) incremental algorithm Renyi A, Erdos P (1959). On random graphs. Publicationes Math-
for updating betweenness centrality in dynamically growing ematicae, 6
networks. The 2013 IEEE/ACM international conference on Tang J, Musolesi M, Mascolo C, Latora V, Nicosia V (2010)
advances in social networks analysis and mining (ASONAM). Analysing information flows and key mediators through tempo-
IEEE, Niagara Falls ral centrality metrics. 3rd workshop on social network systems
Kim H, Anderson R (2012) Temporal node centrality in complex (SNS). April
networks. Phys Rev E 85(026107):1–8 Watts D, Strogatz S (1998) Collective dynamics of ‘small-world’
King, V. (1999). Fully dynamic algorithms for maintaining all-pairs networks. Nature 393:440–442
shortest paths and transitive closure in digraphs. 40th annual Xu J (2008) Markov chain small world model with asymmetric
symposium on foundations of computer science. IEEE, pp 81–89 transition probabilities. Electron J Linear Algebra 17:616–636
Kourtellis N, Alahakoon T, Simha R, Iamnitchi A, Tripathi R (2013) Yang J, Leskovec J (2011). Patterns of temporal variation in online
Identifying high betweenness centrality nodes in large social media. International conference on web search and data minig
networks. Soc Netw Anal Min (SNAM) 4(3):899–914 (WSDM). ACM
Lee MJ, Lee J, Park JY, Choi R, Chung CW (2012a) QUBE: a Quick Zhu C, Xiong S, Tian Y, Li N, Jiang K (2004). Scaling of directed
algorithm for updating BEtweenness centrality. WWW, ACM, dynamical small-world networks with random responses. Phys
pp 351–360 Rev Lett 92 218702(24):1–4
123