0% found this document useful (0 votes)
12 views6 pages

Burstiness and Memory in Complex Systems: Offprint

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 6

OFFPRINT

Burstiness and memory in complex systems


K.-I. Goh and A.-L. Barabási
EPL, 81 (2008) 48002

Please visit the new website


www.epljournal.org
February 2008
EPL, 81 (2008) 48002 www.epljournal.org
doi: 10.1209/0295-5075/81/48002

Burstiness and memory in complex systems


K.-I. Goh1,2 and A.-L. Barabási1,3
1
Center for Complex Network Research and Department of Physics, University of Notre Dame
Notre Dame, IN 46556, USA
2
Department of Physics, Korea University - Seoul 136-713, Korea
3
Department of Physics, Biology, and Computer Science, Northeastern University - Boston, MA 02115, USA

received 9 September 2007; accepted in final form 13 December 2007


published online 17 January 2008

PACS 89.75.-k – Complex systems


PACS 05.45.Tp – Time series analysis

Abstract – The dynamics of a wide range of real systems, from email patterns to earthquakes,
display a bursty, intermittent nature, characterized by short timeframes of intense activity followed
by long times of no or reduced activity. The understanding of the origin of such bursty patterns
is hindered by the lack of tools to compare different systems using a common framework. Here we
propose to characterize the bursty nature of real signals using orthogonal measures quantifying
two distinct mechanisms leading to burstiness: the interevent time distribution and the memory.
We find that while the burstiness of natural phenomena is rooted in both the interevent time
distribution and memory, for human dynamics memory is weak, and the bursty character is due
to the changes in the interevent time distribution. Finally, we show that current models lack in
their ability to reproduce the activity pattern observed in real systems, opening up avenues for
future work.

c EPLA, 2008
Copyright 

The dynamics of most complex systems is driven by the weather patterns, where memory effects appear to play a
loosely coordinated activity of a large number of compo- key role as well [13,14]. Once present, burstiness can affect
nents, such as individuals in the society or molecules in the spreading of viruses [15] or resource allocation [16,17].
the cell. While we witnessed much progress in the study of Also, deviations towards a regular, “anti-bursty” behavior
the networks behind these systems [1–4], advances towards in heartbeat indicate disease progression [18]. Given the
understanding the system’s dynamics have been slower. diversity of systems in which it emerges, there is a need to
With increasing potential to monitor the time-resolved place burstiness on a firmer quantitative basis. Our goal in
activity of most components of selected complex systems, this letter is to make a step in this direction, by developing
such as time-resolved email [5–7], web browsing [8], and a diagnosis tool that can help quantify the magnitude and
gene expression [9,10] patterns, we have the opportunity potential origin of the bursty patterns seen in different
to ask an important question: is the dynamics of complex real systems. Such a tool may also lend insights into the
systems governed by generic organizing principles, or each analysis of fractal and self-similar bursty signals [19].
system has its own distinct dynamical features? While it Let us consider a system whose components have a
is difficult to offer a definite answer to this question, a measurable activity pattern that can be mapped into
common feature across many systems is increasingly docu- a discrete signal, recording the moments when some
mented: the burstiness of the system’s activity patterns. events take place, like an email being sent, or a protein
Bursts, vaguely corresponding to significantly enhanced being translated1 . The activity pattern is random (Poisson
activity levels over short periods of time followed by long process) if the probability of an event is time-independent.
periods of inactivity, have been observed in a wide range In this case the interevent time, τ , between two con-
of systems, from email patterns [6] to earthquakes [11,12] secutive events follows an exponential distribution,
and gene expression [9]. Yet, often a burstiness is more PP (τ ) ∼ exp(−τ /τ0 ) (fig. 1a). An apparently bursty
of a metaphor than a quantitative feature, and opinions 1 For systems with continuous signal, we may adopt a threshold
about its origin diverge. In human dynamics, burstiness method to transform it into a discrete one, and in many systems
has been reduced to the fat-tailed nature of the response the statistical properties of the obtained signal are known to be
time distribution [6,7], in contrast with earthquakes and threshold-independent [12,13].

48002-p1
K.-I. Goh and A. L. Barabási

1
a
a
0 10 20 30 40 t 50
0.5 4
b d 3 P(τ) u=10
101
0 10 20 30 40 50 0 10 20 30 40 50 100 2

B
t t 0
10-1 1
10-2 P(τ)
c e 10-3 0
-0.5 10-4 u=0.1 0 0.5 1 1.5 2
0 10 20 30 40 t 50 0 10 20 30 40 t 50 10-5 τ
τ
10-2 10-1 100 101
-1
Fig. 1: (a) A signal generated by a Poisson process with a 0.01 0.1 1 10 100
unit rate (B = −0.05, M = 0.02; see text for the definitions u
1
of B and M parameters). (b,c) Bursty character through
b
the interevent time distribution: A bursty signal (B = 0.44,
M = −0.04) generated by the power law interevent time distri- 0.5
4
bution P (τ ) ∼ τ −1 (b), and an anti-bursty signal (B = −0.81, P(τ) s=0.1
10
1
3
100
M = −0.02) generated by the Gaussian interevent time distri- s*
10-1 P(τ)

B
0 2
-2
bution with m = 1 and σ = 0.1 (c). A bursty-looking signal 1 10
10-3 s=2
can emerge through memory as well. For example, the -0.5
0 τ 10
-4
-5
0 0.5 1 1.5 2 10
bursty-looking signal shown in (d) (M = 0.90) is obtained by 10-410-310-210-1 100 101 102
shuffling the Poisson signal of (a) to increase the memory effect. τ
-1
A more regular looking signal, with negative memory, is 0.01 0.1 1 10 100
obtained by the same shuffling procedure (e) (M = −0.74). s
Note that signals in (a), (d) and (e) have identical interevent
time distribution, thus the same B-value. Fig. 2: The burstiness parameter B for (a) the stretched
exponential and (b) log-normal interevent time distributions.
Both distributions interpolate between a highly bursty (B = 1),
(or anti-bursty) signal emerges if P (τ ) is different from neutral (B = 0), and a regular (B = −1) signal. Insets show
the exponential, such as the bursty pattern of fig. 1b, or the form of P (τ ) in bursty and anti-bursty regime of each
the more regular pattern of fig. 1c. Yet, the change in distribution along with a typical time signal generated with
the interevent time distribution is not the only way to the corresponding P (τ ). The dashed line in the insets refers
the exponential distribution for the Poisson process.
generate a bursty signal. For example, the signals shown
in fig. 1d,e have exactly the same P (τ ) as in fig. 1a, yet
they have a more bursty or a more regular character.
that between its cumulative functions may also be used
This is achieved by introducing memory: in fig. 1d the
for this purpose. Here we use the coefficient of variation
short interevent times tend to follow short ones, resulting
to define a burstiness parameter B as
in a bursty look. In fig. 1e the relative regularity is due to
the memory effect acting in the opposite direction: short
(στ /mτ − 1) (στ − mτ )
(long) interevent times tend to be followed by long (short) B≡ = . (1)
ones. Therefore, the apparent burstiness of a signal can (στ /mτ + 1) (στ + mτ )
be rooted in two mechanistically different deviations from
a Poisson process: changes in the interevent time distrib- This definition is meaningful when both the mean and
ution or memory. To distinguish these orthogonal effects, the standard deviation of P (τ ) exist, which is always
we consider two measures, the burstiness parameter B the case for real-world finite signals. When meaningful,
based on the interevent time distribution and the memory B has a value in the bounded range (−1, 1), and its
parameter M based on the interevent time correlations, magnitude correlates with the signal’s burstiness: B = 1
that quantify the degree of each effect in real signals. is the most bursty signal, B = 0 is neutral, and B = −1
corresponds to a completely regular (periodic) signal. For
Distribution-based measure. – We may character- example, in fig. 2a we show B for the stretched exponential
ize the deviation from the Poisson signal in several ways. distribution,
Perhaps the simplest measure in the literature would be
the so-called coefficient of variation, defined as the ratio PSE (τ ) = u(τ /τ0 )u−1 exp[−(τ /τ0 )u ]/τ0 , (2)
of the standard deviation to the mean, στ /mτ , where mτ
and στ are the mean and the standard deviation of P (τ ), often used to approximate the interevent time distribu-
respectively. It has a value 1 for a Poisson signal with the tions of complex systems [20]. Here the smaller the para-
exponential P (τ ), 0 for completely regular δ function-like meter u is, the burstier is the signal, and for u → 0, PSE (τ )
P (τ ), and ∞ for signals with a heavy-tailed P (τ ) with follows a power law with the exponent −1, for which B = 1.
infinite variance. Higher moments of the distribution such For u = 1, PSE (τ ) is simply the exponential distribution
as skewness or kurtosis, or a more complicated measure with B = 0. Finally, for u > 1, the larger u is, the more
based on the area between P (τ ) and the exponential or regular is the signal, and for u → ∞, P (τ ) converges to a

48002-p2
Burstiness and memory in complex systems

0 10 20 30 40 50 60 70 80 t (day) 0 5 10 15 20 25 30 35 counts 0 10 20 30 40 50 60 70 80 t
1 3
10
104 Email (a) Text (b) Heartbeat (c)
100 2.5 0.03
2
10
-1
10 0.02
2
100
-2
τ0P(τ)

τ0P(τ)

τ0P(τ)
10 0.01
10-2 -2
100 1.5
10
10-3 -2 0
10-4 10
0 0.5 1 1.5
10-4 -6 -4 1
10 10 10
-4
-8
-6 10
10 10-6 0.5
10-10 10
-5
-8 10
0
10
2
10
4
10
6
10
0
10
2
10
4
10 -6
10 0
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 -2 -1 0 1
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 0 0.5 1 1.5 2 2.5 3
τ/τ0 τ/τ0 τ/τ0

Fig. 3: Interevent time distributions P (τ ) for some real signals. (a) P (τ ) for the email activity of individuals from a University [4].
τ corresponds to the time interval between two emails sent by the same user. (b) P (τ ) for the occurrence of a letter in the text
of C. Dickens’ David Copperfield [25]. Here τ corresponds to the number of letters between two consecutive occurrences of the
same alphabet letter. (c) P (τ ) of the cardiac rhythm of individuals [30]. Each event corresponds to the beat in the heartbeat
signal and τ is the time interval between two consecutive heartbeats for an individual. In each panel, we also show for reference
the exponential interevent time distribution (dotted). Unscaled interevent time distributions are shown in the inset for each
dataset.

Dirac delta function with B = −1. We also show in fig. 2b system a characteristic burstiness parameter, despite the
the behavior of B for the log-normal distribution, different activity level of its components. The scaling in
fig. 3 could be a starting point of further theoretical work,
[ln(τ ) − µ]2
 
1
PLN (τ ) = √ exp − , (3) aiming to answer how generic it is and what is the mecha-
τ s 2π 2s2
nism behind it. Currently, we have only partial answer to
also frequently used for the statistics of complex these questions for specific systems [23].
systems [21,22]. Here the larger the parameter s is,
the larger is the variance of P (τ ) hence the signal gets Correlation-based measure. – The way we can char-
burstier (B → 1). The smaller s is, the more regular is acterize the correlation properties of a signal is not unique
the signal, and B approaches to −1 as s → 0. We note either. The joint probability distribution parameterized by
however that even though B becomes zero for a specific a time lag k, P (τ, τ ′ ; k), defined as the probability density

value s∗ (fig. 2b), P (τ ) does not become an exponential that we have two interevent times τ and τ separated by
there, which is a caveat of the present measure. k events, contains the most information about the two-
Most complex systems display a remarkable heterogene- point correlation properties. The 2autocorrelation function
ity: some components may be very active, and others C(k) = (τi − mτ )(τi+k − mτ )/στ , where · means the
much less so. For example, some users may send dozens average over the index i, is also widely used in many appli-
of emails during a day, while others only one or two. To cations. A simple measure is offered by the correlation
combine the activity levels of so different components, we coefficient of consecutive interevent time values (τi , τi+1 ),
can group the signals based on their average activity level, defining the memory coefficient M as
and measure P (τ ) only for components with similar activ- nτ −1
ity level. As the insets in fig. 3 show, the obtained curves 1 (τi − m1 )(τi+1 − m2 )
M≡ , (4)
are systematically shifted. If we plot, however, τ0 P (τ ) n τ − 1 i=1
σ1 σ2
as a function of τ /τ0 , where τ0 is the average interevent
time, the data collapse into a single curve F(x) (fig. 3), where nτ is the number of interevent times measured
indicating that the interevent time distribution follows from the signal and m1 (m2 ) and σ1 (σ2 ) are sample
P (τ ) = (1/τ0 )F(τ /τ0 ), where F(x) is independent of the mean and sample standard deviation of τi ’s (τi+1 ’s),
average activity level of the component, and represents a respectively (i = 1, . . . , nτ − 1). Note that M is a biased
universal characteristic of the particular system [12,23,24]. estimator for C(k = 1), which is more appropriate
This raises an important question: will B depend on τ0 ? for real-world finite signals, particularly if there are
The burstiness parameter B is indeed invariant under possible long-range correlations in the system. With
the time rescaling as τ̃ ≡ τ /τ0 and P̃ (τ̃ ) ≡ τ0 P (τ ) with a this definition, the memory coefficient has a value in
constant τ0 . Such an invariance enables us to assign to each the range (−1, 1) and is positive when a short (long)

48002-p3
K.-I. Goh and A. L. Barabási

1 0.7
a b
0.75 0.6

0.5 0.5
human
0.25 activities 0.4
B

B
0 0.3

-0.25 0.2

-0.5 natural 0.1


phenomena
-0.75 heartbeat 0
texts
-1 -0.1
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4
M M

Fig. 4: (Color online) (a) The (M, B) phase diagram. Human activities (red) are captured by activity patterns pertaining to
email (⋆) [5], library loans (◦) [7], and printing () [28] of individuals in Universities, call center record at an anonymous bank
() [29], and phone initiation record from a mobile phone company (⋄). Data for natural phenomena (black) are earthquake
records in Japan (•) [26] and daily precipitation records in New Mexico, USA () [27]. Data for written texts (blue) [25] are
the English text of David Copperfield (△) and the Hungarian text of Isten Rabjai by Gárdonyi Géza (▽). Data for physiological
behaviors (green) are the normal sinus rhythm () and the cardiac rhythm with CHF () of human subjects [30]. The dark-grey
area is the region occupied by the 2-state model [34]. (b) Close-up of the most populated region (light-grey region in (a)). Data
in each class are indicated by grouping with the respective dimmer color for the eye.

interevent time tends to be followed by a short (long) effects could lead to predictive tools. Finally, for cardiac
one, and it is negative when a short (long) interevent rhythms describing the time interval between two consec-
time is likely to be followed by a long (short) one. For utive heartbeats (fig. 3c) [30], we find Bhealthy = −0.69(6)
example, the synthetic signals shown in figs. 1(a,d,e) for healthy individuals and BCHF = −0.8(1) for patients
with identical P (τ ) have the memory coefficient with congestive heart failure (CHF), both signals being
M = 0.02 (neutral; a), M = 0.90 (positive memory; d) highly regular. Thus the B parameter captures the fact
and M = −0.74 (negative memory; e), respectively. that cardiac rhythm is more regular with CHF than in
the healthy condition [18]. Furthermore, we find M ≈ 0.97,
Mapping complex systems on the (M, B)-space. – indicating that memory also plays an important role in the
Given that the burstiness of a signal can have two qualita- signal’s regularity.
tively different origins, it is desirable to characterize real- The discriminative nature of the (M, B) phase diagram
world complex systems by quantifying both effects, using is illustrated by the clustering of the different systems in
the corresponding B and M parameters to place them the plane: human-activity patterns locate themselves in
in a (M, B)-space (fig. 4). As a first example, we measured the high-B, low-M region, natural phenomena near the
the spacing between the consecutive occurrences of the diagonal, heartbeats in the high-M , negative-B region and
same letter in written texts of different kind, era, and texts near the origin, suggesting the existence of distinct
language [25]. For these signals, we find B ≈ 0, i.e., the classes of dynamical mechanisms driving the temporal
interevent time distribution follows closely an exponential activity in these systems. It will also be interesting to
(fig. 3b) and M ≈ 0.01, indicating the lack of short-term study how chaotic (real or model-generated) signals are
memory. Thus, this signal is at the origin of the phase placed in the (M, B)-plane, and whether there exist clear
diagram (fig. 4). In contrast, natural phenomena, like boundaries in the phase diagram separating systems into
earthquakes [26] and weather patterns [27] are in the vicin- distinct classes.
ity of the diagonal, indicating that P (τ ) and memory
equally contribute to their bursty character. The situation Discussion. – Following the clustering of the empirical
is quite different, however, for human activities, ranging measurements in the phase diagram, a natural question
from email and phone communication to web browsing emerges: to what degree can current models reproduce
and library visitation patterns [5,7,8,28,29]. For these we the observed quantitative features of bursty processes?
find a high B and small or negligible M , indicating that Queueing models, proposed to capture human-activity
while these systems display significant burstiness rooted in patterns, are designed to capture the waiting times of the
P (τ ), memory plays a small role in their temporal inho- tasks, rather than interevent times [6,7,31–33]. Therefore,
mogeneity. This lack of memory is quite unexpected, as placing them on the phase diagram is not meaningful. A
it suggests the lack of predictability in these systems in bursty signal can be generated by the 2-state model [34].
contrast with natural phenomena, where strong memory The 2-state model is a probabilistic automaton with

48002-p4
Burstiness and memory in complex systems

two internal states q0 and q1 , in each of which the [7] Vázquez A., Oliveira J. G., Dezső Z., Goh K.-I.,
system performs a Poisson process with rates λ0 and Kondor I. and Barabási A.-L., Phys. Rev. E, 73 (2006)
λ1 , respectively. Each time the system switches its state 036127.
(changes λ) with probability p, or remains in its current [8] Dezső Z., Almaas E., Lukacs A., Racz B., Szakadat
state with probability (1 − p). Thus the system alternates I. and Barabási A.-L., Phys. Rev. E, 73 (2006) 066132.
[9] Golding I., Paulsson J., Zawilski S. M. and Cox
between two Poisson processes randomly, generating a
E. C., Cell, 123 (2005) 1025.
bursty signal when λ0 = λ1 . The B and M parameters for
[10] Chubb J. R., Trcek T., Shenoy S. M. and Singer
the 2-state model can be calculated analytically [35]. The R. H., Curr. Biol., 16 (2006) 1018.
region in the (M, B)-space occupied by the 2-state model [11] Bak P., Christensen K., Danon L. and Scanlon T.,
with different λ rates and switching probability p is shown Phys. Rev. Lett., 88 (2002) 178501.
as the dark-grey area in fig. 4a, suggesting that the model [12] Corral A., Phys. Rev. E, 68 (2003) 035102(R).
could account for some of the observed behaviors by [13] Bunde A., Eichner J. F., Kantelhardt J. W. and
tuning its parameters. Yet, the agreement is misleading: Havlin S., Phys. Rev. Lett., 94 (2005) 048701.
for example, P (τ ) of real bursty systems is often skewed [14] Livina V. N., Havlin S. and Bunde A., Phys. Rev. Lett.,
and fat-tailed, which is not the case for the 2-state model 95 (2005) 208501.
for which we have the sum of two exponentials. This [15] Vázquez A., Rácz B., Lukács A. and Barabási A.-L.,
Phys. Rev. Lett., 98 (2007) 158702.
indicates that B and M parameters offer only a first-
[16] Leland W. E., Taqqu M. S., Willinger W. and
order discrimination for the origin of the burstiness. More
Wilson D. V., IEEE/ACM Trans. Netw., 2 (1994) 1.
sophisticated measures are needed to improve the compar- [17] Paxson V. and Floyd S., IEEE/ACM Trans. Netw., 3
ison between models and real systems by, e.g., using the (1995) 226.
full functional form of P (τ ) and the autocorrelation [18] Thurner S., Feurstein M. C. and Teich M. C., Phys.
function, or by developing measures to capture long-term Rev. Lett., 80 (1998) 1544.
correlations and non-linear effects present in real systems, [19] Lowen S. B. and Teich M. C., Fractal-based point
such as those exhibiting self-organized criticality or processes (Wiley, Hoboken, NJ) 2005.
chaotic behavior [36,37]. These topics deserve further [20] Laherrère J. and Sornette D., Eur. Phys. J. B, 2
investigation. This discrepancy also indicates the lack of (1998) 525.
satisfactory modeling tools to capture the detailed mech- [21] Mitzenmacher M., Internet Math., 1 (2004) 226.
[22] Stouffer D. B., Malmgren R. D. and Amaral
anisms responsible for the bursty activities seen in real
L. A. N., e-print physics/0605027v1.
complex systems, opening up possibilities for future work.
[23] Saichev A. and Sornette D., Phys. Rev. Lett., 97
(2006) 078501.
∗∗∗ [24] Candia J., González M. C., Wang P., Schoenharl
T., Madey G. and Barabási A.-L., to be published in
We would like to thank S. Havlin and A. Vázquez J. Phys. A, arXiv:0710.2939v2.
for helpful discussions, and an anonymous reviewer for [25] Project Gutenberg, http://gutenberg.org.
comments on the manuscript. This work is supported by [26] Japan University Network Earthquake Catalog, http://
the S. McDonnell Foundation and the National Science wwweic.eri.u-okyo.ac.jp/CATALOG/junec/.
Foundation under Grant No. CNS-0540348 and ITR [27] National Resources Conservation Service, http://www.
DMR-0426737. K-IG is also supported by the Korea nm.nrcs.usda.gov/Snow/data/historic.htm.
Research Foundation Grant funded by the Korean [28] Harder U. and Paczuski M., Physica A, 361 (2006)
Government (MOEHRD) (KRF-2007-331-C00111). 329.
[29] Guedj I. and Mandelbaum A., http://iew3.technion.
ac.il/serveng/callcenterdata/.
REFERENCES [30] PhysioBank, http://www.physionet.org/physiobank/.
[31] Oliveira J. G. and Barabási A.-L., Nature, 437 (2005)
[1] Albert R. and Barabási A.-L., Rev. Mod. Phys., 74 1251.
(2001) 47. [32] Vázquez A., Phys. Rev. Lett., 95 (2005) 248701.
[2] Boccaletti S., Latora V., Moreno Y., Chavez M. [33] Gabrielli A. and Caldarelli G., Phys. Rev. Lett., 98
and Hwang D.-U., Phys. Rep., 424 (2006) 175. (2007) 208701.
[3] Newman M. E. J., Barabási A.-L. and Watts D. J. [34] Kleinberg J., in Proceedings of the 8th ACM SIGKDD
(Editors), Structure and Dynamics of Complex Networks International Conference on Knowledge Discovery Data
(Princeton University Press, Princeton) 2006. Mining (2002) pp. 91.
[4] Caldarelli G., Scale-free networks (Oxford University [35] Goh K.-I. et al., unpublished.
Press, Oxford) 2007. [36] Bak P., Tang C. and Wiesenfeld K., Phys. Rev. A,
[5] Eckmann J. P., Moses E. and Sergi D., Proc. Natl. 38 (1988) 364.
Acad. Sci. U.S.A., 101 (2004) 14333. [37] Abarbanel H. D., Brown R., Sidorowich J. J.
[6] Barabási A.-L., Nature, 207 (2005) 435. and Tsimring L. Sh., Rev. Mod. Phys., 65 (1993) 1331.

48002-p5

You might also like