Fractual Boundary of Complex Network
Fractual Boundary of Complex Network
Take a look for the latest journal news and information on: reading the latest articles, free! receiving free e-mail alerts submitting your work to EPL www.epl journal.org
November 2008
EPL, 84 (2008) 48004 doi: 10.1209/0295-5075/84/48004 www.epljournal.org
received 12 May 2008; accepted in nal form 16 October 2008 published online 21 November 2008
PACS PACS PACS
89.75.Hc Networks and genealogical trees 89.75.-k Complex systems 64.60.aq Networks
Abstract We introduce the concept of the boundary of a complex network as the set of nodes at distance larger than the mean distance from a given node in the network. We study the statistical properties of the boundary nodes seen from a given node of complex networks. We nd that for both Erds-Rnyi and scale-free model networks, as well as for several real networks, o e the boundaries have fractal properties. In particular, the number of boundaries nodes B follows a power law probability density function which scales as B 2 . The clusters formed by the boundary nodes seen from a given node are fractals with a fractal dimension df 2. We present analytical and numerical evidences supporting these results for a broad class of networks.
Copyright c EPLA, 2008
Many complex networks are small world due to the very small average distance d between two randomly chosen nodes. Often d ln N , where N is the number of nodes [16]. Thus, starting from a randomly chosen node following the shortest path, one can reach any other node in a very small number of steps. This phenomenon is called six degrees of separation in social networks [4]. That is, for most pairs of randomly chosen people, the shortest distance between them is not more than six. Many random network models, such as Erds-Rnyi network o e (ER) [1], Watts-Strogatz network (WS) [5] and scale-free network (SF) [3,68], as well as many real networks, have been shown to possess this small-world property. Much attention has been devoted to the structural properties of networks within the average distance d from a given node. However, almost no attention has been given to nodes which are at distances greater than d from a given node. We dene these nodes as the boundaries of the network and study the ensemble of boundaries formed by all possible starting nodes. An interesting question is: how many friends of friends of friends etc. . . . has one at a distance greater than the average distance d? What is their probability distribution and what is the structure of the boundaries? The boundaries have an important
(a) E-mail:
role in several scenarios, such as in the spread of viruses or information in a human social network. If the virus (information) spreads from one node to all its nearest neighbors, and from them to all next nearest neighbors and further on until d, how many nodes do not get the virus (information), and what is their distribution with respect to the origin of the infection? In this letter, we nd theoretically and numerically that the nodes at the boundaries, which are of order N , exhibit similar fractal features for many types of networks, including ER and SF models as well as several real networks. Song et al. [9] found that some networks have fractal properties while others do not. Properties of fractal networks were also studied [10,11]. Here we show that almost all model and real networks including non-fractal networks have fractal features at their boundaries. Figure 1 demonstrates our approach and analysis. For each root node, we call the nodes at distance from it nodes in shell . We choose a random root node and count the number of nodes B at shell . We see that B1 = 10, B2 = 11, B3 = 13, etc. . . . We estimate the average distance (diameter) d 2.9 by averaging the distances between all pairs of nodes. After removing nodes with < d 2.9, the network is fragmented into 12 clusters, with sizes s3 ={1, 1, 2, 5, 1, 3, 1, 1, 8, 1, 2, 3}. Note that the
48004-p1
10 10
l =11
-1
l =1 2 =1.0 l =13
P(Bl )
10
-2
10 10
shell 1
-3
-4
(a ) E R
0
10
10
Bl
10
10
shell 2
10 10
-1
cluster of 8 nodes
P(Bl )
=1.0
-2
10
Fig. 1: (Color online) Illustration of shells and clusters originating from a randomly chosen root node, which is shown in the center (red). Its neighboring nodes are dened as shell 1 (green), the nodes at distance are dened as shell . When removing all nodes with < 3, the remaining network (purple) becomes fragmented into 12 clusters.
10 10
-3
l =11 l = 10
l =8 l =9 10
2
-4
( b) S F
0
10
Bl
10
10
10
l =9
boundary of the network is always seen from a given node, thus not a unique set of nodes in the network. We begin by simulating ER and SF networks, and then we present analytical proofs. Figure 2a shows simulation results for the number of nodes B reached from a randomly chosen origin node for an ER network. The results shown are for a single network realization of size N = 106 , with average degree k = 6 and d 7.9 (see footnote 1 ). For < d, the cumulative distribution function, P (Bl ), which is the probability that shell has more than B nodes, decays exponentially for B > B , where B is the maximum typical size of shell 2 (see footnote ). However, for > d, we observe a clear transition to a power law decay behavior, where P (B ) B , with 1 and the pdf of B is 2 P (B ) dP (B )/dB B . For dierent networks, the emergence of the power law can occur at shell = d + 1 or = d + 2. Thus, our results suggest a broad scale-free distribution for the number of nodes at distances larger than d. This power law behavior demonstrates that there is no characteristic size and a broad range of sizes can appear in a shell at the boundaries. In SF networks, the degrees of the nodes, k, follow a power law distribution function q(k) k , where the minimum degree of the network, kmin , is chosen to be 2. Figure 2b shows, for SF networks with = 2.5, similar power law results, P (B ) B , with 1 for > d
1 Dierent realizations yield similar results. In one realization, a certain fraction of nodes are randomly taken to be origin. The histogram is obtained from B belonging to dierent origin nodes. 2 The behavior of the pdf of B for < d will be discussed later and is shown in g. 3c.
10
-1
P(Bl )
10
-2
10
-3
(c) HEP
0
10
10
10
Bl
10
10
10
10
-1
l =6
l =5 =1.0
l =4
P(Bl )
10
-2
l =7
10
-3
(d) AS
0
10
10
10
Bl
10
10
Fig. 2: (Color online) The cumulative distribution function, P (B ), for two random network models: (a) ER network with N = 106 nodes and k = 6, and (b) SF network with N = 106 nodes and = 2.5, and two real networks: (c) the High Energy Particle (HEP) physics citations network and (d) the Autonomous System (AS) Internet network. The shells with > d are marked with their shell number. The thin lines from left to right represent shells = 1, 2, . . . , respectively, with < d. For > d, P (B ) follows a power law distribution 2 P (B ) B , with 1 (corresponding to P (B ) B for the pdf). The appearance of a power law decay only happens for larger than d 7.9 for ER and d 4.7 for the SF network. The straight lines possess slopes of 1.
48004-p2
10 10
(a) ER
-1
<B l >/N
10
-2
n(sl )
10 10 10 10
= 3 .0
l =5 l =6 l =7
4 2 0
10
-3
10
-4
10
-5
10
-6 -4
(a) SF
0
l-(lnN/ln<k>)
-2
10
10
8
10
sl
10
10
10
10
kl + 1
n(sl )
10 10 10
=3.0
l =5 l =7 l =8
10
10
(b)
10 1 2 3 4 5 6 7 8 9 10 11 12 13
0
10 (b) HEP 0 1 10 10
0
10
sl
10
10
Fig. 3: (Color online) (a) Normalized average number of nodes at shell , B /N , as a function of ln N/ln k for ER network with k = 6. For dierent N , the curves collapse. 2 (b) k + 1, which is k / k , as a function of shown for both ER and SF networks with dierent N .
which is similar with ER. We nd similar results also for > 3 (not shown). To test how general is our nding, we also study several real networks (gs. 2c, d), including the High Energy Particle (HEP) physics citations network [12] and the Autonomous System (AS) Internet network [13,14]. For HEP network and AS network, d 4.2 and 3.3, respectively. The degree distribution of HEP network is not a power law (see g. 2c), while the AS network shows a power law degree distribution with 2.1 (see g. 2d). Our results suggest that the power law decay behavior appears also in both networks, with similar values of 1 for > d (see footnote 3 ). Next we ask: how many nodes are on average at the boundaries? Are they a nonzero fraction of N ? We calculate the mean number B in shell , and in g. 3a plot B /N as a function of ln N/ln k for dierent values of N for ER network. The term ln N/ln k represents the diameter d of the network [2]. We nd that, for dierent values of N , the curves collapse, supporting a relation independent of network size N . Since B /N is apparently constant and independent of N , it follows that B N , i.e., a nite fraction of N nodes appears at each shell including shells with > d. We nd similar behavior for SF network with = 3.5 (not shown).
3 We also nd similar results (not shown here) for other real networks.
Fig. 4: (Color online) The number of clusters of sizes s , n(s ), as a function of s after removing nodes within shell for: (a) SF network with N = 106 and = 2.5, (b) HEP citations network. The relation between n(s ) and s is characterized by a power law, n(s ) s , with 3. In order to show all curves clearly, vertical shifts are made. Note that the points in the tail of the distributions represent the rare occurences of large clusters which are formed by nodes outside shell 1.
The branching factor [15] of the network is k = k 2 / k 1, where the averages are calculated for the entire network. For ER network, k can be proved to be equivalent to 2 k . Similarly, we dene k k / k 1, where the averages are calculated only for nodes in shell . Above the diameter, k + 1 decreases with for both ER and SF networks (g. 3b). Thus, at the shells where power law behavior of P (B ) appears (g. 2), the nodes have much lower k + 1 compared with the entire network. The approaching of k + 1 to 1 (ER network) and 2 (SF network) is consistent with a critical behavior at the boundaries of the network [15]. Next, we study the structural properties of the boundaries. Removing all nodes that are within a distance (not including shell ) for > d, the network will become fragmented into several clusters (see g. 1). We denote the size of those clusters as s , the number of clusters of size s as n(s ), and the diameter of the cluster as d . We nd n(s) s , with 3.0 (gs. 4a and b). The points in the tails of gs. 4a and b represent the rare appearances of the large clusters. We nd similar results for ER and other real networks. The relation between the sizes of the clusters s and their diameters d is shown as scatter plots in gs. 5a
48004-p3
10
10
l =6 l =7
=1.9
10
10
(a) SF
10
10 1
dl
100
10
l =5 l =6
=2.0
Sl
10
2
(b) HEP
10 1
4
dl
10
for ER, SF with = 3.5 and several other real networks. Root nodes with dierent degree yield dierent average distances of the rest of the nodes from the root [18,19]. However, using our denition of boundaries the fractal clusters can be observed for both large and small degree roots (see g. 5c). Next we present analytical derivations supporting the above numerical results. We denote the degree distribution of a network as q(k). For innitely large networks we can neglect loops for < d and approximate the forming of a network as a branching process [2023]. The probability of reaching a node with k outgoing links through a randomly selected link is q (k) = (k + 1)q(k + 1)/ k . We dene G0 (x) k=0 q(k)xk as the generating function of q(k), G1 (x) = k=0 q (k)xk = G (x)/ k as the generating 0 function of q (k). For ER networks we have G0 (x) = G1 (x) = e k (x1) and k = k . The generating function for the number of nodes, Bm , at the shell m is [23] Gm (x) = G0 (G1 (. . .(G1 (x)))) = G0 (Gm1 (x)), (1)
1
Sl
10
10
Sl
10
=1.8
10
(c) ER
N=100,000 <k>=6
10 1
10
dl
100
is the result of applying where G1 (x), m 1 times. P (Bm ), which is the probability distribution of Bm , is the coecient of xBm in the Taylor expansion of Gm (x). For shells with large m which is still smaller than d, it is expected [23] that the number of nodes will increase by a factor of k. It is possible to show [21] that Gm1 (x) 1 converges to a function of the form f ((1 x)k m ) for large m (m d), where f (x) satises the Poincar functional e relation G1 (f (y)) = f (y k), (2) where y = 1 x. The function form of f (y) can be uniquely determined from eq. (2). It is known [21] that f (x) has an asymptotic functional form, f (y) = f + ay + 0(y ), where f satises G1 (f ) = f . It can be shown [22] that f also gives the probability that a link is not connected to the giant component of the network by one of its ends. Expanding both sides of eq. (2), we obtain G1 (f ) + G (f )ay = f + ak y + 0(y ). (3)
1
Fig. 5: (Color online) The size of clusters, s , shown as scatter plots of the diameters d of the clusters for (a) SF network with N = 106 and = 2.5, (b) HEP citations network for nodes outside shell 1. Vertical shifts of the curves are made for clarity. s scales with d as s d , with 2. (c) For ER network with N = 105 and k = 6, s as a function of d for small degree (k < 4) and large degree (k > 9) nodes chosen as roots. Here, s as a function of d is shown for = 8. The diameter of the entire network is d 6.6. Depending on the degree of the root node, the average distance of all nodes from the root may change. Large degree roots (k > 9) have small average distances ( 6.3) while small degree roots (k < 4) have large average distances ( 7.1). However, s d , with 2, can be observed for both small and large degree roots. We ignore the large clusters which appear in the at regions of g. 4.
and b, for SF ( = 2.5) and HEP citations networks, respectively. In order to show all curves, vertical shifts are made. Figures 5a and b show a power law relation, s d , with 2, suggesting that the clusters at the boundaries are fractals with fractal dimension df = 2 like percolation clusters at criticality [16,17]. Here, we ignore the nonfractal large clusters which appear in the at regions of g. 4. We nd that the fractal dimension is df = 2 also
Since G1 (f ) = f , we have = ln G (f )/ ln k. 1 If q(1) = 0 and q(2) = 0, from G1 (f ) = f , we have f = 0 and G (f ) = G0 (0)/ k = 2q(2)/ k . If q(2) = 1 q(1) = 0 (Bttcher case [21]), then = , which indio cates that f (y) has an exponential singularity. Therefore, networks with minimum degree kmin 3 do not have the power law distribution of B shown in g. 2, and therefore have no fractal boundaries. Applying Tauberian-like theorems [21,24] to f (y), which has a power law behavior for y , Dubuc [25] concluded that the Taylor expansion coecient of Gm (x), P (Bm ), behaves as Bm with an exponential cuto at Bm k m , where 1, q(1) = 0 and q(2) = 0; = 2 1, q(1) = 0 and q(2) = 0.
48004-p4
10 10
-2
=1.4 + 0.1
10
-4
l =1 l =2 l =3 l =4 l =5 l =6 l =7 l =8
where rn = 1 ( i=1 Bi )/N is the fraction of nodes outside shell n 1. Note that eq. (5) has almost the same structure as eq. (1). It can also be shown that the branching factor of nodes outside shell n 1 is k(rn ) = uG (u)/G (u), where u = G1 (rn ). 0 0 0 For ER networks, eq. (5) yields
n1
10
-6
_ (a) ER (l<d)
10
-8
r+1 = e
k (r 1)
=
=0
10
0
10
Bl
10
10
k q(k)r ,
(6)
10
(b) ER
rl+m
10
-1
m=1
Eq. (6)
10
-2
m=2 m=3
which is valid for all possible . We test it in g. 6b for ER network. The relation between r+m and r can be obtained by applying eq. (6) m times on r . In g. 6b we show the fraction of nodes outside shell + m 1, r+m , as a function of r for ER network. Dierent values of m are tested in the plot. When m d and n d, using the same considerations as we used in eq. (1), one can show that
1
10 0
-3
0.2
0.4
rl
0.6
0.8
rn = [ak(1 rm )]1 + r ,
(7)
Fig. 6: (Color online) (a) For ER network, the probability distribution function P (B ) of number of nodes B in shells d. For small values of B , P (B ) B , where depends on the k of the network (eq. (4)). The slopes of the least-square t represented by the straight lines give = 1.4 0.1, which is in good agreement with the theoretically predicted value = 1.34. (b) The fraction of nodes outside shell + m 1, r+m , as a function of r for ER network, where r = 1 ( 1 Bi )/N i=1 is calculated for any possible . The (red) lines represent the theoretical iteration function (eq. (6)).
where r = G0 (f ) is the fraction of nodes not belonging to the giant component of the network, a is a constant. Based on eqs. (4) and (7), expressing rm and rn in terms of Bm and Bn , we nd that for m d and n d, 1 Bn B m . Using P (Bn )dBn = P (Bm )dBm , we obtain
1/(+1)1/(+1) 2 P (Bn ) Bn = Bn ,
(8)
Thus the probability distribution function of the number of nodes in the shell m with m d has a power law tail for small values of Bm
P (Bm ) Bm .
(4)
For an ER network, eq. (4) is supported by simulations for m d in g. 6a. Figure 6a shows for ER network that P (B ) for < d and small values of B increase as a power law, P (B ) B . For ER network, we have = k , = 1, and = ln G (f )/ ln k. Thus = k 1 ln ( k f )/ln k 1, where f can be obtained numerically from f = e k (f 1) . In the case of k = 6, 1.34, which is close to the result shown in g. 6a. The above considerations are correct only for m < d, for which the depletion of nodes with large degree in the network is insignicant. In a large network, the shells with m 1 behave almost deterministically, and one can apply the mean-eld approximation for the number of nodes and links in each shell. Writing down the master equation for the degree distribution in the outer shells, one can obtain a system of ordinary dierential equations, which can be solved analytically using the apparatus of generating functions. Using this solution one can show that
nm rn = G0 (G1 (G1 (rm ))), 0
supporting the numerical ndings in g. 2. These results are rigorous when k exists and when the minimum degree kmin 2. For SF networks with < 3, k still exists. Thus diverges for N . But for nite N , k the above results can also be applied to the case of < 3. For both ER and SF networks with kmin 3, the power law of P (Bn ) with n d cannot be observed, as we indeed conrm by simulations. Relating our problem to percolation theory, we can explain the simulation results of the probability distribution of cluster size s . The cluster size distribution in percolation at some concentration p close to pc is determined by the formula [15] Pp (s > S) S +1 exp(S|p pc |1/ ). (9)
(5)
In the case of random networks the percolation threshold is given by pc = 1/k. In the exterior of the shell n 1 (n d), we can estimate |p pc | (k(rn ) 1)/k, where k(rn ) decreases and reaches the critical percolation value of 1. Near the percolation threshold the nodes outside shell n 1 are split into a number of nite clusters, and if k > 1 a giant component. These nite clusters have fractal dimension df = 2 [16,17]. This theoretical prediction is conrmed in g. 5. The cluster size distribution can be estimated by introducing a sharp exponential cuto at s = Sn 1 |k(rn ) 1| , so that Pn (s > S) S +1 P (Sn > S), where P (Sn > S) is the probability for a given shell to have
48004-p5
We thank ONR and Israel Science Foundation for nancial support. SVB thanks the Oce of the Academic Aairs of Yeshiva University for funding the Yeshiva University high-performance computer cluster and acknowledges the partial support of this research through the Dr. Bernard W. Gamson Computational Science Center at Yeshiva College.
REFERENCES [1] Erdos P. and Renyi A., Publ. Math., 6 (1959) 290; Publ. Math. Inst. Hung. Acad. Sci., 5 (1960) 17. [2] Bollobas B., Random Graphs (Academic, London) 1985.
[3] Albert R. and Barabasi A.-L., Rev. Mod. Phys., 74 (2002) 47. [4] Milgram S., Psychol. Today, 2 (1967) 60. [5] Watts D. J. and Strogatz S. H., Nature (London), 393 (1998) 440. [6] Cohen R. and Havlin S., Phys. Rev. Lett., 90 (2003) 058701. [7] Dorogovtsev S. N. and Mendes J. F. F., Evolution of Networks: from Biological Nets to the Internet and WWW (Oxford University Press, New York) 2003. [8] Pastor-Satorras R. and Vespignani A., Evolution and Structure of the Internet: Statistical Physics Approach (Cambridge University Press) 2004. [9] Song C. et al., Nature (London), 433 (2005) 392; Nat. Phy., 2 (2006) 275. [10] Goh K. I. et al., Phys. Rev. Lett., 96 (2006) 018701. [11] Kitsak M. et al., Phys. Rev. E., 75 (2007) 056115. [12] Derived from the HEP section of arxiv.org; http:// vlado.fmf.uni-lj.si/pub/networks/data/hep-th/hepth.htm (website of Pajek). [13] Carmi S. et al., Proc. Natl. Acad. Sci. U.S.A., 104 (2007) 11150. [14] Shavitt Y. and Shir E., DIMES - Letting the Internet Measure Itself, http://www.arxiv.org/abs/cs.NI/ 0506099. [15] Cohen R. et al., Phys. Rev. Lett., 85 (2000) 4626. [16] Cohen R. et al., Phys. Rev. E, 66 (2002) 036113; Bornholdt S. and Schuster H. G. (Editors), Handbook of Graphs and Networks (Wiley-VCH) 2002, Chapt. 4. [17] Bunde A. and Havlin S., Fractals and Disordered System (Springer) 1996. [18] Holyst J. et al., Phys. Rev. E, 72 (2005) 026108. [19] Dorogovtsev S. N. et al., Phys. Rev. E, 73 (2006) 056122. [20] Harris T. E., Ann. Math. Stat., 41 (1948) 474; The Theory of Branching Processes (Springer-Verlag, Berlin) 1963. [21] Bingham N. H., J. Appl. Probab. A, 25 (1988) 215. [22] Braunstein L. A. et al., Int. J. Bifurcat. Chaos, 17 (2007) 2215; Phys. Rev. Lett., 91 (2003) 168701. [23] Newman M. E. J. et al., Phys. Rev. E, 64 (2001) 026118. [24] Weiss G. H., Aspects and Applications of the Random Walk (North Holland Press, Amsterdam) 1994. [25] Dubuc S., Ann. Inst. Fourier, 21 (1971) 171. [26] Barabasi A. L. et al., Phys. Rev. Lett., 76 (1996) 2192.
48004-p6