0% found this document useful (0 votes)
11 views19 pages

Random Graphs With Arbitrary Degree Distributions and Their Applications

The paper discusses the theory of random graphs with arbitrary degree distributions, moving beyond the traditional Poisson distribution to better model real-world networks like the internet and social collaborations. It derives key properties such as phase transitions and component sizes, demonstrating that these random graphs can accurately predict behaviors in certain real-world scenarios while highlighting discrepancies that suggest additional underlying structures. The authors provide a framework for analyzing directed and bipartite graphs, expanding the applications of random graph theory in various fields.

Uploaded by

penber0427
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views19 pages

Random Graphs With Arbitrary Degree Distributions and Their Applications

The paper discusses the theory of random graphs with arbitrary degree distributions, moving beyond the traditional Poisson distribution to better model real-world networks like the internet and social collaborations. It derives key properties such as phase transitions and component sizes, demonstrating that these random graphs can accurately predict behaviors in certain real-world scenarios while highlighting discrepancies that suggest additional underlying structures. The authors provide a framework for analyzing directed and bipartite graphs, expanding the applications of random graph theory in various fields.

Uploaded by

penber0427
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Random graphs with arbitrary degree distributions and their applications

M. E. J. Newman1,2 , S. H. Strogatz2,3 , and D. J. Watts1,4


1
Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501
2
Center for Applied Mathematics, Cornell University, Ithaca NY 14853–3401
arXiv:cond-mat/0007235v2 [cond-mat.stat-mech] 7 May 2001

3
Department of Theoretical and Applied Mechanics, Cornell University, Ithaca NY 14853–1503
4
Department of Sociology, Columbia University, 1180 Amsterdam Avenue, New York, NY 10027
Recent work on the structure of social networks and the internet has focussed attention on graphs
with distributions of vertex degree that are significantly different from the Poisson degree distri-
butions that have been widely studied in the past. In this paper we develop in detail the theory
of random graphs with arbitrary degree distributions. In addition to simple undirected, unipartite
graphs, we examine the properties of directed and bipartite graphs. Among other results, we derive
exact expressions for the position of the phase transition at which a giant component first forms, the
mean component size, the size of the giant component if there is one, the mean number of vertices
a certain distance away from a randomly chosen vertex, and the average vertex–vertex distance
within a graph. We apply our theory to some real-world graphs, including the world-wide web and
collaboration graphs of scientists and Fortune 1000 company directors. We demonstrate that in
some cases random graphs with appropriate distributions of vertex degree predict with surprising
accuracy the behavior of the real world, while in others there is a measurable discrepancy between
theory and reality, perhaps indicating the presence of additional social structure in the network that
is not captured by the random graph.

I. INTRODUCTION
(a) (b)
A random graph [1] is a collection of points, or vertices,
with lines, or edges, connecting pairs of them at random
(Fig. 1a). The study of random graphs has a long his-
tory. Starting with the influential work of Paul Erdős
and Alfréd Rényi in the 1950s and 1960s [2–4], random
graph theory has developed into one of the mainstays of
modern discrete mathematics, and has produced a prodi-
gious number of results, many of them highly ingenious,
describing statistical properties of graphs, such as distri-
butions of component sizes, existence and size of a giant FIG. 1. (a) A schematic representation of a random graph,
component, and typical vertex–vertex distances. the circles representing vertices and the lines edges. (b) A
In almost all of these studies the assumption has been directed random graph, i.e., one in which each edge runs in
made that the presence or absence of an edge between only one direction.
two vertices is independent of the presence or absence of
any other edge, so that each edge may be considered to world networks of various types, particularly in epidemi-
be present with independent probability p. If there are N ology. The passage of a disease through a community de-
vertices in a graph, and each is connected to an average pends strongly on the pattern of contacts between those
of z edges, then it is trivial to show that p = z/(N − 1), infected with the disease and those susceptible to it. This
which for large N is usually approximated by z/N . The pattern can be depicted as a network, with individuals
number of edges connected to any particular vertex is represented by vertices and contacts capable of transmit-
called the degree k of that vertex, and has a probability ting the disease by edges. The large class of epidemio-
distribution pk given by logical models known as susceptible/infectious/recovered
 
N k z k e−z (or SIR) models [5–7] makes frequent use of the so-called
pk = p (1 − p)N −k ≃ , (1) fully mixed approximation, which is the assumption that
k k!
contacts are random and uncorrelated, i.e., that they
where the second equality becomes exact in the limit of form a random graph.
large N . This distribution we recognize as the Poisson Random graphs however turn out to have severe short-
distribution: the ordinary random graph has a Poisson comings as models of such real-world phenomena. Al-
distribution of vertex degrees, a point which turns out to though it is difficult to determine experimentally the
be crucial, as we now explain. structure of the network of contacts by which a disease
Random graphs are not merely a mathematical toy; is spread [8], studies have been performed of other social
they have been employed extensively as models of real- networks such as networks of friendships within a variety

1
of communities [9–11], networks of telephone calls [12,13],
airline timetables [14], and the power grid [15], as well as 1 2 3 4
networks in physical or biological systems, including neu-
ral networks [15], the structure and conformation space
of polymers [16,17], metabolic pathways [18,19], and food
A B C D E F G H I J K
webs [20,21]. It is found [13,14] that the distribution
of vertex degrees in many of these networks is measur-
ably different from a Poisson distribution—often wildly
different—and this strongly suggests, as has been em-
phasized elsewhere [22], that there are features of such H
A
networks which we would miss if we were to approximate B F
them by an ordinary (Poisson) random graph.
C I
Another very widely studied network is the internet,
whose structure has attracted an exceptional amount of D G
scrutiny, academic and otherwise, following its meteoric E K
rise to public visibility starting in 1993. Pages on the J
world-wide web may be thought of as the vertices of a
graph and the hyperlinks between them as edges. Empir- FIG. 2. A schematic representation (top) of a bipartite
ical studies [23–26] have shown that this graph has a dis- graph, such as the graph of movies and the actors who have
tribution of vertex degree which is heavily right-skewed appeared in them. In this small graph we have four movies,
and possesses a fat (power-law) tail with an exponent labeled 1 to 4, and eleven actors, labeled A to K, with edges
between −2 and −3. (The underlying physical struc- joining each movie to the actors in its cast. In the lower part
ture of the internet also has a degree distribution of this of the picture we show the one-mode projection of the graph
type [27].) This distribution is very far from Poisson, and for the eleven actors.
therefore we would expect that a simple random graph
would give a very poor approximation of the structural
properties of the web. However, the web differs from a stantial differences from the ordinary random graphs that
random graph in another way also: it is directed. Links have been studied in the past, it would clearly be useful
on the web lead from one page to another in only one if we could generalize the mathematics of random graphs
direction (see Fig. 1b). As discussed by Broder et al. [26] to non-Poisson degree distributions, and to directed and
this has a significant practical effect on the typical acces- bipartite graphs. In this paper we do just that, demon-
sibility of one page from another, and this effect also will strating in detail how the statistical properties of each of
not be captured by a simple (undirected) random graph these graph types can be calculated exactly in the limit
model. of large graph size. We also give examples of the ap-
A further class of networks that has attracted scrutiny plication of our theory to the modeling of a number of
is the class of collaboration networks. Examples of real-world networks, including the world-wide web and
such networks include the boards of directors of compa- collaboration graphs.
nies [28–31], co-ownership networks of companies [32],
and collaborations of scientists [33–37] and movie ac-
II. RANDOM GRAPHS WITH ARBITRARY
tors [15]. As well as having strongly non-Poisson de-
DEGREE DISTRIBUTIONS
gree distributions [14,36], these networks have a bipartite
structure; there are two distinct kinds of vertices on the
graph with links running only between vertices of unlike In this section we develop a formalism for calculating
kinds [38]—see Fig. 2. In the case of movie actors, for a variety of quantities, both local and global, on large
example, the two types of vertices are movies and actors, unipartite undirected graphs with arbitrary probability
and the network can be represented as a graph with edges distribution of the degrees of their vertices. In all re-
running between each movie and the actors that appear spects other than their degree distribution, these graphs
in it. Researchers have also considered the projection of are assumed to be entirely random. This means that
this graph onto the unipartite space of actors only, also the degrees of all vertices are independent identically-
called a one-mode network [38]. In such a projection two distributed random integers drawn from a specified dis-
actors are considered connected if they have appeared tribution. For a given choice of these degrees, also called
in a movie together. The construction of the one-mode the “degree sequence,” the graph is chosen uniformly at
network however involves discarding some of the infor- random from the set of all graphs with that degree se-
mation contained in the original bipartite network, and quence. All properties calculated in this paper are aver-
for this reason it is more desirable to model collaboration aged over the ensemble of graphs generated in this way.
networks using the full bipartite structure. In the limit of large graph size an equivalent procedure is
Given the high current level of interest in the structure to study only one particular degree sequence, averaging
of many of the graphs described here, and given their sub- uniformly over all graphs with that sequence, where the

2
sequence is chosen to approximate as closely as possible Moments The average over the probability distribu-
the desired probability distribution. The latter proce- tion generated by a generating function—for instance,
dure can be thought of as a “microcanonical ensemble” the average degree z of a vertex in the case of G0 (x)—is
for random graphs, where the former is a “canonical en- given by
semble.” X
Some results are already known for random graphs z = hki = kpk = G′0 (1). (5)
with arbitrary degree distributions: in two beautiful re- k
cent papers [39,40], Molloy and Reed have derived for-
mulas for the position of the phase transition at which Thus if we can calculate a generating function we can also
a giant component first appears, and the size of the gi- calculate the mean of the probability distribution which
ant component. (These results are calculated within the it generates. Higher moments of the distribution can be
microcanonical ensemble, but apply equally to the canon- calculated from higher derivatives also. In general, we
ical one in the large system size limit.) The formalism we have
present in this paper yields an alternative derivation of X 
d
n 
these results and also provides a framework for obtaining hk n i = k n pk = x G0 (x) . (6)
other quantities of interest, some of which we calculate. dx x=1
k
In Sections III and IV we extend our formalism to the
case of directed graphs (such as the world-wide web) and Powers If the distribution of a property k of an object
bipartite graphs (such as collaboration graphs). is generated by a given generating function, then the dis-
tribution of the total of k summed over m independent
realizations of the object is generated by the mth power
A. Generating functions of that generating function. For example, if we choose
m vertices at random from a large graph, then the dis-
Our approach is based on generating functions [41], the tribution of the sum of the degrees of those vertices is
most fundamental of which, for our purposes, is the gen- generated by [G0 (x)]m . To see why this is so, consider
erating function G0 (x) for the probability distribution of the simple case of just two vertices. The square [G0 (x)]2
vertex degrees k. Suppose that we have a unipartite undi- of the generating function for a single vertex can be ex-
rected graph—an acquaintance network, for example—of panded as
N vertices, with N large. We define
X 2

X
X
k [G0 (x)]2 = pk xk = pj pk xj+k
G0 (x) = pk x , (2) k jk
k=0
= p0 p0 x + (p0 p1 + p1 p0 )x1
0
where pk is the probability that a randomly chosen ver-
+(p0 p2 + p1 p1 + p2 p0 )x2
tex on the graph has degree k. The distribution pk is
assumed correctly normalized, so that +(p0 p3 + p1 p2 + p2 p1 + p3 p0 )x3 + . . .
(7)
G0 (1) = 1. (3)
The same will be true of all generating functions consid- It is clear that the coefficient of the power of xn in this
ered here, with a few important exceptions, which we will expression is precisely the sum of all products pj pk such
note at the appropriate point. Because the probability that j + k = n, and hence correctly gives the probability
distribution is normalized and positive definite, G0 (x) is that the sum of the degrees of the two vertices will be n.
also absolutely convergent for all |x| ≤ 1, and hence has It is straightforward to convince oneself that this prop-
no singularities in this region. All the calculations of this erty extends also to all higher powers of the generating
paper will be confined to the region |x| ≤ 1. function.
The function G0 (x), and indeed any probability gener- All of these properties will be used in the derivations
ating function, has a number of properties that will prove given in this paper.
useful in subsequent developments. Another quantity that will be important to us is the
Derivatives The probability pk is given by the k th distribution of the degree of the vertices that we arrive
derivative of G0 according to at by following a randomly chosen edge. Such an edge
arrives at a vertex with probability proportional to the
1 dk G0 degree of that vertex, and the vertex therefore has a prob-
pk = . (4)
k! dxk x=0
ability distribution of degree proportional to kpk . The
correctly normalized distribution is generated by
Thus the one function G0 (x) encapsulates all the in-
formation contained in the discrete probability distribu- k
G′ (x)
P
k kpk x
tion pk . We say that the function G0 (x) “generates” the P = x 0′ . (8)
k kpk G0 (1)
probability distribution pk .

3
If we start at a randomly chosen vertex and follow where the last equality applies in the limit N → ∞. It is
each of the edges at that vertex to reach the k nearest then trivial to show that the average degree of a vertex is
neighbors, then the vertices arrived at each have the dis- indeed G′0 (1) = z and that the probability distribution of
tribution of remaining outgoing edges generated by this degree is given by pk = z k e−z /k!, which is the ordinary
function, less one power of x, to allow for the edge that Poisson distribution. Notice also that for this special case
we arrived along. Thus the distribution of outgoing edges we have G1 (x) = G0 (x), so that the distribution of out-
is generated by the function going edges at a vertex is the same, regardless of whether
we arrived there by choosing a vertex at random, or by
G′0 (x) 1 following a randomly chosen edge. This property, which
G1 (x) = ′ = G′0 (x), (9)
G0 (1) z is peculiar to the Poisson-distributed random graph, is
the reason why the theory of random graphs of this type
where z is the average vertex degree, as before. The prob- is especially simple.
ability that any of these outgoing edges connects to the b. Exponentially distributed graphs Perhaps the next
original vertex that we started at, or to any of its other simplest type of graph is one with an exponential distri-
immediate neighbors, goes as N −1 and hence can be ne- bution of vertex degrees
glected in the limit of large N . Thus, making use of the
“powers” property of the generating function described pk = (1 − e−1/κ )e−k/κ , (13)
above, the generating function for the probability distri-
bution of the number of second neighbors of the original where κ is a constant. The generating function for this
vertex can be written as distribution is

1 − e−1/κ
X
pk [G1 (x)]k = G0 (G1 (x)). (10)
X
G0 (x) = (1 − e−1/κ ) e−k/κ xk = , (14)
k k=0
1 − xe−1/κ

Similarly, the distribution of third-nearest neighbors is and


generated by G0 (G1 (G1 (x))), and so on. The average 2
1 − e−1/κ

number z2 of second neighbors is G1 (x) = . (15)
  1 − xe−1/κ
d
z2 = G0 (G1 (x)) = G′0 (1)G′1 (1) = G′′0 (1), (11) An example of a graph with an exponential degree dis-
dx x=1
tribution is given in Section V A.
where we have made use of the fact that G1 (1) = 1. (One c. Power-law distributed graphs The recent interest
might be tempted to conjecture that since the average in the properties of the world-wide web and of social net-
number of first neighbors is G′0 (1), Eq. (5), and the aver- works leads us to investigate the properties of graphs with
age number of second neighbors is G′′0 (1), Eq. (11), then a power-law distribution of vertex degrees. Such graphs
the average number of mth neighbors should be given by have been discussed previously by Barabási et al. [22,23]
the mth derivative of G0 evaluated at x = 1. As we show and by Aiello et al. [13]. In this paper, we will look at
in Section II F, however, this conjecture is wrong.) graphs with degree distribution given by

pk = Ck −τ e−k/κ for k ≥ 1. (16)


B. Examples
where C, τ , and κ are constants. The reason for including
the exponential cutoff is two-fold: first many real-world
To make things more concrete, we immediately intro- graphs appear to show this cutoff [14,36]; second it makes
duce some examples of specific graphs to illustrate how the distribution normalizable for all τ , and not just τ ≥ 2.
these calculations are carried out. The constant C is fixed by the requirement of normal-
a. Poisson-distributed graphs The simplest example ization, which gives C = [Liτ (e−1/κ )]−1 and hence
of a graph of this type is one for which the distribution of
degree is binomial, or Poisson in the large N limit. This k −τ e−k/κ
distribution yields the standard random graph studied pk = for k ≥ 1, (17)
Liτ (e−1/κ )
by many mathematicians and discussed in Section I. In
this graph the probability p = z/N of the existence of an where Lin (x) is the nth polylogarithm of x, a function
edge between any two vertices is the same for all vertices, familiar to those who have worked with Feynman inte-
and G0 (x) is given by grals.
Substituting (17) into Eq. (2), we find that the gen-
N   erating function for graphs with this degree distribution
X N
G0 (x) = pk (1 − p)N −k xk is
k
k=0
Liτ (xe−1/κ )
= (1 − p + px)N = ez(x−1) , (12) G0 (x) = . (18)
Liτ (e−1/κ )

4
In the limit κ → ∞—the case considered in Refs. [13]
and [23]—this simplifies to

Liτ (x)
G0 (x) = , (19) = + + + + . . .
ζ(τ )

where ζ(x) is the Riemann ζ-function.


The function G1 (x) is given by FIG. 3. Schematic representation of the sum rule for the
connected component of vertices reached by following a ran-
Liτ −1 (xe−1/κ ) domly chosen edge. The probability of each such component
G1 (x) = . (20) (left-hand side) can be represented as the sum of the probabil-
x Liτ −1 (e−1/κ )
ities (right-hand side) of having only a single vertex, having a
Thus, for example, the average number of neighbors of a single vertex connected to one other component, or two other
components, and so forth. The entire sum can be expressed
randomly-chosen vertex is
in closed form as Eq. (26).
Liτ −1 (e−1/κ )
z = G′0 (1) = , (21)
Liτ (e−1/κ ) ends. We explicitly exclude from H1 (x) the giant com-
ponent, if there is one; the giant component is dealt with
and the average number of second neighbors is
separately below. Thus, except when we are precisely
at the phase transition where the giant component ap-
Liτ −2 (e−1/κ ) − Liτ −1 (e−1/κ )
z2 = G′′0 (1) = . (22) pears, typical component sizes are finite, and the chances
Liτ (e−1/κ ) of a component containing a closed loop of edges goes as
d. Graphs with arbitrary specified degree distribution N −1 , which is negligible in the limit of large N . This
means that the distribution of components generated by
In some cases we wish to model specific real-world graphs
which have known degree distributions—known because H1 (x) can be represented graphically as in Fig. 3; each
component is tree-like in structure, consisting of the sin-
we can measure them directly. A number of the graphs
described in the introduction fall into this category. For gle site we reach by following our initial edge, plus any
number (including zero) of other tree-like clusters, with
these graphs, we know the exact numbers nk of vertices
having degree k, and hence we can write down the exact the same size distribution, joined to it by single edges. If
we denote by qk the probability that the initial site has
generating function for that probability distribution in
the form of a finite polynomial: k edges coming out of it other than the edge we came in
along, then, making use of the “powers” property of Sec-
P k tion II A, H1 (x) must satisfy a self-consistency condition
k nk x
G0 (x) = P , (23) of the form
k nk
H1 (x) = xq0 + xq1 H1 (x) + xq2 [H1 (x)]2 + . . . (25)
where the sum in the denominator ensures that the gen-
erating function is properly normalized. As a example, However, qk is nothing other than the coefficient of xk
suppose that in a community of 1000 people, each person in the generating function G1 (x), Eq. (9), and hence
knows between zero and five of the others, the exact num- Eq. (25) can also be written
bers of people in each category being, from zero to five:
{86, 150, 363, 238, 109, 54}. This distribution will then be H1 (x) = xG1 (H1 (x)). (26)
generated by the polynomial
If we start at a randomly chosen vertex, then we have
86 + 150x + 363x2 + 238x3 + 109x4 + 54x5 one such component at the end of each edge leaving that
G0 (x) = .
1000 vertex, and hence the generating function for the size of
(24) the whole component is

H0 (x) = xG0 (H1 (x)). (27)

In principle, therefore, given the functions G0 (x) and


C. Component sizes
G1 (x), we can solve Eq. (26) for H1 (x) and substitute
into Eq. (27) to get H0 (x). Then we can find the proba-
We are now in a position to calculate some proper- bility that a randomly chosen vertex belongs to a compo-
ties of interest for our graphs. First let us consider the nent of size s by taking the sth derivative of H0 . In prac-
distribution of the sizes of connected components in the tice, unfortunately, this is usually impossible; Eq. (26)
graph. Let H1 (x) be the generating function for the dis- is a complicated and frequently transcendental equation,
tribution of the sizes of components which are reached which rarely has a known solution. On the other hand,
by choosing a random edge and following it to one of its we note that the coefficient of xs in the Taylor expansion

5
of H1 (x) (and therefore also the sth derivative) are given into Eq. (32), we can also write the condition for the
exactly by only s + 1 iterations of Eq. (27), starting with phase transition as
H1 = 1, so that the distribution generated by H0 (x) can X
be calculated exactly to finite order in finite time. With k(k − 2)pk = 0. (33)
current symbolic manipulation programs, it is quite pos- k
sible to evaluate the first one hundred or so derivatives
in this way. Failing this, an approximate solution can be Indeed, since this sum increases monotonically as edges
found by numerical iteration and the distribution of clus- are added to the graph, it follows that the giant compo-
ter sizes calculated from Eq. (4) by numerical differenti- nent exists if and only if this sum is positive. This re-
ation. Since direct evaluation of numerical derivatives sult has been derived by different means by Molloy and
is prone to machine-precision problems, we recommend Reed [39]. An equivalent and intuitively reasonable state-
evaluating the derivatives by numerical integration of the ment, which can also be derived from Eq. (31), is that
Cauchy formula, giving the probability distribution Ps of the giant component exists if and only if z2 > z1 .
cluster sizes thus: Our generating function formalism still works when
there is a giant component in the graph, but, by defi-
1 dsH0 1 H0 (z)
I
nition, H0 (x) then generates the probability distribution
Ps = = dz. (28)
s! dz s z=0 2πi z s+1 of the sizes of components excluding the giant compo-
nent. This means that H0 (1) is no longer unity, as it is
The best numerical precision is obtained by using the for the other generating functions considered so far, but
largest possible contour, subject to the condition that it instead takes the value 1 − S, where S is the fraction of
enclose no poles of the generating function. The largest the graph occupied by the giant component. We can use
contour for which this condition is satisfied in general is this to calculate the size of the giant component from
the unit circle |z| = 1 (see Section II A), and we recom- Eqs. (26) and (27) thus:
mend using this contour for Eq. (28). It is possible to
find the first thousand derivatives of a function without S = 1 − G0 (u), (34)
difficulty using this method [42].
where u ≡ H1 (1) is the smallest non-negative real solu-
tion of
D. The mean component size, the phase transition,
and the giant component u = G1 (u). (35)

This result has been derived in a different but equivalent


Although it is not usually possible to find a closed-form form by Molloy and Reed [40], using different methods.
expression for the complete distribution of cluster sizes The correct general expression for the average compo-
on a graph, we can find closed-form expressions for the nent size, excluding the (formally infinite) giant compo-
average properties of clusters from Eqs. (26) and (27). nent, if there is one, is
For example, the average size of the component to which
a randomly chosen vertex belongs, for the case where H0′ (1)
there is no giant component in the graph, is given in the hsi =
H0 (1)
normal fashion by
G′ (H1 (1))G1 (H1 (1))
 
1
= G0 (H1 (1)) + 0
hsi = H0′ (1) = 1 + G′0 (1)H1′ (1). (29) H0 (1) 1 − G′1 (H1 (1))
zu2
From Eq. (26) we have =1+ , (36)
[1 − S][1 − G′1 (u)]
H1′ (1) = 1 + G′1 (1)H1′ (1), (30) which is equivalent to (31) when there is no giant com-
ponent (S = 0, u = 1).
and hence
For example, in the ordinary random graph with Pois-
G′0 (1) z12 son degree distribution, we have G0 (x) = G1 (x) =
hsi = 1 + = 1 + , (31) ez(x−1) (Eq. (12)), and hence we find simply that 1 − S =
1 − G′1 (1) z1 − z2
u is a solution of u = G0 (u), or equivalently that
where z1 = z is the average number of neighbors of a
vertex and z2 is the average number of second neighbors. S = 1 − e−zS . (37)
We see that this expression diverges when
The average component size is given by
G′1 (1) = 1. (32) 1
hsi = . (38)
This point marks the phase transition at which a giant 1 − z + zS
component first appears. Substituting Eqs. (2) and (9) These are all well-known results [1].

6
For graphs with purely power-law distributions G1 (w∗ ) − w∗ G′1 (w∗ ) = 0. (43)
(Eq. (17) with κ → ∞), S is given by (34) with u the
smallest non-negative real solution of Then x∗ (and hence s∗ ) is given by Eq. (42). Note that
there is no guarantee that (43) has a finite solution, and
Liτ −1 (u) that if it does not, then Ps will not in general follow the
u= . (39)
uζ(τ − 1) form of Eq. (40).
When we are precisely at the phase transition of our
For all τ ≤ 2 this gives u = 0, and hence S = 1, imply- system, we have G1 (1) = G′1 (1) = 1, and hence the so-
ing that a randomly chosen vertex belongs to the giant lution of Eq. (43) gives w∗ = x∗ = 1—a result which
component with probability tending to 1 as κ → ∞. For we used above—and s∗ → ∞. We can use the fact that
graphs with τ > 2, the probability of belonging to the gi- x∗ = 1 at the transition to calculate the value of the ex-
ant component is strictly less than 1, even for infinite κ. ponent α as follows. Expanding H1−1 (w) about w∗ = 1
In other words, the giant component essentially fills the by putting w = 1 + ǫ in Eq. (42), we find that
entire graph for τ ≤ 2, but not for τ > 2. These results
have been derived by different means by Aiello et al. [13]. H1−1 (1 + ǫ) = 1 − 21 G′′1 (1)ǫ2 + O(ǫ3 ), (44)

where we have made use of G1 (1) = G′1 (1) = 1 at the


E. Asymptotic form of the cluster size distribution phase transition. So long as G′′1 (1) 6= 0, which in general
it is not, this implies that H1 (x) and hence also H0 (x)
A variety of results are known about the asymptotic are of the form
properties of the coefficients of generating functions,
H0 (x) ∼ (1 − x)β as x → 1, (45)
some of which can usefully be applied to the distribu-
tion of cluster sizes Ps generated by H0 (x). Close to the
with β = 21 . This exponent is related to the exponent
phase transition, we expect the tail of the distribution Ps α as follows. Equation (40) implies that H0 (x) can be
to behave as
written in the form

Ps ∼ s−α e−s/s , (40) a−1
X ∞
X ∗
H0 (x) = Ps xs + C s−α e−s/s xs + ǫ(a), (46)

where the constants α and s can be calculated from the s=0 s=a
properties of H0 (x) as follows.
The cutoff parameter s∗ is simply related to the radius where C is a constant and the last (error) term ǫ(a) is as-
of convergence |x∗ | of the generating function [41,43], ac- sumed much smaller than the second term. The first term
cording to in this expression is a finite polynomial and therefore has
no singularities on the finite plane; the singularity resides
1 in the second term. Using this equation, the exponent β
s∗ = . (41)
log |x∗ | can be written:
H ′′ (x)
 
The radius of convergence |x∗ | is equal to the magnitude
β = lim 1 + (x − 1) 0′
of the position x∗ of the singularity in H0 (x) nearest to x→1 H0 (x)
the origin. From Eq. (27) we see that such a singular-  P∞
1 x − 1 s=a s2−α xs−1

ity may arise either through a singularity in G0 (x) or = lim lim + P ∞ 1−α xs−1
a→∞ x→1 x x
through one in H1 (x). However, since the first singu- s=a s
 
larity in G0 (x) is known to be outside the unit circle 1 1 − x Γ(3 − α, −a log x)
= lim lim + , (47)
(Section II A), and the first singularity in H1 (x) tends a→∞ x→1 x x log x Γ(2 − α, −a log x)
to x = 1 as we go to the phase transition (see below), it
follows that, sufficiently close to the phase transition, the where we have replaced the sums with integrals as a be-
singularity in H0 (x) closest to the origin is also a singu- comes large, and Γ(ν, µ) is the incomplete Γ-function.
larity in H1 (x). With this result x∗ is easily calculated. Taking the limits in the order specified and rearranging
Although we do not in general have a closed-form ex- for α, we then get
pression for H1 (x), it is easy to derive one for its func-
tional inverse. Putting w = H1 (x) and x = H1−1 (w) in α = β + 1 = 23 , (48)
Eq. (26) and rearranging, we find
regardless of degree distribution, except in the special
w case where G′′1 (1) vanishes (see Eq. (44)). The result
x= H1−1 (w) = . (42) α = 23 was known previously for the ordinary Poisson
G1 (w)
random graph [1], but not for other degree distributions.
The singularity of interest corresponds to the point w∗
at which the derivative of H1−1 (w) is zero, which is a
solution of

7
F. Numbers of neighbors and average path length given by replacing N in Eq. (54) by N S, where S is the
fraction of the graph occupied by the giant component,
We turn now to the calculation of the number of neigh- as in Section II D.
bors who are m steps away from a randomly chosen ver- Such shortcomings notwithstanding, there are a num-
tex. As shown in Section II A, the probability distribu- ber of remarkable features of Eq. (54):
tions for first- and second-nearest neighbors are gener- 1. It shows that the average vertex–vertex distance
ated by the functions G0 (x) and G0 (G1 (x)). By exten- for all random graphs, regardless of degree distri-
sion, the distribution of mth neighbors is generated by bution, should scale logarithmically with size N ,
G0 (G1 (. . . G1 (x) . . .)), with m − 1 iterations of the func- according to ℓ = A + B log N , where A and B are
tion G1 acting on itself. If we define G(m) (x) to be this constants. This result is of course well-known for a
generating function for mth neighbors, then we have number of special cases.

G0 (x) for m = 1 2. It shows that the average distance, which is a global
G(m) (x) = (49)
G(m−1) (G1 (x)) for m ≥ 2. property, can be calculated from a knowledge only
of the average numbers of first- and second-nearest
Then the average number zm of mth-nearest neighbors is neighbors, which are local properties. It would be
possible therefore to measure these numbers em-
dG(m) ′ pirically by purely local measurements on a graph
zm = = G′1 (1)G(m−1) (1) = G′1 (1)zm−1 .
dx x=1 such as an acquaintance network and from them to
determine the expected average distance between
(50)
vertices. For some networks at least, this gives a
Along with the initial condition z1 = z = G′0 (1), this surprisingly good estimate of the true average dis-
then tells us that tance [37].
 m−1 3. It shows that only the average numbers of first-
z2 and second-nearest neighbors are important to the
zm = [G′1 (1)]m−1 G′0 (1) = z1 . (51)
z1 calculation of average distances, and thus that two
random graphs with completely different distribu-
From this result we can make an estimate of the typi- tions of vertex degrees, but the same values of z1
cal length ℓ of the shortest path between two randomly and z2 , will have the same average distances.
chosen vertices on the graph. This typical path length is
reached approximately when the total number of neigh- For the case of the purely theoretical example graphs
bors of a vertex out to that distance is equal to the num- we discussed earlier, we cannot make an empirical mea-
ber of vertices on the graph, i.e., when surement of z1 and z2 , but we can still employ Eq. (54) to
calculate ℓ. In the case of the ordinary (Poisson) random

X graph, for instance, we find from Eq. (12) that z1 = z,
1+ zm = N. (52) z2 = z 2 , and so ℓ = log N/ log z, which is the standard
m=1 result for graphs of this type [1]. For the graph with de-
gree distributed according to the truncated power law,
Using Eq. (51) this gives us
Eq. (17), z1 and z2 are given by Eqs. (21) and (22), and
log[(N − 1)(z2 − z1 ) + z12 ] − log z12 the average vertex–vertex distance is
ℓ= . (53)
log N + log Liτ (e−1/κ )/ Liτ −1 (e−1/κ )
 
log(z2 /z1 )
ℓ=   + 1. (55)
log Liτ −2 (e−1/κ )/ Liτ −1 (e−1/κ ) − 1
In the common case where N ≫ z1 and z2 ≫ z1 , this
reduces to In the limit κ → ∞, this becomes
 
log(N/z1 ) log N + log ζ(τ )/ζ(τ − 1)
ℓ= + 1. (54) ℓ=  + 1. (56)
log(z2 /z1 )

log ζ(τ − 2)/ζ(τ − 1) − 1
This result is only approximate for two reasons. First, Note that this expression does not have a finite positive
the conditions used to derive it are only an approxima- real value for any τ < 3, indicating that one must specify
tion; the exact answer depends on the detailed structure a finite cutoff κ for the degree distribution to get a well-
of the graph. Second, it assumes that all vertices are defined average vertex–vertex distance on such graphs.
reachable from a randomly chosen starting vertex. In
general however this will not be true. For graphs with
no giant component it is certainly not true and Eq. (54) G. Simulation results
is meaningless. Even when there is a giant component
however, it is usually not the case that it fills the entire As a check on the results of this section, we have per-
graph. A better approximation to ℓ may therefore be formed extensive computer simulations of random graphs

8
with various distributions of vertex degree. Such graphs
are relatively straightforward to generate. First, we gen- 1.0
erate a set of N random numbers {ki } to represent the
degrees of the N vertices in the graph. These may be 0.8

size of giant component


thought of as the “stubs” of edges, emerging from their
respective vertices. Then we choose pairs of these stubs
at random and place edges on the graph joining them 0.6
up. It is simple to see that this will generate all graphs
with the given set of vertex degrees with equalPproba-
bility. The only small catch is that the sum i ki of
0.4
the degrees must be even, since each edge added to the τ = 1.0
graph must have two ends. This is not difficult to con- τ = 1.5
0.2 τ = 2.0
trive however. If the set {ki } is such that the sum is odd,
we simply throw it away and generate a new set. τ = 2.5
As a practical matter, integers representing vertex de- 0.0
grees with any desired probability distribution can be 1 10 100
generated using the transformation method if applica-
cutoff parameter κ
ble, or failing that, a rejection or hybrid method [44].
For example, degrees obeying the power-law-plus-cutoff
form of Eq. (17) can be generated using a two-step hy- FIG. 4. The size of the giant component in random graphs
brid transformation/rejection method as follows. First, with vertex degrees distributed according to Eq. (17), as a
function of the cutoff parameter κ for five different values
we generate random integers k ≥ 1 with distribution pro-
of the exponent τ . The points are results from numerical
portional to e−k/κ using the transformation [45]
simulations on graphs of N = 1 000 000 vertices, and the solid
lines are the theoretical value for infinite graphs, Eqs. (34)
k = ⌈−κ log(1 − r)⌉, (57)
and (35). The error bars on the simulation results are smaller
where r is a random real number uniformly distributed than the data points.
in the range 0 ≤ r < 1. Second, we accept this number
with probability k −τ , where by “accept” we mean that are two correct generalizations of the idea of the compo-
if the number is not accepted we discard it and generate nent to a directed graph: the set of vertices which are
another one according to Eq. (57), repeating the process reachable from a given vertex, and the set from which
until one is accepted. a given vertex can be reached. We will refer to these
In Fig. 4 we show results for the size of the giant com- as “out-components” and “in-components” respectively.
ponent in simulations of undirected unipartite graphs An in-component can also be thought of as those ver-
with vertex degrees distributed according to Eq. (17) for tices reachable by following edges backwards (but not
a variety of different values of τ and κ. On the same plot forwards) from a specified vertex. It is possible to study
we also show the expected value of the same quantity directed graphs by allowing both forward and backward
derived by numerical solution of Eqs. (34) and (35). As traversal of edges (see Ref. [26], for example). In this
the figure shows, the agreement between simulation and case, however, the graph effectively becomes undirected
theory is excellent. and should be treated with the formalism of Section II.
With these considerations in mind, we now develop
the generating function formalism appropriate to random
III. DIRECTED GRAPHS
directed graphs with arbitrary degree distributions.

We turn now to directed graphs with arbitrary de-


gree distributions. An example of a directed graph is A. Generating functions
the world-wide web, since every hyperlink between two
pages on the web goes in only one direction. The web In a directed graph, each vertex has separate in-degree
has a degree distribution that follows a power-law, as and out-degree for links running into and out of that ver-
discussed in Section I. tex. Let us define pjk to be the probability that a ran-
Directed graphs introduce a subtlety that is not domly chosen vertex has in-degree j and out-degree k.
present in undirected ones, and which becomes impor- It is important to realize that in general this joint dis-
tant when we apply our generating function formalism. tribution of j and k is not equal to the product pj pk of
In a directed graph it is not possible to talk about the separate distributions of in- and out-degree. In the
a “component”—i.e., a group of connected vertices— world-wide web, for example, it seems likely (although
because even if vertex A can be reached by following this question has not been investigated to our knowledge)
(directed) edges from vertex B, that does not necessarily that sites with a large number of outgoing links also have
mean that vertex B can be reached from vertex A. There

9
a large number of incoming ones, i.e., that j and k are
correlated, so that pjk 6= pj pk . We appeal to those work-
ing on studies of the structure of the web to measure the 1 2
joint distribution of in- and out-degrees of sites; empiri- strongly
cal data on this distribution would make theoretical work links in connected links out
much easier! component
We now define a generating function for the joint prob-
ability distribution of in- and out-degrees, which is nec-
essarily a function of two independent variables, x and y,
thus: FIG. 5. The “bow-tie” diagram proposed by Broder et al.
X as a representation of the giant component of the world-wide
G(x, y) = pjk xj y k . (58) web (although it can be used to visualize any directed graph).
jk

Since every edge on a directed graph must leave some ver- as before. However, this equation should be used with
tex and enter another, the net average number of edges caution. As discussed in Section II F, the derivation of
entering a vertex is zero, and hence pjk must satisfy the this formula assumes that we are in a regime in which
constraint the bulk of the graph is reachable from most vertices.
X On a directed graph however, this may be far from true,
(j − k)pjk = 0. (59) as appears to be the case with the world-wide web [26].
jk The probability distribution of the numbers of vertices
reachable from a randomly chosen vertex in a directed
This implies that G(x, y) must satisfy graph—i.e., of the sizes of the out-components—is gen-
erated by the function H0 (y) = yG0 (H1 (y)), where H1 (y)
∂G ∂G is a solution of H1 (y) = yG1 (H1 (y)), just as before. (A
= = z, (60)
∂x x,y=1 ∂y x,y=1
similar and obvious pair of equations governs the sizes
of the in-components.) The results for the asymptotic
where z is the average degree (both in and out) of vertices behavior of the component size distribution from Sec-
in the graph. tion II E generalize straightforwardly to directed graphs.
Using the function G(x, y), we can, as before, define The average out-component size for the case where there
generating functions G0 and G1 for the number of out- is no giant component is given by Eq. (31), and thus the
going edges leaving a randomly chosen vertex, and the point at which a giant component first appears is given
number leaving the vertex reached by following a ran- once more by G′1 (1) = 1. Substituting Eq. (58) into this
domly chosen edge. We can also define generating func- expression gives the explicit condition
tions F0 and F1 for the number arriving at such a vertex. X
These functions are given by (2jk − j − k)pjk = 0 (65)
jk
1 ∂G
F0 (x) = G(x, 1), F1 (x) = , (61) for the first appearance of the giant component. This
z ∂y y=1 expression is the equivalent for the directed graph of
1 ∂G Eq. (33). It is also possible, and equally valid, to de-
G0 (y) = G(1, y), G1 (y) = . (62)
z ∂x x=1 fine the position at which the giant component appears
by F1′ (1) = 1, which provides an alternative derivation
Once we have these functions, many results follow as be- for Eq. (65).
fore. The average numbers of first and second neighbors Just as with the individual in- and out-components for
reachable from a randomly chosen vertex are given by vertices, the size of the giant component on a directed
Eq. (60) and graph can also be defined in different ways. The giant
component can be represented using the “bow-tie” dia-
∂2G gram of Broder et al. [26], which we depict (in simpli-
z2 = G′0 (1)G′1 (1) = . (63)
∂x∂y x,y=1 fied form) in Fig. 5. The diagram has three parts. The
strongly connected portion of the giant component, rep-
These are also the numbers of first and second neigh- resented by the central circle, is that portion in which ev-
bors from which a random vertex can be reached, since ery vertex can be reached from every other. The two sides
Eqs. (60) and (63) are manifestly symmetric in x and y. of the bow-tie represent (1) those vertices from which the
We can also make an estimate of the average path length strongly connected component can be reached but which
on the graph from it is not possible to reach from the strongly connected
component and (2) those vertices which can be reached
log(N/z1 ) from the strongly connected component but from which
ℓ= + 1, (64)
log(z2 /z1 ) it is not possible to reach the strongly connected compo-

10
are found to be equal. Finally, we choose random in/out
1 pairs of edges and join them together to make a directed
graph. The resulting graph has the desired number of
κ = κc = 1.4427 vertices and the desired joint distribution of in and out
0.1 κ = 0.8 degree.
κ = 0.5 We have simulated directed graphs in which the dis-
frequency Ps

tribution pjk is given by a simple product of indepen-


dent distributions of in- and out-degree. (As pointed out
0.01
in Section III A, this is not generally the case for real-
world directed graphs, where in- and out-degree may be
correlated.) In Fig. 6 we show results from simulations
0.001 of graphs with identically distributed (but independent)
in- and out-degrees drawn from the exponential distri-
bution, Eq. (13). For this distribution, solution of the
critical-point equation G′1 (1) = 1 shows that the giant
0.0001
1 10 component first appears at κc = [log 2]−1 = 1.4427. The
three curves in the figure show the distribution of num-
number of accessible sites s bers of vertices accessible from each vertex in the graph
for κ = 0.5, 0.8, and κc . The critical distribution fol-
FIG. 6. The distribution Ps of the numbers of vertices ac- lows a power-law form (see Section II C), while the others
cessible from each vertex of a directed graph with identically show an exponential cutoff. We also show the exact dis-
exponentially distributed in- and out-degree. The points are tribution derived from the coefficients in the expansion
simulation results for systems of N = 1 000 000 vertices and of H1 (x) about zero. Once again, theory and simulation
the solid lines are the analytic solution. are in good agreement. A fit to the distribution for the
case κ = κc gives a value of α = 1.50 ± 0.02, in good
nent. The solution of Eqs. (34) and (35) with G0 (x) and agreement with Eq. (48).
G1 (x) defined according to Eq. (62) gives the number of
vertices, as a fraction of N , in the giant strongly con-
IV. BIPARTITE GRAPHS
nected component plus those vertices from which the gi-
ant strongly connected component can be reached. Using
F0 (x) and F1 (x) (Eq. (61)) in place of G0 (x) and G1 (x) The collaboration graphs of scientists, company direc-
gives a different solution, which represents the fraction tors, and movie actors discussed in Section I are all ex-
of the graph in the giant strongly connected component amples of bipartite graphs. In this section we study the
plus those vertices which can be reached from it. theory of bipartite graphs with arbitrary degree distribu-
tions. To be concrete, we will speak in the language of
“actors” and “movies,” but clearly all the developments
B. Simulation results here are applicable to academic collaborations, boards of
directors, or any other bipartite graph structure.
We have performed simulations of directed graphs as
a check on the results above. Generation of random di-
A. Generating functions and basic results
rected graphs with known joint degree distribution pjk
is somewhat more complicated than generation of undi-
rected graphs discussed in Section II G. The method we Consider then a bipartite graph of M movies and N
use is as follows. First, it is important to ensure that actors, in which each actor has appeared in an average
the averages of the distributions of in- and out-degree of of µ movies and each movie has a cast of average size
the graph are the same, or equivalently that pjk satisfies ν actors. Note that only three of these parameters are
Eq. (59). If this is not the case, at least to good approx- independent, since the fourth is given by the equality
imation, then generation of the graph will be impossible. µ ν
Next, we generate a set of N in/out-degree pairs (ji , ki ), = . (66)
M N
one for each vertex i, according
P to thePjoint distribution
pjk , and calculate the sums i ji and i ki . These sums Let pj be the probability distribution of the degree of
are required to be equal if there are to be no dangling actors (i.e., of the number of movies in which they have
edges in the graph, but in most cases we find that they appeared) and qk be the distribution of degree (i.e., cast
are not. To rectify this we use a simple procedure. We size) of movies. We define two generating functions which
choose a vertex i at random, discard the numbers (ji , ki ) generate these probability distributions thus:
for that vertex and generate new ones from the distribu- X X
f0 (x) = pj xj , g0 (x) = qk xk . (67)
tion pjk . We repeat this procedure until the two sums
j k

11
(It may be helpful to think of f as standing for “film,” Substituting from Eq. (67), we then derive the explicit
in order to keep these two straight.) As before, we nec- condition for the first appearance of the giant component:
essarily have X
jk(jk − j − k)pj qk = 0. (76)
f0 (1) = g0 (1) = 1, f0′ (1) = µ, g0′ (1) = ν. (68) jk

If we now choose a random edge on our bipartite graph The size S of the giant component, as a fraction of the to-
and follow it both ways to reach the movie and actor tal number N of actors, is given as before by the solution
which it connects, then the distribution of the number of Eqs. (34) and (35).
of other edges leaving those two vertices is generated by Of course, all of these results work equally well if “ac-
the equivalent of (9): tors” and “movies” are interchanged. One can calculate
the average distance between movies in terms of common
1 ′ 1 ′ actors shared, the size and distribution of connected com-
f1 (x) = f (x), g1 (x) = g (x). (69)
µ 0 ν 0 ponents of movies, and so forth, using the formulas given
above, with only the exchange of f0 and f1 for g0 and
Now we can write the generating function for the distri- g1 . The formula (75) is, not surprisingly, invariant under
bution of the number of co-stars (i.e., actors in shared this interchange, so that the position of the onset of the
movies) of a randomly chosen actor as giant component is the same regardless of whether one is
looking at actors or movies.
G0 (x) = f0 (g1 (x)). (70)

If we choose a random edge, then the distribution of num- B. Clustering


ber of co-stars of the actor to which it leads is generated
by
Watts and Strogatz [15] have introduced the concept
G1 (x) = f1 (g1 (x)). (71) of clustering in social networks, also sometimes called
network transitivity. Clustering refers to the increased
These two functions play the same role in the one-mode propensity of pairs of people to be acquainted with one
network of actors as the functions of the same name did another if they have another acquaintance in common.
for the unipartite random graphs of Section II. Once Watts and Strogatz defined a clustering coefficient which
we have calculated them, all the results from Section II measures the degree of clustering on a graph. For our
follow exactly as before. purposes, the definition of this coefficient is
The numbers of first and second neighbors of a ran- 3× number of triangles on the graph 3N△
domly chosen actor are C= = .
number of connected triples of vertices N3
z1 = G′0 (1) = f0′ (1)g1′ (1), (72) (77)
z2 = G′0 (1)G′1 (1) = f0′ (1)f1′ (1)[g1′ (1)]2 . (73)
Here “triangles” are trios of vertices each of which is con-
Explicit expressions for these quantities can be obtained nected to both of the others, and “connected triples” are
by substituting from Eqs. (67) and (69). The average trios in which at least one is connected to both the oth-
vertex–vertex distance on the one-mode graph is given ers. The factor of 3 in the numerator accounts for the fact
as before by Eq. (54). Thus, it is possible to estimate that each triangle contributes to three connected triples
average distances on such graphs by measuring only the of vertices, one for each of its three vertices. With this
numbers of first and second neighbors. factor of 3, the value of C lies strictly in the range from
The distribution of the sizes of the connected compo- zero to one. In the directed and undirected unipartite
nents in the one-mode network is generated by Eq. (27), random graphs of Sections II and III, C is trivially zero
where H1 (x) is a solution of Eq. (26). The asymptotic in the limit N → ∞. In the one-mode projections of bi-
results of Section II E generalize simply to the bipartite partite graphs, however, both the actors and the movies
case, and the average size of a connected component in can be expected to have non-zero clustering. We here
the absence of a giant component is treat the case for actors. The case for movies is easily
derived by swapping f s and gs.
G′0 (1) An actor who has z ≡ z1 co-stars in total contributes
hsi = 1 + , (74) 1
z(z − 1) connected triples to N3 , so that
1 − G′1 (1) 2
X
as before. This diverges when G′1 (1) = 1, marking the N3 = 21 N z(z − 1)rz , (78)
first appearance of the giant component. Equivalently, z
the giant component first appears when where rz is the probability of having z co-stars. As
shown above (Eq. (70)), the distribution rz is generated
f0′′ (1)g0′′ (1) = f0′ (1)g0′ (1). (75)
by G0 (x) and so

12
N3 = 21 N G′′0 (1). (79) And from Eq. (81), the clustering coefficient for the one-
mode network of actors is
A movie which stars k actors contributes 16 k(k −1)(k −
M ν3 1
2) triangles to the total triangle count in the one-mode C= = , (88)
2 2
N ν (µ + µ) µ+1
graph. Thus the total number of triangles on the graph
is the sum of 16 k(k − 1)(k − 2) over all movies, which is
where we have made use of Eq. (66).
given by
Another quantity of interest is the distribution of num-
X bers of co-stars, i.e., of the numbers of people with whom
N△ = 16 M k(k − 1)(k − 2)qk = 16 M g0′′′ (1). (80) each actor has appeared in a movie. As discussed above,
k
this distribution is generated by the function G0 (x) de-
Substituting into Eq. (77), we then get fined in Eq. (70). For the case of the Poisson degree dis-
tribution, we can perform the derivatives, Eq. (4), and
M g0′′′ (1) setting x = 0 we find that the probability rz of having
C= . (81) appeared with a total of exactly z co-stars is
N G′′0 (1)
z
Making use of Eqs. (66), (67), and (70), this can also be ν z µ(e−ν −1) Xn z o −ν k
rz = e µe , (89)
written as z! k
k=1

1 (µ2 − µ1 )(ν2 − ν1 )2 where the coefficients


z
are the Stirling numbers of
−1= , (82) k
C µ1 ν1 (2ν1 − 3ν2 + ν3 ) the second kind [46]
where µn = k k n pk is the nth moment of the distribu-
P
k
nz o X (−1)k−r z
tion of numbers of movies in which actors have appeared, = r . (90)
k r!(k − r)!
and νn is the same for cast size (number of actors in a r=1
movie).

D. Simulation results
C. Example

Random bipartite graphs can be generated using an


To give an example, consider a random bipartite graph
algorithm similar to the one described in Section III B
with Poisson-distributed numbers of both movies per ac-
for directed graphs. After making sure that the required
tor and actors per movie. In this case, following the
degree distributions for both actor and movie vertices
derivation of Eq. (12), we find that
have means consistent with the required total numbers
of actors and movies according to Eq. (66), we generate
f0 (x) = eµ(x−1) , g0 (x) = eν(x−1) , (83)
vertex degrees for each actor and movie at random and
and f1 (x) = f0 (x) and g1 (x) = g0 (x). Thus calculate their sum. If these sums are unequal, we dis-
card the degree of one actor and one movie, chosen at
ν(x−1) random, and replace them with new degrees drawn from
G0 (x) = G1 (x) = eµ(e −1)
. (84)
the relevant distributions. We repeat this process until
This implies that z1 = µν and z2 = (µν)2 , so that the total actor and movie degrees are equal. Then we
join vertices up in pairs.
log N log N In Fig. 7 we show the results of such a simulation for a
ℓ= = , (85) bipartite random graph with Poisson degree distribution.
log µν log z
(In fact, for the particular case of the Poisson distribu-
just as in an ordinary Poisson-distributed random graph. tion, the graph can be generated simply by joining up
From Eq. (74), the average size hsi of a connected com- actors and movies at random, without regard for indi-
ponent of actors, below the phase transition, is vidual vertex degrees.) The figure shows the distribution
of the number of co-stars of each actor, along with the
1 analytic solution, Eqs. (89) and (90). Once more, numer-
hsi = , (86)
1 − µν ical and analytic results are in good agreement.
which diverges, yielding a giant component, at µν = z =
1, also as in the ordinary random graph. From Eqs. (34) V. APPLICATIONS TO REAL-WORLD
and (35), the size S of the giant component as a fraction NETWORKS
of N is a solution of
−νS In this section we construct random graph models of
S = 1 − eµ(e −1)
. (87)
two types of real-world networks, namely collaboration

13
0.04 10000

100
0.03 1000

frequency

frequency
frequency rz

0.02 100
10

10
0.01

1 1
0.00 0 2 4 6 8 0 10 20 30
0 20 40 60 80 100
number of costars z number of boards number of members

FIG. 7. The frequency distribution of numbers of co-stars FIG. 8. Frequency distributions for the boards of directors
of an actor in a bipartite graph with µ = 1.5 and ν = 15. The of the Fortune 1000. Left panel: the numbers of boards on
points are simulation results for M = 10 000 and N = 100 000. which each director sits. Right panel: the numbers of direc-
The line is the exact solution, Eqs. (89) and (90). The error tors on each board.
bars on the numerical results are smaller than the points.
indicating that most boards have about 10 members.
graphs and the world-wide web, using the results of Sec- Using these distributions, we can define generating
tions III and IV to incorporate realistic degree distribu- functions f0 (x) and g0 (x) as in Eq. (23), and hence
tions into the models. As we will show, the results are in find the generating functions G0 (x) and G1 (x) for the
reasonably good agreement with empirical data, although distributions of numbers of co-workers of the directors.
there are some interesting discrepancies also, perhaps in- We have used these generating functions and Eqs. (72)
dicating the presence of social phenomena that are not and (81) to calculate the expected clustering coefficient
incorporated in the random graph. C and the average number of co-workers z in the one-
mode projection of board directors on a random bipar-
tite graph with the same vertex degree distributions as
A. Collaboration networks the original dataset. In Table I we show the results of
these calculations, along with the same quantities for the
real Fortune 1000. As the table shows the two are in
In this section we construct random bipartite graph
remarkable—almost perfect—agreement.
models of the known collaboration networks of company
It is not just the average value of z that we can cal-
directors [29–31], movie actors [15], and scientists [36].
culate from our generating functions, but the entire dis-
As we will see, the random graph works well as a model
tribution: since the generating functions are finite poly-
of these networks, giving good order-of-magnitude esti-
nomials in this case, we can simply perform the deriva-
mates of all quantities investigated, and in some cases
tives to get the probability distribution rz . In Fig. 9, we
giving results of startling accuracy.
show the results of this calculation for the Fortune 1000
Our first example is the collaboration network of the
graph. The points in the figure show the actual distribu-
members of the boards of directors of the Fortune 1000
tion of z for the real-world data, while the solid line shows
companies (the one thousand US companies with the
highest revenues). The data come from the 1999 For-
tune 1000 [29–31] and in fact include only 914 of the
1000, since data on the boards of the remaining 86 were
clustering C average degree z
not available. The data form a bipartite graph in which
network theory actual theory actual
one type of vertex represents the boards of directors, and
company directors 0.590 0.588 14.53 14.44
the other type the members of those boards, with edges movie actors 0.084 0.199 125.6 113.4
connecting boards to their members. In Fig. 8 we show physics (arxiv.org) 0.192 0.452 16.74 9.27
the frequency distribution of the numbers of boards on biomedicine (MEDLINE) 0.042 0.088 18.02 16.93
which each member sits, and the numbers of members of
each board. As we see, the former distribution is close to
exponential, with the majority of directors sitting on only TABLE I. Summary of results of the analysis of four col-
one board, while the latter is strongly peaked around 10, laboration networks.

14
0.12

boards codirectors sit on


0.02 3
movies
0.10
0.01 0.10
2

frequency
0 0.08
frequency rz

probability
collaborations
0.1 in physics 0
0.06 0 5 10
0.05
boards you sit on
0
0 10 20 30 40 50 0.04
collaborators
0.02

0.00
0 10 20 30 40 50 0.00
0 10 20 30
number of codirectors z
number of interlocks

FIG. 9. The probability distribution of numbers of


FIG. 10. The distribution of the number of other boards
co-directors in the Fortune 1000 graph. The points are the
with which each board of directors is “interlocked” in the
real-world data, the solid line is the bipartite graph model,
Fortune 1000 data. An interlock between two boards means
and the dashed line is the Poisson distribution with the same
that they share one or more common members. The points are
mean. Insets: the equivalent distributions for the numbers of
the empirical data, the solid line is the theoretical prediction.
collaborators of movie actors and physicists.
Inset: the number of boards on which one’s codirectors sit, as
a function of the number of boards one sits on oneself.
the theoretical results. Again the agreement is excellent.
The dashed line in the figure shows the distribution for sit. The results are shown in the inset of Fig. 10. If these
an ordinary Poisson random graph with the same mean. two quantities were uncorrelated, the plot would be flat.
Clearly this is a significantly inferior fit. Instead, however, it slopes clearly upwards, indicating in-
In fact, within the business world, attention has fo- deed that on the average the big-shots work with other
cussed not on the collaboration patterns of company di- big-shots. (This idea is not new. It has been discussed
rectors, but on the “interlocks” between boards, i.e., on previously by a number of others—see Refs. [47] and [48],
the one-mode network in which vertices represent boards for example.)
of directors and two boards are connected if they have one The example of the boards of directors is a particu-
or more directors in common [28,29]. This is also simple larly instructive one. What it illustrates is that the cases
to study with our model. In Fig. 10 we show the distri- in which our random graph models agree well with real-
bution of the numbers of interlocks that each board has, world phenomena are not necessarily the most interest-
along with the theoretical prediction from our model. As ing. Certainly it is satisfying, as in Fig. 9, to have the
we see, the agreement between empirical data and theory theory agree well with the data. But probably Fig. (10)
is significantly worse in this case than for the distribution is more instructive: we have learned something about
of co-directors. In particular, it appears that our theory the structure of the network of the boards of directors by
significantly underestimates the number of boards which observing the way in which the pattern of board inter-
are interlocked with very small or very large numbers of locks differs from the predictions of the purely random
other boards, while over estimating those with interme- network. Thus it is perhaps best to regard our random
diate numbers of interlocks. One possible explanation of graph as a null model—a baseline from which our expec-
this is that “big-shots work with other big-shots.” That tations about network structure should be measured. It
is, the people who sit on many boards tend to sit on those is deviation from the random graph behavior, not agree-
boards with other people who sit on many boards. And ment with it, that allows us to draw conclusions about
conversely the people who sit on only one board (which real-world networks.
is the majority of all directors), tend to do so with others We now look at three other graphs for which our theory
who sit on only one board. This would tend to stretch also works well, although again there are some noticeable
the distribution of numbers of interlocks, just as seen deviations from the random graph predictions, indicating
in figure, producing a disproportionately high number of the presence of social or other phenomena at work in the
boards with very many or very few interlocks to others. network.
To test this hypothesis, we have calculated, as a function We consider the graph of movie actors and the movies
of the number of boards on which a director sits, the av- in which they appear [15,49] and graphs of scientists
erage number of boards on which each of their codirectors and the papers they write in physics and biomedical re-

15
search [36]. In Table I we show results for the cluster- 8
ing coefficients and average coordination numbers of the 10
one-mode projections of these graphs onto the actors or
scientists. As the table shows, our theory gives results for
6
these figures which are of the right general order of mag- 10

number of pages
nitude, but typically deviate from the empirically mea-
sured figures by a factor of two or so. In the insets of
Fig. 9 we show the distributions of numbers of collabo- 4
10
rators in the movie actor and physics graphs, and again
the match between theory and real data is good, but not
as good as with the Fortune 1000. 2
The figures for clustering and mean numbers of col- 10
laborators are particularly revealing. The former is uni-
formly about twice as high in real life as our model pre-
0
dicts for the actor and scientist networks. This shows 10
that there is a significant tendency to clustering in these 1 10 100 1000 1 10 100 1000
networks, in addition to the trivial clustering one expects in−degree out−degree
on account of the bipartite structure. This may indicate,
for example, that scientists tend to introduce pairs of FIG. 11. The probability distribution of in-degree (left
their collaborators to one another, thereby encouraging panel) and out-degree (right panel) on the world-wide web,
clusters of collaboration. The figures for average numbers rebinned from the data of Broder et al. [26]. The solid lines
of collaborators show less deviation from theory than the are best fits of form (91).
clustering coefficients, but nonetheless there is a clear
tendency for the numbers of collaborators to be smaller
in the real-world data than in the models. This probably data plotted on log scales.) We find both distributions
indicates that scientists and actors collaborate repeat- to be well-fitted by the form
edly with the same people, thereby reducing their total pk = C(k + k0 )−τ , (91)
number of collaborators below the number that would
naively be expected if we consider only the numbers of where the constant C is fixed by the requirement of nor-
papers that they write or movies they appear in. It would malization, taking the value 1/ζ(τ, k0 ), were ζ(x, y) is the
certainly be possible to take effects such as these into generalized ζ-function [46]. The constants k0 and τ are
account in a more sophisticated model of collaboration found by least-squares fits, giving values of 0.58 and 3.94
practices. for k0 , and 2.17 and 2.69 for τ , for the in- and out-degree
distributions respectively, in reasonable agreement with
the fits performed by Broder et al. With these choices,
B. The world-wide web the data and Eq. (91) match closely (see Fig. 11 again).
Neither the raw data nor our fits to them satisfy the
In this section we consider the application of our theory constraint (59), that the total number of links leaving
of random directed graphs to the modeling of the world- pages should equal the total number arriving at them.
wide web. As we pointed out in Section III A, it is not at This is because the data set is not a complete picture of
present possible to make a very accurate random-graph the web. Only about 200 million of the web’s one billion
model of the web, because to do so we need to know the or so pages were included in the study. Within this sub-
joint distribution pjk of in- and out-degrees of vertices, set, our estimate of the distribution of out-degree is pre-
which has not to our knowledge been measured. How- sumably quite accurate, but many of the outgoing links
ever, we can make a simple model of the web by assum- will not connect to other pages within the subset studied.
ing in- and out-degree to be independently distributed At the same time, no incoming links which originate out-
according to their known distributions. Equivalently, we side the subset of pages studied are included, because the
assume that the joint probability distribution factors ac- data are derived from “crawls” in which web pages are
cording to pjk = pj qk . found by following links from one to another. In such a
Broder et al. [26] give results showing that the in- crawl one only finds links by finding the pages that they
and out-degree distributions of the web are approxi- originate from. Thus our data for the incoming links is
mately power-law in form with exponents τin = 2.1 and quite incomplete, and we would expect the total number
τout = 2.7, although there is some deviation from the of incoming links in the dataset to fall short of the num-
perfect power law for small degree. In Fig. 11, we show ber of outgoing ones. This indeed is what we see. The
histograms of their data with bins chosen to be of uniform totals for incoming and outgoing links are approximately
width on the logarithmic scales used. (This avoids certain 2.3 × 108 and 1.1 × 109 .
systematic errors known to afflict linearly histogrammed The incompleteness of the data for incoming links lim-
its the information we can at present extract from a ran-

16
dom graph model of the web. There are however some inherently bipartite, and the World-Wide web, which is
calculations which only depend on the out-degree distri- directed. We have shown that the random graph theory
bution. gives good order-of-magnitude estimates of the properties
Given Eq. (91), the generating functions for the out- of known collaboration graphs of business-people, scien-
degree distribution take the form tists and movie actors, although there are measurable
differences between theory and data which point to the
Φ(x, τ, k0 ) presence of interesting sociological effects in these net-
G0 (x) = G1 (x) = , (92)
ζ(τ, k0 ) works. For the web we are limited in what calculations
we can perform because of the lack of appropriate data to
where Φ(x, y, z) is the Lerch Φ-function [46]. The cor-
determine the generating functions. However, the calcu-
responding generating functions F0 and F1 we cannot
lations we can perform agree well with empirical results,
calculate accurately because of the incompleteness of the
offering some hope that the theory will prove useful once
data. The equality G0 = G1 (and also F0 = F1 ) is a gen-
more complete data become available.
eral property of all directed graphs for which pjk = pj qk
as above. It arises because in such graphs in- and out-
degree are uncorrelated, and therefore the distribution of
ACKNOWLEDGEMENTS
the out-degree of a vertex does not depend on whether
you arrived at it by choosing a vertex at random, or by
following a randomly chosen edge. The authors would like to thank Lada Adamic, Andrei
One property of the web which we can estimate from Broder, Jon Kleinberg, and Cris Moore for useful com-
the generating functions for out-degree alone is the frac- ments and suggestions, and Jerry Davis, Paul Ginsparg,
tion Sin of the graph taken up by the giant strongly con- Oleg Khovayko, David Lipman, Grigoriy Starchenko, and
nected component plus those sites from which the giant Janet Wiener for supplying data used in this study. This
strongly connected component can be reached. This is work was funded in part by the National Science Foun-
given by dation, the Army Research Office, the Electric Power Re-
search Institute, and Intel Corporation.
Sin = 1 − G0 (1 − Sin ). (93)
In other words, 1 − Sin is a fixed point of G0 (x). Using
the measured values of k0 and τ , we find by numeri-
cal iteration that that Sin = 0.527, or about 53%. The
direct measurements of the web made by Broder et al.
show that in fact about 49% of the web falls in Sin , in [1] B. Bollobás, Random Graphs, Academic Press, New York
reasonable agreement with our calculation. Possibly this (1985).
implies that the structure of the web is close to that of [2] P. Erdős and A. Rényi, “On random graphs,” Publica-
a directed random graph with a power-law degree distri- tiones Mathematicae 6, 290–297 (1959).
bution, though it is possible also that it is merely coinci- [3] P. Erdős and A. Rényi, “On the evolution of random
dence. Other comparisons between random graph models graphs,” Publications of the Mathematical Institute of the
and the web will have to wait until we have more accurate Hungarian Academy of Sciences 5, 17–61 (1960).
data on the joint distribution pjk of in- and out-degree. [4] P. Erdős and A. Rényi, “On the strength of connect-
edness of a random graph,” Acta Mathematica Scientia
Hungary 12, 261–267 (1961).
VI. CONCLUSIONS [5] L. Sattenspiel and C. P. Simon, “The spread and per-
sistence of infectious diseases in structured populations,”
Mathematical Biosciences 90, 367–383 (1988).
In this paper we have studied in detail the theory of
[6] R. M. Anderson and R. M. May, “Susceptible–infectious–
random graphs with arbitrary distributions of vertex de- recovered epidemic models with dynamic partnerships,”
gree, including directed and bipartite graphs. We have Journal of Mathematical Biology 33, 661–675 (1995).
shown how, using the mathematics of generating func- [7] M. Kretschmar and M. Morris, “Measures of concurrency
tions, one can calculate exactly many of the statistical in networks and the spread of infectious disease,” Math-
properties of such graphs in the limit of large numbers ematical Biosciences 133, 165–195 (1996).
of vertices. Among other things, we have given explicit [8] D. D. Heckathorn, “Respondent-driven sampling: A new
formulas for the position of the phase transition at which approach to the study of hidden populations,” Social
a giant component forms, the size of the giant compo- Problems 44, 174–199 (1997).
nent, the average and distribution of the sizes of the [9] C. C. Foster, A. Rapoport, and C. J. Orwant, “A study
other components, the average numbers of vertices a cer- of a large sociogram: Elimination of free parameters,”
tain distance from a given vertex, the clustering coeffi- Behavioural Science 8, 56–65 (1963).
cient, and the typical vertex–vertex distance on a graph. [10] T. J. Fararo and M. Sunshine, A study of a biased friend-
We have given examples of the application of our the- ship network, Syracuse University Press, Syracuse, NY
ory to the modeling of collaboration graphs, which are (1964).

17
[11] H. R. Bernard, P. D. Kilworth, M. J. Evans, C. Mc- [32] B. Kogut and G. Walker, “The small world of firm own-
Carty, and G. A. Selley, “Studying social relations cross- ership in Germany: Social capital and structural holes in
culturally,” Ethnology 2, 155–179 (1988). large firm acquisitions–1993-1997,” Working paper, Regi-
[12] J. Abello, A. Buchsbaum, and J. Westbrook, “A func- nald H. Jones Center, Wharton School (1999).
tional approach to external graph algorithms,” in Pro- [33] J. W. Grossman and P. D. F. Ion, “On a portion of the
ceedings of the 6th European Symposium on Algorithms well-known collaboration graph,” Congressus Numeran-
(2000). tium 108, 129–131 (1995).
[13] W. Aiello, F. Chung, and L. Lu, “A random graph model [34] R. De Castro and J. W. Grossman, “Famous trails
for massive graphs,” in Proceedings of the 32nd Annual to Paul Erdős,” Mathematical Intelligencer 21, 51–63
ACM Symposium on Theory of Computing (2000). (1999).
[14] L. A. N. Amaral, A. Scala, M. Barthélémy, and H. E. [35] V. Batagelj and A. Mrvar, “Some analyses of Erdős col-
Stanley, “Classes of small-world networks,” Proc. Natl. laboration graph,” Social Networks 22, 173–186 (2000).
Acad. Sci. USA 97, 11149–11152 (2000). [36] M. E. J. Newman, “The structure of scientific collabora-
[15] D. J. Watts and S. H. Strogatz, “Collective dynamics of tion networks,” Proc. Natl. Acad. Sci. USA 98, 409–415
‘small-world’ networks”, Nature 393, 440–442 (1998). (2001).
[16] S. Jespersen, I. M. Sokolov, and A. Blumen, “Small-world [37] M. E. J. Newman, “A study of scientific collaboration
Rouse networks as models of cross-linked polymers,” J. networks: I. Network construction and fundamental re-
Chem. Phys. 113, 7652–7655 (2000). sults,” Phys. Rev. E, in press; “A study of scientific col-
[17] A. Scala, L. A. N. Amaral and M. Barthélémy, “Small- laboration networks: II. Shortest paths, weighted net-
world networks and the conformation space of a lattice works, and centrality,” Phys. Rev. E, in press.
polymer chain,” cond-mat/0004380. [38] S. Wasserman and K. Faust, Social Network Analysis,
[18] D. Fell and A. Wagner, “The small world of metabolism,” Cambridge University Press, Cambridge (1994).
Nature Biotechnology 18, 1121–1122 (2000). [39] M. Molloy and B. Reed, “A critical point for random
[19] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.- graphs with a given degree sequence,” Random Structures
L. Barabási, “The large-scale organization of metabolic and Algorithms 6, 161–179 (1995).
networks,” Nature 407, 651–654 (2000). [40] M. Molloy and B. Reed, “The size of the giant compo-
[20] R. J. Williams and N. D. Martinez, “Simple rules yield nent of a random graph with a given degree sequence,”
complex food webs,” Nature 404, 180–183 (2000). Combinatorics, Probability and Computing 7, 295–305
[21] J. M. Montoya and R. V. Solé, “Small world patterns in (1998).
food webs,” cond-mat/0011195. [41] H. S. Wilf, Generatingfunctionology, 2nd Edition, Aca-
[22] A.-L. Barabási and R. Albert, “Emergence of scaling in demic Press, London (1994).
random networks,” Science 286, 509–512 (1999). [42] C. Moore and M. E. J. Newman, “Exact solution of site
[23] R. Albert, H. Jeong, and A.-L. Barabási, “Diameter of and bond percolation on small-world networks,” Phys.
the world-wide web,” Nature 401, 130–131 (1999). Rev. E 62, 7059–7064 (2000).
[24] B. A. Huberman and L. A. Adamic, “Growth dynamics [43] G. H. Hardy and J. E. Littlewood, “Tauberian theorems
of the world-wide web,” Nature 401, 131 (1999). concerning power series and Dirichlet’s series whose coef-
[25] J. M. Kleinberg, S. R. Kumar, P. Raghavan, S. Ra- ficients are positive,” Proc. London Math. Soc. 13, 174–
jagopalan, and A. Tomkins, “The web as a graph: Mea- 191 (1914).
surements, models, and methods,” in Lecture Notes in [44] M. E. J. Newman and G. T. Barkema, Monte Carlo
Computer Science, No. 1627, T. Asano, H. Imai, D. T. Methods in Statistical Physics, Oxford University Press,
Lee, S.-I. Nakano, and T. Tokuyama (eds.), Springer- Oxford (1999).
Verlag, Berlin (1999). [45] Note that one must use log(1 − r) in this expression, and
[26] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Ra- not log r, even though one might expect the two to give
jagopalan, R. Stata, A. Tomkins, and J. Wiener, “Graph the same result. The reason is that r can be zero where
structure in the web,” Computer Networks 33, 309–320 1 − r cannot. With the standard 32-bit random number
(2000). generators used on most computers, r will be zero about
[27] M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On once in every 4 billion calls, and when it is, calculating
power-law relationships of the internet topology,” Comp. log r will give an error but log(1 − r) will not. For simu-
Comm. Rev. 29, 251–262 (1999). lations on large graphs of a few million vertices or more
[28] P. Mariolis, “Interlocking directorates and control of cor- this will happen with some frequency, and calculation of
porations,” Social Science Quarterly 56, 425–439 (1975). log r should therefore be avoided.
[29] G. F. Davis, “The significance of board interlocks for [46] M. Abramowitz and I. Stegun, Handbook of Mathematical
corporate governance,” Corporate Governance 4, 154– Functions, Dover, New York (1965).
159 (1996). [47] B. Mintz and M. Schwartz, The Power Structure of
[30] G. F. Davis and H. R. Greve, “Corporate elite networks American Business, University of Chicago Press, Chicago
and governance changes in the 1980s,” American Journal (1985).
of Sociology 103, 1–37 (1997). [48] G. F. Davis and M. S. Mizruchi, “The money center can-
[31] G. F. Davis, M. Yoo, and W. E. Baker, “The small world not hold: Commercial banks in the U.S. system of corpo-
of the corporate elite,” preprint, University of Michigan rate governance,” Administrative Science Quarterly 44,
Business School (2001). 215–239 (1999).

18
[49] The figures given in our table differ from those given by
Watts and Strogatz in Ref. [15] because we use a more re-
cent version of the actor database. Our version dates from
May 1, 2000 and contains about 450 000 actors, whereas
the 1998 version contained only about 225 000.

19

You might also like