Random Graph Theory
Random Graph Theory
I RANDOM WALKS
1.1 The plot of the mean-squared displacement < R 2 > versus Time, for an
one dimensional system.
1.2 The plot of the mean value of the displacement < R > versus Time, for
an one dimensional system.
1.3 The plot of the distribution of R for N= 500 steps and N=1000 steps.
1.4 Trapping on random walks in two dimensional lattice.
1.5 The distribution of trapping times.
1.6 The survival probability.
1.7 Simulations for calculating S N .
1.8 The plot of S N versus Time for 1-D.
1.9 The plot of S N versus Time for 2-D.
1.10 The plot of S N versus Time for 3-D.
1.11 The plot of the distribution of S N for 1-D.
1.12 The plot of the distribution of S N for 2-D.
1.13 The plot of the distribution of S N for 3-D.
1.14 The plot of the relation of < S N > versus Time for 1-D.
1.15 The plot of the relation of < S N > versus Time for 2-D.
1.16 The plot of the relation of < S N > versus Time for 3-D.
II NETWORKS
2.1 What are networks?
2.1.1 Basic notions
2.1.2 Adjacency matrix
2.1.3 The adjacency matrix for a random network
2.1.4 Degree distribution
2.2 Random networks, the Erdős- Rényi model
2.2.1 The plot of the distribution of random networks
2.3 Scale-free networks
2.3.1 Characteristics of scale-free networks for g = 2.0, 2.5,3.0
2.3.2 Plot of the distribution of scale-free networks for g = 2.0
2.3.3 Plot of the distribution of scale-free networks for g = 2.5
2.3.4 Plot of the distribution of scale-free networks for g = 3.0
2.3.5 The algorithm of constructing the scale-free network
2.3.6 The Breadth-First search for finding the greatest cluster for the scale-
free networks
2.4 The Barabasi-Albert model
2.4.1 The plot of the distribution of B-A model
2.5 Real networks
1
2.5.1 Internet robustness
2.5.2 The plot of distribution of the Internet
2.6 Resilience of scale-free networks and internet as a special case, to
random breakdowns
2.6.1 The theory of resilience to random failures
2.6.2 The plot of k versus P of scale-free networks under random attack
2.6.3 The plot of k versus P of scale-free networks and internet under
random attack
2.6.4 The plot of k versus Pc of scale-free networks under random attack
2.7 Breakdown of scale-free networks and internet as a special case, under
intentional attack
2.7.1 The plot of k versus P of scale-free networks under intentional attack
2.7.2 The plot of k versus Pc of scale-free networks under intentional
attack
2.7.3 The plot of k versus Pc of scale-free networks under random attack
and intentional attack
III REFERENCES
2
Random Walks
A one-dimensional random walk can also be looked at as a Markov chain whose space
is given by the integers i = 0, �1, �2,L , for some probability p , 0 < p < 1 , Pi ,i +1 = p ,
Pi.i -1 = 1 - p . We can call it a random walk because we may think of it as being a
model for an individual walking on a straight line who at each point of time either
takes one step to the right with probability p or one step to the left with probability
1- p .
A random walk is a discrete domain (e.g. time) stochastic process. It can be viewed as
a domain summation of a Bernoulli process. The simplest random walk is a path
constructed according to the following rules:
Suppose we draw a line some distance from the origin of the walk. How many times
will the random walk cross the line? The following, perhaps surprising, theorem is the
answer: for any random walk in one dimension, every point in the domain will be
crossed an infinite number of times almost surely. This problem has many names: the
level-crossing problem, the recurrence problem or the gambler’s ruin problem. The
source of the last name is as follows: if you are a gambler with a finite amount of
money playing a fair against a bank with an infinite amount of money, you will surely
lose. The amount of money you have will perform a random walk but it will, almost
surely, reach at some time 0, and the game would be over.
Higher dimensions
Imagine now a drunkard walking around in the city. The city is infinite and arranged
in a square grid, and at every corner he chooses one of the four possible routes
(including the one he has come from) with equal probability. Formally, this is a
random walk on the set of all points in the plane with integer coordinates. Will the
drunkard ever get back to his home from the bar? It turns out that he will (almost
surely). This is the high dimensional equivalent of the level crossing problem
3
discussed above. However, the similarity stops here. In three dimensions and above,
this no longer holds. In other words, a drunk bird might forever wander around, never
finding its nest. The formal terms to describe this phenomenon is that random walks
in dimensions 1 and 2 is recurrent while in dimension 3 and above it is transient. This
was proved by Pólya in 1921.
1.1 The plot of the mean-squared displacement ( < R 2 > ) versus Time
One of the most basic quantities in the random walk theory is the mean-squared
displacement < R 2 (n) > of a particle diffusing in a given space, which is a measure of
the distance R covered by a random walker after performing n steps. In most cases,
this quantity is described by an expression of the form < R 2 (n) >: n a . The value of the
parameter a classifies the type of diffusion into normal linear diffusion (a = 1) ,
subdiffusion (a < 1) , or superlinear diffusion (a > 1) .
The description of the algorithm for constructing the plot of < R 2 > versus Time:
4
The plot of <R^2> vs Time, for a one dimentional system
1000
600
<R^2>
400
200
B
Linear Fit of Data1_B
0
0 200 400 600 800 1000
Time
1.2 The plot of the mean value of the displacement ( < R > ) versus Time
0.0
-0.5
<R>
-1.0
-1.5
0 100 200 300 400 500 600 700 800 900 1000
Time
5
1.3 The plot of the distribution of R for 500 steps and 1000 steps
Distribution of R
4000
for 1000 steps B
for 500 steps C
3500
3000
2500
2000
P(R)
1500
1000
500
-500
-150 -100 -50 0 50 100 150
R
For the random walk in two dimensions, imagine a drunkard walking around in the
city. The city is infinite and arranged in a square grid, and at every corner he chooses
one of the four possible routes with equal probability. Formally, this is a random walk
on the set of all points in the plane with integer coordinates.
We will perform a random walk in a two dimensional lattice, where we have placed at
random positions a number of trap molecules with concentration c. We place one
particle at a random position on the lattice and let it perform a random walk. The walk
will stop when the particle falls in a trap. The time needed for his to happen is the
6
trapping time. When the particle reaches the borders of the lattice it does not escape,
but it is placed in the opposite site of the lattice. We will perform a large number of
walks (runs) and we will save the trapping times and make the distribution of these
times.
A
0.08
0.07
0.06
P(Trapping-ime)
0.05
0.04
0.03
0.02
0.01
0.00
-0.01
0 20 40 60 80 100
Trapping_time
The main property monitored during the process introduced in the previous section is
the survival probability, which denotes the probability that a particle A survives after
performing n steps in a space which includes traps B with a concentration c.
7
In the following lines we will describe the way of calculating the survival probability
of random walks that move in the presence of randomly distributed traps. We will
calculate the survival probability from the distribution of the trapping times.
If the probability that a particle will get trapped in the time t is the numerical value of
t
the integral �
p ( x)dx , than the survival probability that the particle will survive in
0
t
time t is 1 - �
p ( x )dx , where p(x) is the distribution of the trapping time. The
0
following plot shows the survival probability of the problem described in the previous
paragraph for the one we have find the distribution of the trapping time.
C
1.0
0.8
Survival Probability
0.6
0.4
0.2
0.0
0 20 40 60 80 100
Time
In the graph above are presented in the same plot the distribution of the trapping time
and the survival probability.
8
1.0
0.6
0.4
0.2
0.0
0 20 40 60 80 100
The behavior of a random walk is also characterized by the coverage of the space, as
expressed by the average number of distinct sides visited < S N > after N steps.
S N is the number of distinct sites visited by a single random walker. We will see the
behavior of S N on one, two and three dimensional lattices, using Monte Carlo
simulation techniques to model a single random walk on spatially anisotropic lattice
structures.
The analytic expression for S N in the asymptotic limit of N � �, where N is
the number of steps, has been given by Montroll and Weiss for all three
dimensionalities. In 1-D, S N follows a t 1/ 2 power law, in 3-D it is linear with t, and in
2-D it is “almost” linear, with an additional logarithmic term:
1/ 2
�8 N �
SN : � � N � � 1-D (1)
�p �
pN
SN : N �� 2-D (2)
log N
SN : N N �� 3-D (3)
9
These are the asymptotic equations. Correction terms, which add accuracy to the early
time behavior, have also been derived. We use the analytical expressions 2 and 3, with
their associated correction terms.
1/ 2
�8 N � � 1 3 �
1-D: SN = � � � 1+ - +L � (4)
p
��� 4 N 64 N 2
AN � -d b
j
� �1 � �
2-D: SN = �
ln( BN ) j =0 ln BN b =2
1+ O � �
�
� �N �
�
�
(5)
The analytical solution, which in the plot corresponds with the continuous line is
found using equations (4), (5) and (6).
10
exercice results
55 For a one dimentional lattice analitic solution
50
45
40
35
30
SN
25
20
15
10
Simulation results
400 For a two dimentional lattice Analitic solution
350
300
250
200
Sn
150
100
50
-50
0 200 400 600 800 1000
Time
11
1.10 The plot of S N versus Time for 3-D
exercise results
For the 3 - D lattice analytic soluion
700
600
500
400
SN
300
200
100
-100
0 200 400 600 800 1000
Time
250
200
Frequency
150
100
50
12
1.12 The plot of the distribution of S N for 2-
800
600
Frequency
400
200
150
Frequency
100
50
1.14 The plot of the relation of < S N > versus Time for 1-D
13
B
For a one dimentional lattice
50 LAT[100000], NRUN=100000
45
40
35
<Sn>
30
25
20
15
1.15 The plot of the relation of < S N > versus Time for 2-D
D
400
For 2-D lattice
350
300
250
<Sn>
200
150
100
50
0
0 200 400 600 800 1000
Time
14
1.16 The plot of the relation of < S N > versus Time for 3-D
B
3.6 For a three dimentional lattice
LAT[50][50][50],NRUN=10000
3.4
3.2
3.0
2.8
<Sn>
2.6
2.4
2.2
2.0
1.8
2 4 6 8 10 12
Time
15
II NETWORKS
Networks with undirected edges are called undirected networks, networks with
directed edges are directed networks.
The case of my study will be the undirected networks.
The total number of connection of a vertex is called its degree k. In general
terms, random networks are networks with a disordered arrangement of edges. Note
that, usually, in graph theory the meaning of a random graph is much more narrow. A
random network means an esemble of nets. In principle, random networks may
contain vertices of fixed degree. Usually, however, the degree of vertices are
statistically distributed.
The adjacency matrix of a network provides its complete image. It indicates which of
the vertices are connected (adjacent). This is a square N �N matrix, where N is the
total number of vertices in the network. Its element aij is equal to 1 if there is an edge
that connects the vertex i with vertex j. The adjacency matrix of an undirected
network is symmetrical, and diagonal elements are equal to zero: aii = 0 .
16
This is the adjacent matrix of a random network with 20 nodes, and two nodes are
linked with probability p = 1/ 6 (in this case).
e-k k k
P (k ) =
k!
asymptotically has just this degree distribution, if its number of vertices approaches
infinity under the constraint that the mean degree is fixed.
17
k
(2) Exponential distribution P (k ) = a e - k . For instance, this is the distribution of
the trapping times.
Here we list the main structural characteristics of networks. Some of them have been
introduced, the others will be necessary in what follows.
A particularly rich source of ideas has been the study of random graphs, graphs in
which the edges are distributed randomly. The theory of random graphs was
introduced by Paul Erdős and Alféd Rényi.
According to the Erdős- Rényi model, we start with N nodes (a fixed number) and
connect every pair of nodes randomly selected with probability p, creating a graph
with approximately pN ( N - 1) / 2 edges distributed randomly. The distribution below
is obtained for a network with 100000 nodes where two nodes are connected with
probability p = 1/ 6 .
As we can see from the plot, the random network has a Poisson distribution. This
result suits with what we said in paragraph 2.1.4.
18
Distribution of P(k)
400
Mean Value = 16666.69062
350 N_NODE = 100000 Count
300
250
200
Frequency
150
100
50
-50
16200 16400 16600 16800 17000 17200 17400
K
The term “scale-free” refers to any functional form f(x) that remains unchanged to
within a multiplicative factor under a rescaling of the independent variable x. In effect
this means power-law forms, since these are the only solutions to f(ax)=bf(x), and
hence “power-law” and “scale-free” are, for our purposes, synonymous.
Here we find the formula for calculating the degree of a vertex k, for a power-law
distribution:
dx
Px ( x) = , from this follows dx = Pk (k )dk
dk
k
x=
kmin
�P (k ')dk '
k
-g
Replacing Pk (k ) = Ck , we obtain
Pk (k ') = C ( k ') -g
k
-g
x= �(k ')
kmin
dk '
19
(k ') -g +1 k C
x=C |kmin = [k 1-g - kmin
1-g
] , for g �1 .
-g + 1 1- g
C
So, x= [ k 1-g - kmin1-g
] for g �1 .
1- g
1- g
= k 1-g - kmin 1-g
C
(1 - g )x 1-g
k = 1-g + kmin (*)
C
1- g 1-g
kmax = 1-g + k min
C
Taking the power of ( 1 - g ) we receive:
This equation will be used widely for finding the number of connections for each
node of a scale-free network.
In the following table are shown the values of the parameters < K > , < K 2 > , where
K stands for the number of connections. To provide a heuristic characterization of the
level of heterogeneity of networks we define the parameter k =< k 2 > / < k > .
Indeed, fluctuations are denoted by the normalized variance that can be expressed as
k / < k > -1 , and scale-free networks are characterized by k � �, whereas
homogeneous networks have k �< k > . For this reason, we will generally refer all
networks with heterogeneity parameter k ? < k > as scale-free networks. We shall see
in the following paragraphs that k is a key parameter for all properties and physical
processes in networks which are affected by the degree fluctuations.
20
2.3.2 Plot of the distribution of scale-free network for g = 2.0
We receive the following distribution plot (with logarithmic axes). Where K is the
number of links and Frequency is the number of times each link is encountered.
10000
1000
log(Frequency)
100
10
10000
1000
lo(Frequency)
100
10
1 10 100 1000
log(K)
21
Count
10000
1000
log(Frequency)
100
10
1 10 100
log(K)
From this plots we see that the network for g = 2 is very dense, the network for
g = 2.5 is less dense and the network for g = 3 is less dense than the previous one.
Below we represent the distribution of scale-free networks for g = 2.0, 2.5,3.0 in the
same plot.
1000
100
10
1 10 100
log(K)
22
2.3.5 The algorithm that constructs the scale – free network for
g = 2, 2.5,3 with N fixed nodes
1. Find the number of connections for each node using the equation
1-g
k = 1-g (kmax 1-g
- kmin 1-g
) x + kmin , in our case kmin = 1 .
N -1
2. Find the total number of legs: T _ legs = �legs (i ) , where legs(i) are the
i=0
connections node i has.
3. If the total number of the legs is odd, choose randomly one node and add a leg
to it, so T _ legs will increase by one.
4. Find the number of links: N _ links = T _ legs / 2 .
5. Randomly choose a link and than two nodes. Using the rejection method we
give more priority to choose two nodes with biggest ‘weights’, or nodes that
have more connections. If we choose the nodes randomly then we will be
faced with the problem that nodes with many connections will have most of
them unconnected.
6. Construct the network.
After the network construction we ignore the links which cannot connect.
Below we present the scale-free network for g = 2.5 in a linked list format.
23
22 2 47 44
23 3 29 44 46
24 2 44 4
25 1 40
26 4 29 14 7 12
27 1 42
28 1 7
29 12 9 7 45 26 21 2 41 23 35 17
11 20
30 1 36
31 1 47
32 1 11
33 2 7 38
34 2 47 40
35 2 29 7
36 2 7 30
37 1 7
38 1 33
39 1 17
40 5 8 34 7 6 25
41 1 29
42 4 9 7 27 15
43 2 3 44
44 9 9 2 21 47 22 24 43 23 11
45 2 29 7
46 1 23
47 7 34 7 31 44 21 22 14
48 1 21
49 2 16 1
Breadth – First search is an algorithm for searching a graph. Given a graph G= (V, E)
and a distinguished source vertex s, breadth – first search systematically explores the
edges of G to “discover” every vertex that is reachable from s. It computes the
distance (fewer number of edges) from s to all such reachable vertices. It also
produces a “breadth – search tree” with root s that contains all such reachable
vertices. From any vertex v reachable from s, the path in the breadth – first tree from
s to v corresponds to a “shortest path” from s to v in G, that is, a path containing the
fewest number of edges.
Breadth – first search is so named because it expands the frontier between discovered
and undiscovered vertices uniformly across the breadth to the frontier. That is, the
algorithm discovers all vertices a distance k from s before discovering any vertices at
distance k+1.
To keep track of progress, breadth – first search colors each vertex white, gray, or
black. All vertices start out white and may become gray and then black. A vertex is
discovered the first time it is encountered during the search, at which time it becomes
24
nonwhite. Gray and black vertices, therefore, have been discovered, but breadth – first
search distinguishes between them to ensure that the search proceeds in a breadth –
first manner. If (u , v ) �E and vertex u is black, then vertex v is either gray or black;
that is, all vertices adjacent to black vertices have been discovered. Gray vertices may
have some adjacent white vertices; they represent the frontier between discovered and
undiscovered vertices.
Breadth- first search constructs a breadth – first tree, initially containing only its root,
which is the source vertex s. Whenever a white vertex v is discovered in the course of
scanning the adjacency list of an already discovered vertex u, the vertex v and the
edge (u, v) are added to the tree. We say that u is the predecessor or parent of v in the
breadth – first tree. Since a vertex is discovered at most once, it has at most one
parent. Ancestor and descendant relationships in the breadth – first tree are defined
relative to the root s as usual: if u is on a path in the tree from the root s to vertex v,
then u is an ancestor of v and v is a descendant of u.
The breadth–first–search procedure BFS below assumes that the input graph G=(V,E)
is represented using adjacency lists. It maintains several additional data structures
with each vertex in the graph. The color of search vertex u �V is stored in the
variable color[u ] , and the predecessor of u is stored in the variable P[u ] . If u has no
predecessor (for example, if u=s or u has not been discovered), then P[u ] = NIL . The
distance from the source s to vertex u computed by the algorithm is stored in d [u ] .
The algorithm also uses a first-in, first-out queue Q to manage the set of gray vertices.
25
The procedure BFS works as follows. Lines 1-4 paint every vertex white, set d[u] to
be infinity for every vertex u, and set the parent of every vertex to be NIL. Line 5
paint the source vertex s gray, since it is considered to be discovered when the
procedure begins. Line 6 initializes d[s] to 0, and line 7 sets the predecessor of the
source to be NIL. Line 8 initializes Q to the queue containing just the vertex s;
thereafter, Q always contains the set of gray vertices.
The main loop of the program is contained in lines 9-18. The loop iterates as long as
there remain gray vertices, which are discovered vertices that have not yet had their
adjacency lists fully examined. Line 10 determines the gray vertex u at the head of the
queue Q. The for loop of lines 11-16 considers each vertex v in the adjacency list of u.
If v is white, then it has not yet been discovered, and the algorithm discovers it by
executing lines 13-16. It is first grayed, and its distance d[v] is set to d[u] + 1. Then,
u is recorded as its parent. Finally, it is placed at the tail of the queue Q. When all the
vertices on u’s adjacency list have been examined, u is removed from Q and
blackened in lines 17-18.
Figure: The operation of BFS on an undirected graph. Tree edges are shown shaded as
they are produced by BFS. Within each vertex u is shown d[u]. The queue Q is shown
26
at the beginning of each iteration of the while loop of lines 9-18. Vertex distances are
shown next to vertices in the queue.
Networks growing under the mechanism of preferential linking are growing networks
where new edges become attacked to vertices preferentially. By the definition, this
means that more connected vertices have a better chance to get new connections.
The linear type of preferential attachment produces fat-tailed degree distributions. On
the other hand, if any preference of linking is absent, and new connections become
distributed at random, the degree distribution decreases rapidly. These networks are
strongly correlated.
We start with a fixed number N of vertices that are then randomly connected or
rewired, without modifying N. In contrast, most realworld networks describe open
systems that grow by the continuous addition of new nodes. Starting from a small
nucleus of nodes, the number of nodes increases throughout the lifetime of the
network by subsequent addition of new nodes.
Network models discussed so far assume that the probability that two nodes are
connected is independent of the nodes’ degree, i.e., new nodes are placed randomly.
Most real networks, however, exhibit preferential attachment, such that the likelihood
of connecting to a node depends on the nodes’ degree.
27
The algorithm of the Barabasi – Albert model is the following:
ki
P ( ki ) =
�k j
j
After k time steps this procedure results in a network with nodes and tm edges.
Numerical simulations indicated that this network evolves into a scale – invariant
state with the probability that a node has k edges following a power law with an
exponent g BA �3 .
In the following plot is represented the distribution of the BA model. The distribution
is received for a network that contains 10000 nodes, and is taken the average result of
1000 runs, starting from 2 nodes linked together and adding each time a new node, for
the first plot, and the second one is received for a network that contains 100000
nodes, for the same number of runs.
28
B
10000 Linear Fit of Data1_B
1000
Slope = - 2.91344
Node = 10000
100
Run = 1000
10
log(P(K))
0.1
0.01
1E-3
1E-4
1 10 100
log(K)
B
100000 Linear Fit of Data1_B
100
10
log(P(K))
0.1
0.01
1E-3
1E-4
1 10 100 1000
log(K)
29
2.5 Real Networks
Only recently have we realized that we reside in a world of networks. The Internet
and World Wide Web (WWW) are changing our lives. Our physical existence is based
on various biological networks. The extent of the development of communication
networks is a good indicator of the level of development in a country. ‘Networks’ turn
to be a central notion in our time, and the explosion of interest in networks is already
a social and cultural phenomenon. The Internet will be the case of this study.
30
the Internet to random failures and intentional attacks is a major issue, with several
practical implications.
The study of the resilience of the Internet to failures is no an easy task. After
any router or connection fails, the Internet responds very quickly by updating the
routing tables of the routers in the neighborhood of the failure point. Therefore, the
error tolerance of this network is a dynamical process, which should take into account
the time response of the routers to different damage configurations. Though, a first
approach to the analysis of the Internet’s robustness can be achieved at the
topological level by studying the behavior of the AS and IR level maps under the
removal of vertices or edges. These studies have shown that the Internet presents two
faces in front of the component failures: it is extremely robust to the loss of a large
number of randomly selected vertices, but extremely fragile in response of a targeted
attack.
The Internet can be viewed as a special case of a random, scale-free network,
where its distribution follows a power-law: P (k ) : k -g ( g �2.1 , for the Internet).
The following plot shows the distribution of the Internet for results taken from
www.netdimes.org . From this web site I have taken the edges of the network and then
I construct the network (in a linked list format) and from the adjacency list of the
network I find the distribution of the Internet, which is shown in the following graph.
The data are taken from a registration of NETDIMES done in October 2004.
1000
Slope = -2.10297
100
log(frequency)
10
10
log(k)
31
In this work I provide a review of results on the topological resilience of the
Internet damage. I will present numerical experiments which show that the Internet
can withstand a considerable amount of random damage and still maintain overall
connectivity in the surviving network. In particular the Internet’s tolerance to massive
random damage is much higher than for meshes or random homogeneous networks,
suggesting that the cause for this robustness resides in its power law degree
distribution. The very nature of Internet degree distribution, on the other hand, implies
the presence of heavily connected hubs. A targeted attack, aimed at knocking down
those hubs, has dramatic consequences for Internet connectivity. In this case we shall
see that the deletion of a very small fraction of hubs is enough to break the network
down into small, isolated components, hugely reducing its communication
capabilities.
32
by recalling that the power law form of this distribution implies that the vast majority
of vertices have a very small degree, while a few hubs collect a very large number of
edges, providing the necessary connectivity to the whole network. When removing
vertices at random, chances are that the largest fraction of deleted elements will have
a very small degree. Their deletion will imply, in turn, that only a limited number of
adjacent edges are eliminated. Therefore, the overall damage exerted on the network’s
global connectivity properties will be limited, even for very large values of p. This
intuition will be confirmed in the analytical study performed in the following
paragraphs, where it is shown that scale – free networks with degree exponent g �3
have a infinite tolerance to random damage, in the sense that it is necessary to delete a
fraction p � 1 in order to induce the complete breakdown of the largest component.
This fact has led on some occasions to the erroneous statement that scale – free
networks have a topology that is designed or optimized to resist random failures.
Recently there has been increasing interest in the formation of random networks and
in the formation of random networks and in the connectivity of these network,
especially in the context of the Internet. When such networks are subject to random
breakdowns – a fraction p of the nodes and their connections are removed randomly –
their integrity might be compromised: when p exceeds a certain threshold, p < pc , the
network disintegrates into smaller, disconnected parts. Below that critical threshold,
there still exists a connected cluster that spans the entire system.
In this paragraph we consider random breakdowns in the Internet and introduce an
analytical approach to finding the critical point. The site connectivity of the physical
structure of the Internet, where each communication node is considered as a site, is
power law, to a good approximation. We introduce a new general criterion for the
percolation critical threshold of randomly connected networks. Using this criterion,
we show analytically that the Internet undergoes no transition under random
breakdowns of its nodes. In other words, a connected cluster of sites that spans the
Internet survives even for arbitrarily large fractions of crashed sites.
We consider networks whose nodes are connected randomly to each other, so
that the probability for any two nodes to be connected depends solely on their
respective connectivity. We argue that, for randomly connected networks with
connectivity distribution P(k), the critical breakdown threshold may be found by the
following criterion: if loops of connected nodes may be neglected, the percolation
transition takes place when a node (i), connected to a node (j) in the spanning cluster,
is also connected to at least one other node – otherwise the spanning cluster is
fragmented. This may be written as
33
where the angular brackets denote an ensemble average, ki is the connectivity of node
i, and P (ki | i � j ) is the conditional probability hat node i has connectivity ki , given
that it is connected to node j. But, by Bayes rule for conditional probabilities
where P (ki , i � j ) is the joint probability that node i has connectivity ki and that it is
connected to node j. For randomly connected networks P (i � j ) = k /( N - 1) and
P (i � j | ki ) = ki /( N - 1) , where N is the total number of nodes in the network. It
follows that the criterion (1) is equivalent to
k2
k� =2 (2)
k
at criticality.
The strategy thus consists in finding at which damage density p, the surviving
network fulfills the percolation condition k p / k p = 2 , where k p
2 2
and k p
refers to the degree distribution moments of the damaged graph.
Following the intuitive approach proposed by Cohen, Erez, ben Avraham, and
Havlin (2000), lets consider a sparse uncorrelated generalized random graph with
2
degree distribution P0 ( k ) and first moments k 0 and k . 0
�k '�
�� (1 - p ) k p k '- k , k ' �k ,
�k �
�
�k '�
Pp (k ) = �P0 (k ') � � (1 - p) k p k '- k .
k '= k k
��
From this equation we can compute the first and second moments of the degree
distribution in the damaged network, obtaining
k p
= �kPp (k ) = (1 - p ) k 0 ,
k
k2 = �k 2 Pp (k ) = (1 - p ) 2 k 2 + p(1 - p) k 0 .
p 0
k
34
These expressions can be plugged in Equation (2) that gives the condition for the
presence of a giant component in the surviving network, yielding the precise value of
the threshold pc as the one that satisfies the equation
k2 (1 - pc ) k 2
pc
= 0
+ pc = 2 .
k pc
k 0
1
pc = 1 - .
k -1
That is, the critical threshold for the destruction of the graph’s connectivity differs
from unity by a term that is inversely proportional to the degree fluctuations of the
undamaged network. This readily implies that the topological robustness to damage is
related to the graph’s degree heterogeneity. In homogeneous networks, in which the
heterogeneity parameter is k ; k , the threshold is finite and will simply depend on
the average degree. In highly heterogeneous networks, in which k ? k , the
2
threshold approaches larger values, being dominated by the magnitude of k . In
particular, all scale – free graphs with diverging k2
0
have k � �, therefore
exhibiting an infinite tolerance to random failures; i.e. pc � 1 . Scale – free graphs
thus define a class of networks characterized by a distinctive resistance to high levels
of damage that is clearly welcome in many situations.
2.6.2 The plot of k versus P for scale-free networks under random attack
35
( k - is the number of connections for each node), and the threshold is reached
when k =2.
g =2.0
g=2.5
Random Attack
g =3.0
300
250
200
Kapa
150
100
50
2.6.3 The plot of k versus P for scale-free networks & Internet under
random attack
36
5/28/2007 15:34:14
300 g = 2.0
Internet
g = 2.5
250
200
Kapa
150
100
50
37
Random Attack
1.0
0.9
0.8
Pc
0.7
0.6
0.5
0.4
1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6
Gama
The results shown in the previous section point a scale – free graphs as very robust
networks that in principle can be broken apart only by damaging all their vertices.
From the plot in paragraph 2.6.3 we conclude that even the Internet network gives a
stunning robustness to random failures.
However, real – world networks, the Internet studied in this chase, necessarily show
finite size effects due to resource or size constraints.
38
transition, although finite networks (such as the Internet) may eventually be disrupted
when nearly all of their sites are removed.
Albert et al. (2000) have introduced a model for intentional attack, or sabotage
of random networks: the removal of sites is not random, but rather sites with the
highest connectivity are targeted first. Their numerical simulations suggest that scale
– free networks are highly sensitive to this kind of attack. It is proved both
analytically and numerically, that scale – free networks are highly sensitive to
sabotage of a small fraction of the sides, for all values of g .
With simulations I have studied the problem of intentional attack in scale – free
networks. The study focuses on the exact value of the critical fraction needed for
disruption.
Find the number of connections for each node using the equation
1-g
k = 1-g (kmax 1-g
- kmin 1-g
) x + kmin , in our case kmin = 1 and kmax = N .
Sort the array of connections using Selection Sort.
Construct the network for this sorted array of connections.
Choose randomly 10% of the nodes and remove their legs and also the legs of
the nodes adjacent to them; then we choose randomly 20% and so on, till the
threshold will be reached. As in the case of random attach we test
k = < k > < k > ( k - is the number of connections for each node), and the
2
The first plot shows the relation of k versus P, where P is the percentage of removed
sites, for the scale-free networks for g �2.0, 2.5,3.0 .
The second plot represents the relation of k versus Pc , for different g . Pc is the
threshold, such that for values of P above that, the network disintegrates.
39
Gama = 2.0
Intentional Attack Gama = 2.5
18
Gama = 3.0
16
14
12
10
Kapa
0
0.01 0.02 0.03 0.04 0.05 0.06 0.07
P
0.08
Intentional Attack
0.07
0.06
0.05
0.04
Pc
0.03
0.02
0.01
0.00
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6
Gama
40
2.7.3 The plot of k versus Pc for scale-free networks under random and
intentional attack
Intentional attack
Random attack
1.0
0.8
0.6
Pc
0.4
0.2
0.0
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6
Gama
The scale-free nature of the Internet protects it from random failures, since the hubs,
that hold the network together with their many links, are difficult to hit in a random
selection of vertices. Since hubs are the key elements ensuring the connectivity of the
network, however, it is easy to imagine that a targeted attack, aimed at the destruction
of the most connected vertices, should have a very disruptive effect. In practice this is
made by removing the vertices following a decreasing degree ordered list. The first
vertex to be removed is therefore the one with the highest degree. Then the second
highest degree vertex is removed, and so on until the total number of removed
vertices represents a fraction p of the total number of vertices forming the network. As
for the random removal, the behavior of k can be studied for increasing damages p.
The scale-free nature of the Internet graph makes the long tail of large degree vertices,
the hubs, extremely important for keeping the graph connected. Their removal leads
right away to networks collapse.
41
10 Gama = 2.0
Intentional Attack
Internet
9
Gama = 2.5
8
6
Kapa
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30
P
The targeted attack scenario is studied in the case of the Internet network. The
heterogeneous parameter k reaches the value 2 for Pc �0.11 .
This result does not suits with my expectations, because as the Internet shows a scale-
free behavior, with g �2.1 , I expect that 0.048 < Pc < 0.072 , where Pc = 0.048 for
g = 2.0 , and Pc = 0.072 for g = 2.5 .
Further work
To use more registration of the NETDIMES, to see their behavior and take the
average of them.
To try the same attacks even with the registrations of Oregon Route-Views
project and compare the results.
42
REFERENCES
43
[22] Frank S. Henyey, V. Seshadri, On the number of distinct sites visited in 2D
lattices, Mathematical Physics 6, 2, (1965).
[23] Harvey Gould, Jan Tobochnik (1996), An Introduction to Computer Simulation
Methods.
44