Complex Network Models
Complex Network Models
Johannes Lengler
1 Introduction 2
1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . 6
ii
3.5.1 Strengths . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5.2 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . 55
4 Geometric Graphs 57
4.1 Weak Ties and the Watts-Strogatz Model . . . . . . . . . . 58
4.2 Navigatibility and the Kleinberg model . . . . . . . . . . . 61
4.2.1 Shortcomings of the Kleinberg model . . . . . . . . 66
Introduction
In this course, we will study some of the most important random net-
work models. Formally, a (finite) random network model is a probability
distribution over all graphs with vertex set V = [n]. Practically, it is a
randomized procedure to generate a graph on n vertices. We will study
these models asymptotically in the limit n → ∞.
There are many reasons to study random network models, from exis-
tence proofs in graph theory to the design of efficient data structures and
algorithms. But in this lecture, we focus on a different goal: to understand
complex real-world networks better.
Complex real-world networks include social networks. In these net-
works, the nodes are usually people, or users. There are online social
networks like the facebook graph, in which edges are friendship links in the
facebook social network; or a collaboration network, where the nodes are
researchers, and two researchers share an edge if they have co-authored a
publication; or a mobile phone graph, in which two phones are linked by
an edge if there was a phone call between the two phones in a particular
month. Or the friendship network, in which two people are connected by
an edge if they know each other on a first-name basis.
Another class of networks are technological networks like the internet
graph, in which the nodes are given by routers and links are given by con-
necting cables1 . Another example is the web graph, in which the nodes are
the pages of the world wide web, and the edges are given by the hyperlinks.
Many other examples of networks can be found in public repositories such
1
The term internet graph is also used to refers only to the highest layer of the physical
structure of the internet, where the nodes are the autonomous systems of the internet.
2
as https://snap.stanford.edu/data/.
There are several reasons why we want to study real-world networks
through models. One is that some of the networks are not directly available.
For example, the data of an online social network may not be available
because the company does not want to share it. In some cases like for mobile
phone data, there are data protection laws preventing them from making
the network public. In other cases like the friendship network, the data
does not exist at all in computer-readable form.2 While it is in principle
possible to query links in the network by asking the involved people, this
allows at best to get tiny samples from the networks. Nevertheless, it is
possible to run algorithms on the friendship network, and we will learn
about one such algorithm in the lecture.
Another reason to study network models instead of the real networks
is that we are able to create variations of the networks. For example,
assume we want to understand how the efficiency of a routing protocol like
the Border Gateway Protocol (BGP) scales if the internet graph grows over
time. We have a few data points: we know the internet graph right now,
and we know how the internet graph has looked like in the past. So we can
run experiments on these real instances, find a trend, and extrapolate to
the future. However, network models allow us a different approach: we can
choose a model that generates networks like the internet graph, and then
we can use the model to generate networks which are twice as large, and run
experiments on those networks. Of course, for this approach it is crucial
that the network model captures the properties of the internet graph that
are relevant for routing and the BGP protocol. So we need to find a good
model. This is precisely the goal of this course: to introduce the students to
a collection of network models to choose from, and to discuss some network
properties that may be relevant for such tasks. Beware that the course can
and will not be exhaustive on either side: there are many more interesting
network models, and many more important network properties than what
we can cover in this course.
A third reason to study network models is rather fundamental: assume
we want to understand a real-world phenomenon. Let us take the following
empirical fact as an example. We have an unweighted network G and two
2
This can also be difficult for technological networks. It is surprisingly unclear how
big the web graph really is, since search engines only give us access to the parts that are
accessible by web crawlers.
vertices u, v, and we want to find a shortest path from u to v. The textbook
solution is to use breadth-first search (BFS), starting from u, until v is
found. However, in practice, a different algorithm, called bi-directional
BFS is more efficient: start two BFS in parallel, one from u and one from
v. As soon as there is a vertex w that is discovered by both BFS, the path
from u to v through w is a shortest path. Why is the second algorithm
more efficient than the first one? Classic worst-case or average-case analysis
does not give a difference between the two variants, so it must have to do
with the networks on which the algorithm is run in practice. What aspects
of the networks are important for the runtime? Is it relevant that there are
nodes of rather high degree? Does it play a role that those networks tend to
be clustered into communities? In such a situation, the best approach is the
following: we develop different levels of abstraction of the phenomenon.
So, we need different network models, some more basic, and some more
complex. In the most basic ones, we will not see a difference between uni-
and bi-directional BFS. But as we add more and more aspects of real-world
networks to the model, at some point the bi-directional variant will start
to have an advantage. By studying when exactly this happens, we get a
much better understanding of why the bi-directional variant is superior in
practice. We will come back to bi-directional search for several network
models in this course.
This last aspect is a rather fundamental approach in the study of com-
plex system. To understand such a system throroughly, we need to develop
different levels of abstraction of the system, and choose the appropriate
one to understand different aspects of the system – the level should be as
simple as possible, but as complex as necessary.
Literature
The script and lecture will be self-contained, and it is not necessary to read
further literature for passing the course. However, for students who want
additional material, we want to highlight especially two excellent books on
the topic.
• The book Complex Network: Principles, Methods, Applications
by Latora, Nicosia and Russo gives a gentle introduction into complex
networks for readers of all domains. It covers a wider range of topics,
in particular more real-world network models and a larger collection
of network properties than we can cover in this lecture. It is available
through ETH library (vpn required).
Compared to our course, the book is less math-heavy. It does work
with some mathematical concepts, but the authors try to make them
accessible for people without a mathematical background.
The book comes with an excellent homepage, which contains all the
network discussed in the book, and C-programs for all the network
properties and algorithms that are discussed. This webpage is a great
place for anyone who want to play around with some instances of real-
world networks.
3
Though beware that this there is no clear-cut definition for a fixed network of finite
size. E.g., the collaboration network on high energy physics (based on arxiv papers) has
n 12, 000 vertices and m 120, 000 edges. Whether this is sparse or not is interpreta-
tion, not math.
• The book Random graphs and Complex Network, Volume 1 by
van der Hofstad is more mathematical than our course. The reader
can find all the nasty technical details which we skip in our course in
this book, and many more. The book covers graph-theoretic aspects
(e.g., component structures, typical distances, clustering coefficients)
in much more depth and detail. It mostly does not cover the algorith-
mic aspects that we discuss in this lecture (e.g., routing, bidirectional
search). The book (as well as a preliminary volume 2) is freely avail-
able at https://www.win.tue.nl/~rhofstad/NotesRGCN.html.
1.1 Preliminaries
Probabilistic tools
Basic objects like graphs often have two names, one coming from graph the-
ory and one from network theory. We will use the terms “graph/network”,
“vertex/node” and “edge/link” interchangeably in this course.
Throughout the course, we will always consider a graph G = (V, E) with
vertex set V and edge set E. We use the convention n = |V|, m = |E|.
Unless otherwise mentioned, G is undirected, simple and finite. G is usu-
ally obtained from a random graph model where the set of vertices is fixed,
but the set of edges is random. We are interested in the behaviour of G
for n → ∞. The Landau notation O(.), o(.), Θ(.), Ω(.), ω(.) is always with
respect to the limit n → ∞ unless otherwise stated. If D is a probabil-
ity distribution, we will slightly abuse notation by writing Pr[D = x] as
abbreviation for “Pr[X = x] for a random variable X with distribution D”.
Further notation
• N := {1, 2, . . . , }.
• E(S1 , S2 ) = {{v1 , v2 } 2 E | v1 2 S1 , v2 2 S2 }.
• u ∼ v = “u is adjacent to v”.
4
This term is sometimes used differently in the literature and may mean “with proba-
bility n−ω(1) ” or “with probability n−Ω(1) ”. In this case, probability 1 − o(1) is also called
“asymptotically almost surely” or “aas”.
Chapter 2
In this section, we will discuss the most basic random network model,
the Erdős-Rényi random graph Gn,p . In particular, we will study three
aspects of the component structure: the existence of a giant component,
the asymptotic absence of medium-size components and the number and
structure of small components.
We will use Gn,p to introduce two perspectives on a network: the local
and the global perspective. Even though the Erdős-Rényi model is not
a good model for real-world networks, the techniques developed here are
useful to understand more complex models.
n→∞ e−µ µk µ
Pr[deg(v) = k] → Pr[Po(µ) = k] = in Gn,p with p = . (2.1)
k! n
9
2.1 The local perspective and the Galton-Watson branch-
ing process
We want to know how many vertices of Gn,p with p = µ/n are contained
in connected components of size s = 1, 2, . . ., or in larger components.
The model Gn,p is so symmetric that we could compute this directly. For
example, the expected number of components of size s = 2 is
X
Pr[{u, v} is a component of size 2]
u6=v2V
X
= Pr(uv 2 E, none of the other 2(n − 2) incident edges exist)
u6=v2V
!
n µ µ 2n−4 µe−2µ
= 1− = (1 − o(1)) n.
2 n n 2
Each component of size s = 2 contains two vertices, so the expected number
of vertices in such components is µe−2µ n. It is an easy exercise to show
that the true value is concentrated around its expectation. The calculation
becomes slightly more complicated for components of larger sizes, but can
be done.
However, we want to use a different approach, which generalizes better
to more complex situations. To this end, we fix a vertex v 2 V and com-
pute the probability that v is in a component of size s. We explore the
graph starting from v in a breadth-first search (BFS). So, we take the local
perspective of vertex v.
In the first step, we uncover the number of neighbours of v. This is
distributed as Bin(n − 1, p) → Po(µ), since we assumed p = µ/n. Assume
v has X1 = o(n) neighbours. In the next step, for every neighbour w we
reveal how many edges go out from w. One edge clearly goes to the parent
v, and we ignore this edge at this point. For the other n−2 potential edges,
each of them has the same probability p to be present, so the number of
outgoing edges is distributed as Bin(n − 2, p) → Po(µ). So the number of
new edges that we find per neighbour w follows essentially again the same
distribution.
Of course, not every new edge necessarily leads to a new vertex. It
could also lead to a vertex that we have already found before. However, if
we process the vertices one by one, as long as the number of vertices that
we have found is x = o(n), the number of new vertices that we find from
w is Bin(n − x, p) → Po(µ). So, this effect only starts to matter when we
have discovered a linear fraction of the graph.
Summarizing, exploring the graph resembles the process of growing a
tree with parent v, where each node has a random number of children,
distributed as Po(µ). This process is known as Galton-Watson branching
process.
Proof. We will only show that E[ns /n] → ps . For the theorem, one also
needs to show concentration, which can be obtained by probabilistic tools
like Azuma’s inequality.
Fix s 2 N and v 2 V. Let E (v) be the event that v is in a component
P
of size s. By symmetry, we have E[ns ] = u V E (v) = nPr[E (v)]. Thus we
2
and the probability to obtain a path P3 where the root has degree 1 is
3
Pr[P3 with deg(root)= 1] = (Pr[Po(µ) = 1]) Pr[Po(µ) = 0]
1 3
e−µ µ e−µ µ0
= = e−4µ µ3 .
1! 0!
Note that by this calculation we count every component S4 exactly once (because there
is only one vertex of degree 3), while we count every P3 twice. On the other hand, every
component has four vertices. Therefore, in Gn,p the fraction of vertices in S4 -components
is 23 e−4µ µ3 , and the fraction of vertices in P3 -components is 2e−4µ µ3 . Thus there are
three times as many vertices in S3 -components than in P4 -components.3
Now that we have established Theorem 2.3, we can derive the number
of small (constant-size) components in Erdős-Rényi graphs by analyzing
3
General rules can be obtained here. It can be shown that the fraction of vertices is
inversely proprtional to the number of automorphisms of the structure.
the properties of branching processes. First we will estimate how the prob-
ability ps scales with s, and in particular, when a Galton-Watson process
has extinction probability of one.
It remains to show the tail bound in s for µ > 1. Recall that a necessary
Ps
condition for |T | = s is i=1 Zi = s − 1. The left hand side is a sum
of independent Poisson distributed random variables, so it also follows a
Poisson distribution with mean sµ. Hence,
X
s X
s
Pr[|T | = s] Pr[ Zi = s − 1] Pr[ Zi < s] ηs , (2.4)
i=1 i=1
where the last step again follows from concentration of the Poisson distri-
bution, Theorem 1.1. This concludes the proof.
Remark 2.2. It can be shown that Pr[|T | = ∞] = 0 also holds for µ = 1, except for the
trivial case that the distribution is constant with Pr[Zi = 1] = 1. However, the tail bounds
in s are no longer true, not even in the case D = Po(1). We will generally ignore such
threshold cases in this lecture. We are interested in models for real-world networks. If
some properties of the model only hold for a parameter µ which is exactly on the threshold,
then this means that they are not robust against tiny parameter changes. It is then hard
to argue that the model is relevant.
There is an important implication of the different cases (a) and (b) of
Theorem 2.4 for Erdős-Rényi graphs. Obviously, if we consider any finite
n and sum up the number of vertices in components of size s, then we
P
obtain ∞ s=1 ns = n because we count every vertex exactly once. Dividing
by n gives the fraction of vertices in components of size s,
X
∞
ns
= 1. (2.5)
s=1
n
Informally, this says that all vertices (except for an asymptotically neg-
ligible number) are either in the giant component or in components of
constant size. Indeed, it can be shown that for every function f(n) with
P
limn→∞ f(n) = ∞, whp limn→∞ ∞ s=f(n) ns = |C1 | + o(n). In other words,
there are only o(n) vertices outside of the giant component and outside of
constant-size components.
Equation (2.6) raises the question how we can compute p∞ . Of course,
we could compute the first few of p1 , p2 , . . ., and thus approximate p∞ . But
there is a more elegant way, as the following proposition shows.
1 − p∞ = e−µp∞ . (2.8)
Proof. Equation (2.7) follows directly from the law of total probability,
where we discriminate between the number i of children of the root. The
Galton-Watson tree is finite if and only if the subtrees below all children
of the root are finite. Since all children are the root of independent Galton
Watson trees, the probability that all of them are finite is piext . Hence,
X
∞
pext = Pr[GW tree becomes extinct] = Pr[root has i children]piext ,
i=0
We already know by Theorem 2.2 that p∞ > 0 for µ > 1, and it is easy
to see that the function f(x) = 1 − x − e−µx has a unique positive solution.
(E.g., because f is concave, starts at f(0) = 0 with a positive slope f (0) = 0
µ − 1 > 0, and becomes negative since f(1) < 0.) Therefore, p∞ must be
the unique root of the equation 1 − x = e−µx .
and some positive x. In this case, we could rule out the solution x = 0 by
Theorem 2.2.
C log n.
[C log n, δn].
Proof. We omit the proof. It follows the same line of argument as The-
orems 2.3 and 2.4, except that the coupling in Theorem 2.3 needs to be
made a bit more carefully. As key insight for the case µ > 1, note that
we want to couple the BFS in G up to size δn with a Galton-Watson tree
with offspring distribution Bin(n − δn, p). If δ is sufficiently small then
this distribution still has expectation (1 − δ)np = (1 − δ)µ > 1, so the
Galton-Watson tree will be very unlikey to have size s for any large s.
Proof. We construct G in two steps. In the first step, we insert every edge
with probability p1 := p − n−3/2 . In a second step, we insert every edge
with probability p2 such that (1 − p1 )(1 − p2 ) = 1 − p, or equivalently
p − p1
p2 := = (1 + o(1))n−3/2 . (2.9)
1 − p1
Note that this gives exactly the correct probability for the Erdős-Rényi
graph, because the probability that an edge is not present after both rounds
is (1 − p1 )(1 − p2 ) = 1 − p, independently for all edges. The second round
is also called sprinkling.
Now we consider the graph G1 after the first round. Since there are
vertices for which the corresponding Galton-Watson process has infinite
size, there must be super-constant components,and it is easy to see that
there must also be components larger than C log n, where C is the constant
from Lemma 2.6. (Because the coupling from Theorem 2.3 still works after
the BFS has discovered O(log n) vertices.) By the same lemma, there are
no medium-sized components in G1 .5 Therefore, there must be components
of size at least δn in G1 .
p
In the second round, we add at most O(n2 p2 ) = O( n) edges in expec-
tation, and also with high probability by the Chernoff bound. In particu-
lar, we can not not create a new linear-size component, since r edges can
join at most r + 1 components. Since all non-linear components have size
O(log n) in G1 , the additional edges could create at most a component of
p
size O( n log n). Thus every linear-size component in G must contain a
linear-size component from G1 .
5
We cheat very slightly here, since Lemma 2.6 a constant µ, while the value for G1 is
µ1 := µ − n−1/2 , which depends on n. However, µ1 stays bounded away from one, and is
also bounded from above, and Lemma 2.6 also holds under this weaker condition.
Assume that there are two linear-size components C1 and C2 in G1 .
Since |C1 |, |C2 | δn, there are at least δ2 n2 pairs (v1 , v2 ) with v1 2 C1 and
v2 2 C2 . The probability that none of these pairs is hit by an edge is
p
2 n2 2 n2
Pr[no pair hit] (1 − p2 )δ e−p2 δ = e−Ω( n)
. (2.10)
So whp at least one such pair is hit, and the components C1 and C2 are
joined in G.
Let k be the number of linear-size components in G1 , and call them
C1 , . . . , Ck . Then k 1/δ, since |Ci | δn for all i, and since the Ci
are disjoint. By a union bound and (2.10), the probability that there is
a i 2 {2, . . . , k} that is not joined with C1 in the second round is O((k −
p
1)e−Ω( n) ) = o(1). Hence, with high probability all components are joined
with C1 in the second round, and G only has a single giant component.
2.3.3 Clustering
We start with a definition.
0 if deg(v) 1.
Note that the denominator in the first case is simply deg(v)
2
. The
(local) clustering coefficient of G is then defined as
1X
CC(G) := CC(v).
nv V2
u are adjacent, see also Figure 2.1. (We count the probability as zero if
0
deg(v) 1.)
Some authors use an alternative definition, where the clustering coef-
ficient of G is three times the number of triangles in G, divided by the
number of paths of length 2. This is also sometimes called the global clus-
tering coefficient. The global clustering coefficient puts more weight on
large-degree vertices than the local clustering coefficient. In fact, the global
clustering coefficient can be written
as the weighted average of all CC(v),
where vertex v is weighted by deg(v)
2
.
For Gn,p with constant µ = pn > 1, the clustering coefficient is easy to
compute. If we pick a vertex v, then with constant probability it has degree
at least two. (In the limit the probability is Pr(Po(µ) 2) = 1−e−µ (1+µ).)
Conditional on that, after picking two random neighbours u, u of v, the 0
probability that they are connected is exactly p = µ/n. Thus the local
clustering coefficient is Θ(1/n).
V
•
'
y
]
'
CC Cu) =
Pr [ u ~ u
,
'
where u, u are random
neighbours of v
Figure 2.1: The clustering coefficient CC(v) is the probability that two
random neighbours of v are adjacent.
For many real-world networks, this is a poor match. They have rather
high clustering coefficient. This is not surprising. For example, in the
friendship network, if you pick two random friends of yourself, then what
is the probability that they know each other? It’s not one, but it is not
Θ(1/n) either, where n = 8 109 is the number of people on earth. In
fact, taking Gn,p as a model for the friendship network, this would predict
that the probability that two random friends of yours know each other is
exactly the same as the probability that two random people on earth know
each other. This is clearly a mismatch between Erdős-Rényi networks and
reality.
2.3.4 Communities
There are several definitions of communities, none of them very formal.
One is that communities are subgraphs that are much denser than the to-
tal graph. Another definition is that a community is a subgraph which
has much more internal edges (within the subgraph) than external edges
(from the subgraph to the remainder). It can be tricky to find an appro-
priate definition
of “much” more. If we pick the densest subset S of size k,
then we have nk options for S. This is a huge number of options to pick
from, especially if k is large. Even random fluctuations can create a large
difference in such a case.
Let us look at a few examples to see how tricky the intuitive definition
can be. The model Gn,p is considered as a graph without comunities simply
due to its definition: the edges are all completely independent of each other.
Thus it is sometimes used as a baseline for the definition of communities,
i.e., communities are subgraphs that are denser than anything that we
typically find in Gn,p . But what do we find in Gn,p ?
Start in any vertex v in the giant component of Gn,p and let S be the
set of the first k vertices that we find via a BFS. For small k the induced
subgraph will typically be a tree , so it has k − 1 edges and average degree
roughly two. For µ = 1+ε, this is almost twice as dense as the whole graph!
Superficially, it looks like a community. Even more extreme, if we choose S
to be the giant component itself, then it has average degree > 2, but zero
external edges! Again, this can easily be mistaken for a community. Or
take S as as the union of all components of size exactly 2. Then S has still
a linear number of internal edges, but zero external edges. Nevertheless,
we would hardly want to classify S as a community.
While Gn,p does have sets which superficially look like communities, the
picture changes a bit if we restrict to connected subgraphs of Gn,p . This
restricts the number of options that we can choose from. For example,
for small k (e.g., constant), almost all connected subgraphs of k vertices
are trees, so they only have k − 1 edges, which is the minimal density
possible among all connected graphs. Globally, we would consider the
giant component C1 of Gn,p . This graph does not have subgraphs which are
much denser than C1 itself, and no subgraphs with much more internal than
external edges. For example, it is impossible to split the giant component
into two subsets of linear size such that the number of edges between both
sides is o(n) [LM01].9 On the other hand, real-world networks usually have
communities of all sizes, from small ones to linear-size ones, even within
the giant component. Thus Erdös-Rényi graphs are not a good model for
community aspects of graphs.
2.3.5 Distances
If we pick two nodes u, v uniformly at random, what is the graph distance
between them, i.e., what is the length of a shortest path from u to v? This
9
Another application of the sprinkling technique.
is also called the typical distance of the graph.10 To answer this question
for Gn,p , we can use the following algorithm for finding a shortest path.
Assume we start a BFS from u and at the same time a BFS from v. Then
in depth d we find all vertices which have distance exactly d from u and v
respectively. Let us call the set of these vertices the d-th level. We continue
with the two BFS level by level (alternating between the two) until we find
a vertex w that belongs to both BFS trees. Then a shortest path from u
to v runs through w, and its length is given by the sum of the depths of w
in the two BFS tree, see Figure 2.2.
↑
fimtoommm Bfsfromv
Bfsfromu vertex
¥5k
dcuiv)=du+dv
Now let us analyze the process for Gn,p . We will only give an informal
argument, but it can be turned into a formal one with standard probabilistic
tools like the Chernoff bound. For each BFS, we know that it branches like
a Galton-Watson tree T . We have two possibilities for T . Either it dies out
quickly, and then the vertex is in a small component. Or it grows to infinite
size. In this latter case, it is not hard to see that it grows rather reliably.
Every vertex has offspring distribution Po(µ), which has expectation µ. If
we have x nodes in depth d, then in expectation the next level has size
10
We do not give a formal definition of typical distances, as the term is used in different
ways. The most common situation is that one can show that for two random vertices u, v,
their distance is whp in some small interval. Then this interval is called the set of typical
distances.
µx. Since each node draws the number of its children independently, the
actual size of the next level is sharply concentrated around µx if x is large.
So after some initial phase (where x is still small), once x becomes large, it
will realiably grow by a factor of µ in each level.
One can indeed show that, if the process survives, whp the trees grow
like Θ(µd ), up to some small fluctuations in the beginning that essentially
add or subtract a constant number of rounds. In other words, the number
of vertices in distance d from u is roughly Θ(µd ), and similarly for v.
When is the first time that we find a shared vertex w in both search
trees? You man have encountered a variation of this question as birthday
paradox. If we have two random subsets11 of V of size s, then the expected
number of collisions (elements which appear in both subsets) is s2 /|V|. So
p
if s = o( n), then by Markov’s inequality whp we will not see a collision,
p
but if ω( n) then whp we do see collisions. In other words, the first
p
time when we encounter a collision is when s = Θ( n). This happens
p
after d = logµ (Θ( n)) = 21 logµ n O(1) rounds. Since d is the distance
from u to w and from w to v, the distance from u to v is 2d logµ n =
Θ(log n). Summarizing, for two vertices u and v, conditional on being in
the giant component, the distance between them is typically logµ nO(1) =
Θ(log n).
Is this a good match for real-world networks? As for the component
structures, the answer is mixed. On the one hand, log n is a pretty small
function, and we do see that real-world networks tend to have very small
typical distances. Graphs with typical distance O(log n) are called small-
world graphs. On the other hand, many real-world networks have ex-
tremely small typical distances, so small that log n might still be consid-
ered large. For example, the facebook graph was studied in 2016 with
n = 1.6 109 nodes [EDF+ 16]. It has average distance (in the giant compo-
nent) of 4.57, which is a surprisingly small number for more than a billion
user. It might still be compatible with average distances of logµ n (espe-
cially because the facebook graph has a large average degree of µ 200),
but we will learn later about models which have even smaller typical dis-
tances.
More importantly, Erdős-Rényi graphs completely miss two aspects
which are crucial for average distances: on the one hand, they lack cluster-
11
In a formal proof one would need to argue why the nodes in the search trees are
random subsets, but this follows from the symmetry of Gn,p .
ing and communities, which increase typical distances. To see this, consider
the BFS search from a vertex v. For Erdős-Rényi graphs, we could assume
that all found vertices are new vertices, which we have not encountered
before. But in a graph with large clustering coefficient, this already goes
wrong in the second step. When we explore the first neighbour u of v, then
a large clustering coefficient means that many of the neighbours of u were
already neighbours of v, and thus have already been revealed in the first
step. Thus the BFS tree is much smaller than a corresponding Galton-
Watson tree. The existence of communities causes similar problems later
in the BFS search. Since the BFS trees grow slower with clustering and
communities, these phenomena increase typical distances.
On the other hand, Gn,p has a very homogeneous degree distribution.
As we will see for other models, the degree distribution plays an important
role for typical distances, and heterogeneous distributions can massively
decrease them. Thus Erdős-Rényi graphs lack two aspects that are both
important for typical distances, one which increases distances, and another
one which increases them. So we should be careful to draw conclusions
about graph distances from this model.
Chapter 3
29
3.1 Power-laws
3.1.1 Power-law probability distributions
We start with a general introduction into power laws. They are also some-
times called scale-free.2 Throughout the section, D will denote a proba-
bility distribution either on N0 or on [1, ∞). Let X ∼ D, and let f (x) be D
For our purposes, the differences between the four different versions of
2
Some authors reserve the phrase scale-free to power-laws with exponent τ 2 (2, 3).
power-laws do not matter much. A strong/weak density power-law implies
a strong/weak cumulative power-law, but not vice versa. In this lecture, we
will usually assume strict density power-laws, i.e., we make the strongest
possible assumption. We do this to simplify calculations, not because it is
necessary. Without further specification, “power-law” means “strict density
power-law”.
where the hidden constants are uniform over all d. In other words,
we require that there are c1 , c2 > 0 such that with high probability
the following holds.
tices of degree at least d in G(n). This requirement can hold a bit longer,
and in many models it holds until D = n1/(τ−1)−ε . This is why one can
sometimes prove stronger results by working with cumulative power-laws.
A second difference between power-laws of distributions and degree
sequences is that (3.1) can actually be checked for a concrete network.
In practice, power-laws (of degrees, but also of any other quantity) can
be checked by plotting Nd,n on a log-log-plot. On such plots, a power-
law corresponds to a straight line: if Nd,n = c d−τ n, then log(Nd,n ) =
log(cn) − τ log d, so there is a linear relation between the quantities log d
and log(Nd,n ) that are used in the axes of a log-log-plot. Moreover, the
slope of the line is −τ, so we can recover the power-law parameter by esti-
mating the slope of the line in the log-log-plot. In practice one indeed often
finds a linear relationship in log-log-plots, though often with a cut-off point
that is earlier than the theoretically achievable value. We refer the reader
to Figure 5.5 in [LNR17] and Chapter 1.6 in [VDH09] for log-log-plots of
the degree distributions of various real-world networks.
Proof. We only give the calculation in the simplest case of a strong power-
law for densities. Then
X
∞ X
∞ X
∞
E[X] = X Pr[X = k] = Θ( k k−τ ) = Θ( k−τ+1 ),
k=0 k=1 k=1
and the sum is finite for τ > 2 and infinite for τ < 2. Note that Θ() was
taken with respect to the limit k → ∞ and thus the hidden factors are
independent of k (and not, as usual, independent of n). This is why we
could take Θ() out of the sum.
For the second moment, the calculation is almost identical:
X
∞ X
∞ X
∞
2
E[X ] = X 2
Pr[X = k] = Θ( k 2
−τ
k ) = Θ( k−τ+2 ),
k=0 k=1 k=1
which is finite for τ > 2 and infinite for τ < 2.
P∞ 2
For power-laws of the cumulative distribution, a sum like k=0 X
P∞
Pr[X = k] can be linked to the sum k=0 X Pr[X k] via Abel summation,
the discrete version of integration by parts. We do not give the details.
Proof. (i). Let Nd,n be the number of vertices of degree d in G(n). Let
C > 0 be arbitrary. We will show that whp m > Cn if n is sufficiently
large.
By (3.2), the number of edges is at least
X X
n
whp X
D X
D
m= 1
2
deg(v) = 1
2
d Nd,n
1
2
d c1 d−τ n = c1 n
2
d1−τ . (3.3)
v2 V d=0 d=0 d=0
Since 1 − τ > −1, the sum over d1−τ diverges. Hence there is a constant
P
k such that kd=0 d1−τ > 2C/c1 . If we make n large enough, then D =
D(n) k, and (3.3) implies m > Cn.
(ii). We make a similar computation to (3.3). Since we want to estimate
m D instead of m, we may use the following upper bound. (It is not an
equality because we count edges twice if both their endpoints have degrees
at most D.)
X
D
whp X
D X
∞
m D d Nd,n d c2 d−τ n = c1 n d1−τ . (3.4)
d=0 d=0 d=0
Other than in (i), the sum in (3.4) converges to a constant C since 1 − τ <
−1. Hence m D Cc1 n. The second statement in (ii) follows directly from
P whp
the definition of negligible cutoff error since m − m D nd=D+1 Nd,n =
o(n).
We have found out that τ > 2 leads to sparse graphs (constant degrees),
while τ < 2 does not. Since we are interested in sparse graphs in this course,
we will from now on restrict to τ > 2. Let us compute how many edges
come from vertices of degree between K and D, for some large K. By a
similar calculation as above, the number of incident edges to such vertices
is at most
XD X∞ Z∞
−τ
d (c2 d n) c2 d n = Θ( c2 x1−τ ndx) = Θ(K2−τ n).
1−τ
d=K d=K K
E[deg(v)] = (n − 1) Pr[v ∼ u]
Z∞
= (n − 1) Pr[wu = w]Pr[v ∼ u | wu = w]dw
Z∞ 1
w wv
= Θ(n) w−τ min 1, dw
n
1
Z n/wv Z∞ !
w w v
= Θ(n) w−τ dw + w−τ 1dw
1 n n/wv
Z n/wv
= Θ(wv ) w1−τ dw + Θ(n) [w1−τ ]∞
n/wv
1
τ>2
= Θ(wv ) + Θ(n (n/wv )1−τ )
= Θ(wv ) + Θ((n/wv )2−τ wv ) = Θ(wv ),
where the last step uses that n/wv 1 and τ > 2. One checks that the
same end result also holds for n/2 < wv n.
In particular, for wmax := n1/(τ−1) this expectation is Θ(1). This means that
the largest weight among the n vertices is roughly wmax . For w = ω(wmax )
we have E[n w ] = o(1), so such vertices are unlikely to exist by Markov’s
In the proof we evaluated both of them, and it was a bit tricky to see that the second one
is negligible. However, none of this was actually necessary. With the right perspective,
it was a priori clear that the second integral was negligible.
Let us first summarize what we have already used in the computation. Both integrals
are over polynomials in w. Integrating over polynomials is easy. Since we do not consider
threshold case, we only Rhave exponents s 6= −1, and the inverse derivation of ws is
w
1
1+s w
1+s
. The integral w01 ws dw can thus be easily evaluated and is [ 1+s1
w0 . If
w1+s ]w1
s > −1, then w1+s is increasing, and we get Θ(w1 ). (We assume here that w1 is at
1+s
least by a constant factor larger than w0 .) If s < −1, then w1+s is decreasing, so we
obtain Θ(w1+s0 ). Note that the signs also work out in the second case, since there are
two canceling minus signs: from 1/(1 + s) < 0 and from evaluating the lower boundary.
So the integral is always dominated either by the upper boundary term or by the lower
boundary term.
upp upp
Let us call Ilow
1 and I1 the two terms needed to evaluate I1 , i.e., Ilow1 and I1
are obtained by plugging the lower (the upper) boundary into the inverse derivative
1
2−τ w
2−τ
wv /n, and similarly for I2 . In the proof of Lemma 3.6, the dominating term
was given by the lower boundary for both integrals, and we evaluated both Ilow1 2 .
and Ilow
upp
However, a more clever argument uses the observation that |I1 | = Θ(|I2 |). Before we
low
upp
argue why this is generally true, let us check this by hand. For |I1 |, we need to plug
n/wv into the inverse derivative w 2−τ
wv /n, and obtain up to constant factors
upp
|I1 | = Θ((n/wv )2−τ wv /n) = Θ((n/wv )1−τ ).
For |Ilow
2 |, we plug n/wv into w
1−τ
, and obtain the same term.
This is an incredibly helpful observation. Knowing this relation, we know that
upp
• Ilow
1 dominates I1 .
upp
• I1 is of the same order as Ilow
2 .
upp
• Ilow
2 dominates I2 .
From these observations it is obvious that Ilow 1 dominates everything else. Even more, in
the proof of Lemma 3.6 we had to give special treatement to the case n/2 < wv n,
because our estimation of I1 may fail if wv is very close to n. This is because if the upper
upp
and lower boundary (1 and n/wv ) are too close to each other, the terms Ilow 1 and I1
are very similar to each other any may cancel. (They have opposite signs.) But in fact,
upp
we have nothing to fear: if Ilow 1 and I1 are of the same order, then we have a third
term I2 which is also of the same order and which can not be canceled out. So by our
low
meta-argument it becomes obvious that the asymptotics do not change in this case either.
upp
Why is it generally true that |I1 | = Θ(|Ilow 2 |)? ThisR∞is because the two integrals I1
and I2 were obtained by splitting a single integral I = 1 f(w)dw into two parts. But
I was taken over a continuous function f. (The min function is not smooth, but it is
continuous.) This means that the functions in I1 and I2 take the same values when the
splitting point wsplit = n/wv is plugged in; it is just the same as plugging wsplit into f.
upp
How do we get I1 and Ilow 2 from f(wsplit )? Since we integrate over polynomials in both
cases, we just increase the exponent by one and plug in wsplit , so in both cases we obtain
wsplit f(wsplit ) up to constant factors.
The argument may seem a little magic, so let us rephrase it in terms of the quantities
that we compute. We compute how many neighbours a vertex v of weight wv has in
expectation. The integral I has a natural interpretation: the range from w0 to w1 gives
us the number of neighbours with weight between w0 and w1 (all in expectation). The fact
that the first integral has exponent < −1 tells us that there are more neighbours of constant
weight than of larger weight, for example of weight in the interval [wsplit /2, wsplit ]. The
key insight is that the latter number is essentially the same as the number of neighbours
with weight in [wsplit , 2 wsplit ], due to three ingredients:
1. Both intervals have the same length, up to a constant factor of 2.
2. The probability density is the same up to a constant factor: increasing w by a
constant factor κ decreases the probability density Pr[wu = w] by the constant
factor κ−τ .
3. The connection probability Pr[u ∼ v | wu = w] only changes by a constant factor
if we vary w within [wsplit /2, 2 wsplit ]. This is because the connection probability
is continuous and piecewise smooth in wu .
So let us summarize how we should actually think about Lemma 3.6. (All statements
about expectations.)
(a) Since I1 has exponent < −1, there are more neighbours of constant weight than of
larger weights, in particular than weights in [wsplit /2, wsplit ].
(b) There are the same number of neighbours with weights in [wsplit /2, wsplit ] and with
weights in [wsplit , 2 wsplit ], up to constant factors.
(c) Since I2 has exponent < −1, there are more neighbours of weights in [wsplit , 2 wsplit ]
than of larger weights.
Thus, most neighbours have constant weight, and we can neglect any term that comes
from neighbours of larger weight. It thus suffices to compute how many neighbours of
constant weight there are. This is easy to compute. There are Θ(n) vertices of constant
weight, and each of them has probability Θ(wv /n) to connect to v. Hence, wv has Θ(wv )
neighbours.
As a final exercise, let us try to apply the same reasoning for 1 < τ < 2. How many
neighbours does wv have in this case? We still obtain the same integrals I1 and I2 , but
now the exponent 1 − τ of I1 is larger than −1. This means that v has more neighbours of
weight in [wsplit /2, wsplit ] than of smaller weights. On the other hand, the exponent −τ
in I2 is still smaller than −1, so there are more neighbours with weight in [wsplit , 2 wsplit ]
upp
than neighbours with larger weight. So we only need to evaluate either I1 or Ilow 2 (both
automatically give the same value up to constant factors). The term Ilow 2 looks a bit
simpler, so we plug wsplit = n/wv into the inverse derivative w 1−τ
and obtain that v has
Θ((n/wv )1−τ ) neighbours in expectation.4
Pr[wu = w and u ∼ v]
Pr[wu = w | u ∼ v] =
Pr[u ∼ v]
w−τ min{1, ww v
}
= Θ(1) n
= Θ(w1−τ ),
wv /n
Of course, since degrees and weights are tightly coupled, a similar state-
ment would be true for the distribution of the degree of u instead of the
weight of u. Theorem 3.7 is rather remarkable since it says that the distri-
bution of wu does not depend on wv . This property is also called neutral
assortativity (or no assortativity). Assortativity is a measure for how much
the distribution of deg(u) depends on the value of deg(v), and in which di-
rection this connection goes. We do not give a formal defintion (since there
are several competing ones), but informally speaking a graph has positive
assortativity if the distribution of deg(u) is more skewed towards larger
values if deg(v) is large, and more skewed towards smaller values if deg(v)
is small. If the connection goes into the opposite direction, we speak of
negative assortativity. As a rule of thumb, social networks tend to have
positive assortativity (nodes of large degree connect especially well to other
nodes of large degree), while many technological networks have negative as-
sortativity (nodes of large degree connect well to nodes of small degree).
Note that in the most common case τ 2 (2, 3), the random variable W2
in Theorem 3.7 has infinite expectation. A bit more precisely, the limiting
distribution for n → ∞ has infinite expectation, while for finite n there
is a cut-off point that goes to infinity. Pointedly speaking, while you have
a constant number of friends, your friends have in expectation an infinite
number of friends.5
Of course, it is prevented by the cut-off point and the finite size of
the universe that your friends actually have infinitely many friends. But
the cut-off point goes to infinity as a polynomial in n, so it grows rather
fast. The expectation of wu is really large even for finite networks. On the
other hand, this is one of the cases where the expectation is dominated by
low-probability events (the rare event that you have a superstar as friend:
for most people it does not happen, but the slim chance that it happens
dominates the expectation). So for most people the situation looks a bit
less depressing. A more accurate estimation of the typical most popular of
your friends is given by Theorem 3.8.
In Theorem 3.8, note that the exponent 1/(τ − 2) is larger than one
for τ 2 (2, 3). Hence, v has a neighbour of much larger weight than v
itself. On the other hand, if τ > 3 then for a large-weight vertex v typically
all its neighbours have much smaller weight than v. However, the model
is generally less interesting for τ > 3. The large-degree vertices are then
negligible for most questions since there are too few of them, for example for
the small-world properties discussed in the next section. In such respects,
the Chung-Lu model for τ > 3 behaves just like the Erdős-Rényi model.
For the probabilities in Theorem 3.8, both terms approach 1 as w in-
creases. However, the probability in (i) approaches 1 very rapidly as w in-
creases (“stretched exponentially”), while the probability in (ii) approaches
1 more slowly (polynomially fast in w).
6
This is the only reason for assuming w 2, to get rid of some notational ballast. This
simplification would not be true for w = 1.
uniformly at random from the giant component. Then the graph
distance d(x, y) satisfies
2 o(1)
d(x, y) = log log n (3.7)
| log(τ − 2)|
Proof (sketch). We will only show the upper bound, and only sketch the
main idea. Let ε > 0, and let us write η := 1/(τ − 2) − ε for brevity. Since
τ < 3, we may choose ε so small that η > 1.
Consider a vertex v0 of large constant weight w. Then by Theorem 3.8
v0 has a neighbour v1 of weight w1 = wη . Applying the same theorem
again, v1 has a neighbour v2 of weight w2 = wη1 = wη . Iterating, we find
2
which gives the condition k logη 21 logw n = logloglogη n − O(log log w).
We apply this reasoning for both x and y. From each of them, we
find paths of length k to vertices x and y of weight at least n1/2 . By
0 0
*
ji%y
÷
"
,¥÷,=%Y÷ˢᵗm
{
rn
-
i.
WE E-
loyzbgn
The last formula falls essentially into this category since there are constants
c1 , c2 such that whp c1 n W c2 n. There are some subleties: it is no
longer true that the events “u ∼ v” and “u ∼ v ” are independent of each
0 0
Other distributions
Multigraph Variation
edge
} di
:: stubs
loop
↑
random perfect
matching
Figure 3.2: In the configuration model, every vertex vi gets di stubs (half-
edges), and the stubs are connected via a random perfect matching. It may
happen that loops or multi-edges are created.
the number of loops and multiple edges is very small, so we are still close
to the target degree sequence. Moreover, by the same argument as above,
loops and multiple edges mostly affects vertices of large degree, where one
edge fewer might be tolerable.
One reason why the configuration model is popular is that in the case
of finite second moments E[D2 ], it can be analyzed with amazing precision.
For example, let us call D2 the degree of a random neighbour u of a random
vertex v. It is not hard to see that D2 only depends on the degree sequence,
and to compute its expectation:
E[D2 ]
E[D2 ] = .
E[D]
This means that we can employ all the machinery that we have learned for
Erdős-Rényi graphs in this more general case. In particular, we can couple
a local exploration of the configuration model with a Galton-Watson tree
with offspring distribution D2 − 1. The “−1” accounts for the fact that
the parent is also a neighbour that must not be counted as offspring. In
particular, the Galton-Watson tree has positive survival probability if and
only if µ = E[D2 ] − 1 > 1, or equivalently
Hence, the giant component would also form in the truncated graph where
we reduce all large degrees to C (and even if we just remove them from the
graph). Likewise, since we can approximate µ to arbitrary precision by a
truncated distribution D C , the typical distance log n/ log µ is a property
that arises from vertices of degree at most C. Vertices of larger degree play
essentially no role for shortest paths in the graph. Note how different this
is from Chung-Lu graphs with exponent τ 2 (2, 3), where shortest paths
were obtained by reaching vertices of very weight in O(log log n) steps.
A particular case for which E[D2 ] = O(1) are power-law networks with
exponent τ > 3 (Chung-Lu or configuration model). In both models, the
high-weight vertices do not even help to cut the lengths of shortest paths.
Moreover, they do not even play a role for the formation of a giant com-
ponent: a giant exists if the low-weight vertices are dense enough to form
it on their own, and otherwise it does not exist. For most purposes, it is
adequate to consider power-law networks with τ > 3 as networks in which
there are too few large-degree vertices to affect the global structure. Of
course, some things do change. For example, power-law networks with
τ > 3 still contain vertices of polynomial degree n1/(τ−1) . As a triv-
ial consequence, in the subcritical regime without a giant component, they
still contain components of size at least n1/(τ−1) , even if they are little more
than a star around a central vertex of that degree. Recall that such com-
8
The reference uses a slight variation of our model here.
ponents do not exist in Erdős-Rényi networks by Lemma 2.6. On the other
hand, the supercritical regime (both for τ 2 (2, 3) and for larger τ if the
Molloy-Reed criterion (3.12) is satisfied) is similar to Erdős-Rényi graphs
with µ > 1: the fraction of components of size s decays exponentially in
s as in Theorem 2.4. The reason is the same as for the Erdős-Rényi case:
in the corresponding Galton-Watson process, if we have k vertices in some
layer, then each of them has the same positive probability to become the
root of an infinite subtree, and this is independent for all k vertices. So the
probability of staying finite is exponentially small in k. (For the configura-
tion model, it is still true that a BFS can be coupled to a Galton-Watson
process, though the proof is a bit harder than for the Chung-Lu model.)
degGk (vi ) + δ
Pk ,
j=1 (degGk (vj ) + δ)
Note that we allow any δ > −M, so we can achieve any power-law
exponent τ > 2.
We will not give a proof of Theorem 3.10, but in the following we will
try to make it plausible. Firstly, it may seem that the random process is
rather unpredictable, but it is not. Rather the opposite, it can be shown
that the time of “birth” (i.e., the index k of a vertex) plays almost the same
role as the weight for Chung-Lu random graphs, where the “weight” of vk is
(n/k)1/(τ−1) . I.e., whp the number of neighbours of vertex vk is proportial
to (n/k)1/(τ−1) if n/k is large. This is the same deterministic formula that
is sometimes used as a fixed weight sequence in the Chung-Lu model to
generate power-law graphs of exponent τ. However, there are also some
systematic deviations from the Chung-Lu model. In particular, recall that
every vertex receives M edges at birth, so there are no isolated vertices.
Moreover, by induction the graph is connected at all times.11
So where does the power-law come from? Let us denote the degree of
vertex vi at time t (i.e., when the graph has t vertices) by Di,t , and assume
that Di,t is large for some t. At this time, the total number of edges is tM.
Therefore, when an edge chooses a random endpoint, then the probability
pi,t that it chooses vi is
Now consider the next εt rounds. A total of εtM edges will be added
during this time. This means that the denominator of (3.13) changes little
11
There are also variants where small components can form, similar to Erdős-Rényi
graphs. For example, we may not equip a new vertex with exactly M edges, but only with
M edges in expectation, where the exact number may be zero with positive probability.
It is mostly a matter of taste whether generating only connected graphs is a bug or a
feature of the model.
during this time. Let us momentarily assume that pi,t also stays almost
constant in this period. (We can achieve this by choosing ε small enough.)
Then the expected number of edges that hit vi during this time is roughly
Di,t + δ Di,t εDi,t
εtM εtM = . (3.14)
(2M + δ)t (2M + δ)t 2 + δ/M
So during this phase, the total number of vertices and edges in the graph
increases by a factor of (1 + ε), while the degree increases by a factor of
ε
(1 + 2+δ/M ε
) = (1 + τ−1 ) (1 + ε)1/(τ−1) . By iterating the argument, we see
that this is true for any factor: when the number of vertices increases by
a factor C, then the degrees increase by a factor C1/(τ−1) . In particular, if
vertex vk is inserted at time k, then its degree may grow in the time interval
from k to n. In this time interval, the number of vertices grows by a factor
of n/k (it grows from k to n), and therefore the degree grows from Θ(1) to
Θ((n/k)1/(τ−1) ). In particular, how many vertices are there with degree at
least x? Assuming concentration and ignoring the hidden constant factors,
it is exactly those vk for which (n/k)1/(τ−1) x, or equivalently those vk for
which k x1−τ n. There are exactly x1−τ n such integers k, so the fraction
of vertices of degree at least k is x1−τ . This is the cumumlative power-law
condition with exponent τ.
Of course, a full proof needs a lot of concentration bounds to make the
argument precise, but not much more than that.
The preferential attachment model is compelling for two reasons. Firstly,
it gives a possible explanation for the origin of the power law. Secondly,
it is dynamic and models how the graph changes over time. For some
questions, we would like to work with such models. For example, for some
networks there is birth time data available, i.e., data about the time when
nodes joined a network. If we want to study related to questions like this,
then dynamic models like preferential attachment are the models of choice.
If one is only interested in the static network of size n that is generated
in the end, and not in the history of the process, then the preferential at-
tachment model is less attractive. The process introduces dependencies and
thus makes it very technical to analyze rigorously. In fact, most analyses
of the model show that the resulting graph is very similar to a Chung-Lu
model or configuration model, and use this connection to prove that the
corresponding statements transfer to the preferential attachment model.
(Just mind the obvious differences in component structure and low-degree
vertices since the minimum degree is M.) As usual, there is no “best”
model, and it depends on the question of interest which model to choose.
Preferential attachment networks have a rather rigid dynamics. On the
one hand, this is very helpful for analyzing them. On the other hand, the
dynamics are too rigid to match real evolving networks well, like the web
graph (webpages and hyperlinks) or citation networks (scientific papers and
citations). To understand the problem, let us assume in the preferential
attachment model that the degrees of two vertices u, v differ by a constant
factor at time t, e.g. degt (u) 2degt (v). If both u and v are large12 then
this will stay true throughout the whole process: it is very unlikely that v
will ever overtake u. In particular, the final degree is mostly determined
by the age of the vertex, and the highest degrees are obtained by the oldest
vertices. In real-world networks, the age of a node (a webpage, a paper,
. . .) is certainly correlated with its degree, but the correlation is much
weaker than in preferential attachment networks. This limitation can be
overcome by combining the ideas of Chung-Lu random graphs and pref-
erential attachment. In this model, the graph is still generated vertex by
vertex, but each vertex also obtains a weight. The connection probability
to vertex v is then a function of the degree of v (at time t) and of the weight
of v. The resulting graphs are similar to ordinary preferential attachment
graphs, but the dynamics are more realistic. We do not go into further
detail. A discussion can be found in [LNR17, Chapter 6.5].
3.5.2 Weaknesses
The biggest weakness of Chung Lu and configuration models is that they
have no clustering or community structure. The clustering coefficient is
easily seen to be o(1). It is slightly larger than for Erdős-Rényi graphs
because of the skewed neighbourhood degree distribution: the neighbour-
hood of a vertex v has an increased probability to contain vertices of large
weight, and those are more likely to connect to each other. However, the
effect is not large: most neighbours of v are still of constant weight, and
those still have probability Θ(1/n) to connect to each other.
This lack of triangles extends to larger cycles, to cliques, and to other
small and dense subgraphs. The networks look locally tree-like (which
is essentially the same as saying that they are well described by Galton-
Watson processes). In particular, if we start from a random vertex and
explore the graph, usually we obtain in the first rounds a subgraph of k
vertices which is a tree. Thus is has only k − 1 edges and has minimal
density. However, it can be tricky to find the proper statistics here. For
example, if we try to simply count the number of connected subgraphs of
size k which have more than α k edges for some α > 1, then this may
p
yield a large number. The vertices of weight at least n form a gigantic
clique, and this contains very many subcliques of size k, all of which are
counted in the statistics. Still, the networks are considered not to have
community structures, even though measurement can be tricky.13 This
lack of communities is the weakest point of the models.
13
In fact, the configuration model is sometimes used as baseline, and other networks are
said to have community structure if they have more densely connected subgraphs than
the corresponding configuration model with the same degree distribution.
Chapter 4
Geometric Graphs
57
more fundamental property of real-world networks, locality. The problem
is that we do not really know what exactly we mean by locality, so we
measure it by auxiliary measures. The clustering coefficient is one of them,
and the hexagonal lattice happens to behave much better with respect to
this measure than the square lattice. However, this does not necessarily
mean that the hexagonal lattice is a better representative of real-world
networks than the square lattice. We should be careful not to overfit to a
single auxiliary measure. In this case, we can easily see this by switching
to other auxiliary measures. For example, real-world networks have also
many K4 as subgraphs, and those exist neither in the square lattice nor
in the hexagonal lattices. Still, we will see some applications where either
type of lattice models locality just fine.
As an alternative, for d 2 it is possible to use Random Geometric
Graphs instead of grids. In this setting, n vertices are randomly placed
in a d-dimensional cube of volume n (with or without torus topology).
Then two vertices are connected if and only if they have distance at most
r, where r is a parameter of the model. If d 2 and r is a sufficiently large
constant, then one can show that the graph has a giant component, and
the remaining components show a stretched exponential tail bound as for
Erdős-Rényi graphs.2
2
Stretched exponential means that the fraction of vertices in component of size s is at
ε
most η(s ) for some constants η < 1 and ε > 0. This is still a very fast decaying function
in s, though not quite as fast as an exponential function. Arguably, this type of tail
bound is even more plausible than a proper exponential, since the non-giant components
in real-world networks are small, but not quite as small as an exponential tail would
suggest.
with probability p we add the edge from u to v.
Proof. We will only prove the upper bound. Partition the grid into n := 0
n2 p/2 cubes of volume U := 2/(np), i.e., the side length of the cubes is
U1/d . We will ignore rounding issues and assume for simplicity that U1/d
3
The paper The strength of weak ties by Mark Granovetter [Gra73] is the most cited
paper in sociology of all times.
4
In the sense that for every ε > 0 there are c1 , c2 > 0 such that two randomly chosen
vertices u, v have distance c1 n d(u, v) c2 n with probability at least 1 − ε.
is an integer. By construction, U is the number of vertices in each block,
and by the assumption on p we have U = o(n) and U = Ω(1).
Consider the graph G = (V , E ) where the vertex set is the set of cubes,
0 0 0
G . We will next show that p 1.5/n . Note that this is plausible since
0 0 0
,÷÷}
¥
blocks form
'
a Grip '
with p ≥ 1.51mi
Figure 4.1: Partitioning into blocks for d = 1. The induced graph between
blocks is a supercritical Erdős-Rényi graph on n vertices. Thus it has 0
where in step () we have used that pn2 = ω(1). Hence, for large n we have
p 1.5/n and the Erdős-Rényi graph G is supercritical. In particular,
0 0
this means that the typical distances in the giant component of G are 0
0
Θ(log n ).
Now we consider two random vertices v1 , v2 2 V and bound their dis-
tance. For simplicity, let us assume that their corresponding cubes C1 , C2
are in the giant component in G . (Otherwise we can walk along the grid
0
to cubes which are in the giant.) Then we need to use O(log n ) edges in 0
But since all the cubes have side-length O(U1/d ), we can walk from u1 to
u2 along the grid in O(U1/d ) steps. Thus we can walk from v1 to v2 in G
in O(U1/d log n ) steps.
0
majority of test subjects continued to seemingly torture a fellow test person to death,
simply because a scientist in a lab told them that this is how the protocol goes. If you
don’t know about this experiment, you should read about it: https://en.wikipedia.
org/wiki/Milgram_experiment.
networks. In the most famous experiment, Milgram gave a letter to some
person A in the US Midwest that was addressed to some person B at the
US East Coast. However, A was not allowed to send the letter directly to
B. Instead, A was only allowed to send the letter to a personally known
contact A , defined as someone whom A knew on a first-name basis. A was
0
supposed to pick a neighbour A who was more likely to know the target
0
B. Then A continues the process in the same manner, i.e., A sends the
0 0
diameter. However, the following theorem shows that these shortcuts are
The success rate in the first experiments was only 5-30%, but could be increased to
6
up to 85% in later variations where letters were replaced by phone and email.
7
In the original experiment, the participants were also told the job of the target.
not too helpful for navigation.
Proof. For the lower bound, we may assume that s and t have Manhattan
distance at least ∆ := 21 p−1/(d+1) , since this is asymptotically smaller than
the side length of the grid. Consider the event E that during the first ∆
steps, the algorithm does not uncover a random edge whose endpoint has
Manhattan distance at most ∆ from t. First we show that conditional on
E , it is impossible for the algorithm to reach t in ∆ steps. Consider the last
random edge e that the algorithm takes during those ∆ steps. Then after
taking this edge, the algorithm has Manhattan distance more than ∆ from
t. By definition of t, it only uses grid edges afterwards, so it does not reach
t in ∆ steps. The same applies if the algorithm does not take any random
edges at all during the first ∆ steps.
Next we show Pr[E ] 1/2. Note that this will conclude the proof of
the lower bound, since it implies that the expected number of steps is at
least ∆/2, as required. The crucial insight is that since the random edges
are uniformly at random, it does not matter which path the algorithm
takes during the first ∆ steps. By symmetry, the probability of E does not
depend on the set of explored vertices, but only on the number of explored
vertices. So let us compute Pr[E ].
Let us consider the ball around t of radius ∆ with respect to the Man-
hattan distance. It contains less than ∆d vertices. Thus, when we explore a
new vertex v, the probably to find a random edge into this Manhattan ball
is at most p ∆d ∆d /n by a union bound. By another union bound over
the first ∆ steps, the probability that this happens in any of those steps is
Pr[E ] p∆d+1 = 2−d−1 1/2. This concludes the proof.
For the upper bound, we compute the time until we reach Manhattan
distance ∆ from t. We pessimistically assume that we need to wait for a
random edge into that region. Since the Manhattan ball has size Ω(∆d ),
in each step we have probability of Ω(p∆d ) of discovering such an edge.
Hence, the expected time until we find such an edge is O(p−1 ∆−d ) = O(∆).
Afterwards, we need at most ∆ more steps to proceed to the target. Thus
the expected number of steps is O(∆).
To make Theorem 4.3 more concrete, for d = 1 and p = 1/n the lower
p
bound is Ω( n) steps, even though the typical distances are only O(log n).
Random edges are not completely useless. Without them, the typical dis-
tance would be Ω(n). However, navigation is much less efficient than
shortest paths. In general, for any dimension d and any p in the specified
range, the time for local navigation is always polynomial in n, while the
typical distance may be polylogarithmic for some values of p.
In 2000, Jon Kleinberg proposed a model in which routing is possible in
poly-logarithmic time [Kle00]. The main difference to the Watts-Strogatz
model is that edges are no longer placed uniformly at random, but rather
the probability for placing an edge depends on the distance of the two
endpoints.
É¥÷É¥*¥¥¥ intersection
Ann 13%4
Figure 4.2: The distance between v and t is ∆. The annulus Ar (v) contains
all points in distance [∆/2, ∆) from v, and the ball B3∆/4 (t) contains all
vertices in distance at most 3∆/4 from t. Both have volume Θ(∆d ), and
their intersection has also volume Θ(∆d ). (Balls in Manhattan distance
look like diamonds, but the conclusion would also hold for any other.)
69
The idea behind the model is that the geometric position captures prop-
erties and categories of the nodes. In social networks, this might be pro-
fession, place of living, or hobbies. The weight captures the popularity of
a node. The connection probability increases with the popularity of the
nodes, and is larger for nodes which are geometrically close to each other.
There are some choices in (5.1) which do not immediately have an ob-
vious reason. One is the appearance of the exponent α. Another is why
the distance k.k should have exactly exponent d. We will return to both
of these question later in more detail (Sections 5.1.1 and 5.5). We will also
see that we could have used any other norm. The only reason to use the
∞-norm is that formulas look a little bit nicer, since any two points in x
have distance at most n1/d . In the following, we will omit the index ∞
from the norm and simply write kxu − xv k.
But first we will show that formula (5.1) yields the same marginal prob-
abilities as the Chung-Lu model.
Note that the splitting point (wu wv )1/d lies in the integration range [0, n1/d ]
because wu wv n. In the first integral, the minimum is taken by 1, in the
second integral the minimum is taken by wu wv /rd . Hence, (5.2) simplifies
to
Z (wu wv )1/d Z n1/d
d−1
Θ(r ) 1dr + Θ(rd−1−dα ) (wu wv )α dr.
0 (wu wv )1/d
as required.
(d) There are nΩ(1) vertices of weight at least n1/2 , which form a
single clique. We call this set of vertices the inner core.
The GIRG model is less fragile than the Kleinberg model with respect to
the exponent d in the term kxu − xv kd in (5.1). If instead we choose any
different exponent d > d/α, then the resulting graph still has a power-law
0
small world. Thus we do not rely on d being one specific value. There is a
0
whole range [d/α, 2d/(τ − 1)] of possible exponents, and one easily checks
that this interval is non-empty. However, the case d = d has a convenient
0
parametrization: for other values of d we do not have E[deg(v)] = Θ(wv ),
0
but instead we have E[deg(v)] = Θ(wd/d ). Since weights and degrees fall
0
The role of α
The terms weak ties and strong ties also make sense for GIRG. However,
other than for the Watts-Strogatz model, in GIRGs there is a continuous
spectrum between strong and weak ties. For an edge uv, if wu wv /kxu −
xv kd 1 then the edge is a strong tie, and if wu wv /kxu − xv kd is “much”
smaller than one, then the edge is a weak tie. It is a matter of taste where to
draw the line. In order to have a clear distinction, we define an edge uv to be
a strong tie if and only if wu wv /kxu − xv kd 1. Informally, a more natural
convention might be that uv is a strong tie if wu wv /kxu − xv kd = Ω(1), and
that it is a small tie if wu wv /kxu − xv kd = o(1). However, this would not
give us a clear distinction between strong and weak ties for a fixed edge in
a fixed graph for some concrete value of n, which is why we do not use this
convention.
The exponent α ensures that “most” edges are strong ties.4 For illustra-
tion, let us focus on vertices of constant weight. Recall that those form the
majority of vertices. Fix a vertex v of weight wv = O(1). We want to study
the number of neighbours of weight O(1) of v, and we want to understand
how this is affected by the exponent α. As in the analysis of the Kleinberg
model, Theorem 4.5, we want to understand the number Nr of neighbours
of constant weight in distance [r, 2r] from v. There are Θ(rd ) vertices in
this distance range. If we would omit the exponent α, then the connection
probability would be wu wv /rd = Θ(r−d ), so E[Nr ] would be Θ(1). Thus
we would have exactly the same situation as in the Kleinberg model, and a
vertex would have the same number of neighbours (of constant weight) in
every distance range. In total, this would lead to a degree of Θ(log n). (Or
we could put a factor 1/ log n in front of the probability as in the Kleinberg
4
This is to be taken with a grain of salt. With our strict definition of strong ties, still
a Θ(1)-fraction of all edges are weak ties, and it could even be more than half. With the
informal alternative, it would be a o(1)-fraction.
model, to get constant degrees.) However, with the exponent α we obtain
E[Nr ] = Θ(rd (r−d )α ). Thus, for α > 1 the number of neighbours per
distance range decreases with r, and most neighbours are close to v. In
fact, there is nothing special about the function xα that we applied here.
It can be shown that we could take any non-negative increasing function
R∞
f(x) with f(1) = 1 and x=1 f(1/x)dx < ∞, and define
!
w u wv
puv := min 1, f d
.
k xu − xv k
The resulting graph model would work just as well as the GIRG model.
The function f determines how quickly the number of weak ties decays with
increasing distance.
An important special case is known as the threshold GIRG model or
as α = ∞. In this case, we set f(x) := 0 for all 0 x < 1, and f(x) := 1 for
x 1. So, we connect two vertices if and only if xwu −x
u wv
k
v
d 1. This is an
k
same. The only minor difference to a GIRG is that there we place exactly
n = Vol(X ) points in X . For G , every vertex has chance Vol(X )/n to
0 0
E[Nstrong strong
[r,2r] ] = Θ(E[N r
]) = Θ(E[Nweak weak
[r,2r] ]) = Θ(E[N r ]).
Proof. We first consider the interval [r, 2r]. The number of vertices with
distance in [r, 2r] from v is Θ(rd ). We will only give the calculation under
the simplifying assumption that all those vertices have distance exactly r
from v. Then a vertex u in distance r yields a strong tie with v if and only
if it has weight wu rd /wv . Moreover, we have the probability density
Pr[wu = w] = Θ(w−τ ), and hence
Z∞
strong d
E[N[r,2r] ] = Θ(r ) w−τ dw = Θ(rd (rd /wv )1−τ ) = Θ(rd(2−τ) wτ−1
v ).
rd /wv
(5.4)
On the other hand, a vertex u in distance r forms a weak tie with v if
and only if i) it has weight wu < rd /wv , and ii) it forms an edge with v.
The probability density that it has weight w is Θ(w−τ ), and therefore
Z rd /wv
wwv α
weak d
E[N[r,2r] ] = Θ(r ) w−τ min 1, d dw (5.5)
1 r
Z rd /wv
d −τ wwv α
= Θ(r ) w dw
1 rd
Z rd /wv
d−dα α
= Θ(r wv ) wα−τ dw.
1
If α − τ < −1, then we have to evaluate the integral at the lower boundary.
Moreover, µ = α in this case, and hence
E[Nweak
[r,2r] ] = Θ(r
d−dα α
wv ) = Θ(rd (rd /wv )−µ ).
If instead α − τ > −1, then we have to evaluate the integral at the upper
boundary. Since this is cumbersome, we cleverly observe that the inte-
gral (5.5) is in fact the same integral that we evaluated in (5.4), only with
a different integration range. Since we evaluate the integral both times at
the same value (the lower boundary in (5.4), the upper boundary in (5.5)),
by Excursion 3.1 they must give the same value up to constant factors.
Hence,
strong
E[Nweak d d
[r,2r] ] = Θ(E[N[r,2r] ]) = Θ(r (r wv )
1−τ
).
Since µ = τ − 1, this proves the claim for the interval [r, 2r].
For the distances r, since τ − 1 > 1 and µ > 1, we observe that
E[Nstrong
[r,2r] ] and E[N[r,2r] ] are decreasing in r. If we start at r := r and sum
weak 0
and E[Nweak
r ] are decreasing in r. Hence, the further away we go from v,
the fewer neighbours we find. So, “most” edge are in or close to the ball
of influence. The term “most” is slightly imprecise because there is a soft
transition as we increase the distance from v. Moreover, within distance
Θ(r), strong neighbours need to have weight at least Ω(rd /wv ) by definition
of strong ties. In the proof, we evaluated the integral in (5.4) at the lower
boundary. This means that “most” of the strong neighbours in distance r
have the minimal possible weight Θ(rd /wv ).
For weak ties, there are two different regimes: for α > τ − 1 there
are few weak ties, and most weak neighbours in distance r have so large
weight that they almost qualify as strong ties, i.e., their weight is only a
constant factor below the threshold rd /wv . In this case, the majority of
strong ties and weak ties look rather similar to each other. For α < τ − 1
there are many weak ties. In particular, there are asymptotically more weak
neighbours than strong neighbours in distance r as r → ∞, and “most” of
the weak neighbours in distance r have weight Θ(1). Rephrased, in the
case α > τ − 1, a random neighbour in distance r rI (v) typically has
large weight Θ(rd /wv ), while it typically has small weight Θ(1) in the case
α < τ − 1. Note that we can only observe this distinction outside of the
ball of influence, so only for radii r > rI (v). Since v connects to all vertices
inside the ball of influence, picking a random neighbour is the same as
picking a random vertex inside the ball, which will likely have weight Θ(1),
regardless of the values of α and τ.
As an interesting corollary of Lemma 5.4, we obtain that GIRGs have
a large clustering coefficient.
properties that we know about GIRGs. For example, G has a giant com-
0
ponent of size Θ(n ). Also, the number of edges within G is Θ(n ). The
0 0 0
The formal statement is a bit more technical, since Lemma 5.4 only makes a statement
5
1 R n1/d /4, and let V be the set of vertices in X . Let E(V , V \V)
0 0 0
Then
V)|] = O((Rd )ν (log R)c1 ) and E[|E(V , V \ V)|] = Ω((Rd )ν (log R)c2 ) for
0
two constants c1 , c2 2 R.
Proof. The proof involves the most complex calculation in this course, and
we will ignore some borderline cases. We will go over the vertices v 2 V 0
form.
Z Z
Pr[9v : d(xv , ∂X ) = r] Pr[wv = w] E[#{nbs of v in V \ V } | r, w]dwdr.
0 0
In the outer integral, we integrate over the possible distances r that v may
have from ∂X . In principle, this distance may be anything between 0
0
same is true for vertices in distance r 2 [1, 2], and there are about as many
vertices with distance r 2 [1, 2] as vertices with r 2 [0, 1], up to a constant
factor. Thus, we will lose at most a constant factor by omitting r 2 [0, 1].
On the other side, we will also ignore distances r R/2 from ∂X , i.e., we 0
contributes only a constant factor to the total volume, and it is not hard
to see that vertices in the center have less expected neighbours in V \ V 0
than vertices which are closer to the boundary. So, we will only consider
vertices in distance r 2 [1, R/2] from ∂X . 0
wv = w that v may have, and count how many neighbours v has for these
values of r and w.
We will compute the integral in two steps. In the first step, we will
only consider vertices v for which the ball of influence I(v) has non-empty
intersection with X \ X . This is the case if and only if rI (v) r, or
0
in this case.
So, we can finally compute the contribution of vertices v for which I(v)
intersects X \ X as
0
Z R/2 Z∞
d−1
I1 := Θ(1) R w−τ w dwdr
1 2r d
Z R/2
= Θ(Rd−1 ) [w2−τ ]2rd dr (5.6)
1
Z R/2
= Θ(Rd−1 ) rd(2−τ) dr.
1
Now we need to distinguish two cases. Let us first assume that d(2 − τ) 6=
−1, so that we need to evaluate the function [rd(2−τ)+1 ]. For d(2 − τ) > −1
we need to evaluate the upper boundary, and for d(2 − τ) < −1 the lower
boundary. For d(2 − τ) = −1, the inner integral simply gives log(R/2),
which we may swallow by a Θ̃(.) notation. Hence,
Θ(Rd−1 ) Rd(2−τ)+1 = Θ(Rd(3−τ) ) , if d(2 − τ) > −1,
I1 = (5.7)
Θ̃(Rd−1 ) , if d(2 − τ) −1.
This is equivalent to the condition wv < rd . We split this in yet two sub-
cases: strong ties and weak ties. Note that a vertex can have strong ties
outside of its ball of influence, if the neighbour has large weight. For this
case, we will just show an upper bound, and find that this contribution
is negligible. We will use the same integration method as before. Con-
sider a vertex v of weight wv = w in distance r from the boundary. In
order to form a strong tie with a vertex in X \ X , the weight of the neigh- 0
This is the same integral that we have already evaluated in (5.6). Therefore,
Istrong
2 = O(I1 ), and we may ignore this term.
Finally, we come to the last integral Iweak
2 . This covers weak ties of
vertices v for which I(v) is disjoint from X \ X . So let v be such a vertex.
0
is not hard to argue that a constant portion of those weak neighbours are in
X \ X : the asymptotics does not change if we consider weak neighbours in
0
6
We omit the case α = τ − 1. This case gives another log R-factor, but is otherwise
identical to the other cases.
distance at least 2r, and those have a constant probability to be in X \ X . 0
Since we are in the case α < τ − 1, the inner integral is Θ(1), and the outer
integral has exponent d(1 − α). Similar as in (5.7), we need to make a case
distinction, depending on whether d(1 − α) > −1 or not. Note that we can
also write this condition as d(2 − α) > d − 1. Thus we get
Θ(Rd−1 ) Rd(1−α)+1 = Θ(Rd(2−α) ) , if d(2 − α) > d − 1,
Iweak
2 =
Θ̃(Rd−1 ) , if d(2 − α) d − 1 (5.11)
= Θ̃(Rmax{d−1,d(2−α)} ) = Θ̃((Rd )max{1−1/d,2−α} ).
as required.
Theorem 5.6 says that the number of edges going out of community
V is Θ̃((n )ν ). Note that ν < 1, so the number of edges is indeed much
0 0
smaller (for large n ) than the number of edges inside of V , which is Θ(n ).
0 0 0
wv does not have neighbours of larger weight. Thus the algorithm has a
constant failure probability in the first few steps, when wv is still small, but
with growing wv the failure probability quickly becomes negligibly small.
Recall that vertices in the heavy core form a single clique, regardless of
their position. Therefore, a vertex v in the heavy core reaches vertices in all
places of X , and thus it is not hard to show that R(v) > D(v). Hence, when
the algorithm reaches the heavy core, it enters the second phase. (It would
not harm the analysis if the algorithm enters the second phase earlier, or if
we only have R(v) D(v) when the algorithm reaches the heavy core, but
both scenarios are not likely.)
Which neighbour does a vertex v pick in the second phase of the al-
gorithm? This is a bit trickier. First of all, mind that R(v) > D(v) does
not mean that t is a neighbour of v. While v has some neighbours in
distance range D(v), it does not connect to all vertices in this distance
range. In particular, it typically does not connect to t. However, v has
some neighbours which are much closer to the target t, and one of them
will be optimal. More precisely, we will show that the best neighbour u has
weight wu wopt := ϕ(v)−1 and distance D(u) Dopt := ϕ(v)(1−τ)/d from
t. Thus ϕ(u) wu /D(u)d = ϕ(v)−1 /ϕ(v)1−τ = ϕ(v)τ−2 =: ϕopt . Hence, in
the second phase we increase the potential by an exponent of τ − 2 in each
step, see also Figure 5.1. (We have τ − 2 < 1, so taking the (τ − 2)-th power
brings the potential closer to one. Since the potential is less than one, this
corresponds to an increase.) We start the second phase (and in fact, also
the first phase) with a potential wv /kxv − xt kd 1/n, and if we raise the
potential to power τ − 2 in each step, then an easy calculation shows that
we need at most | log(τ−2)|
1 o(1)
log log n steps to reach potential Ω(1). Here, the
o(1) term swallows the approximations that we have swept under the rock.
Once the potential is at Ω(1), we are finished, since then the algorithm has
a probability of Ω(1) to hit t in the next step.
It remains to show that the best neighbour u indeed satisfies wu
wopt and D(u) Dopt . Let us first show that such a neighbour indeed
exists. We will use without proof that Dopt D(v).7 Every vertex u of
weight wopt and distance at most D(v) from v is a neighbour of v because
wopt wv /D(v)d = ϕ(v)−1 wv /D(v)d = 1. This is not quite the set of vertices
we are looking for. Instead, we want to find a neighbour u of v which has
distance Dopt from the target. But this does not change much: every such
vertex has distance at most D(v)+Dopt 2D(v) from v, so all such vertices
of weight wopt connect to v with probability Ω(1).
On the other hand, there are vertices of weight wopt and distance
Dopt from the target: the expected number of such vertices is Θ(Ddopt w1−τ
opt ) =
Θ(ϕ(v) ϕ(v) ) = Θ(1). In a real proof, we would choose Dopt slightly
1−τ τ−1
larger to make sure that there are many such vertices. Hence, we have
shown that v does have neighbours with weight wopt and distance Dopt
7
This step is actually not completely trivial. It involves showing that not all combina-
tion of D(v) and wv are possible, since whp there are no vertices with very high weight
which are very close to t. Alternatively, one can show inductively that throughout the
second phase (except for the very first step) the relation D(v)d wτ−1 v holds, which
follows the formulas for Dopt and wopt .
Figure 5.1: A typical trajectory of greedy routing. (Figure taken
from [BKL+ 22], β = τ.) In the first phase the weight is increased by
an exponent of 1/(τ − 2) in each step. In the second phase, the potential
is increased by an exponent τ − 2 in each step.
from t.
We still need to show that there are no neighbours with better potential.
To this end, consider vertices of weight w in distance r from t. Such vertices
exist if w (rd )1/(τ−1) , since this is the maximum weight among rd vertices.
We write equivalently
r−d w1−τ 1. (5.13)
Let us now compute the expected number of neighbours of v which have
weight w and distance r from t. For simplicity, we will restrict our-
selves to the case r D(v)/2. Mind that those vertices still have distance
D(v) from v, not distance r. Hence, the probability to form an edge
Patching
Geometric Routing
(i) u and t should lie in the same direction from v, so the vector −
x− →
v xu
should have a similar direction as the vector −
x−→
v xt .
For condition (i), all neighbours of v have the same chance to lie in a good
direction, regardless of their weight. But for condition (ii), it depends on
α. If α > τ − 1, then for large r there are more strong vertices in distance
r than weak neighours. By Lemma 5.4, the expected number of strong
Erdős-Rényi graphs
p
In Erdős-Rényi graphs, we need to explore Θ( n) vertices from both sides.
p
Then every vertex in the s-BFS has probability Θ(1/ n) to appear in the
t-BFS, so the expected number of vertices which appear both in the s-BFS
and the t-BFS is Θ(1). This is know as the birthday paradox. By standard
p
probabilistic arguments, we thus only need to explore Θ( n) vertices in
expectation. This is much fewer than the Θ(n) vertices that we need to
explore by unidirectional search.
Chung-Lu graphs
For Chung-Lu graphs with power-law exponents τ 2 (2, 3), we know that
shortest paths run via the inner core. I.e., the two BFS will run until
both have reached the inner core, and then they will find an overlap. The
question is. How many vertices does a BFS explore before it finds the inner
core? The answer will turn out to be surprisingly simple, but in order to
understand the answer, we need to take one step back and return to the
question of how many friends our friends have.
Consider a vertex v of weight wv . The weights in the neighbourhood
of v follow a power-law distribution with exponent τ − 1. Let vmax be the
neighbour of largest weight of v. Then the weight of vmax is roughly wmax :=
w1/(τ−2)
v , and vmax has Θ(wmax ) neighbours. But how many neighbours have
all neighbours of v combined? Since each neighbour v contributes Θ(wv )
0
0
integral:
Z wmax
#{nbs of nbs of v wv Θ(w1−τ w)}dw = Θ(wv w3−τ
max )
1
= Θ(w1+(3−τ)/(τ−2)
v ) = Θ(w1/(τ−2)
v ) = Θ(wmax ).
So, the neighbour vmax has about as many neighbours as all other neigh-
bours of v combined. Actually, we shouldn’t be surprised because we know
from Section 3.2.2 that in the limit, our friends have infinitely many friends
in expectation. This can only happen if the expectation is dominated by
the friend(s) of largest weight, which means that they must contribute as
much (or more) to the expectation than all other friends combined.
The same insight can also be transferred to sets of vertices. Assume
that S is some layer of the BFS tree S of vertices, and assume that this set
P
has total weight wS := v S wv . Consider the set S that we explore in the
2
0
1/(τ−2)
in fact, of the whole BFS tree) is in wS .
Unfortunately, the above calculation comes with a restriction. The
weights of the neighbours of a vertex v follow a power-law of exponent
τ − 1 up to the cut-off point n/wv . Beyond this cut-off point, the above
computation no longer holds. In particular, if the weight of v is too large
then the heaviest neighbour does not have weight w1/(τ−2) v . Fortunately,
this restriction only applies for very large weights wv . More precisely, it is
only relevant for the last layer, when the BFS finds the inner core.
Assume that S is the last layer before the BFS finds the inner core.
Let vmax be the heaviest vertex in S, and let wmax be its weight. Let S 0
it can be checked that wmax is below the cut-off point for wmax , so the
0
above calculations still hold and the BFS only needs to explore O(wmax ) = 0
maximal weight in the whole graph. In this case, we need to explore the
neighbourhood of the whole inner core. The inner core contains Θ(n
(n1/2 )1−τ vertices, all of which have degree Ω(n1/2 ), so its neighbourhood
has size Ω(n1+(1−τ)/2+1/2 ) = Ω(n(4−τ)/2 ). This bound is indeed tight, both for
the size of the neighbourhood of S and for the runtime of the BFS. Hence,
0
(ii) Boundedness: There is C > 0 such that κ(x) C kxk∞ for all
x 2 Rd .
(iii) Continuity of volume: For all R > 0, the function VolκR (r) :
R 0 → R 0 ; r 7→ VolκR (r) is surjective onto [0, Rd ].
When κ and R are clear from the context, we also write Vol(r) instead
of VolκR (r).
For the last condition, note that the volume of B∞ R/2 (0) is R . Hence, we
d
have VolκR (r) Rd for all r 0. Moreover, condition (ii) implies that for
r = CR/2, we have that B∞ R (0) Br (0), and thus VolR (CR/2) = R . The
κ κ d
function VolR (r) is increasing in r since the set Bκr (0) is growing with r,
and so the third condition requires that VolκR (0) = 0 and that the volume
increases continuously from 0 to Rd as r increases from 0 to CR/2.
For feasible distance functions, we can define a generalized GIRG model.
As for ordinary GIRG, we draw the vertex locations from an axis-parallel
cube X of volume n and radius R = n1/d . To avoid boundary effects, we
will not apply κ directly to xu − xv , but rather to xu − xv mod R, where
the “mod R” operator is applied componentwise for the d components of
the vector xu − xv . Recall that for y 2 R we define y mod R := y for the 0
unique y 2 [0, R) for which (y − y )/R 2 Z. In this way, the shape and
0 0
13
We are very slightly cheating here. We would need VolκR (r) = rd for κ = k.k∞ to
obtain the GIRG model. This is true for small r, but fails to be true if r > R/2 due to
boundary effects. But the difference is negligible.
xu 2 X , but that xv is still random. Then the probability that u and
v are connected is
wu wv
Pr[u ∼ v | wu , wv , xu ] = Θ min 1, .
n
Proof. We will show that the marginal probability is the same as in the
GIRG model. Let 0 y n, and let us study q y := Pr[VolκR (ruv )
Lemma 5.10 has vast consequences. It implies that all arguments that
are based on the marginal connection probabilities remain true. In partic-
ular, all result from Corollary 5.3 directly transfer to arbitrary κ-GIRGs,
including E[deg(v)] = Θ(wv ), the degree distribution in the neighbourhood,
and the existence of a giant component with typical distances. Moreover,
Lemma 5.4, which counted the number of strong and weak neighbours in
distance at least r (or in [r, 2r]) also remains true. In particular, most
neighbours of v are in or almost in the ball of influence I(v), and in dis-
tance r > rI (v), there are more weak neighbours for α < τ − 1, and more
strong or almost strong neighbours for α > τ − 1. The result for bidirec-
tional search also relies only on the marginal connection probabilities, and
thus carries over as well.
5.5.1 The minimum component distance
One of the most intriguing examples of a distance function κ is the mini-
mum component distance. (Not to be confused with the maximum compo-
nent distance, which is just the good old ∞-norm we know from analysis.)
For d 2 and a vector x = (x1 , . . . , xd ) 2 Rd , we define κ(x) := kxkmin :=
min{|xi | | 1 i d}. Note that this is not a norm, and it does not
even satisfy the triangle inequality. For the vectors x := (0, 0), y := (1, 0),
z := (0, 1), the vector x has distance zero from both y and z. But this does
not imply that y and z also have distance zero from each other, or even
“small” distance. There are also some other peculiarities. We have some
non-zero vectors x with kxkmin = 0. The scaling is also different than what
we know from norms and metrics. The r-neighbourhood of 0 consists of the
union of d “thickened” hyperplanes, each defined by −ε xi ε for a co-
ordinate i. Since each of these thickened hyperplanes has volume 2ε Rd−1 ,
we generally have VolκR (r) = Θ(Rd−1 r). Despite all these quirks, it is easy
to check that the minimum component distance is a feasible distance.
For a κ-GIRG, this means that two nodes are considered to be close if
they agree in at least one coordinate. Intuitively, this makes sense for social
networks: most of your acquaintances probably share at least one aspect
with you: you may know colleagues taking the same lectures, comrades
from your sport team or some other hobby, your family members, and your
neighbours. But typically, your acquaintances do not share all of your
aspects with you. Few (if any) of them will be a family member and study
the same subject and play in the same sport team and live in the same
house.
In GIRGs, the triangle inequality was the ultimate reason for the large
clustering coefficient. Let us briefly recall the argument. A typical vertex
u has weight O(1). The typical random neigbhour of u has also weight
O(1) and is in distance O(1) from u. Hence, if v1 and v2 are two random
neighbours of u, they typically both have distance O(1) from u, and then
by the triangle inequality they also have distance O(1) from each other.
Thus they have connection probability Ω(1), which leads to a clustering
coefficient of Ω(1).
Since the minimum component distance does not satisfy the triangle
inequality, it may seem it leads to low clustering coefficient. But that
is not so. The minimum component distance still satisfies the following
relaxed version of the triangle inequality.
Proof. The proof is similar to the proof for GIRG in Corollary 5.5, so we
only stress the main difference. It still suffices to show that E[CC(v)] =
Ω(1) for vertices v with weight wv 2 [1, 2] and deg(v) 2. If we choose
two random neighbours u1 , u2 of v, then they have a constant probability
to be both in the ball of influence I(v) of v. Since every vertex in I(v)
connects to v, the position of u and v is uniformly at random in I(v). By
the weak stochastic triangle inequality, with probability Ω(1), u and v have
distance at most 2rI (v), which implies that they connect with probability
Ω(1). This yields E[CC(v)] = Ω(1), as required.
probability u and u are close to v in the same coordinate, and thus also
0
close to each other. Let us now focus on a different case, that u and u are 0
the vector u − u mod R is just a uniformly random vector in [0, R]d . Now
0
consider the shortest path between u and u in G[V \{v}], i.e. after removing
0
v from the graph. Since u − u mod R is random, we are simply left with
0
two vertices u, u with random distance vector from each other, and with
0
high probability the shortest path between them has length 2|τ−2| o(1)
log log n
as for every other random pair of vertices in the graph. Thus in the k.kmin -
GIRG model, if we pick two random neighbours u, u of a vertex v, then
0
close to each other. They may not be direct neighbours, but the shortest
path between them is typically much shorter than 2|τ−2| o(1)
log log n.
How does this compare to real social networks? We do not really know
the answer, but it seems to make sense to some degree. Think of a fellow
student v who has come to ETH from abroad. If you pick a family member
u of v, do you expect to find an untypically short path from yourself to u
in the friendship network without going through v? It seems plausible that
the answer is No. (Of course, all paths are pretty short in social networks.
We want paths that are shorter than the typical distance.) One interpreta-
tion is that any person belongs to several social circles or communities, and
there is not necessarily any other connection between these communities.
This corresponds nicely the k.kmin -GIRGs, where each of the d dimensions
of a vertex v define communities which have nothing in common except
d. On the other hand, k.kmin -GIRGs also don’t seem to capture the whole
truth. Your neighbours and your fellow students probably belong to dif-
ferent social circles who have little to do with each other (unless you live
in a student hostel). But they all live in Zürich. So they are maybe as
far from each other as random Zürich citizens, but probably less far than
random people on earth. Relatedly, from a global perspective, there are
certainly small separators. The number of acquaintanships within Europe
is certainly much higher than the number of acquaintanships between Eu-
14
It is not perfectly uniform because there is a non-empty overlap between the d hyper-
planes whose union is I(v). But the overlap is negligibly small if wv is small.
ropeans and the rest of the world. So geometry seems to play a stronger
role than in k.kmin -GIRGs, but a weaker role than in GIRGs. There is still
a lot to be learned about real-world networks, and a lot to improve about
our random network models.
Bibliography
i
BIBLIOGRAPHY ii
[CL02] Fan Chung and Linyuan Lu. The average distances in ran-
dom graphs with given expected degrees. Proceedings of the
National Academy of Sciences, 99(25):15879–15882, 2002.
[EDF+ 16] Sergey Edunov, Carlos Diuk, Ismail Onur Filiz, Smriti Bha-
gat, and Moira Burke. Three and a half degrees of separation.
Research at Facebook, 694, 2016.
[Fel91] Scott L Feld. Why your friends have more friends than you
do. American journal of sociology, 96(6):1464–1477, 1991.