0% found this document useful (0 votes)
12 views110 pages

Complex Network Models

The document outlines a course on complex network models, focusing on random network models and their applications to real-world networks such as social and technological networks. It discusses various models including Erdős-Rényi random graphs, inhomogeneous degree distributions, and geometric graphs, while emphasizing the importance of understanding network properties relevant for algorithms and routing protocols. The course adopts a mathematical approach, primarily examining sparse undirected networks, and provides references for further reading on the topic.

Uploaded by

Günay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views110 pages

Complex Network Models

The document outlines a course on complex network models, focusing on random network models and their applications to real-world networks such as social and technological networks. It discusses various models including Erdős-Rényi random graphs, inhomogeneous degree distributions, and geometric graphs, while emphasizing the importance of understanding network properties relevant for algorithms and routing protocols. The course adopts a mathematical approach, primarily examining sparse undirected networks, and provides references for further reading on the topic.

Uploaded by

Günay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

Complex Network Models

D-INFK, ETH Zürich

Johannes Lengler

Stand: January 27, 2023


Contents

1 Introduction 2
1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Erdős-Rényi random graphs 9


2.1 The local perspective and the Galton-Watson branching pro-
cess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 The global perspective: sprinkling . . . . . . . . . . . . . . 18
2.3 Shortcomings of Gn,p : components, degrees, clustering, com-
munities, distances . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1 Component structure . . . . . . . . . . . . . . . . . 21
2.3.2 Degree distribution . . . . . . . . . . . . . . . . . . . 22
2.3.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.4 Communities . . . . . . . . . . . . . . . . . . . . . . 24
2.3.5 Distances . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Inhomogeneous Degree Distributions 29


3.1 Power-laws . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.1 Power-law probability distributions . . . . . . . . . . 30
3.1.2 Power-law sequences . . . . . . . . . . . . . . . . . . 31
3.1.3 Properties of power-laws . . . . . . . . . . . . . . . . 33
3.2 The Chung-Lu model . . . . . . . . . . . . . . . . . . . . . 36
3.2.1 Degree distribution . . . . . . . . . . . . . . . . . . . 36
3.2.2 Friends of your friends . . . . . . . . . . . . . . . . . 40
3.2.3 Ultra-small worlds . . . . . . . . . . . . . . . . . . . 43
3.2.4 Variations of the Chung-Lu model . . . . . . . . . . 44
3.3 Perfect sampling: the configuration model . . . . . . . . . . 47
3.4 Preferential attachment . . . . . . . . . . . . . . . . . . . . 51
3.5 Strengths and weaknesses of the Chung-Lu model . . . . . . 54

ii
3.5.1 Strengths . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5.2 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . 55

4 Geometric Graphs 57
4.1 Weak Ties and the Watts-Strogatz Model . . . . . . . . . . 58
4.2 Navigatibility and the Kleinberg model . . . . . . . . . . . 61
4.2.1 Shortcomings of the Kleinberg model . . . . . . . . 66

5 Geometric Inhomogeneous Random Graphs (GIRGs) 69


5.1 The GIRG model: basic properties . . . . . . . . . . . . . . 69
5.1.1 Variations and extensions . . . . . . . . . . . . . . . 73
5.2 Neighbours and Communities . . . . . . . . . . . . . . . . . 77
5.2.1 Most neighbours are close . . . . . . . . . . . . . . . 77
5.2.2 Boundaries and communities . . . . . . . . . . . . . 80
5.3 Greedy routing . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.4 Bidirectional search . . . . . . . . . . . . . . . . . . . . . . 93
5.5 Non-Euclidean GIRGs . . . . . . . . . . . . . . . . . . . . . 98
5.5.1 The minimum component distance . . . . . . . . . . 101
Chapter 1

Introduction

In this course, we will study some of the most important random net-
work models. Formally, a (finite) random network model is a probability
distribution over all graphs with vertex set V = [n]. Practically, it is a
randomized procedure to generate a graph on n vertices. We will study
these models asymptotically in the limit n → ∞.
There are many reasons to study random network models, from exis-
tence proofs in graph theory to the design of efficient data structures and
algorithms. But in this lecture, we focus on a different goal: to understand
complex real-world networks better.
Complex real-world networks include social networks. In these net-
works, the nodes are usually people, or users. There are online social
networks like the facebook graph, in which edges are friendship links in the
facebook social network; or a collaboration network, where the nodes are
researchers, and two researchers share an edge if they have co-authored a
publication; or a mobile phone graph, in which two phones are linked by
an edge if there was a phone call between the two phones in a particular
month. Or the friendship network, in which two people are connected by
an edge if they know each other on a first-name basis.
Another class of networks are technological networks like the internet
graph, in which the nodes are given by routers and links are given by con-
necting cables1 . Another example is the web graph, in which the nodes are
the pages of the world wide web, and the edges are given by the hyperlinks.
Many other examples of networks can be found in public repositories such
1
The term internet graph is also used to refers only to the highest layer of the physical
structure of the internet, where the nodes are the autonomous systems of the internet.

2
as https://snap.stanford.edu/data/.
There are several reasons why we want to study real-world networks
through models. One is that some of the networks are not directly available.
For example, the data of an online social network may not be available
because the company does not want to share it. In some cases like for mobile
phone data, there are data protection laws preventing them from making
the network public. In other cases like the friendship network, the data
does not exist at all in computer-readable form.2 While it is in principle
possible to query links in the network by asking the involved people, this
allows at best to get tiny samples from the networks. Nevertheless, it is
possible to run algorithms on the friendship network, and we will learn
about one such algorithm in the lecture.
Another reason to study network models instead of the real networks
is that we are able to create variations of the networks. For example,
assume we want to understand how the efficiency of a routing protocol like
the Border Gateway Protocol (BGP) scales if the internet graph grows over
time. We have a few data points: we know the internet graph right now,
and we know how the internet graph has looked like in the past. So we can
run experiments on these real instances, find a trend, and extrapolate to
the future. However, network models allow us a different approach: we can
choose a model that generates networks like the internet graph, and then
we can use the model to generate networks which are twice as large, and run
experiments on those networks. Of course, for this approach it is crucial
that the network model captures the properties of the internet graph that
are relevant for routing and the BGP protocol. So we need to find a good
model. This is precisely the goal of this course: to introduce the students to
a collection of network models to choose from, and to discuss some network
properties that may be relevant for such tasks. Beware that the course can
and will not be exhaustive on either side: there are many more interesting
network models, and many more important network properties than what
we can cover in this course.
A third reason to study network models is rather fundamental: assume
we want to understand a real-world phenomenon. Let us take the following
empirical fact as an example. We have an unweighted network G and two
2
This can also be difficult for technological networks. It is surprisingly unclear how
big the web graph really is, since search engines only give us access to the parts that are
accessible by web crawlers.
vertices u, v, and we want to find a shortest path from u to v. The textbook
solution is to use breadth-first search (BFS), starting from u, until v is
found. However, in practice, a different algorithm, called bi-directional
BFS is more efficient: start two BFS in parallel, one from u and one from
v. As soon as there is a vertex w that is discovered by both BFS, the path
from u to v through w is a shortest path. Why is the second algorithm
more efficient than the first one? Classic worst-case or average-case analysis
does not give a difference between the two variants, so it must have to do
with the networks on which the algorithm is run in practice. What aspects
of the networks are important for the runtime? Is it relevant that there are
nodes of rather high degree? Does it play a role that those networks tend to
be clustered into communities? In such a situation, the best approach is the
following: we develop different levels of abstraction of the phenomenon.
So, we need different network models, some more basic, and some more
complex. In the most basic ones, we will not see a difference between uni-
and bi-directional BFS. But as we add more and more aspects of real-world
networks to the model, at some point the bi-directional variant will start
to have an advantage. By studying when exactly this happens, we get a
much better understanding of why the bi-directional variant is superior in
practice. We will come back to bi-directional search for several network
models in this course.
This last aspect is a rather fundamental approach in the study of com-
plex system. To understand such a system throroughly, we need to develop
different levels of abstraction of the system, and choose the appropriate
one to understand different aspects of the system – the level should be as
simple as possible, but as complex as necessary.

Scope and limitations of the course

We take a mathematical approach in this lecture: we study random net-


works models with n vertices in the limit n → ∞. This is a limitation when
we want to apply our results to real-world networks. If we find out some-
thing for n → ∞, does it also apply for a social network with 100 nodes?
With 1000? 10, 000? Or do we need a million or a billion nodes? These
questions are a natural second step after understanding the limit, because
they are about speed of convergence in the limit. Thus, the methods to
answer these questions are similar to the methods of obtaining the limit
behaviour in the first place. However, we will not discuss these questions
in the course.
We will restrict ourselves to sparse network models in this lecture, i.e.
network models in which the number m of edges is linear, m = O(n).
Equivalently, the average degree is constant, O(1). Very many social and
technological networks are sparse.3 Moreover, we will only cover undirected
networks. Not because directed networks are less interesting, but rather
because they are even more complicated. Also, there is much more data
that a network may be equipped with: edge weights (e.g., in a collaboration
network the number of co-authored papers), time stamps (e.g., in mobile
phone networks), or labels of vertices or edges. We will ignore all these
possible extensions in this course.

Literature

The script and lecture will be self-contained, and it is not necessary to read
further literature for passing the course. However, for students who want
additional material, we want to highlight especially two excellent books on
the topic.
• The book Complex Network: Principles, Methods, Applications
by Latora, Nicosia and Russo gives a gentle introduction into complex
networks for readers of all domains. It covers a wider range of topics,
in particular more real-world network models and a larger collection
of network properties than we can cover in this lecture. It is available
through ETH library (vpn required).
Compared to our course, the book is less math-heavy. It does work
with some mathematical concepts, but the authors try to make them
accessible for people without a mathematical background.
The book comes with an excellent homepage, which contains all the
network discussed in the book, and C-programs for all the network
properties and algorithms that are discussed. This webpage is a great
place for anyone who want to play around with some instances of real-
world networks.
3
Though beware that this there is no clear-cut definition for a fixed network of finite
size. E.g., the collaboration network on high energy physics (based on arxiv papers) has
n  12, 000 vertices and m  120, 000 edges. Whether this is sparse or not is interpreta-
tion, not math.
• The book Random graphs and Complex Network, Volume 1 by
van der Hofstad is more mathematical than our course. The reader
can find all the nasty technical details which we skip in our course in
this book, and many more. The book covers graph-theoretic aspects
(e.g., component structures, typical distances, clustering coefficients)
in much more depth and detail. It mostly does not cover the algorith-
mic aspects that we discuss in this lecture (e.g., routing, bidirectional
search). The book (as well as a preliminary volume 2) is freely avail-
able at https://www.win.tue.nl/~rhofstad/NotesRGCN.html.

Both books do not cover the GIRG model (Geometric Inhomogeneous


Random Graphs) that we will discuss in Chapter 5. The results in this
chapter are from the frontline of research and are not yet covered by text-
books. For additional literature outside of this script, the only option is
to look into the research articles that we link in this script. The arti-
cle [BKL19] gives an overview of the model and its basic properties.
There are many other sources on complex network models. Students
who want to experiment hands-on with real-world networks can find sev-
eral collection of networks here, here and here, and tools for analyzing
networks here and here.

1.1 Preliminaries
Probabilistic tools

Since we consider random network models, we need to analyse random


processes. We expect students to be familiar with basics of probability
theory like expectations, conditional probabilities, Binomial and Poisson
distributions, with high probability statements, and so on. For full proofs,
one would also frequently need probabilistic bounds like the inequalities
of Markov, Chebyshev, Chernoff, Azuma-Hoeffding, and others. However,
since we want to cover a variety of different models, we will mostly skip over
these technical details. Students with background knowledge on stochastic
processes (e.g., who have attended the lecture Randomized Algorithms
and Probabilistic Methods) will be able to fill out most technical details
by themselves if they wish to. Students without this background will have
to believe some steps (or look them up in in the RandAlg script or a
textbook [AS16, MR95]), but will be able to follow the main computations
and gain the intuition.
We will not give a list of all probabilistic bounds that we use. As
mentioned, students who don’t know them have to believe them or look
them up. But since the Poisson distribution will occur quite often, we make
one exception: the following Chernoff-like concentration bound holds for
Poisson-distributed random variables [MU17].

Theorem 1.1. Let X be a Poisson random variable X ∼ Po(λ). Then


(eλ)x e−λ
• Pr(X > x)  xx
for x > λ.
(eλ)x e−λ
• Pr(X < x)  xx
for x < λ.

In particular, for every ε > 0 there is η < 1 such that

Pr(X > (1 + ε)λ)  ηλ and Pr(X < (1 − ε)λ)  ηλ .

Notation and conventions

Basic objects like graphs often have two names, one coming from graph the-
ory and one from network theory. We will use the terms “graph/network”,
“vertex/node” and “edge/link” interchangeably in this course.
Throughout the course, we will always consider a graph G = (V, E) with
vertex set V and edge set E. We use the convention n = |V|, m = |E|.
Unless otherwise mentioned, G is undirected, simple and finite. G is usu-
ally obtained from a random graph model where the set of vertices is fixed,
but the set of edges is random. We are interested in the behaviour of G
for n → ∞. The Landau notation O(.), o(.), Θ(.), Ω(.), ω(.) is always with
respect to the limit n → ∞ unless otherwise stated. If D is a probabil-
ity distribution, we will slightly abuse notation by writing Pr[D = x] as
abbreviation for “Pr[X = x] for a random variable X with distribution D”.

Further notation

• N := {1, 2, . . . , }.

• [k] := {1, 2, . . . , k}.


• i.i.d. = independently identically distributed.

• whp = with high probability = with probability 1 − o(1) as n → ∞.4

• almost surely = with probability one.

• C1 = largest connected component, C2 = second largest component,


etc.

• G[V ] = induced subgraph with vertex set V .


0 0

• deg(v) = degree of vertex v

• E(S1 , S2 ) = {{v1 , v2 } 2 E | v1 2 S1 , v2 2 S2 }.

• u ∼ v = “u is adjacent to v”.

4
This term is sometimes used differently in the literature and may mean “with proba-
bility n−ω(1) ” or “with probability n−Ω(1) ”. In this case, probability 1 − o(1) is also called
“asymptotically almost surely” or “aas”.
Chapter 2

Erdős-Rényi random graphs

In this section, we will discuss the most basic random network model,
the Erdős-Rényi random graph Gn,p . In particular, we will study three
aspects of the component structure: the existence of a giant component,
the asymptotic absence of medium-size components and the number and
structure of small components.
We will use Gn,p to introduce two perspectives on a network: the local
and the global perspective. Even though the Erdős-Rényi model is not
a good model for real-world networks, the techniques developed here are
useful to understand more complex models.

Definition 2.1. The Erdős-Rényi random graph Gn,p is the random


graph on n vertices in which every edge exists with probability p,
independently for all edges.

The expected degree of a vertex in Gn,p is (n−1)p. Since we focus on sparse


models, we will assume that p = µ/n for a constant µ > 0. Therefore, the
expected degree is µ n−1
n
→ µ for n → ∞. Moreover, the degree of a
vertex is binomially distributed, deg(v) ∼ Bin(n − 1, µ/n). It is a basic
fact about the binomial distribution that this distribution converge to a
Poisson distribution Po(µ) for n → ∞. Therefore, in the limit the degrees
follow a Poisson distribution with parameter µ. Hence, for all k  0,

n→∞ e−µ µk µ
Pr[deg(v) = k] → Pr[Po(µ) = k] = in Gn,p with p = . (2.1)
k! n

9
2.1 The local perspective and the Galton-Watson branch-
ing process
We want to know how many vertices of Gn,p with p = µ/n are contained
in connected components of size s = 1, 2, . . ., or in larger components.
The model Gn,p is so symmetric that we could compute this directly. For
example, the expected number of components of size s = 2 is
X
Pr[{u, v} is a component of size 2]
u6=v2V
X
= Pr(uv 2 E, none of the other 2(n − 2) incident edges exist)
u6=v2V
!  
n µ µ 2n−4 µe−2µ
= 1− = (1 − o(1))  n.
2 n n 2
Each component of size s = 2 contains two vertices, so the expected number
of vertices in such components is  µe−2µ n. It is an easy exercise to show
that the true value is concentrated around its expectation. The calculation
becomes slightly more complicated for components of larger sizes, but can
be done.
However, we want to use a different approach, which generalizes better
to more complex situations. To this end, we fix a vertex v 2 V and com-
pute the probability that v is in a component of size s. We explore the
graph starting from v in a breadth-first search (BFS). So, we take the local
perspective of vertex v.
In the first step, we uncover the number of neighbours of v. This is
distributed as Bin(n − 1, p) → Po(µ), since we assumed p = µ/n. Assume
v has X1 = o(n) neighbours. In the next step, for every neighbour w we
reveal how many edges go out from w. One edge clearly goes to the parent
v, and we ignore this edge at this point. For the other n−2 potential edges,
each of them has the same probability p to be present, so the number of
outgoing edges is distributed as Bin(n − 2, p) → Po(µ). So the number of
new edges that we find per neighbour w follows essentially again the same
distribution.
Of course, not every new edge necessarily leads to a new vertex. It
could also lead to a vertex that we have already found before. However, if
we process the vertices one by one, as long as the number of vertices that
we have found is x = o(n), the number of new vertices that we find from
w is Bin(n − x, p) → Po(µ). So, this effect only starts to matter when we
have discovered a linear fraction of the graph.
Summarizing, exploring the graph resembles the process of growing a
tree with parent v, where each node has a random number of children,
distributed as Po(µ). This process is known as Galton-Watson branching
process.

Definition 2.2. Let D be a probability distribution over N0 . A Galton-


Watson branching process with offspring distribution D is the pro-
cess for generating a random rooted tree in which the number of
children of each vertex (including the root) follows independently a
Po(µ)-distribution.
The resulting random tree T is called Galton-Watson tree. It
can be either finite or infinite. The size |T | of the Galton-Watson
tree is the number of nodes, and can be finite or infinite. We denote
ps := Pr[|T | = s], where s 2 N [ {∞}.
We say that the Galton-Watson branching process survives if the
resulting tree is infinite, and that it becomes extinct otherwise. We
also call p∞ the survival probability and 1 − p∞ the extinction prob-
ability.

We have informally argued that Galton-Watson trees with Poisson off-


spring distribution are related to how an Erdős-Rényi graph looks like from
the perspective of a single vertex. Indeed, the following theorem makes this
precise.

Theorem 2.3 (Small components in Erdős-Rényi graphs.). Let G ∼


Gn,p be an Erdős-Rényi graph with p = µ/n, and let T be a Galton-
Watson tree with offspring distribution Po(µ). For s 2 N, let ns be
the number of vertices of G in components of size s. Then for all
s 2 N, almost surely,
ns
lim = ps = Pr[|T | = s].
n→∞ n

Proof. We will only show that E[ns /n] → ps . For the theorem, one also
needs to show concentration, which can be obtained by probabilistic tools
like Azuma’s inequality.
Fix s 2 N and v 2 V. Let E (v) be the event that v is in a component
P
of size s. By symmetry, we have E[ns ] = u V E (v) = nPr[E (v)]. Thus we
2

need to show that Pr[E (v)] → ps for n → ∞. Let ε > 0.


Consider a BFS starting in v. We run the BFS until either we have
explored the component, or we have uncovered more than s vertices. When
we uncover the neighbours of a vertex u, then the number of new vertices
that we find from u follows the binomial distribution Bin(n−x, p), where x
is the number of neighbours that we have uncovered before that point. By
definition of the process, we have 1  x  s. Therefore, we can sandwich
the distribution Bin(n − x, p) between the two distributions D1 := Bin(n −
s, p) and D2 := Bin(n−1, p).1 Thus, we can sandwich the BFS tree between
two Galton-Watson branching processes T1 and T2 with respective offspring
distributions D1 and D2 . Note that this coupling only works while the
BFS still runs (i.e., only until we have explored the whole component or
uncovered more than s vertices), but this is fine for our purposes.
The distributions D1 and D2 both converge to D := Po(µ) for n → ∞.
In particular, let X1 ∼ D1 , X2 ∼ D2 and X ∼ D. If n is large enough, then
|Pr[X1 = i] − Pr[X = i]|  2sε for all i 2 N0 , and similarly for X2 .2 This
allows us to couple X1 and X2 to X as follows. We first draw X. Then
we flip a coin with Pr[tail] = 2sε . If the coin comes up head, then we
set X1 := X. Otherwise we set X1 to whatever value we need to obtain
the correct distribution of X1 . We also perform a second coin flip with
Pr[tail] = 2sε to couple X2 to X.
Now consider the Galton-Watson trees T1 , T2 , and a third Galton-Watson
tree T with offspring distribution D. We use the above coupling for the first
s nodes of the branching processes (or until the processes die out): we flip
two coins for each node, and if they both come up head then all three
processes have the same number of children for this node. By a union
bound, the probability that we ever see tail in these s steps is at most
2s  2sε = ε. If we never see tail, then the three Galton-Watson-trees are
identical until this point. Since the BFS tree is sandwiched between the
1
This is called domination. Bin(n − x, p) dominates Bin(n − s, p) means that for two
random variables X ∼ Bin(n − x, p) and X 0 ∼ Bin(n − s, p) we have Pr[X  i]  Pr[X 0  i]
for all i 2 N0 .
2
It is obvious that we can achieve this for a constant number of values of i, for example
for all 0  i  s. This would also suffice for our purposes, since any larger number of
offspring terminates the BFS immediately. The stronger version for all i 2 N0 needs a
proof, but the proof is not hard.
trees T1 and T2 , it must then also be identical. Thus, with probability
at least 1 − ε, all four processes either terminate with the same number of
vertices, or all four processes result in more than s nodes. Since one of them
(the process T ) has probability ps to terminate with exactly s vertices, the
other processes have probability ps  ε to terminate with s vertices, if n
is large enough. For the BFS tree, this event was called E (v), and hence
ps − ε  Pr[E (v)]  ps + ε for n large enough. This implies Pr[E (v)] → ps
for n → ∞ and concludes the proof.
Remark 2.1 (Structure of small components). In fact, the connection between Galton-
Watson trees and components in Erdős-Rényi graphs goes much further than Theorem 2.3.
This theorem tells us that the number of components is given by Galton-Watson trees.
But the structure of components is also given by such trees. First of all, it is easy to
see that small components are unlikely to contain any cycles, so they are trees. This is
because starting from a fixed vertex v, it is very unlikely to discover the same vertex twice.
p
By the “birthday paradox", we only find the same vertex twice when we uncover Θ( n)
vertices.
Now consider components of size 4. We know that they are trees, but there are two
different types of trees on 4 vertices: the path P3 of length 3 (three edges and four vertices),
and the star S4 of size 4. How often do they occur? Exactly as often as a Galton-Watson
process produces P3 and S4 , respectively. These probabilities are not hard to compute.
Denoting µ = pn as before, the probability that a Galton-Watson process gives a star S4
where the root v has degree 3 is
3
Pr[S4 with deg(root)= 3] = Pr[Po(µ) = 3]  (Pr[Po(µ) = 0])
 3
e−µ µ3 e−µ µ0 e−4µ µ3
=  =
3! 0! 6

and the probability to obtain a path P3 where the root has degree 1 is
3
Pr[P3 with deg(root)= 1] = (Pr[Po(µ) = 1])  Pr[Po(µ) = 0]
 
1 3
e−µ µ e−µ µ0
=  = e−4µ µ3 .
1! 0!

Note that by this calculation we count every component S4 exactly once (because there
is only one vertex of degree 3), while we count every P3 twice. On the other hand, every
component has four vertices. Therefore, in Gn,p the fraction of vertices in S4 -components
is 23 e−4µ µ3 , and the fraction of vertices in P3 -components is 2e−4µ µ3 . Thus there are
three times as many vertices in S3 -components than in P4 -components.3
Now that we have established Theorem 2.3, we can derive the number
of small (constant-size) components in Erdős-Rényi graphs by analyzing
3
General rules can be obtained here. It can be shown that the fraction of vertices is
inversely proprtional to the number of automorphisms of the structure.
the properties of branching processes. First we will estimate how the prob-
ability ps scales with s, and in particular, when a Galton-Watson process
has extinction probability of one.

Theorem 2.4. Let T be a Galton-Watson tree with offspring distribu-


tion D, and let µ be the mean of D.

(a) If µ < 1, then Pr[|T | = ∞] = 0.

(b) If µ > 1, then Pr[|T | = ∞] > 0.

Moreover, if D = Po(µ) for µ = 6 1, then there is η < 1 such that


Pr[|T | = s]  η for all s 2 N.
s

Proof. Let Z1 , Z2 , Z3 , . . . be an infinite sequence of i.i.d. copies of a random


variables with distribution D. We can construct T as follows. We enumerate
all vertices of T in the order in which we create them. The root has number
1. Then we use Zi to determine the number of children of node i. So the
root has Z1 children, its first child (if it exists) has Z2 children, and so
on. Note that it may happen that we don’t use all the Zi . For example, if
Z1 = 0, then the root has no children, and we never use Z2 .
In general, assume that the Galton-Watson tree has size at least s.
P
Then it stops with size exactly |T | = s if and only if si=1 Zi = s − 1: then
after inspection the first s nodes, we have altogether created s nodes (the
root and s − 1 offspring), so there is no (s + 1)-st node to continue with.
P
Therefore, the condition si=1 Zi = s − 1 implies |T |  s. But note that
it does not imply |T | = s; the process may already have stopped at an
P
earlier stage. Conversely, the condition si=1 Zi > s − 1 is necessary, but
not sufficient for |T | > s.
Now let us assume µ < 1. The strong law of large numbers states
that whp the average of s i.i.d. random variables converges to the mean of
the distribution for s → ∞, as long as this distribution is positive and has
finite mean. Hence,
hX
s i hX
s i h1 Xs i s→∞
Pr[|T | > s]  Pr Zi > s − 1 = Pr Zi  s = Pr Zi  1 −→ 0,
i=1 i=1
s i=1
(2.2)
where the last step holds by the law of large numbers. This shows in
particular that Pr[|T | = ∞] = 0 for µ < 1. If the Zi are Poisson distributed
P
Po(µ), then the sum si=1 Zi is also Poisson distributed with mean sµ, and
is thus concentrated with exponential tail bounds (Theorem 1.1). Hence
we can strengthen the last step of (2.2) by the exponential bound  ηs .
On the other hand, assume µ > 1. We would like to use the strong law
of large numbers again, but this only holds if µ < ∞, while we also allow
µ = ∞. So in the latter case we use a little trick: we truncate the offspring
distribution (and thus the Galton-Watson tree) at some value k, so we set
Zi := min{Zi , k}. If we choose k large enough, then the expectation µ
0 0

of Zi is still larger than one, but it is also finite by construction. If the


0

truncated Galton-Watson tree is infinite, then so is the original tree. So,


we only need to show that the truncated Galton-Watson tree is infinite
with positive probability. It thus suffices to prove the theorem in the case
µ < ∞.
P
Let E (s0 ) be the event “ 8s  s0 : 1s si=1 Zi  1". By the strong law of
P
large numbers, almost surely 1s si=1 Zi → µ. Hence, there exists s0 such
that Pr[E (s0 )]  1/2. On the other hand, consider the event F (s0 ) that the
Galton-Watson tree reaches size s0 . Then Pr[F (s0 )]  Pr[8i 2 {1, . . . , s0 } :
Zi  1] > Pr[Z1 > 0]s > 0. Since the event F (s0 ) only gives lower bounds
on the Zi , it cannot decrease the probability of E (s0 ). Thus

Pr[|T | = ∞]  Pr[E (s0 ) ∧ F (s0 )]  21 Pr[F (s0 )] > 0. (2.3)

It remains to show the tail bound in s for µ > 1. Recall that a necessary
Ps
condition for |T | = s is i=1 Zi = s − 1. The left hand side is a sum
of independent Poisson distributed random variables, so it also follows a
Poisson distribution with mean sµ. Hence,
X
s X
s
Pr[|T | = s]  Pr[ Zi = s − 1]  Pr[ Zi < s]  ηs , (2.4)
i=1 i=1

where the last step again follows from concentration of the Poisson distri-
bution, Theorem 1.1. This concludes the proof.
Remark 2.2. It can be shown that Pr[|T | = ∞] = 0 also holds for µ = 1, except for the
trivial case that the distribution is constant with Pr[Zi = 1] = 1. However, the tail bounds
in s are no longer true, not even in the case D = Po(1). We will generally ignore such
threshold cases in this lecture. We are interested in models for real-world networks. If
some properties of the model only hold for a parameter µ which is exactly on the threshold,
then this means that they are not robust against tiny parameter changes. It is then hard
to argue that the model is relevant.
There is an important implication of the different cases (a) and (b) of
Theorem 2.4 for Erdős-Rényi graphs. Obviously, if we consider any finite
n and sum up the number of vertices in components of size s, then we
P
obtain ∞ s=1 ns = n because we count every vertex exactly once. Dividing
by n gives the fraction of vertices in components of size s,
X

ns
= 1. (2.5)
s=1
n

What happens if we take the limit for n → ∞ of each summand? (Recall


from your analysis classes that there is no guarantee that equality still
holds afterwards, since we are exchanging the limit with the infinite sum
P∞
here.) By Theorem 2.3, the left hand side then gives s=1 ps . In the
subcritical case µ < 1, the Galton-Watson tree is finite almost surely, and
P
thus ∞ s=1 ps = 1. Hence, we may indeed exchange limits if we take n → ∞
in (2.5). However, in the supercritical case µ > 1, we have p∞ = Pr[|T | =
P
∞] > 0, and thus ∞ s=1 ps = 1 − p∞ < 1. So, we may not exchange limits
in (2.5) in the supercritical case!
What happens here? Where do the vertices disappear to when we take
the limit for n → ∞? The answer is that ps is asymptotically the fraction of
vertices in components of size s, but we only count vertices in components
of constant size here. If there is a component of size n/2, then for every
n it will be counted by a different s, and thus there is no s for which it will
leave a trace in the limit.
This is exactly what actually happens. As we will show, there is a single
component C1 of linear size, the giant component of the graph. Intuitively,
this contains all vertices for which the Galton-Watson tree has infinite size.
(Recall that the Galton-Watson process stops to be an good description
of a BFS exploration when it reaches size Ω(n), because then a relevant
fraction of discovered vertices in the graph are already known.) In fact,
what is true, but what we will not show, is that
|C1 | X ∞
lim = p∞ = 1 − ps . (2.6)
n→∞ n
s=1

Informally, this says that all vertices (except for an asymptotically neg-
ligible number) are either in the giant component or in components of
constant size. Indeed, it can be shown that for every function f(n) with
P
limn→∞ f(n) = ∞, whp limn→∞ ∞ s=f(n) ns = |C1 | + o(n). In other words,
there are only o(n) vertices outside of the giant component and outside of
constant-size components.
Equation (2.6) raises the question how we can compute p∞ . Of course,
we could compute the first few of p1 , p2 , . . ., and thus approximate p∞ . But
there is a more elegant way, as the following proposition shows.

Proposition 2.5. Consider a Galton-Watson process with offspring dis-


tribution D, and let µ be the mean of D. Then the extinction prob-
ability pext := 1 − p∞ satisfies the following equation.
X

pext = Pr[D = i]piext . (2.7)
i=0

If D = Po(µ) for µ > 1, then the survival probability is the unique


positive solution of

1 − p∞ = e−µp∞ . (2.8)

Proof. Equation (2.7) follows directly from the law of total probability,
where we discriminate between the number i of children of the root. The
Galton-Watson tree is finite if and only if the subtrees below all children
of the root are finite. Since all children are the root of independent Galton
Watson trees, the probability that all of them are finite is piext . Hence,
X

pext = Pr[GW tree becomes extinct] = Pr[root has i children]piext ,
i=0

which implies (2.7).


For (2.8), we could simply plug Pr[D = i] = e−µ µi /i! into (2.7) and sim-
plify, but there is also a way without calculation. The number of children
of the root is Po(µ)-distributed. Each child independently has probability
p∞ to be the root of an infinite tree, and we call such a child surviving.
Hence, we may obtain the number of surviving children by first drawing
the number of children, and then flipping a coin for each of them, keeping
it with probability p∞ . This process yields again a Poisson distribution
Po(p∞ µ).4 The Galton-Watson tree becomes extinct if and only if it has
4
If this is not clear to you, remember that the Poission distribution Po(µ) is approx-
no surviving children. Hence, the survival probability satisfies

1 − p∞ = Pr(no surviving child) = Pr(Po(p∞ µ) = 0) = e−p∞ µ .

We already know by Theorem 2.2 that p∞ > 0 for µ > 1, and it is easy
to see that the function f(x) = 1 − x − e−µx has a unique positive solution.
(E.g., because f is concave, starts at f(0) = 0 with a positive slope f (0) = 0

µ − 1 > 0, and becomes negative since f(1) < 0.) Therefore, p∞ must be
the unique root of the equation 1 − x = e−µx .

Note that the solution of (2.7) is not necessarily unique. In general,


we can only say that the extinction probability is one of the solutions, but
it may be non-trivial to decide which one it is. Even in the special case
D = Po(µ) there are really two solutions of 1 − x = e , namely x = 0
−µx

and some positive x. In this case, we could rule out the solution x = 0 by
Theorem 2.2.

2.2 The global perspective: sprinkling


Surprisingly, Theorem 2.4 tells us that for constant s, components of size
s are quite rare as s gets large. This holds both for the subcritical case
µ < 1 and for the supercritical case µ > 1. Indeed, this is a property that
holds in many network models, though not always as extreme as for Erdős-
Rényi graphs. It also holds in many real-world networks. The Facebook
network with n  109 nodes has many isolated nodes, but its second-largest
component is of size  2000  n [UKBM11].
Recall the proof idea of Theorem 2.3: a BFS exploration from a fixed
vertex v resembles closely a Galton-Watson branching process. If the Erdős-
Rényi graph has edge probability p = µ/n, and the BFS tree has already
size x, then the number of new neighbours that we find from an unexplored
vertex v is Bin(n − x, p)-distributed, and this is almost the same as Po(µ).
This connection can be pushed further. In Theorem 2.3 we only considered
components of constant size, but the connection stays tight while x = o(n).
imated by the binomial distribution Bin(n, µ/n) for large n. The binomial distribution
is obtained by flipping n coins with probability µ/n, and counting the number of heads.
Now for each head, we flip another coin and keep it with probability p∞ . Then the whole
procedure is the same as flipping n coins, each with probability p∞ µ/n. Hence, the result
is Bin(n, p∞ µ/n)-distributed, which approximates Po(p∞ µ).
In particular, we have shown in Theorem 2.4 that ps  ηs holds for all
µ 6= 1. We have only shown this statement for constant s, but let us just
pretend for a moment that we were allowed to plug in larger values of s.
Then plugging s = C log n, we obtain ps  ηC log n = n−C| log η| . (Mind that
η < 1 means log η < 0.) If C is large enough, the number of vertices in
components of size C log n or larger should be n  n−C| log η| = o(1). In other
words, we should not expect to see any vertices at all in components of size
between C log n and o(n). Indeed, this can be formally proven.

Lemma 2.6 (Medium-size components in Erdős-Rényi graphs). Let


G ∼ Gn,p be an Erdős-Rényi graph with p = µ/n. Then there exist
C, δ > 0 such that with high probability the following holds.

1. If µ < 1, then G does not contain any component of size s 

C log n.

2. If µ > 1, then G does not contain any component of size s 2

[C log n, δn].

Proof. We omit the proof. It follows the same line of argument as The-
orems 2.3 and 2.4, except that the coupling in Theorem 2.3 needs to be
made a bit more carefully. As key insight for the case µ > 1, note that
we want to couple the BFS in G up to size δn with a Galton-Watson tree
with offspring distribution Bin(n − δn, p). If δ is sufficiently small then
this distribution still has expectation (1 − δ)np = (1 − δ)µ > 1, so the
Galton-Watson tree will be very unlikey to have size s for any large s.

Lemma 2.6 tells us that there are no medium-size components. More-


over, we understand very well the constant-size components. But what
about linear-size components? Could there be several of them? The an-
swer is No, there is never more than a single linear-size component. The
reason is that any configuration with several large component is unstable.
Imagine a graph in which two components C1 , C2 have size at least δn,
and imagine that an additional edge is randomly inserted into the graph.
Then the new edge has a constant probability of at least δ2 to have the
first endpoint in C1 and the second endpoint in C2 . If this happens, then
the two components are merged into a single, bigger component. If we not
just insert a single edge, but many edges, then it is very unlikely that C1
and C2 survive this. Hence we obtain the following theorem.

Theorem 2.7 (Uniqueness of the giant). Let G ∼ Gn,p be an Erdős-


Rényi graph with p = µ/n for some constant µ > 1, and let δ >
0 be sufficiently small. Then with high probability G contains a
unique component with more than δn vertices. This is called the
giant component.

Proof. We construct G in two steps. In the first step, we insert every edge
with probability p1 := p − n−3/2 . In a second step, we insert every edge
with probability p2 such that (1 − p1 )(1 − p2 ) = 1 − p, or equivalently
p − p1
p2 := = (1 + o(1))n−3/2 . (2.9)
1 − p1
Note that this gives exactly the correct probability for the Erdős-Rényi
graph, because the probability that an edge is not present after both rounds
is (1 − p1 )(1 − p2 ) = 1 − p, independently for all edges. The second round
is also called sprinkling.
Now we consider the graph G1 after the first round. Since there are
vertices for which the corresponding Galton-Watson process has infinite
size, there must be super-constant components,and it is easy to see that
there must also be components larger than C log n, where C is the constant
from Lemma 2.6. (Because the coupling from Theorem 2.3 still works after
the BFS has discovered O(log n) vertices.) By the same lemma, there are
no medium-sized components in G1 .5 Therefore, there must be components
of size at least δn in G1 .
p
In the second round, we add at most O(n2 p2 ) = O( n) edges in expec-
tation, and also with high probability by the Chernoff bound. In particu-
lar, we can not not create a new linear-size component, since r edges can
join at most r + 1 components. Since all non-linear components have size
O(log n) in G1 , the additional edges could create at most a component of
p
size O( n log n). Thus every linear-size component in G must contain a
linear-size component from G1 .
5
We cheat very slightly here, since Lemma 2.6 a constant µ, while the value for G1 is
µ1 := µ − n−1/2 , which depends on n. However, µ1 stays bounded away from one, and is
also bounded from above, and Lemma 2.6 also holds under this weaker condition.
Assume that there are two linear-size components C1 and C2 in G1 .
Since |C1 |, |C2 |  δn, there are at least δ2 n2 pairs (v1 , v2 ) with v1 2 C1 and
v2 2 C2 . The probability that none of these pairs is hit by an edge is
p
2 n2 2 n2
Pr[no pair hit]  (1 − p2 )δ  e−p2 δ = e−Ω( n)
. (2.10)

So whp at least one such pair is hit, and the components C1 and C2 are
joined in G.
Let k be the number of linear-size components in G1 , and call them
C1 , . . . , Ck . Then k  1/δ, since |Ci |  δn for all i, and since the Ci
are disjoint. By a union bound and (2.10), the probability that there is
a i 2 {2, . . . , k} that is not joined with C1 in the second round is O((k −
p

1)e−Ω( n) ) = o(1). Hence, with high probability all components are joined
with C1 in the second round, and G only has a single giant component.

2.3 Shortcomings of Gn,p: components, degrees, clus-


tering, communities, distances
2.3.1 Component structure
In the previous sections, we have studied the component structures of Gn,p .
We have seen that in the supercritical regime, there is a single giant com-
ponent and a few smaller components, but no medium-size components.
The smaller components are indeed very small: the number of vertices in
components of size s drops exponentially with s. This is a pretty good
match for real-world networks, where we also usually have a giant compo-
nent and otherwise only very few small components. For example, the only
neural network that we know completely is that of C. elegans.6 It has a
system of 282 somatic neurons, with 514 connections between them. It has
a giant component of size 248, two smaller components of size 2 and 3, and
29 isolated nodes [VCP+ 11].
A completely different example is the facebook network, where nodes
consists of (active) profiles, and links are given by facebook friendships. In
6
Caenorhabditis elegans is a tiny worm, and its connectome is hard-coded in its genes.
So all adult worms of the same sex have the same set of neurons, and mostly the same set
of connections between them. For our discussion we only count the (undirected) electrical
connections, not the (directed) chemical ones.
an analysis from 2011 [UKBM11], when the network had about n = 7  108
nodes (users) and m = 7  1010 links, it was found that 99.91% of the users
were part of a single giant component.7 Most of the other components were
very small. The second-largest component had size 2000.
Is this facebook data a good match for the Erdős-Rényi model? Partly.
On the one hand, qualitatively we get the right behaviour: a single giant
component, and most of the other vertices concentrating in very small com-
ponents. On the other hand, quantitatively the match is not so good. In
Erdős-Rényi graphs, the number of components of size s shrinks exponen-
tially in s and is bounded by n  η−s . Note that it hardly matters that the
factor n is pretty big, since s enters in the exponent. It is hard to argue for
a realistic value of η for which η−2000 is not astronomically small. For exam-
ple, n  0.95−2000  10−36 . Only for η = 0.99, we start to get n  0.99−2000  1.
But this would be a pretty extreme value of η. Since 0.99100  1/e  0.37,
it would mean that we should only see a small difference (a factor of three
or less) between the number of components of size s = 2 and of size s = 102.
Indeed, a closer analysis shows that the number of components of size
s in the facebook network does not decrease exponentially in s, but rather
polynomially. This means that the Erdős-Rényi model is missing something
about the structure of small components in the facebook graph. If we
want to understand the details of the component structures in the facebook
graph, then we will need to look for explanations which are not captured
by Erdős-Rényi graphs.

2.3.2 Degree distribution


The degree of a vertex in Gn,p is Bin(n, p)-distributed, which for p = µ/n is
approximately Po(µ). The Poisson distribution has exponentially decaying
tails, so the degrees are concentrated in a small range. In particular, it is
very unlikely for a vertex to have degree larger than O(log n/ log log n).8
This is not a good match for most real-world networks. One of the most
prominent properties is the existence of nodes with very different degrees.
7
The analysis excluded users without facebook friends, so by definition there were no
isolated nodes. The relative size of the giant component would probably be smaller if
those nodes were counted as well.
8
This can be computed by a direct calculation or by using Chernoff bound. For Chernoff
bounds, mind that there are weaker and stronger versions. The weaker versions may only
give you O(log n).
Usually the degrees follow at least partly a power-law distribution. We will
return to this point in Chapter 3.

2.3.3 Clustering
We start with a definition.

Definition 2.8. For a graph G = (V, E), the clustering coefficient of a


vertex v is defined as follows.

 |{{u,u } ( 2 )|{u,v},{u ,v},{u,u } E}| if deg(v)  2
0 V 0 0
2 2

|{{u,u } (V2 )|{u,v},{u ,v} E}|


CC(v) :=
0 0
2 2

0 if deg(v)  1.
 
Note that the denominator in the first case is simply deg(v)
2
. The
(local) clustering coefficient of G is then defined as

1X
CC(G) := CC(v).
nv V2

The clustering coefficient can be interpreted in the following way. Choose a


vertex v uniformly at random. Afterwards, select two different neighbours
u, u of v. Then the clustering coefficient is the probability that u and
0

u are adjacent, see also Figure 2.1. (We count the probability as zero if
0

deg(v)  1.)
Some authors use an alternative definition, where the clustering coef-
ficient of G is three times the number of triangles in G, divided by the
number of paths of length 2. This is also sometimes called the global clus-
tering coefficient. The global clustering coefficient puts more weight on
large-degree vertices than the local clustering coefficient. In fact, the global
clustering coefficient can be written
 as the weighted average of all CC(v),
where vertex v is weighted by deg(v)
2
.
For Gn,p with constant µ = pn > 1, the clustering coefficient is easy to
compute. If we pick a vertex v, then with constant probability it has degree
at least two. (In the limit the probability is Pr(Po(µ)  2) = 1−e−µ (1+µ).)
Conditional on that, after picking two random neighbours u, u of v, the 0

probability that they are connected is exactly p = µ/n. Thus the local
clustering coefficient is Θ(1/n).
V

'
y

]
'

CC Cu) =
Pr [ u ~ u
,
'
where u, u are random

neighbours of v

Figure 2.1: The clustering coefficient CC(v) is the probability that two
random neighbours of v are adjacent.

For many real-world networks, this is a poor match. They have rather
high clustering coefficient. This is not surprising. For example, in the
friendship network, if you pick two random friends of yourself, then what
is the probability that they know each other? It’s not one, but it is not
Θ(1/n) either, where n = 8  109 is the number of people on earth. In
fact, taking Gn,p as a model for the friendship network, this would predict
that the probability that two random friends of yours know each other is
exactly the same as the probability that two random people on earth know
each other. This is clearly a mismatch between Erdős-Rényi networks and
reality.

2.3.4 Communities
There are several definitions of communities, none of them very formal.
One is that communities are subgraphs that are much denser than the to-
tal graph. Another definition is that a community is a subgraph which
has much more internal edges (within the subgraph) than external edges
(from the subgraph to the remainder). It can be tricky to find an appro-
priate definition
 of “much” more. If we pick the densest subset S of size k,
then we have nk options for S. This is a huge number of options to pick
from, especially if k is large. Even random fluctuations can create a large
difference in such a case.
Let us look at a few examples to see how tricky the intuitive definition
can be. The model Gn,p is considered as a graph without comunities simply
due to its definition: the edges are all completely independent of each other.
Thus it is sometimes used as a baseline for the definition of communities,
i.e., communities are subgraphs that are denser than anything that we
typically find in Gn,p . But what do we find in Gn,p ?
Start in any vertex v in the giant component of Gn,p and let S be the
set of the first k vertices that we find via a BFS. For small k the induced
subgraph will typically be a tree , so it has k − 1 edges and average degree
roughly two. For µ = 1+ε, this is almost twice as dense as the whole graph!
Superficially, it looks like a community. Even more extreme, if we choose S
to be the giant component itself, then it has average degree > 2, but zero
external edges! Again, this can easily be mistaken for a community. Or
take S as as the union of all components of size exactly 2. Then S has still
a linear number of internal edges, but zero external edges. Nevertheless,
we would hardly want to classify S as a community.
While Gn,p does have sets which superficially look like communities, the
picture changes a bit if we restrict to connected subgraphs of Gn,p . This
restricts the number of options that we can choose from. For example,
for small k (e.g., constant), almost all connected subgraphs of k vertices
are trees, so they only have k − 1 edges, which is the minimal density
possible among all connected graphs. Globally, we would consider the
giant component C1 of Gn,p . This graph does not have subgraphs which are
much denser than C1 itself, and no subgraphs with much more internal than
external edges. For example, it is impossible to split the giant component
into two subsets of linear size such that the number of edges between both
sides is o(n) [LM01].9 On the other hand, real-world networks usually have
communities of all sizes, from small ones to linear-size ones, even within
the giant component. Thus Erdös-Rényi graphs are not a good model for
community aspects of graphs.

2.3.5 Distances
If we pick two nodes u, v uniformly at random, what is the graph distance
between them, i.e., what is the length of a shortest path from u to v? This
9
Another application of the sprinkling technique.
is also called the typical distance of the graph.10 To answer this question
for Gn,p , we can use the following algorithm for finding a shortest path.
Assume we start a BFS from u and at the same time a BFS from v. Then
in depth d we find all vertices which have distance exactly d from u and v
respectively. Let us call the set of these vertices the d-th level. We continue
with the two BFS level by level (alternating between the two) until we find
a vertex w that belongs to both BFS trees. Then a shortest path from u
to v runs through w, and its length is given by the sum of the depths of w
in the two BFS tree, see Figure 2.2.


fimtoommm Bfsfromv
Bfsfromu vertex

¥5k
dcuiv)=du+dv

Figure 2.2: Shortest paths can be found by bidirectional breadth-first


search (BFS) from both endpoints.

Now let us analyze the process for Gn,p . We will only give an informal
argument, but it can be turned into a formal one with standard probabilistic
tools like the Chernoff bound. For each BFS, we know that it branches like
a Galton-Watson tree T . We have two possibilities for T . Either it dies out
quickly, and then the vertex is in a small component. Or it grows to infinite
size. In this latter case, it is not hard to see that it grows rather reliably.
Every vertex has offspring distribution Po(µ), which has expectation µ. If
we have x nodes in depth d, then in expectation the next level has size
10
We do not give a formal definition of typical distances, as the term is used in different
ways. The most common situation is that one can show that for two random vertices u, v,
their distance is whp in some small interval. Then this interval is called the set of typical
distances.
µx. Since each node draws the number of its children independently, the
actual size of the next level is sharply concentrated around µx if x is large.
So after some initial phase (where x is still small), once x becomes large, it
will realiably grow by a factor of µ in each level.
One can indeed show that, if the process survives, whp the trees grow
like Θ(µd ), up to some small fluctuations in the beginning that essentially
add or subtract a constant number of rounds. In other words, the number
of vertices in distance d from u is roughly Θ(µd ), and similarly for v.
When is the first time that we find a shared vertex w in both search
trees? You man have encountered a variation of this question as birthday
paradox. If we have two random subsets11 of V of size s, then the expected
number of collisions (elements which appear in both subsets) is s2 /|V|. So
p
if s = o( n), then by Markov’s inequality whp we will not see a collision,
p
but if ω( n) then whp we do see collisions. In other words, the first
p
time when we encounter a collision is when s = Θ( n). This happens
p
after d = logµ (Θ( n)) = 21 logµ n  O(1) rounds. Since d is the distance
from u to w and from w to v, the distance from u to v is 2d  logµ n =
Θ(log n). Summarizing, for two vertices u and v, conditional on being in
the giant component, the distance between them is typically logµ nO(1) =
Θ(log n).
Is this a good match for real-world networks? As for the component
structures, the answer is mixed. On the one hand, log n is a pretty small
function, and we do see that real-world networks tend to have very small
typical distances. Graphs with typical distance O(log n) are called small-
world graphs. On the other hand, many real-world networks have ex-
tremely small typical distances, so small that log n might still be consid-
ered large. For example, the facebook graph was studied in 2016 with
n = 1.6  109 nodes [EDF+ 16]. It has average distance (in the giant compo-
nent) of 4.57, which is a surprisingly small number for more than a billion
user. It might still be compatible with average distances of  logµ n (espe-
cially because the facebook graph has a large average degree of µ  200),
but we will learn later about models which have even smaller typical dis-
tances.
More importantly, Erdős-Rényi graphs completely miss two aspects
which are crucial for average distances: on the one hand, they lack cluster-
11
In a formal proof one would need to argue why the nodes in the search trees are
random subsets, but this follows from the symmetry of Gn,p .
ing and communities, which increase typical distances. To see this, consider
the BFS search from a vertex v. For Erdős-Rényi graphs, we could assume
that all found vertices are new vertices, which we have not encountered
before. But in a graph with large clustering coefficient, this already goes
wrong in the second step. When we explore the first neighbour u of v, then
a large clustering coefficient means that many of the neighbours of u were
already neighbours of v, and thus have already been revealed in the first
step. Thus the BFS tree is much smaller than a corresponding Galton-
Watson tree. The existence of communities causes similar problems later
in the BFS search. Since the BFS trees grow slower with clustering and
communities, these phenomena increase typical distances.
On the other hand, Gn,p has a very homogeneous degree distribution.
As we will see for other models, the degree distribution plays an important
role for typical distances, and heterogeneous distributions can massively
decrease them. Thus Erdős-Rényi graphs lack two aspects that are both
important for typical distances, one which increases distances, and another
one which increases them. So we should be careful to draw conclusions
about graph distances from this model.
Chapter 3

Inhomogeneous Degree Distributions

For Erdős-Rényi random graphs, the degrees are concentrated in a small


interval, and it is unlikely to have degrees larger than O(log n/ log log n).
Many real-world networks are not like this. In a train network, hubs like
Zürich HB have much more connections than the average Swiss station. In
social networks, the number of friends or followers can span many orders
of magnitude. In the web graph, some pages have a tremendous number of
incoming links. In many of these cases, the degree distribution for many
such networks follows a power-law. This observation was first made by
Barábasi and Albert [BA99]. By now there are many good and thorough
discussions of this phenomon, for example Chapters 1.6 in the book by van
der Hofstad [VDH09] or the classical accounts in [BB03] and [New03].1 In
this chapter we will study models which have power-law degree distribu-
tions built into them.
1
There are exceptions, of course. For example, the street network of a city, where roads
are edges and road intersections are vertices, do not have a power-law degree distributions.
Also, there is ongoing debate how good power-laws approximate the real degree distribu-
tion for various networks, see for example the discussion in [VDH09, Chapter 1.6]. This
issue can be circumvented by working with more general classes of degree distributions,
but we will ignore them for this lecture.

29
3.1 Power-laws
3.1.1 Power-law probability distributions
We start with a general introduction into power laws. They are also some-
times called scale-free.2 Throughout the section, D will denote a proba-
bility distribution either on N0 or on [1, ∞). Let X ∼ D, and let f (x) be D

the probability density function of D (if it exists). By abuse of notation,


we will usually write Pr[X = x] instead of f (x). This notation is not prob-
D

lematic if the distribution lives on N0 , but for continuous distributions on


[0, ∞), the term Pr[X = x] does not literally mean “the probability of the
event X = x”, which has probability zero for decent continuous distribu-
tions. However, there should never be room for confusion, since we will
never consider zero-probability events of this type in this lecture. This
convention allows us to treat distributions over N0 and over [1, ∞) at the
same time. For the cumulative distribution function Pr[X  x] of D, the
notion is not problematic in either case.

Definition 3.1. Let D be a probability distribution on [1, ∞), and let


X ∼ D. We say that

(i) D follows a (strict density) power law with exponent τ > 1 if

Pr[X = k] = Θ(k−τ ) for k → ∞.

(ii) D follows a weak density power law with exponent τ > 1 if

Pr[X = k] = k−τ  o(1)


for k → ∞.

(iii) D follows a cumulative power-law with exponent τ > 1 if

Pr[X  k] = Θ(k1−τ ) for k → ∞ (strict power law);


Pr[X  k] = k1−τ o(1)

for k → ∞ (weak power law).

In this definition, Θ() and o() are taken with respect to k → ∞.

For our purposes, the differences between the four different versions of
2
Some authors reserve the phrase scale-free to power-laws with exponent τ 2 (2, 3).
power-laws do not matter much. A strong/weak density power-law implies
a strong/weak cumulative power-law, but not vice versa. In this lecture, we
will usually assume strict density power-laws, i.e., we make the strongest
possible assumption. We do this to simplify calculations, not because it is
necessary. Without further specification, “power-law” means “strict density
power-law”.

3.1.2 Power-law sequences


We have defined power-laws for probability distributions, but what does
it mean that the degrees of a random network model G = Gn follow a
power-law? This question is a bit more tricky. The first attempt would
be to draw a random vertex v from Gn , compute the limiting probability
pk := limn→∞ PrGn [deg(v) = k], and require that the pk converge to the
density function of a power-law probability distribution. For example, for
Erdős-Rényi graphs Gn,p with µ = np = Θ(1), we know that pk = e−µ µk /k!.
This is indeed the density of a probability distribution (not a power-law),
namely of Po(µ). However, this approach does not work well in general,
for two reasons.
First, we require that the pk form a probability distribution. This does
not have to be the case in general. For example, consider Gn,p with any
p = ω(1/n). Then pk := limn→∞ PrGn [deg(v) = k] = 0 for every fixed k.
Of course, the function 8k : pk = 0 does not correspond to a probability
distribution. The second issue is that the pk only carry information about
constant k. Remember the fraction ps of vertices in components of size
s? They do not sum up to one if there is a giant component. That is
exactly the same problem as here. The ps do not account for components
of growing sizes, and the pk do not account for growing degrees. But the
whole point of power-law distributions is that we find vertices of very large
p
degree in the graph, e.g., vertices of degree Θ( n) if τ < 3 (more on that
below). We cannot account for that by the limits pk .
Instead, we need to make a more direct approach. We require that the
number of vertices of degree k (at least k) is roughly the same as predicted
by a power law, up to some point which depends on n.

Definition 3.2. Let G(n) be a sequence of random graphs, let Nd,n be


the number of vertices of degree d in G(n), and let D = D(n) 2 [1, n]
be a function of n. We say that G(n) follows a (strict, density) power-
law with exponent τ up to D if there are constants c1 , c2 > 0 such
that with high probability

8 1  d  D : Nd,n = Θ(d−τ n), (3.1)

where the hidden constants are uniform over all d. In other words,
we require that there are c1 , c2 > 0 such that with high probability
the following holds.

8 1  d  D : c1 d−τ n  Nd,n  c2 d−τ n. (3.2)

We can analogously define weak power-laws, and (strict or weak)


cumulative power-laws.
We say that the power-law has negligible cut-off error if with
P
high probability nd=D+1 Nd,n = o(n).

There are some differences between power-laws for probability distribu-


tions and for degree sequences that are worth pointing out. Most obviously,
Definition 3.2 only requires that degrees up to D are well-behaved. For a
finite graph, we can not expect (3.1) to hold without restriction on D. If
we would plug in d = ω(n1/τ ) then d−τ n = o(1), so the integer Nd,n can
not be in Θ(d−τ n). Many models have (density) power-law degrees up to
D = n1/τ−ε for any fixed ε. We remark that the requirement for cumulative
power-law degrees is N d,n = Θ(d−τ+1 n), where N d,n is the number of ver-
 

tices of degree at least d in G(n). This requirement can hold a bit longer,
and in many models it holds until D = n1/(τ−1)−ε . This is why one can
sometimes prove stronger results by working with cumulative power-laws.
A second difference between power-laws of distributions and degree
sequences is that (3.1) can actually be checked for a concrete network.
In practice, power-laws (of degrees, but also of any other quantity) can
be checked by plotting Nd,n on a log-log-plot. On such plots, a power-
law corresponds to a straight line: if Nd,n = c  d−τ n, then log(Nd,n ) =
log(cn) − τ log d, so there is a linear relation between the quantities log d
and log(Nd,n ) that are used in the axes of a log-log-plot. Moreover, the
slope of the line is −τ, so we can recover the power-law parameter by esti-
mating the slope of the line in the log-log-plot. In practice one indeed often
finds a linear relationship in log-log-plots, though often with a cut-off point
that is earlier than the theoretically achievable value. We refer the reader
to Figure 5.5 in [LNR17] and Chapter 1.6 in [VDH09] for log-log-plots of
the degree distributions of various real-world networks.

3.1.3 Properties of power-laws


The most important case for us will be power-law exponents τ 2 (2, 3). For
most real-world networks, the degree sequences have power-law exponents
in this range (though there are exceptions). During this course we will
understand why this is the interesting case. As the next lemma shows,
this is the range where the distribution has finite expectation, but infinite
second moment.

Lemma 3.3. Let D be a distribution which follows a power-law with


exponent τ > 1 for any of the four variants in Definition 3.1, and let
X ∼ D.

(i) If τ < 2 then E[X] = ∞.

(ii) If τ > 2 then E[X] < ∞.

(iii) If τ < 3 then E[X2 ] = ∞.

(iv) If τ > 3 then E[X2 ] < ∞.

Proof. We only give the calculation in the simplest case of a strong power-
law for densities. Then
X
∞ X
∞ X

E[X] = X  Pr[X = k] = Θ( k  k−τ ) = Θ( k−τ+1 ),
k=0 k=1 k=1

and the sum is finite for τ > 2 and infinite for τ < 2. Note that Θ() was
taken with respect to the limit k → ∞ and thus the hidden factors are
independent of k (and not, as usual, independent of n). This is why we
could take Θ() out of the sum.
For the second moment, the calculation is almost identical:
X
∞ X
∞ X

2
E[X ] = X 2
 Pr[X = k] = Θ( k 2

−τ
k ) = Θ( k−τ+2 ),
k=0 k=1 k=1
which is finite for τ > 2 and infinite for τ < 2.
P∞ 2
For power-laws of the cumulative distribution, a sum like k=0 X 
P∞
Pr[X = k] can be linked to the sum k=0 X  Pr[X  k] via Abel summation,
the discrete version of integration by parts. We do not give the details.

We remark that for the threshold cases τ = 2 and τ = 3 in Lemma 3.3,


it depends on the exact distribution whether the corresponding moments
are finite or infinite. As usual for this course, we do not go into details for
such threshold cases.
Lemma 3.3 has an important implication. Let us forget momentarily
about the cutoff in Definition 3.2. If X ∼ D is the distribution of degrees
in a network model, then E[X] corresponds to the expected degree, and
E[X]  n/2 is the expected number of edges in the graph. More formally, let
us consider the case E[X] = c < ∞ and let us assume that the number of
edges is concentrated around its mean and that the power-law has negligible
error. Then whp the number of edges is c/2  n  o(n), and the average
degree is c  o(1). On the other hand, if E[X] = ∞, then we can not ignore
the cutoff in Definition 3.2, since the number of edges in an n-vertex graph
can not be infinite. However, the next lemma tells us that this corresponds
to an average degree of ω(1), or equivalently to ω(n) edges.

Corollary 3.4. Let G(n) be a sequence of random graphs, D = D(n) =


ω(1) and assume that the degrees in G(n) follow a power-law with
exponent τ up to D as in Definition 3.2.

(i) If τ < 2, then whp the number of edges in G is ω(n).

(ii) If τ > 2, let m D be the number of edges which have at least




one endpoint of degree at most D. Then E[m D ] = O(n).




Moreover, if the power-law has negligible cut-off error then whp


m = m D  o(n).


Proof. (i). Let Nd,n be the number of vertices of degree d in G(n). Let
C > 0 be arbitrary. We will show that whp m > Cn if n is sufficiently
large.
By (3.2), the number of edges is at least

X X
n
whp X
D X
D
m= 1
2
deg(v) = 1
2
d  Nd,n 
1
2
d  c1 d−τ n = c1 n
2
d1−τ . (3.3)
v2 V d=0 d=0 d=0

Since 1 − τ > −1, the sum over d1−τ diverges. Hence there is a constant
P
k such that kd=0 d1−τ > 2C/c1 . If we make n large enough, then D =
D(n)  k, and (3.3) implies m > Cn.
(ii). We make a similar computation to (3.3). Since we want to estimate
m D instead of m, we may use the following upper bound. (It is not an


equality because we count edges twice if both their endpoints have degrees
at most D.)

X
D
whp X
D X

m D  d  Nd,n  d  c2 d−τ n = c1 n d1−τ . (3.4)
d=0 d=0 d=0

Other than in (i), the sum in (3.4) converges to a constant C since 1 − τ <
−1. Hence m D  Cc1 n. The second statement in (ii) follows directly from

P whp
the definition of negligible cutoff error since m − m D  nd=D+1 Nd,n = 

o(n).

We have found out that τ > 2 leads to sparse graphs (constant degrees),
while τ < 2 does not. Since we are interested in sparse graphs in this course,
we will from now on restrict to τ > 2. Let us compute how many edges
come from vertices of degree between K and D, for some large K. By a
similar calculation as above, the number of incident edges to such vertices
is at most
XD X∞ Z∞
−τ
d  (c2 d n)  c2 d n = Θ( c2 x1−τ ndx) = Θ(K2−τ n).
1−τ

d=K d=K K

Since τ > 2, the exponent of K is negative. So if K is large, then this is only


a small fraction of all edges. In other words, most edges will not have an
endpoint of degree larger than K. A common way to phrase this is: “Most
edges run between vertices of small degrees”. Yet another formulation of
the same fact is that if we pick a random edge, and then a random endpoint
of this edge, then the endpoint will likely have small degree. It is easy to
see that the opposite is true for τ < 2. There, a random endpoint of a
random edge has likely very large degree.
3.2 The Chung-Lu model
In the following sections we will discuss several random graph models that
have power-law degree distributions. The easiest one is the Chung-Lu
model, named after Fan Chung and Linyuan Lu, who analyzed the model
in detail [CL02].3 It is also known as the weighted Erdős-Rényi model or
simply as weighted random graphs or generalized random graphs.

Definition 3.5. Let D be a power-law distribution on [1, ∞) with ex-


ponent τ > 2. A Chung-Lu random graph on n vertices is obtained
by the following two-step procedure.

1. Every vertex v 2 V draws i.i.d. a weight wv ∼ D.

2. For any two distinct vertices u, v 2 V, we insert the edge {u, v}


with probability
wu w v
puv := min 1, , (3.5)
n
independently for all {u, v}.

3.2.1 Degree distribution


The minimum in (3.5) needs to be present so that puv is a probability.
A fundamental fact about this formula is that it converts weights into
expected degrees, as the following lemma shows.

Lemma 3.6. Let v be a vertex of weight wv  n in the Chung-Lu


model. Then

E[deg(v)] = Θ(wv ). (3.6)

The hidden constants are independent of v and wv .

Proof. Let u 2 V \ {v}. Assume first that wv  n/2, or rephrased that


n/wv  2. We can calculate the expected degree of v as follows. (See also
3
The original model of Chung and Lu differed in some details, see Section 3.2.4.
Excursion 3.1 below.)

E[deg(v)] = (n − 1)  Pr[v ∼ u]
Z∞
= (n − 1) Pr[wu = w]Pr[v ∼ u | wu = w]dw
Z∞ 1
 w  wv
= Θ(n)  w−τ min 1, dw
n
1
Z n/wv Z∞ !
w  w v
= Θ(n)  w−τ dw + w−τ  1dw
1 n n/wv
Z n/wv
= Θ(wv )  w1−τ dw + Θ(n)  [w1−τ ]∞
n/wv
1
τ>2
= Θ(wv ) + Θ(n  (n/wv )1−τ )
= Θ(wv ) + Θ((n/wv )2−τ  wv ) = Θ(wv ),

where the last step uses that n/wv  1 and τ > 2. One checks that the
same end result also holds for n/2 < wv  n.

The assumption wv  n in Lemma 3.6 is not problematic. Let us call


n w the number of vertices of weight at least w. Then E[n w ] = Θ(nw1−τ ).
 

In particular, for wmax := n1/(τ−1) this expectation is Θ(1). This means that
the largest weight among the n vertices is roughly wmax . For w = ω(wmax )
we have E[n w ] = o(1), so such vertices are unlikely to exist by Markov’s


inequality. On the other hand, if w = o(wmax ) then E[n w ] = ω(1), and




whp n w > 0 by the Chernoff-Hoeffding bound. For τ 2 (2, 3), we can



p
bound n1/2  n1/(τ−1)  n, so whp the largest weight is larger than n,
but smaller than n.
We can actually say a bit more about the exact distribution of degrees.
Consider a vertex of weight wv , i.e., we fix the weight of this vertex, but
consider the other weights still as random. Then every other vertex has
the same probability pv = Θ(wv /n) to connect to v, so the degree of v
is Bin(n − 1, pv )-distributed. If pv = o(1) (or equivalently wv = o(n),
which whp holds for all vertices), then this converges to Po((n − 1)pv ).
We say that deg(v) converges to a mixed Poisson distribution, which is
a Poisson distribution Po(X), where X is again a random variable. In this
case, X = E[deg(v) | wv ] = Θ(wv ).
Recall that the Poisson distribution is highly concentrated around its
expectation if the expectation is large. This means that the degree of a
vertex of large weight is concentrated around its expectation, which is (up
to a constant factor) the same as its weight. For example, it can be shown
that whp all vertices of weight at least log n have degrees which match their
expectations up to a constant factor. As a consequence, the degree sequence
follows a power-law with the same exponent τ as the weight distribution.
This is the main purpose of the model: to generate graphs with a power-law
degree distribution of exponent τ.
Excursion 3.1 (Smart with integrals). We will see computations as in the proof of Lemma 3.6
over and over again, so let us make a detour to discuss the structure of the calculation a
bit more in detail. It looks messy at first glance, but it becomes much easier if one knows
what to look for.
We obtained a sum of two integrals,
Z n/wv Z∞
−τ w  wv
w dw + w−τ  1dw .
n
|1
{z } | n/wv
{z }
=:I1 =:I2

In the proof we evaluated both of them, and it was a bit tricky to see that the second one
is negligible. However, none of this was actually necessary. With the right perspective,
it was a priori clear that the second integral was negligible.
Let us first summarize what we have already used in the computation. Both integrals
are over polynomials in w. Integrating over polynomials is easy. Since we do not consider
threshold case, we only Rhave exponents s 6= −1, and the inverse derivation of ws is
w
1
1+s w
1+s
. The integral w01 ws dw can thus be easily evaluated and is [ 1+s1
w0 . If
w1+s ]w1

s > −1, then w1+s is increasing, and we get Θ(w1 ). (We assume here that w1 is at
1+s

least by a constant factor larger than w0 .) If s < −1, then w1+s is decreasing, so we
obtain Θ(w1+s0 ). Note that the signs also work out in the second case, since there are
two canceling minus signs: from 1/(1 + s) < 0 and from evaluating the lower boundary.
So the integral is always dominated either by the upper boundary term or by the lower
boundary term.
upp upp
Let us call Ilow
1 and I1 the two terms needed to evaluate I1 , i.e., Ilow1 and I1
are obtained by plugging the lower (the upper) boundary into the inverse derivative
1
2−τ w
2−τ
wv /n, and similarly for I2 . In the proof of Lemma 3.6, the dominating term
was given by the lower boundary for both integrals, and we evaluated both Ilow1 2 .
and Ilow
upp
However, a more clever argument uses the observation that |I1 | = Θ(|I2 |). Before we
low
upp
argue why this is generally true, let us check this by hand. For |I1 |, we need to plug
n/wv into the inverse derivative w 2−τ
wv /n, and obtain up to constant factors
upp
|I1 | = Θ((n/wv )2−τ wv /n) = Θ((n/wv )1−τ ).

For |Ilow
2 |, we plug n/wv into w
1−τ
, and obtain the same term.
This is an incredibly helpful observation. Knowing this relation, we know that
upp
• Ilow
1 dominates I1 .
upp
• I1 is of the same order as Ilow
2 .
upp
• Ilow
2 dominates I2 .
From these observations it is obvious that Ilow 1 dominates everything else. Even more, in
the proof of Lemma 3.6 we had to give special treatement to the case n/2 < wv  n,
because our estimation of I1 may fail if wv is very close to n. This is because if the upper
upp
and lower boundary (1 and n/wv ) are too close to each other, the terms Ilow 1 and I1
are very similar to each other any may cancel. (They have opposite signs.) But in fact,
upp
we have nothing to fear: if Ilow 1 and I1 are of the same order, then we have a third
term I2 which is also of the same order and which can not be canceled out. So by our
low

meta-argument it becomes obvious that the asymptotics do not change in this case either.
upp
Why is it generally true that |I1 | = Θ(|Ilow 2 |)? ThisR∞is because the two integrals I1
and I2 were obtained by splitting a single integral I = 1 f(w)dw into two parts. But
I was taken over a continuous function f. (The min function is not smooth, but it is
continuous.) This means that the functions in I1 and I2 take the same values when the
splitting point wsplit = n/wv is plugged in; it is just the same as plugging wsplit into f.
upp
How do we get I1 and Ilow 2 from f(wsplit )? Since we integrate over polynomials in both
cases, we just increase the exponent by one and plug in wsplit , so in both cases we obtain
wsplit  f(wsplit ) up to constant factors.
The argument may seem a little magic, so let us rephrase it in terms of the quantities
that we compute. We compute how many neighbours a vertex v of weight wv has in
expectation. The integral I has a natural interpretation: the range from w0 to w1 gives
us the number of neighbours with weight between w0 and w1 (all in expectation). The fact
that the first integral has exponent < −1 tells us that there are more neighbours of constant
weight than of larger weight, for example of weight in the interval [wsplit /2, wsplit ]. The
key insight is that the latter number is essentially the same as the number of neighbours
with weight in [wsplit , 2  wsplit ], due to three ingredients:
1. Both intervals have the same length, up to a constant factor of 2.
2. The probability density is the same up to a constant factor: increasing w by a
constant factor κ decreases the probability density Pr[wu = w] by the constant
factor κ−τ .
3. The connection probability Pr[u ∼ v | wu = w] only changes by a constant factor
if we vary w within [wsplit /2, 2  wsplit ]. This is because the connection probability
is continuous and piecewise smooth in wu .
So let us summarize how we should actually think about Lemma 3.6. (All statements
about expectations.)
(a) Since I1 has exponent < −1, there are more neighbours of constant weight than of
larger weights, in particular than weights in [wsplit /2, wsplit ].
(b) There are the same number of neighbours with weights in [wsplit /2, wsplit ] and with
weights in [wsplit , 2  wsplit ], up to constant factors.
(c) Since I2 has exponent < −1, there are more neighbours of weights in [wsplit , 2  wsplit ]
than of larger weights.
Thus, most neighbours have constant weight, and we can neglect any term that comes
from neighbours of larger weight. It thus suffices to compute how many neighbours of
constant weight there are. This is easy to compute. There are Θ(n) vertices of constant
weight, and each of them has probability Θ(wv /n) to connect to v. Hence, wv has Θ(wv )
neighbours.
As a final exercise, let us try to apply the same reasoning for 1 < τ < 2. How many
neighbours does wv have in this case? We still obtain the same integrals I1 and I2 , but
now the exponent 1 − τ of I1 is larger than −1. This means that v has more neighbours of
weight in [wsplit /2, wsplit ] than of smaller weights. On the other hand, the exponent −τ
in I2 is still smaller than −1, so there are more neighbours with weight in [wsplit , 2  wsplit ]
upp
than neighbours with larger weight. So we only need to evaluate either I1 or Ilow 2 (both
automatically give the same value up to constant factors). The term Ilow 2 looks a bit
simpler, so we plug wsplit = n/wv into the inverse derivative w 1−τ
and obtain that v has
Θ((n/wv )1−τ ) neighbours in expectation.4

3.2.2 Friends of your friends


We have seen that the degree sequence follows a power-law with exponent
τ, because degrees are tightly coupled to the weights. In this section we
will ask a perhaps surprising question. If we pick a random vertex v,
and then a random neighbour u of v (assuming deg(v)  1), what is the
weight (or degree) distribution of u? It is not the same distribution as the
unconditional distribution Pr[wu = w], since the event “u ∼ v” is positively
correlated to large weights, so the distribution Pr[wu = w | u ∼ v] will also
be skewed towards larger weights. This has the psychologically surprising
consequence that the neighbours of a typical node v tend to have higher
degrees than v. It is also known as the friendship paradox : your friends
have more friends than you! [Fel91]
How many more friends do your friends have compared to yourself? In
an Erdős-Rényi random graph, there is no correlation between edges, so
the number of neighbours of u is 1 + Bin(n − 2, p): they have neighbour
v, all other neighbours are as likely as before. So the asymptotic answer
is 1 + Po(µ) versus Po(µ), and neighbours of a random node have their
degree increased by exactly one in the limit. But the degree distribution in
real-world network is a power-law, and there the difference is much more
dramatic.
4
This calculation tells us a lot: most of the neighbours have weight roughly wsplit .
Note that the connection probability is one if wu  wsplit , and Θ(w1−τ split ) is simply the
number of vertices of weight at least wsplit . So we have found out that v connects to all
vertices of weight larger than wsplit , there are Θ(w1−τ
split ) such vertices, and this is more
than the number of neighbours of smaller weight.
Theorem 3.7. Let G be a Chung-Lu random graph with power law
exponent τ > 2. Let v be a vertex of weight wv , and let u be a
uniformly random neighbour of v, conditioned on deg(v)  1. We
denote W2 = wu . Then W2 follows a power-law distribution with
exponent τ − 1 up to cut-off D = n/wv .

Proof. We need to compute Pr[wu = w | u ∼ v] for w  D = n/wv . Note


that this is a density, not a probability. Nevertheless, the usual formula for
conditional probabilities still applies:

Pr[wu = w and u ∼ v]
Pr[wu = w | u ∼ v] =
Pr[u ∼ v]
w−τ min{1, ww v
}
= Θ(1) n
= Θ(w1−τ ),
wv /n

since the minimum is taken by the second term for w  D.

Of course, since degrees and weights are tightly coupled, a similar state-
ment would be true for the distribution of the degree of u instead of the
weight of u. Theorem 3.7 is rather remarkable since it says that the distri-
bution of wu does not depend on wv . This property is also called neutral
assortativity (or no assortativity). Assortativity is a measure for how much
the distribution of deg(u) depends on the value of deg(v), and in which di-
rection this connection goes. We do not give a formal defintion (since there
are several competing ones), but informally speaking a graph has positive
assortativity if the distribution of deg(u) is more skewed towards larger
values if deg(v) is large, and more skewed towards smaller values if deg(v)
is small. If the connection goes into the opposite direction, we speak of
negative assortativity. As a rule of thumb, social networks tend to have
positive assortativity (nodes of large degree connect especially well to other
nodes of large degree), while many technological networks have negative as-
sortativity (nodes of large degree connect well to nodes of small degree).
Note that in the most common case τ 2 (2, 3), the random variable W2
in Theorem 3.7 has infinite expectation. A bit more precisely, the limiting
distribution for n → ∞ has infinite expectation, while for finite n there
is a cut-off point that goes to infinity. Pointedly speaking, while you have
a constant number of friends, your friends have in expectation an infinite
number of friends.5
Of course, it is prevented by the cut-off point and the finite size of
the universe that your friends actually have infinitely many friends. But
the cut-off point goes to infinity as a polynomial in n, so it grows rather
fast. The expectation of wu is really large even for finite networks. On the
other hand, this is one of the cases where the expectation is dominated by
low-probability events (the rare event that you have a superstar as friend:
for most people it does not happen, but the slim chance that it happens
dominates the expectation). So for most people the situation looks a bit
less depressing. A more accurate estimation of the typical most popular of
your friends is given by Theorem 3.8.

Theorem 3.8. Let G = (V, E) be a Chung-Lu random graph with


power-law exponent τ > 2, and let v 2 V be a vertex of weight
wv = w with 2  w  n(τ−2)/(τ−1) .
Let ε > 0 be a constant, and let wmax be the highest weight among
the neighbours of v. Then for sufficiently large w,

(i) wmax w1/(τ−2)−ε with probability 1 − e−w .


Ω(1)


(ii) wmax  w1/(τ−2)+ε with probability 1 − w−Ω(1) .

The hidden constants are uniform over all w.

Proof. We know by Theorem 3.7 that the neighbours of v follow a power-


law with exponent τ − 1. A density power-law implies a cumulative power-
law. Therefore, if u is a neighbour of v then px := Pr[wu  x | u ∼
v] = Θ(x2−τ ). Hence, each of the n − 1 other vertices has probability
qx := Pr[u ∼ v]  Pr[wu  x | u ∼ v] = w/n  px to be a neighbour of
v of weight at least x. This is independent for all u since we have fixed
the weight wv = w. Hence, the number of such vertices is Bin(n − 1, qx )
distributed.
Now we compute the expectation of this binomial distribution. For
(i), we plug in x := w1/(τ−2)−ε . The condition wv  n(τ−2)/(τ−1) ensures
that x is smaller than the cut-off point D = n/w in Theorem 3.7, since
5
This is true for all social networks with power-law exponent τ 2 (2, 3). A Swedish
study found that the network of sexual contacts has a power-law exponent of 2.6 [LEA+ 01].
Draw your own conclusions.
x  w = w(τ−1)/(τ−2)−ε  n. We obtain an expectation of

(n − 1)qx = Θ(wpx ) = Θ(w)  w(1/(τ−2)−ε) (2−τ) = Θ(w)  w−1+Ω(1) = Θ(wΩ(1) ).




Since we assumed w  2, we can omit the Θ in the last expression:6 for


any constant C > 0, we can simplify CwΩ(1) into wΩ(1) . Hence the number
of neighbours of v of weight at least x is binomially distributed with ex-
pectation wΩ(1) . By the Chernoff bounds, the probability to have no such
neighbour is exponentially small in wΩ(1) .
For (ii), we plug in x := w1/(τ−2)+ε , and perform an analogous calculation.
We obtain that the expected number of neighbours of v with weight at least
x is w−Ω(1) . By Markov’s inequality, the probability that there exists such
a neighbour is also at most w−Ω(1) .

In Theorem 3.8, note that the exponent 1/(τ − 2) is larger than one
for τ 2 (2, 3). Hence, v has a neighbour of much larger weight than v
itself. On the other hand, if τ > 3 then for a large-weight vertex v typically
all its neighbours have much smaller weight than v. However, the model
is generally less interesting for τ > 3. The large-degree vertices are then
negligible for most questions since there are too few of them, for example for
the small-world properties discussed in the next section. In such respects,
the Chung-Lu model for τ > 3 behaves just like the Erdős-Rényi model.
For the probabilities in Theorem 3.8, both terms approach 1 as w in-
creases. However, the probability in (i) approaches 1 very rapidly as w in-
creases (“stretched exponentially”), while the probability in (ii) approaches
1 more slowly (polynomially fast in w).

3.2.3 Ultra-small worlds


A remarkable consequence of Theorem 3.8 is that the typical distances in
Chung-Lu graphs are even smaller than in Erdős-Rényi random graphs if
τ < 3. They are O(log log n). Graphs like this are also called ultra-small
worlds.

Theorem 3.9. Let G be a Chung-Lu random graph on n vertices with


power-law exponent τ 2 (2, 3), and let x, y be two vertices drawn

6
This is the only reason for assuming w  2, to get rid of some notational ballast. This
simplification would not be true for w = 1.
uniformly at random from the giant component. Then the graph
distance d(x, y) satisfies

2  o(1)
d(x, y) = log log n (3.7)
| log(τ − 2)|

in expectation and with high probability.

Proof (sketch). We will only show the upper bound, and only sketch the
main idea. Let ε > 0, and let us write η := 1/(τ − 2) − ε for brevity. Since
τ < 3, we may choose ε so small that η > 1.
Consider a vertex v0 of large constant weight w. Then by Theorem 3.8
v0 has a neighbour v1 of weight w1 = wη . Applying the same theorem
again, v1 has a neighbour v2 of weight w2 = wη1 = wη . Iterating, we find
2

a vertex vi in distance i of v0 of weight wi = wη . We proceed this way


i

until we find a vertex vk of weight wk  n1/2 , which happens if wη  n1/2 ,


k

which gives the condition k  logη 21 logw n = logloglogη n − O(log log w).
We apply this reasoning for both x and y. From each of them, we
find paths of length k to vertices x and y of weight at least n1/2 . By
0 0

definition of the connection probability in (3.5), x and y are connected


0 0

with probability one. Thus we have found a path of length 2k + 1 from x to


y. Since k  | log(τ−2−ε)|
log log n
, and since we can find this bound for any constant
ε, this gives the upper bound of the theorem, see also Figure 3.1.
For a complete proof, one would need to analyze the failure probabilities
in each step. Moreover, one would need to adapt the argument for the first
steps when the weights are still small, since the failure probabilities are
large during this phase. This is where the condition enters that x and y
are from the giant component. Finally, the lower bound can be achieved
by a careful first-moment argument showing that the expected number of
shorter paths is o(1).

3.2.4 Variations of the Chung-Lu model


Constant factor deviations

It is easy to see that all computations go through if the formula (3.5) is


relaxed a bit. In particular, constant factor deviations do not play a role,
'

*
ji%y
÷
"

,¥÷,=%Y÷ˢᵗm
{
rn
-

i.
WE E-
loyzbgn

Figure 3.1: In Chung-Lu graphs with τ 2 (2, 3), it only takes (1 


o(1)) logloglogη n steps to reach a vertex in the inner core, and all vertices in
the inner core are connected to each each other.

so all our calculations remain valid if we only require


w u wv wu wv
c1 min 1,  puv  c2 min 1, (3.8)
n n
for two universal constants c1 and c2 . For most of the arguments presented
here, constant factors do not matter. An exception is the last step in the
proof of Theorem 3.9, where we used that any two vertices of weight at
least n1/2 are connected with probability one, so they form a single, very
large clique. Equation (3.8) only ensures a connection probability of Ω(1)
among them. But this is enough to make this set of vertices so densely
connected that any two vertices share a common neighbour. So instead of
a clique, we obtain a dense subset of diameter 2.
There are many formulas in the literature that look different but sat-
isfy (3.8), for example
w u wv
puv = , (3.9)
n + wu wv
puv = 1 − e−wu wv /n , (3.10)
wu w v X
puv = min 1, , where W = wz . (3.11)
W z 2V

The last formula falls essentially into this category since there are constants
c1 , c2 such that whp c1 n  W  c2 n. There are some subleties: it is no
longer true that the events “u ∼ v” and “u ∼ v ” are independent of each
0 0

other if we condition on the four weights wu , wv , wu , wv . They only be-


0 0

come independent after conditioning on all weights, since the connections


probability also depends on W. In practice, these subtleties do not make
a difference.

Other distributions

It is possible to define the Chung-Lu model with a distribution D which is


not a power-law distribution. For example, we can retain the Erdős-Rényi
model as the special case where all weights are one. However, it is important
that D has finite expectation, since otherwise the expected number of edges
is no longer linear, and it is no longer true that E[deg(v)] = Θ(wv ). In
this case, formula (3.11) is often used, but that solves neither of the two
problems.

Deterministic Weight Sequences

Instead of drawing the weights randomly in step 1 of Definition 3.5, it is


also possible to use a fixed weight sequence. For example, a popular power-
law sequence with exponent τ is given by wi := (n/i)1/(τ−1) . This is also
possible for sequences which are not necessarily power-law, and this is the
framework in the original work of Chung and Lu. However, the caveats
from the last paragraph still apply.

Multigraph Variation

It is sometimes helpful to compare the Chung-Lu model with the following


multigraph variation. Recall that in a multigraph, we may have several
edges between the same pair of vertices. In a multigraph version, we can
get rid of the minimum; we simply require that the expected number of
edges between u and v is wu wv /n, and we don’t need to cap this formula
at one. Since we have only specified the expectation, there are many ways
of realizing this variation. The most natural one puts randomly either
bwu wv /nc or dwu wv /ne edges between u and v, where the two probabilities

are chosen such that the expectation is wu wv /n. If wu wv /n  1 (which


is true for the vast majority of vertices), then the number of edges is the
same Bernoulli random variable as in the Chung-Lu model. Of course, the
multigraph variation has more edges (counted with multiplicities), but it
can be computed that the number of multi-edges (even with multiplicities)
is rather small. Since the formula for the multigraph variation is simpler,
it is sometimes easier to understand. For example, the computation in
Lemma 3.6 becomes much easier in the multigraph variation:

E[deg(v)] = (n − 1)  E[# edges between u and v]


Z∞
= (n − 1) Pr[wu = w]E[# edges between u and v | wu = w]dw
Z∞ 1
w  wv
= Θ(n)  w−τ dw
1 n
= Θ(wv )  [w1−τ ]∞
1 = Θ(wv ).

3.3 Perfect sampling: the configuration model


There is another very popular option to obtain random graphs with any
prescribed degree distribution. Given a sequence d1 , . . . , dn , we may ab-
stractly consider the set of all graphs on n vertices with deg(vi ) = di for
1  i  n, and draw a graph uniformly at random from this set. Of course,
this abstract description by itself is not very helpful. However, there is a
simple way to generate such graphs, which can be surprisingly efficient in
some situations.
We call a degree sequence d1 , . . . , dn valid if there is a graph with this
degree sequence.7 Then Algorithm 1 is a way to draw a graph uniformly
Pn
7
Obviously, a necessary condition is that 0  di  n − 1 and that i=1 di is even.
However, this alone is not sufficient. For example, there is no graph which contains at
the same time degrees 0 and n − 1. It is an exercise in graph theory to show that a
Pk
degree sequence d1  . . .  dn is valid if and only if the sum is even and i=1 di 
at random from that distribution, see also Figure 3.2.

Algorithm 1: Drawing Configurations


Input: valid degree sequence d1 , d2 , . . . , dn
repeat
Let G be the empty multigraph on n vertices.
For i = 1, . . . , n, create di half-edges (“stubs”) si,1 , . . . , si,di .
Pick a random perfect matching among the stubs.
For every matched pair (si,x , sj,y ), insert an edge between i and
j in G.
if G does not have loops or multiple edges then
return G
else
discard G and restart
until;

Obviously, the efficiency of this method depends on the probability that


the produced multi-graph G is a simple graph, i.e., that G does not have
loops or multiple edges. It turns out that this probability is pretty big if the
second moment of the degree distribution is bounded. More precisely, let D
be the degree of a random element of the degree sequence (i.e., the degree of
a random vertex). If the second moment E[D2 ] is bounded by a constant,
then Pr[G simple] = Ω(1). Intuitively, this makes sense: each stub of
vertex i has probability # stubs−1
di −1
to choose another stub of i as partner,
Pn di 
so the expected number of loops is given by ν := # stubs−1 1
i=1 2 . If
E[D ] = O(1), then also ν = O(1). It can be shown (under some technical
2

conditions) that the number of loops is Po(ν)-distributed in the limit, so


there is a constant probability of having zero loops. A similar argument
holds for multiple edges.
While this is a really nice result, for power-laws it excludes the most
interesting case. Power-laws have finite second moment if τ > 3, but the
more interesting case is 2 < τ < 3. In this case, it is an open problem
whether a graph with this degree distribution can be sampled efficiently
uniformly at random. However, it is possible to just ignore loops and
multiple edges (i.e, just delete them from the multigraph). Then we do
not get the exact degree sequence that we desire, but one can show that
Pn
k(k − 1) + i=k+1 min{di , k} for all 1  k  n (Erdős-Gallai theorem).
multi -

edge

} di
:: stubs

loop

random perfect
matching
Figure 3.2: In the configuration model, every vertex vi gets di stubs (half-
edges), and the stubs are connected via a random perfect matching. It may
happen that loops or multi-edges are created.

the number of loops and multiple edges is very small, so we are still close
to the target degree sequence. Moreover, by the same argument as above,
loops and multiple edges mostly affects vertices of large degree, where one
edge fewer might be tolerable.
One reason why the configuration model is popular is that in the case
of finite second moments E[D2 ], it can be analyzed with amazing precision.
For example, let us call D2 the degree of a random neighbour u of a random
vertex v. It is not hard to see that D2 only depends on the degree sequence,
and to compute its expectation:
E[D2 ]
E[D2 ] = .
E[D]
This means that we can employ all the machinery that we have learned for
Erdős-Rényi graphs in this more general case. In particular, we can couple
a local exploration of the configuration model with a Galton-Watson tree
with offspring distribution D2 − 1. The “−1” accounts for the fact that
the parent is also a neighbour that must not be counted as offspring. In
particular, the Galton-Watson tree has positive survival probability if and
only if µ = E[D2 ] − 1 > 1, or equivalently

E[D2 ] > 2E[D], (3.12)


which is known as the Molloy-Reed criterion. In particular, it can be
shown that the graph has a giant component if and only if (3.12) is satis-
fied [Dur07]. Moreover, the expected number of vertices in distance k grows
by a factor of µ in each step, starting with E[D] at the first step. Thus
the number of vertices in distance k is in expectation E[D]  µk−1 = Θ(µk ).
From this equation, it can be derived that the typical distance in the graph
model is (1  o(1)) logµ n if 1 < µ = O(1) [vdHHVM05].8 Other quantities
like the component structure can also be derived just as for Erdős-Rényi
graphs.
However, as mentioned, all this is restricted to the case E[D2 ] = O(1),
and many phenomena are specific to this situation. For example, for finite
E[D2 ] the Molloy-Reed criterion (3.12) is a condition about constant de-
grees. If E[D2 ] > 2E[D], then there exists a constant C > 0 such that even
the truncated distribution D C := min{D, C} satisfies E[D2 C ] > 2E[D].
 

Hence, the giant component would also form in the truncated graph where
we reduce all large degrees to C (and even if we just remove them from the
graph). Likewise, since we can approximate µ to arbitrary precision by a
truncated distribution D C , the typical distance log n/ log µ is a property


that arises from vertices of degree at most C. Vertices of larger degree play
essentially no role for shortest paths in the graph. Note how different this
is from Chung-Lu graphs with exponent τ 2 (2, 3), where shortest paths
were obtained by reaching vertices of very weight in O(log log n) steps.
A particular case for which E[D2 ] = O(1) are power-law networks with
exponent τ > 3 (Chung-Lu or configuration model). In both models, the
high-weight vertices do not even help to cut the lengths of shortest paths.
Moreover, they do not even play a role for the formation of a giant com-
ponent: a giant exists if the low-weight vertices are dense enough to form
it on their own, and otherwise it does not exist. For most purposes, it is
adequate to consider power-law networks with τ > 3 as networks in which
there are too few large-degree vertices to affect the global structure. Of
course, some things do change. For example, power-law networks with
τ > 3 still contain vertices of polynomial degree  n1/(τ−1) . As a triv-
ial consequence, in the subcritical regime without a giant component, they
still contain components of size at least n1/(τ−1) , even if they are little more
than a star around a central vertex of that degree. Recall that such com-
8
The reference uses a slight variation of our model here.
ponents do not exist in Erdős-Rényi networks by Lemma 2.6. On the other
hand, the supercritical regime (both for τ 2 (2, 3) and for larger τ if the
Molloy-Reed criterion (3.12) is satisfied) is similar to Erdős-Rényi graphs
with µ > 1: the fraction of components of size s decays exponentially in
s as in Theorem 2.4. The reason is the same as for the Erdős-Rényi case:
in the corresponding Galton-Watson process, if we have k vertices in some
layer, then each of them has the same positive probability to become the
root of an infinite subtree, and this is independent for all k vertices. So the
probability of staying finite is exponentially small in k. (For the configura-
tion model, it is still true that a BFS can be coupled to a Galton-Watson
process, though the proof is a bit harder than for the Chung-Lu model.)

3.4 Preferential attachment


There is yet another way to obtain power-law random graphs. This model
goes back to Barabási and Albert [BA99] and is known as preferential
attachment (or simply as Barabási-Albert model). Let M  1 be an
integer, and δ > −M. We start with a complete graph GM+1 on M + 1
vertices v1 , . . . , vM . Then the remaining vertices are added one by one to
the graph, and we denote by Gk the graph of k vertices. In order to go from
Gk to Gk+1 , we add the (k + 1)-st vertex vk+1 , and add exactly M edges
from vk+1 to the previous vertices v1 , . . . , vk in Gk . Those M neighbours are
chosen randomly, and the probability to choose vi as a neighbour of vk+1 is
proportional to degGk (vi ) + δ. I.e., we assign to vi a probability

degGk (vi ) + δ
Pk ,
j=1 (degGk (vj ) + δ)

and we draw M neighbours from this distribution. If we draw the same


vertex twice, then we repeat the second drawing until we find a new vertex.9
The idea behind this model is that a new node is more likely to join it as
neighbour of popular nodes. The principle is also called Rich-get-Richer
or the Matthew effect.10 Preferential attachment has become extremely
9
Some variants also allow multiple edges in this case, which does not make a big
difference.
10
From the bible verse “For to every one who has will more be given, and he will have
abundance; but from him who has not, even what he has will be taken away.” Matthew
25:29.
popular since it gives an intuitive explanation for where heterogeneous
degrees may come from. (Though it is not the only possible explanation.)
Intriguingly, the above rule yields a power-law degree distribution.

Theorem 3.10. Let Gn be given by the preferential attachment model


with parameters M and δ. Then with high probability the degree
sequence of Gn follows a power-law with exponent τ = 3 + δ/M.

Note that we allow any δ > −M, so we can achieve any power-law
exponent τ > 2.
We will not give a proof of Theorem 3.10, but in the following we will
try to make it plausible. Firstly, it may seem that the random process is
rather unpredictable, but it is not. Rather the opposite, it can be shown
that the time of “birth” (i.e., the index k of a vertex) plays almost the same
role as the weight for Chung-Lu random graphs, where the “weight” of vk is
(n/k)1/(τ−1) . I.e., whp the number of neighbours of vertex vk is proportial
to (n/k)1/(τ−1) if n/k is large. This is the same deterministic formula that
is sometimes used as a fixed weight sequence in the Chung-Lu model to
generate power-law graphs of exponent τ. However, there are also some
systematic deviations from the Chung-Lu model. In particular, recall that
every vertex receives M edges at birth, so there are no isolated vertices.
Moreover, by induction the graph is connected at all times.11
So where does the power-law come from? Let us denote the degree of
vertex vi at time t (i.e., when the graph has t vertices) by Di,t , and assume
that Di,t is large for some t. At this time, the total number of edges is tM.
Therefore, when an edge chooses a random endpoint, then the probability
pi,t that it chooses vi is

degGt (vi ) + δ Di,t + δ


pi,t = Pt = . (3.13)
j=1 (degGt (vj ) + δ)
(2M + δ)t

Now consider the next εt rounds. A total of εtM edges will be added
during this time. This means that the denominator of (3.13) changes little
11
There are also variants where small components can form, similar to Erdős-Rényi
graphs. For example, we may not equip a new vertex with exactly M edges, but only with
M edges in expectation, where the exact number may be zero with positive probability.
It is mostly a matter of taste whether generating only connected graphs is a bug or a
feature of the model.
during this time. Let us momentarily assume that pi,t also stays almost
constant in this period. (We can achieve this by choosing ε small enough.)
Then the expected number of edges that hit vi during this time is roughly
Di,t + δ Di,t εDi,t
εtM   εtM  = . (3.14)
(2M + δ)t (2M + δ)t 2 + δ/M
So during this phase, the total number of vertices and edges in the graph
increases by a factor of (1 + ε), while the degree increases by a factor of
ε
(1 + 2+δ/M ε
) = (1 + τ−1 )  (1 + ε)1/(τ−1) . By iterating the argument, we see
that this is true for any factor: when the number of vertices increases by
a factor C, then the degrees increase by a factor C1/(τ−1) . In particular, if
vertex vk is inserted at time k, then its degree may grow in the time interval
from k to n. In this time interval, the number of vertices grows by a factor
of n/k (it grows from k to n), and therefore the degree grows from Θ(1) to
Θ((n/k)1/(τ−1) ). In particular, how many vertices are there with degree at
least x? Assuming concentration and ignoring the hidden constant factors,
it is exactly those vk for which (n/k)1/(τ−1)  x, or equivalently those vk for
which k  x1−τ n. There are exactly x1−τ n such integers k, so the fraction
of vertices of degree at least k is x1−τ . This is the cumumlative power-law
condition with exponent τ.
Of course, a full proof needs a lot of concentration bounds to make the
argument precise, but not much more than that.
The preferential attachment model is compelling for two reasons. Firstly,
it gives a possible explanation for the origin of the power law. Secondly,
it is dynamic and models how the graph changes over time. For some
questions, we would like to work with such models. For example, for some
networks there is birth time data available, i.e., data about the time when
nodes joined a network. If we want to study related to questions like this,
then dynamic models like preferential attachment are the models of choice.
If one is only interested in the static network of size n that is generated
in the end, and not in the history of the process, then the preferential at-
tachment model is less attractive. The process introduces dependencies and
thus makes it very technical to analyze rigorously. In fact, most analyses
of the model show that the resulting graph is very similar to a Chung-Lu
model or configuration model, and use this connection to prove that the
corresponding statements transfer to the preferential attachment model.
(Just mind the obvious differences in component structure and low-degree
vertices since the minimum degree is M.) As usual, there is no “best”
model, and it depends on the question of interest which model to choose.
Preferential attachment networks have a rather rigid dynamics. On the
one hand, this is very helpful for analyzing them. On the other hand, the
dynamics are too rigid to match real evolving networks well, like the web
graph (webpages and hyperlinks) or citation networks (scientific papers and
citations). To understand the problem, let us assume in the preferential
attachment model that the degrees of two vertices u, v differ by a constant
factor at time t, e.g. degt (u)  2degt (v). If both u and v are large12 then
this will stay true throughout the whole process: it is very unlikely that v
will ever overtake u. In particular, the final degree is mostly determined
by the age of the vertex, and the highest degrees are obtained by the oldest
vertices. In real-world networks, the age of a node (a webpage, a paper,
. . .) is certainly correlated with its degree, but the correlation is much
weaker than in preferential attachment networks. This limitation can be
overcome by combining the ideas of Chung-Lu random graphs and pref-
erential attachment. In this model, the graph is still generated vertex by
vertex, but each vertex also obtains a weight. The connection probability
to vertex v is then a function of the degree of v (at time t) and of the weight
of v. The resulting graphs are similar to ordinary preferential attachment
graphs, but the dynamics are more realistic. We do not go into further
detail. A discussion can be found in [LNR17, Chapter 6.5].

3.5 Strengths and weaknesses of the Chung-Lu model


3.5.1 Strengths
The obvious advantage of the Chung-Lu model and the configuration model
is that they can model the power-law degree distributions that we see in
real-world networks. We have seen that there are strong structural dif-
ference between the case τ 2 (2, 3) and τ > 3. In the latter case, the
large-weight vertices are too sparse to change the global structure of the
network. Most real-world networks seem to have exponents in the range
τ 2 (2, 3).
12
More precisely, the error probability of the following statement is exponentially small
in degt (u).
The models can help us understand some effects that are closely related
to the degrees. For example, we understand better why our friends are
much more popular than we are. While this effect is hardly existent in
Erdős-Rényi graphs, it becomes extreme for power-law exponents τ 2 (2, 3):
while we have a constant number of friends, our friends have in expectation
an infinite number of friends, but this expectation is dominated by low-
probability events. This also helps us understand some confusing effects:
if we try to empirically investigate a distribution with infinite mean, then
averages from finite samples tend to be dominated by a few outliers (the
largest-degree nodes in the sample), and are often rather inconsistent with
each other.
We have also seen that a power-law with exponent τ 2 (2, 3) leads
to ultra-small worlds. It is up to debate whether typical distances are
rather like log n or like log log n, but we have also seen that the structural
properties of shortest paths change dramatically, since all shortest paths go
through a small set of large-degree vertices. These properties are important.
For example, routing protocols which use shortest paths will put a very high
load on those high-degree vertices in Chung-Lu random graphs, while the
load is rather uniformly distributed in Erdős-Rényi random graphs.

3.5.2 Weaknesses
The biggest weakness of Chung Lu and configuration models is that they
have no clustering or community structure. The clustering coefficient is
easily seen to be o(1). It is slightly larger than for Erdős-Rényi graphs
because of the skewed neighbourhood degree distribution: the neighbour-
hood of a vertex v has an increased probability to contain vertices of large
weight, and those are more likely to connect to each other. However, the
effect is not large: most neighbours of v are still of constant weight, and
those still have probability Θ(1/n) to connect to each other.
This lack of triangles extends to larger cycles, to cliques, and to other
small and dense subgraphs. The networks look locally tree-like (which
is essentially the same as saying that they are well described by Galton-
Watson processes). In particular, if we start from a random vertex and
explore the graph, usually we obtain in the first rounds a subgraph of k
vertices which is a tree. Thus is has only k − 1 edges and has minimal
density. However, it can be tricky to find the proper statistics here. For
example, if we try to simply count the number of connected subgraphs of
size k which have more than α  k edges for some α > 1, then this may
p
yield a large number. The vertices of weight at least n form a gigantic
clique, and this contains very many subcliques of size k, all of which are
counted in the statistics. Still, the networks are considered not to have
community structures, even though measurement can be tricky.13 This
lack of communities is the weakest point of the models.

13
In fact, the configuration model is sometimes used as baseline, and other networks are
said to have community structure if they have more densely connected subgraphs than
the corresponding configuration model with the same degree distribution.
Chapter 4

Geometric Graphs

We have seen models that can create a power-law degree distribution, or


even an arbitrary degree distribution. However, the models so far could
not provide high clustering or community structures. The most classi-
cal example of graphs with strong community structures are grids. The
simplest one is Γs,d := {1, . . . , s}d , where each vertex is adjacent to the 2d
vertices of Euclidean distance 1. (It is exactly 2d neighbours if we use the
torus topology by treating the numbers 1 and s of {1, . . . , s} as neighbours.
Otherwise boundary vertices have fewer neighbours.) The dimension d is
usually considered a constant, most often a small constant like d = 1, 2, 3.
In a gridpit is easy to find “communities” of K vertices such that there are
only O( K) edges out of the community: just pick a ball or cube of volume
K, and declare the grid points inside the box as a comunity.
The grid defined above leads to a bipartite graph. This is awkward
since the most prominent measure for locality is the clustering coefficient,
which is zero for bipartite graphs. (In bipartite graphs, two neighbours
of the same vertex are never adjacent.) To circumvent this, two options
are commonly used: either we use a hexagonal grid instead of a square
grid. Or we use an ordinary square grid Γs,d , but connect any two points in
Manhattan distance1 at most r for some constant r > 1. This is also called
the r-th power of the grid.
Beware that such patches are rather a proof-of-concept that should not
be over-interpreted. The clustering coefficient is one way to measure a much
1
The Manhattan distance between two vertices x and y, also called L1 -distance, is the
graph distance between x and y in the grid graph without random edges. If we don’t use
Pd
torus topology, then it can be computed as i=1 |xi − yi |.

57
more fundamental property of real-world networks, locality. The problem
is that we do not really know what exactly we mean by locality, so we
measure it by auxiliary measures. The clustering coefficient is one of them,
and the hexagonal lattice happens to behave much better with respect to
this measure than the square lattice. However, this does not necessarily
mean that the hexagonal lattice is a better representative of real-world
networks than the square lattice. We should be careful not to overfit to a
single auxiliary measure. In this case, we can easily see this by switching
to other auxiliary measures. For example, real-world networks have also
many K4 as subgraphs, and those exist neither in the square lattice nor
in the hexagonal lattices. Still, we will see some applications where either
type of lattice models locality just fine.
As an alternative, for d  2 it is possible to use Random Geometric
Graphs instead of grids. In this setting, n vertices are randomly placed
in a d-dimensional cube of volume n (with or without torus topology).
Then two vertices are connected if and only if they have distance at most
r, where r is a parameter of the model. If d  2 and r is a sufficiently large
constant, then one can show that the graph has a giant component, and
the remaining components show a stretched exponential tail bound as for
Erdős-Rényi graphs.2

4.1 Weak Ties and the Watts-Strogatz Model


The Watts-Strogatz model was designed to demonstrate that even a mini-
mal change to a rigid geometric model can lead to small worlds.

Definition 4.1. The Watts-Strogatz random graph model starts with


a d-dimensional grid with torus topology for some constant d  1,
i.e., the vertex set is {1, . . . , s}d and the edges wrap around from s to
1 in each coordinate. Then for every pair u, v of different vertices,

2
Stretched exponential means that the fraction of vertices in component of size s is at
ε
most η(s ) for some constants η < 1 and ε > 0. This is still a very fast decaying function
in s, though not quite as fast as an exponential function. Arguably, this type of tail
bound is even more plausible than a proper exponential, since the non-giant components
in real-world networks are small, but not quite as small as an exponential tail would
suggest.
with probability p we add the edge from u to v.

The original construction by Watts and Strogatz [WS98] differed in some


detail. It started with the r-th power of a grid and redirected edges instead
of adding them. This makes the analysis a bit more complicated, but does
not change the result. Also, the original model was for dimension d = 1.
We refer to the additional edges as random edges and to the original edges
as grid edges. For symmetry reasons, it is convenient to also make a coin
flip for grid edges, even though “adding” an already existing grid edge does
not change the graph. So, an edge can be both a random edge and a grid
edge at the same time.
The general idea of the Watts-Strogatz model (and of similar models
that we will see later) is very popular because it matches an important
concept in sociology: weak ties.3 This concept describes edges which exist
without obvious reasons: the endpoints seem rather unrelated, and they are
not part of the same clusters and communities. Sociological experiments
have shown that such weak ties are very important for the connectivity
of social networks, for work careers and for processes like the spread of
behaviour through the networks [Cen10]. The distinction into strong and
weak ties matches nicely with the grid edges and random edges in the
Watts-Strogatz model.
We will be interested in typical distances in the Watts-Strogatz model.
Obviously, for p = 0, without random edges, typical distances are of order
Θ(n1/d ).4 The surprising insight is that even for very small values of p, the
distance decreases dramatically.

Theorem 4.2. Let G be a Watts-Strogatz random graph on n vertices


with parameter p = p(n) satisfying p = ω(1/n2 ) and p = O(1/n).
Then the typical distances in G are of order Θ(log(n2 p)/(np)1/d ).

Proof. We will only prove the upper bound. Partition the grid into n := 0

n2 p/2 cubes of volume U := 2/(np), i.e., the side length of the cubes is
U1/d . We will ignore rounding issues and assume for simplicity that U1/d
3
The paper The strength of weak ties by Mark Granovetter [Gra73] is the most cited
paper in sociology of all times.
4
In the sense that for every ε > 0 there are c1 , c2 > 0 such that two randomly chosen
vertices u, v have distance c1 n  d(u, v)  c2 n with probability at least 1 − ε.
is an integer. By construction, U is the number of vertices in each block,
and by the assumption on p we have U = o(n) and U = Ω(1).
Consider the graph G = (V , E ) where the vertex set is the set of cubes,
0 0 0

and there is an edge between two cubes C1 , C2 2 V if in G at least one 0

random edge was added between a vertex u 2 C1 and a vertex v 2 C2 , see


also Figure 4.1. Note that the edges in G are formed independently of each 0

other, since the random edges in G are placed independently. Thus G is 0

an Erdős-Rényi graph Gn ,p , where n = n/U is the number of vertices in


0 0
0

G . We will next show that p  1.5/n . Note that this is plausible since
0 0 0

the expected number of random edges in G that are added to vertices in a


cube C is  U  (n − 1)  p = 2, and these edges target random cubes. If you
are already convinced by this argument, you can skip the next paragraph.

,÷÷}
¥
blocks form
'
a Grip '

with p ≥ 1.51mi

Figure 4.1: Partitioning into blocks for d = 1. The induced graph between
blocks is a supercritical Erdős-Rényi graph on n vertices. Thus it has 0

typical distance O(log n ). 0

Otherwise, for two blocks C1 , C2 2 V , there are U2 pairs of vertices 0

(v1 , v2 ) with v1 2 C1 and v2 2 C2 . Hence, the probability that no random


edge is added between C1 and C2 is
2 2 2) () 4 2
Pr[C1 C2 62 E ] = (1 − p)U
0
 e−pU = e−4/(pn  1− 2
=1− ,
pn n0

where in step () we have used that pn2 = ω(1). Hence, for large n we have
p  1.5/n and the Erdős-Rényi graph G is supercritical. In particular,
0 0
this means that the typical distances in the giant component of G are 0

0
Θ(log n ).
Now we consider two random vertices v1 , v2 2 V and bound their dis-
tance. For simplicity, let us assume that their corresponding cubes C1 , C2
are in the giant component in G . (Otherwise we can walk along the grid
0

to cubes which are in the giant.) Then we need to use O(log n ) edges in 0

G to connect C1 , C2 via a path π . This is not yet a path in G, since we


0 0

typically enter and leave a cube C on π in two different vertices u1 , u2 .


0

But since all the cubes have side-length O(U1/d ), we can walk from u1 to
u2 along the grid in O(U1/d ) steps. Thus we can walk from v1 to v2 in G
in O(U1/d  log n ) steps.
0

Let us take a moment to appreciate how quickly the distances in The-


orem 4.2 decrease even for small p. For example, if p = n−2+ε then we
only add O(nε ) random edges to the graph, a ridiculously small number
compared to the Θ(n) grid edges. But the typical distance is already sig-
nificantly reduced, from Θ(n1/d ) to Θ(n(1−ε)/d log n). As a second example,
for p = 1/(n log n) we add only O(n/ log n) vertices to the graph, so only
O(1/ log n) = o(1) edges per vertex. So most vertices do not receive an
extra edge. Nevertheless, these few extra random edges are sufficient to
bring down the typical distance to Θ((log n)1+1/d ). In general, we see that
even if we start with a very rigid graph with high distances (for d = 1 we
have the cycle and distances of Θ(n)!), we only need to add a tiny number
of random edges to decrease the distances dramatically.

4.2 Navigatibility and the Kleinberg model


We have seen several network models which are small or even ultra-small
worlds, in particular the Erdős-Rényi, Chung-Lu and Watts-Strogatz model.
However, a serious of famous experiment by psychologist Stanley Milgram5
in the 60s and 70s showed that social networks are not only small worlds,
it is also possible to navigate efficiently with local knowledge in these
The same Stanley Milgram who performed the Milgram experiment in which the
5

majority of test subjects continued to seemingly torture a fellow test person to death,
simply because a scientist in a lab told them that this is how the protocol goes. If you
don’t know about this experiment, you should read about it: https://en.wikipedia.
org/wiki/Milgram_experiment.
networks. In the most famous experiment, Milgram gave a letter to some
person A in the US Midwest that was addressed to some person B at the
US East Coast. However, A was not allowed to send the letter directly to
B. Instead, A was only allowed to send the letter to a personally known
contact A , defined as someone whom A knew on a first-name basis. A was
0

supposed to pick a neighbour A who was more likely to know the target
0

B. Then A continues the process in the same manner, i.e., A sends the
0 0

letter to a personal contact whom she knew on a first-name basis, and so on


until the chain fails or the letter reaches a person who knows B personally.
Thus the participants were only allowed to navigate within the friendship
graph, and they had to make their decision without global knowledge of
the network.
The experiment showed that a portion of letters reached their targets,6
and reached them within just a few steps. Since one of the experiments
found an average of less than 6 intermediate steps, the result of the exper-
iment became widely known as six degrees of separation.
So, it is possible to navigate the friendship network using only local
information. I.e., every node v only knows the location of the target7 and
some basic information about the direct neighbours of v (what you know
about your friends; address, popularity, profession, . . .). This information
suffices to navigate the network efficiently. In abstract terms, we would
assume that a vertex knows the geometric location and possibly some in-
trinsic properties of its neighbours, but nothing else about the network.
This type of efficient navigation is not possible in any of the networks
that we have seen so far. In Erdős-Rényi graphs, none of your neighbours
is better than any other, so by symmetry it is clear that Ω(n) nodes need
to be visited. In Chung-Lu graphs, the shortest path runs through vertices
of very high weight, and it is possible to find those efficiently: simply go
to the neighbour of largest weight. But for the second half of the path, one
would need to navigate from large-degree vertices down to small-degree
vertices, and this can not be done efficiently. Finally, the shortest paths
in the Watts-Strogatz model use that the random edges turn the graph
G of supernodes into a supercriticial Erdős-Rényi graph, which has small
0

diameter. However, the following theorem shows that these shortcuts are
The success rate in the first experiments was only 5-30%, but could be increased to
6

up to 85% in later variations where letters were replaced by phone and email.
7
In the original experiment, the participants were also told the job of the target.
not too helpful for navigation.

Theorem 4.3. Consider the Watts-Strogatz model on n vertices with


dimension d and parameter p, where p = ω(1/n2 ) and p = Ω(1/n).
Then any local navigation algorithm that routes from a random ver-
tex s to a random vertex t takes at least Ω(p−1/(d+1) ) steps in ex-
pectations, which is in particular Ω(n1/(d+1) ). Moreover, the greedy
routing algorithm which always proceeds to the/a neighbour closest
to t needs O(p−1/(d+1) ) steps in expectation.

Proof. For the lower bound, we may assume that s and t have Manhattan
distance at least ∆ := 21 p−1/(d+1) , since this is asymptotically smaller than
the side length of the grid. Consider the event E that during the first ∆
steps, the algorithm does not uncover a random edge whose endpoint has
Manhattan distance at most ∆ from t. First we show that conditional on
E , it is impossible for the algorithm to reach t in ∆ steps. Consider the last

random edge e that the algorithm takes during those ∆ steps. Then after
taking this edge, the algorithm has Manhattan distance more than ∆ from
t. By definition of t, it only uses grid edges afterwards, so it does not reach
t in ∆ steps. The same applies if the algorithm does not take any random
edges at all during the first ∆ steps.
Next we show Pr[E ]  1/2. Note that this will conclude the proof of
the lower bound, since it implies that the expected number of steps is at
least ∆/2, as required. The crucial insight is that since the random edges
are uniformly at random, it does not matter which path the algorithm
takes during the first ∆ steps. By symmetry, the probability of E does not
depend on the set of explored vertices, but only on the number of explored
vertices. So let us compute Pr[E ].
Let us consider the ball around t of radius ∆ with respect to the Man-
hattan distance. It contains less than ∆d vertices. Thus, when we explore a
new vertex v, the probably to find a random edge into this Manhattan ball
is at most p  ∆d  ∆d /n by a union bound. By another union bound over
the first ∆ steps, the probability that this happens in any of those steps is
Pr[E ]  p∆d+1 = 2−d−1  1/2. This concludes the proof.
For the upper bound, we compute the time until we reach Manhattan
distance ∆ from t. We pessimistically assume that we need to wait for a
random edge into that region. Since the Manhattan ball has size Ω(∆d ),
in each step we have probability of Ω(p∆d ) of discovering such an edge.
Hence, the expected time until we find such an edge is O(p−1 ∆−d ) = O(∆).
Afterwards, we need at most ∆ more steps to proceed to the target. Thus
the expected number of steps is O(∆).

To make Theorem 4.3 more concrete, for d = 1 and p = 1/n the lower
p
bound is Ω( n) steps, even though the typical distances are only O(log n).
Random edges are not completely useless. Without them, the typical dis-
tance would be Ω(n). However, navigation is much less efficient than
shortest paths. In general, for any dimension d and any p in the specified
range, the time for local navigation is always polynomial in n, while the
typical distance may be polylogarithmic for some values of p.
In 2000, Jon Kleinberg proposed a model in which routing is possible in
poly-logarithmic time [Kle00]. The main difference to the Watts-Strogatz
model is that edges are no longer placed uniformly at random, but rather
the probability for placing an edge depends on the distance of the two
endpoints.

Definition 4.4. The Kleinberg random graph model starts with a d-


dimensional grid with torus topology for some constant d  1. Then
for every pair u, v of different vertices, with probability
1 1
puv :=  (4.1)
log n d1 (u, v)d

we add the edge from u to v, where d1 (u, v) denotes the Manhattan


distance between u and v.

As for the Watts-Strogatz model, we call the additional edges random


edges, and the original edges grid edges.
To understand the model a bit better, let us first compute how many
edges we add to a fixed vertex v. We fix an exponentially growing set of
radii R = {1, 2, 4, 8, . . . , 2s } and sort the other vertices into annuli Ar (v) for
r 2 R, where Ar (v) := {u 2 V | r  d1 (u, v) < 2r}. Since the diameter of
the grid is dn1/d /2, we need s = blog2 (dn1/d /2)c = Θ(log n) radii.
The annulus Ar (v) contains Θ(rd ) vertices. Moreover, the vertices in
Ar (v) have distance Θ(r) from v. Therefore, the expected number of neigh-
bours of v in Ar (v) is
1 1
E[u 2 Ar (v) | u ∼ v] = |Ar (v)|   = Θ(1/ log n). (4.2)
log n Θ(d1 (u, v)d )
So v has the same expected number of neighbours “in each distance”, namely
1/ log n. (Plus the 2d neighbours from the grid.) Since we have Θ(log n)
different distances r, the total number of random edges incident to v is
Θ(1). In particular, the expected degree is still O(1).
The neighbours of v are not uniformly at random over all vertices, but
they are (approximately) uniform over distances. Every distance range
has the same probability Θ(1/ log n) to provide a neighbour. Moreover, if
we fix an annulus Ar (v), then all vertices in Ar (v) have roughly the same
probability to connect to v, up to a factor of 2d = Θ(1). Rephrased, if v
has a neighbour in Ar (v), then this neighbour is uniformly distributed in
Ar (v) up to constant factors, meaning that for each potential neighbour,
the probability to be the actual neighbour is the same as in the uniform
distribution up to a constant factor. This double form of equidistribution
allows for efficient greedy routing.

Theorem 4.5. Consider the Kleinberg model on n vertices with dimen-


sion d. Consider the greedy routing algorithm with start vertex s and
target vertex t, which always proceeds to the/a neighbour closest to
t. Then for random s and t the the greedy routing algorithm takes
O(log2 n) steps in expectations.

Proof sketch. Assume that the algorithm is at a vertex with Manhattan


distance ∆ from t. We will show that after an expected O(log n) steps, we
find a random edge that reduces the distance from t to 34 ∆. We call such an
edge a shortcut. Repeating the same argument O(log n) time, we reduce
the Manhattan distance to a constant, which yields the runtime O(log2 n).
So we need to show that we quickly find a random edge that reduces
the distance to 34 ∆. We will ignore some technical details, in particular that
the distance to t can change while we search for the shortcut. Moreover, we
assume that ∆ is a power of 2. Let r := ∆/2. Then by (4.2), each explored
vertex v has probability Θ(1/ log n) to have a neighbour u 2 Ar (v). Note
that a constant portion of Ar (v) lies in the Manhattan ball B3∆/4 (t) around
t, as illustrated in Figure 4.2. Since the location of u within Ar (v) is
uniform up to a constant factor 2d , the probability that u lies in B3∆/4 (t) is
Ω(1). Summarizing, each explored vertex v has probability Θ(1/ log n) to
have a neighbour u 2 Ar (v), and this neighbour has constant probability
to be in B3∆/4 (t). Therefore, we need to wait O(log n) in expectation to
find a neighbour in B3∆/4 (t), as required.

É¥÷É¥*¥¥¥ intersection
Ann 13%4

Figure 4.2: The distance between v and t is ∆. The annulus Ar (v) contains
all points in distance [∆/2, ∆) from v, and the ball B3∆/4 (t) contains all
vertices in distance at most 3∆/4 from t. Both have volume Θ(∆d ), and
their intersection has also volume Θ(∆d ). (Balls in Manhattan distance
look like diamonds, but the conclusion would also hold for any other.)

It can be shown that the bound in Theorem 4.5 is tight, so navigation


in the Kleinberg model takes time Θ(log2 n).

4.2.1 Shortcomings of the Kleinberg model


The Kleinberg model gives a nice and simple model for a network that can
be efficiently navigated. Moreover, it can be navigated by the greedy algo-
rithm, which is rather similar to the description of Milgram’s small-world
experiments. However, there are also some shortcomings of the model.
Firstly, as Kleinberg himself pointed out, the model is rather vulnerable
with respect to the exponent d. Any other exponent than d will lead to a
highly imbalanced distribution of neighbours over different distances. An
exponent d + ε leads to a much smaller number of long edges, so that the
algorithm can not cover the long distances at the beginning of routing in
reasonable time. An exponent of d − ε gives too many long edges; in this
case the normalization factor 1/ log n in (4.1) needs to be replaced by a
polynomial factor, or the degrees would become polynomial in n. But then
the number of short random edges is too small, so that greedy routing
does not accelerate the second part of the routing process, as in the Watts-
Strogatz model. For either d + ε and d − ε (and every other choice except
d), we end up with a polynomial runtime for greedy routing. Thus the
model is rather brittle with respect to the scaling in (4.1).
Secondly, the model relies quite heavily on the underlying grid struc-
ture. With a grid, it can never happen that the greedy algorithm gets
stuck. If we replace the grid with a Random Geometric Graph, or if some
grid edges are missing, then it may happen that the algorithm enters a
vertex v that has no neighbour which is closer to the target than v. If v
still forwards the message to the best neighbour, then this could result in
v sending the message back to its predecessor, and the algorithm might
enter an infinite loop. It can be seen that such problems occur in each step
with constant probability, so the probability that they happen in at least
one of the Θ(log2 n) steps is very large. So once we replace the grid with a
Random Geometric Graph, the success rate of the algorithm becomes o(1).
(In fact, it goes to zero very fast, polynomially in n.) On the other hand, it
could be argued that this problem might not show in real social networks
due to the large average degree. Either way, the theoretical model and
analysis relies rather heavily on the perfect grid structure.
Thirdly, the while the runtime O(log2 n) is poly-logarithmic, it still
seems rather large compared to the very small paths found in real net-
works. The famous six degrees of separations correspond to a path length
of seven, and other experiments found even shorter paths. One possible
explanation is that the degrees in real social networks are much larger than
the degrees in Kleinberg’s model, so that the leading constants are rather
small. Alternatively, in the final part of the lecture we will see a model in
which greedy routing succeeds in even shorter time.
Finally, there is a non-trivial asymptotic stretch, which is defined as
the ratio between the path that greedy routing finds, and the graph-
theoretic shortest path. The reason is that greedy routing only finds short-
cuts by accidentally discovering them. We have argued that it needs to
explore Θ(log n) vertices to find a shortcut. However, for constructing
shortest paths, with global knowledge of the network, we can simply go
to the nearest shortcut. The nearest shortcut is typically in Manhattan
distance Θ((log n)1/d ), because there are Θ(log n) vertices in this Man-
hattan distance, and each has probability Ω(1/ log n) to provide a short-
cut. So, instead of blindly exploring Θ(log n) vertices, global knowledge
enables us to immediately go to a shortcut within Θ(n1/d ) steps. We
still need to repeat this O(log n) times, so shortest paths are of length
O((log n)(1+1/d ). Thus the stretch is at least Ω((log n)2 /(log n)1+1/d ) =
Ω((log n)1−1/d ). Rephrased, this means that if the shortest path has length
ℓ = O((log n)1+1/d ), then greedy routing needs Ω(ℓ2d/(d+1) ) steps. For ex-
ample, for d = 2 greedy routing needs ℓ4/3 steps, where ℓ is the length of
a shortest path. The stretch is hard to measure for real-world networks
since we can not take a limit n → ∞, and constants are hard to distinguish
from logarithmic factors or even from nε . Still, it seems that the Kleinberg
model has a rather large stretch, especially when phrased in terms of the
distance ℓ.
Chapter 5

Geometric Inhomogeneous Random


Graphs (GIRGs)

5.1 The GIRG model: basic properties


In this chapter we will introduce a model which combines the Chung-Lu
model with geometry. This allows us to get the best aspects from both
models. However, as we will see, we also actually get additional features
from the combination that are explicitly build into the model.

Definition 5.1. Let α > 1, d 2 N and let D be a power-law distribution


on [1, ∞) with exponent τ 2 (2, 3). Let X be a d-dimensional cube
of volume n 2 N with torus topology. A Geometric Inhomogeneous
Random Graph (GIRG) G = (V, E) on n vertices is obtained by the
following three-step procedure.

(a) Every vertex v draws independently a weight wv from distribution


D.

(b) Every vertex v draws indepdendently a uniformly random posi-


tion xv 2 X .

(c) Every two different vertices u, v 2 V are independently connected


by an edge with probability
α
wu wv
puv := min 1, d
. (5.1)
kxu − xv k∞

69
The idea behind the model is that the geometric position captures prop-
erties and categories of the nodes. In social networks, this might be pro-
fession, place of living, or hobbies. The weight captures the popularity of
a node. The connection probability increases with the popularity of the
nodes, and is larger for nodes which are geometrically close to each other.
There are some choices in (5.1) which do not immediately have an ob-
vious reason. One is the appearance of the exponent α. Another is why
the distance k.k should have exactly exponent d. We will return to both
of these question later in more detail (Sections 5.1.1 and 5.5). We will also
see that we could have used any other norm. The only reason to use the
∞-norm is that formulas look a little bit nicer, since any two points in x
have distance at most n1/d . In the following, we will omit the index ∞
from the norm and simply write kxu − xv k.
But first we will show that formula (5.1) yields the same marginal prob-
abilities as the Chung-Lu model.

Lemma 5.2. Let u, v 2 V for a GIRG. Assume we have drawn wu , wv


and xu 2 X , but that xv is still random. Then the probability that u
and v are connected is
 
wu wv
Pr[u ∼ v | wu , wv , xu ] = Θ min 1, .
n
This probability is also known as the marginal connection probabil-
ity of u and v.

Proof. First note that if wu wv  n, then wu wv  kxu − xv kd for all xv 2 X


since the diameter of X is n1/d . Hence, no matter which xv we draw, u
and v are always connected, and Pr[u ∼ v | wu , wv , xu ] = 1 as required. So,
in the following we may assume wu wv  n and the minimum on the right
hand side is taken by the second term.
The probability density of the event “ kxu −xv k = r” is Pr[kxu −xv k = r] =
Θ(rd−1 ), since the ball of radius r around xu has surface area Θ(rd−1 ). (This
is also true for the ∞-norm.1 ) Therefore, we can compute the marginal
1
We are slightly imprecise when r approaches the diameter n1/d of X , since there are
wraparound effects. We will ignore this complication here.
probability as
Z n1/d
Pr[u ∼ v | wu , wv , xu ] = Pr[kxu − xv k = r]  Pr[u ∼ v | wu , wv , r]dr
0
Z n1/d
wu wv α
= Θ(rd−1 )  min 1, dr
0 rd
Z (wu wv )1/d
wu wv α
= Θ(rd−1 )  min 1, dr
0 rd
Z n1/d
wu wv α
+ Θ(rd−1 )  min 1, dr. (5.2)
(wu wv )1/d rd

Note that the splitting point (wu wv )1/d lies in the integration range [0, n1/d ]
because wu wv  n. In the first integral, the minimum is taken by 1, in the
second integral the minimum is taken by wu wv /rd . Hence, (5.2) simplifies
to
Z (wu wv )1/d Z n1/d
d−1
Θ(r )  1dr + Θ(rd−1−dα )  (wu wv )α dr.
0 (wu wv )1/d

The integration variable is r, and it has exponent d − 1 > −1 in the first


integral and exponent d(1 − α) − 1 < −1 in the second integral. By Ex-
cursion 3.1, this means that we only need to evaluate the first integral at
the upper boundary and the second integral at the lower boundary. More-
over, from the excursion we also know a priori that both evaluations will
give the same term up to constant factors.2 Since the first integral looks
simpler, we only evaluate that and obtain
1/d
Pr[u ∼ v | wu , wv , xu ] = Θ([rd ](wu wv ) ) = Θ(wu wv ),

as required.

Lemma 5.2 has important consequences. It means that when we fix


all information about a vertex u (its weight and position), then all other
vertices v have the same probability of connecting to u as in the Chung-Lu
model, up to constant factors. And of course, once the weight wu is fixed,
any two different vertices v1 and v2 have independent chances of connecting
to u. In particular, this means that the degree of u in the GIRG model
follows exactly the same distribution as in the Chung-Lu model (all up to
2
If you don’t believe it, compute both and see the magic happen.
constant factors, which we ignore for now), namely a Binomial distribution
with expectation Θ(wu ). In the limit for n → ∞, this converges to a
Poisson distribution.
Moreover, the weight distribution of a random neighbour of u is also the
same as in the Chung-Lu model, so it follows a power-law with exponent
τ − 1. Since this is independent of wu , the GIRG model has, as the Chung-
Lu model, neutral assortativity with respect to vertex weights. However,
this does not directly translate into neutral assortativity with respect to
degrees, especially for low-degree vertices. Consider two neighbours u and v
of (constant) weights wu and wv . This yields expected degrees Eu = Θ(wu )
and Ev = Θ(wv ), respectively. If someone tells us deg(u), and this happens
to be untypically large compared to Eu , what can we infer about deg(v)?
Of course, nothing is certain. But one potential reason for the large degree
of wu is that it has many strong ties, i.e., that an untypical number of
vertices have positions close to xu . In this case, if xv is also close to xu
then xv also has untypically many vertices nearby, and thus xv might also
have an untypically high degree. There are many if’s in this reasoning,
but we will see later that indeed most neighbours are geometrically close
to each other, so that this does give a notable positive correlation between
deg(u) and deg(v), at least for small degrees. So while GIRGs have neutral
assortativity with respect to weights, they have positive assortativity with
respect to degrees.
Of course, for large weights the degree is highly concentrated around
its expectation, because a Poisson distribution with large expectation is
concentrated. Hence, the heaviest neighbour of u has the same typical
weight w1/(τ−2)
u
 o(1)
as in the Chung-Lu model. The discussed properties
are so important that we collect them in a corollary.

Corollary 5.3. In a GIRG G with parameters α > 1, d 2 N and τ 2

(2, 3), let u be a vertex of weight wu .

(a) The degree of u follows a Binomial distribution with E[deg(u)] =


Θ(wu ).

(b) The degree distribution in the neighbourhood of u follows a power


law with exponent τ − 1.
(c) For every ε > 0, if wu is sufficiently large then the heaviest neigh-
bour v of u has weight wv 2 [w1/(τ−2)−ε , w1/(τ−2)+ε ], with the same
error probabilities as in Theorem 3.8.

(d) There are nΩ(1) vertices of weight at least n1/2 , which form a
single clique. We call this set of vertices the inner core.

(e) With high probability, G has a giant component with typical


distance | log(τ−2)|
2 o(1)

log log n.

Proof. We have already argued that (a)-(c) are immediate consequences of


Lemma 5.2. For the number of vertices of weight at least w = n1/2 , this has
nothing to do with connection probability, so the number of such vertices
is Θ(w1/(τ−1) ) = nΩ(1) just as in every power law with exponent τ 2 (2, 3).
It follows immediately from (5.1) that any two such vertices are connected
since the maximal distance in X is at most n1/d .
For (e), we just need to recall how we obtained a short path from u to v
in a Chung-Lu random graph: starting in u, we greedily go to the heaviest
vertex in the neighbourhood, until we reach the inner core. This has con-
stant success probability, which already shows that a constant fraction of
the vertices is connected to the inner core.3 Then we do the same from v.
By (c) the lengths of these greedy paths are the same in Chung-Lu graphs
and in GIRGs. Finally we use that any two vertices in the inner core are
connected by (d). We have not discussed how to obtain the lower bounds
for typical distances in Chung-Lu graphs, but these arguments also make
only use of the marginal probabilities (they compute the expected number
of shorter paths), so they also transfer to GIRGs.

5.1.1 Variations and extensions


In this section we will discuss the formula (5.1) for the connection probabil-
ity in GIRGs. We will also mention some variations of the GIRG model, and
show that the definition of GIRGs is rather robust against small changes.
3
Strictly speaking, it shows that the number of such vertices is Ω(n) in expectation,
and an (easy) extra argument is required to obtain the whp statement.
Constant factors

In the connection probability (5.1) we could allow any Θ(1)-factors. Con-


sequently, it does not matter which norm we take in (5.1), since all norms
on Rd only differ by constant factors. However, this also includes models
where puv never exceeds some constant c < 1 even if wu wv /kxu − xv kd > 1.
This allows for less rigid models. One of the main differences is that then
the inner core does not form a single huge clique, but rather a dense Erdős-
Rényi graph with constant connection probability. Such a graph does not
have a large clique – the largest clique has size O(log n). But it is still
extremely well-connected. For example, any two vertices have a common
neighbour, i.e., the diameter of the inner core is 2. So the typical distances
remain the same.
Let us make a short historic excursion. The first version of a GIRG
were Hyperbolic Random Graphs [KPK+ 10]. Those had a very different
description: place n points uniformly at random in a two-dimensional disc
of radius R = 2 log n + O(1) in hyperbolic space, and connect any two
points of distance at most 1. This is a very simple description if one
knows what the hyperbolic plane is. Analyzing these graphs is also not so
simple because one needs to understand hyperbolic distances. However, it
turns out that Hyperbolic Random Graphs are just a special case of the
GIRG model, where one of the two hyperbolic dimensions corresponds to
the weight, and the other hyperbolic dimension corresponds to a Euclidean
circle. Thus (two-dimensional) Hyperbolic Random Graphs correspond to
one-dimensional GIRGs. The only difference is that the former have a very
complicated Θ(1)-factor in the connection probability that comes from the
hyperbolic geometry. In fact, GIRGs were developed as simplification and
generalization of Hyperbolic Random Graphs.

The role of the exponent d

The GIRG model is less fragile than the Kleinberg model with respect to
the exponent d in the term kxu − xv kd in (5.1). If instead we choose any
different exponent d > d/α, then the resulting graph still has a power-law
0

degree distribution. If additionally d < 2d/(τ − 1), then it is still an ultra-


0

small world. Thus we do not rely on d being one specific value. There is a
0

whole range [d/α, 2d/(τ − 1)] of possible exponents, and one easily checks
that this interval is non-empty. However, the case d = d has a convenient
0
parametrization: for other values of d we do not have E[deg(v)] = Θ(wv ),
0

but instead we have E[deg(v)] = Θ(wd/d ). Since weights and degrees fall
0

apart, the power-law exponent τ of the degree distribution no longer co-


0

incides with the power-law exponent τ of the weight distribution. Instead


we have τ = dd (τ − 1) + 1. So other values of d would give us models
0
0 0

which are less convenient to work with.

The role of α

The terms weak ties and strong ties also make sense for GIRG. However,
other than for the Watts-Strogatz model, in GIRGs there is a continuous
spectrum between strong and weak ties. For an edge uv, if wu wv /kxu −
xv kd  1 then the edge is a strong tie, and if wu wv /kxu − xv kd is “much”
smaller than one, then the edge is a weak tie. It is a matter of taste where to
draw the line. In order to have a clear distinction, we define an edge uv to be
a strong tie if and only if wu wv /kxu − xv kd  1. Informally, a more natural
convention might be that uv is a strong tie if wu wv /kxu − xv kd = Ω(1), and
that it is a small tie if wu wv /kxu − xv kd = o(1). However, this would not
give us a clear distinction between strong and weak ties for a fixed edge in
a fixed graph for some concrete value of n, which is why we do not use this
convention.
The exponent α ensures that “most” edges are strong ties.4 For illustra-
tion, let us focus on vertices of constant weight. Recall that those form the
majority of vertices. Fix a vertex v of weight wv = O(1). We want to study
the number of neighbours of weight O(1) of v, and we want to understand
how this is affected by the exponent α. As in the analysis of the Kleinberg
model, Theorem 4.5, we want to understand the number Nr of neighbours
of constant weight in distance [r, 2r] from v. There are Θ(rd ) vertices in
this distance range. If we would omit the exponent α, then the connection
probability would be wu wv /rd = Θ(r−d ), so E[Nr ] would be Θ(1). Thus
we would have exactly the same situation as in the Kleinberg model, and a
vertex would have the same number of neighbours (of constant weight) in
every distance range. In total, this would lead to a degree of Θ(log n). (Or
we could put a factor 1/ log n in front of the probability as in the Kleinberg
4
This is to be taken with a grain of salt. With our strict definition of strong ties, still
a Θ(1)-fraction of all edges are weak ties, and it could even be more than half. With the
informal alternative, it would be a o(1)-fraction.
model, to get constant degrees.) However, with the exponent α we obtain
E[Nr ] = Θ(rd  (r−d )α ). Thus, for α > 1 the number of neighbours per
distance range decreases with r, and most neighbours are close to v. In
fact, there is nothing special about the function xα that we applied here.
It can be shown that we could take any non-negative increasing function
R∞
f(x) with f(1) = 1 and x=1 f(1/x)dx < ∞, and define
!
w u wv
puv := min 1, f d
.
k xu − xv k

The resulting graph model would work just as well as the GIRG model.
The function f determines how quickly the number of weak ties decays with
increasing distance.
An important special case is known as the threshold GIRG model or
as α = ∞. In this case, we set f(x) := 0 for all 0  x < 1, and f(x) := 1 for
x  1. So, we connect two vertices if and only if xwu −x
u wv
k
v
d  1. This is an
k

important extreme case because it is a model which behaves as GIRGs in


many aspects, but without any weak ties.

Poisson point processes and grids

An interesting property of the GIRG model is self-similarity. Pick any


cube X of radius R, and let V be the set of vertices in X . Then the induced
0 0 0

subgraph G := G[V ] is itself again almost a GIRG with |V | vertices. The


0 0 0

vertices v 2 V which draw a location in X land in a uniform location within


0

X , and the distribution of weights and connection probabilities remain the


0

same. The only minor difference to a GIRG is that there we place exactly
n = Vol(X ) points in X . For G , every vertex has chance Vol(X )/n to
0 0

land in X , so E[|V |] = n := Vol(X ). So the expected number of vertices


0 0 0 0

in G is n , but in a GIRG we would place exactly n vertices. However,


0 0 0

|V | is binomially distributed Bin(n, n /n) and thus highly concentrated


0 0

when its expectation n is large, so the difference to a GIRG is small.


0

There is a way to remove even this small difference: if in the definition


of GIRG we place Po(n) vertices instead of exactly n vertices, then the
model becomes perfectly self-similar, and the induced subgraph G is again 0

a GIRG. The method of placing Po(n) points in a space of volume n is also


known as Poisson point process and has a nice mathematical consequence:
for two disjoint regions A, B  X , the number of vertices in A and B is
independent of each other.
Yet another possibility is to use a grid instead of placing the points ran-
domly. This does not yield a perfectly self-similar model (though it is still
approximatively self-similar). But it gives a connected graph, since all grid
edges are present. This is sometimes mathematically convenient, though
one should be aware that such prefectly regular structures are usually not
present in real-world networks.

5.2 Neighbours and Communities


In this section we will study the community structure of GIRGs. As we will
see, despite the high-degree vertices and the small-world property, GIRGs
are strongly governed by geometry.

5.2.1 Most neighbours are close


In this section we will study how far away the neighbours of a vertex v are
from v. To this end, let us define the ball of influence I(v) of v as the
ball around xv of radius rI (v) := w1/d v , with respect to k.k∞ . The ball of
influence has volume Θ(wv ). Every other vertex u in I(v) has connection
probability puv  min{1, wv wu /rI (v)d } = min{1, wu } = 1. Hence, v connects
to all vertices in its ball of influence.
Recall that in the GIRG model, we throw n vertices into a box of
volume n. Therefore, for any region R of volume x, the expected number
of vertices that land in R is exactly n  x/n = x. For R = I(v), this means
that in expectation Θ(wv ) vertices land in I(v). Hence, v has in expectation
Θ(wv ) neighbours in its ball of influence. On the other hand, we know that
the total number of neighbours of v is also Θ(wv ). Therefore, at least a
constant fraction of the neighbours of v are in I(v). Note that all of these
are strong neighbours of v.
The next lemma quantifies how many weak and strong neighbours v has
outside of the ball of influence. Recall our definition:
wu wv
u is a strong neighbour of v ⇐⇒ d
 1. (5.3)
kxu − xv k
Lemma 5.4. In the GIRG model, consider a vertex v of weight wv  1
and position xv 2 X , and let rI (v)  r  n1/d . Let Nstrong
[r,2r] and N[r,2r]
weak

be respectively the number of strong and weak neighbours u of v with


distance kxu − xv k 2 [r, 2r], and let analogously Nstrong
r 
and Nweak
r for


neighbours in distance kxu − xv k  r. Then

(a) E[Nstrong strong


[r,2r] ] = Θ(E[N r 
]) = Θ(rd  (rd /wv )1−τ ).

(b) Assume that α 6= τ − 1, and let µ := min{α, τ − 1}. Then


E[Nweak weak d d −µ
[r,2r] ] = Θ(E[N r ]) = Θ(r  (r /wv ) ).


In particular, if α > τ − 1 then

E[Nstrong strong
[r,2r] ] = Θ(E[N r 
]) = Θ(E[Nweak weak
[r,2r] ]) = Θ(E[N r ]). 

Proof. We first consider the interval [r, 2r]. The number of vertices with
distance in [r, 2r] from v is Θ(rd ). We will only give the calculation under
the simplifying assumption that all those vertices have distance exactly r
from v. Then a vertex u in distance r yields a strong tie with v if and only
if it has weight wu  rd /wv . Moreover, we have the probability density
Pr[wu = w] = Θ(w−τ ), and hence
Z∞
strong d
E[N[r,2r] ] = Θ(r )  w−τ dw = Θ(rd  (rd /wv )1−τ ) = Θ(rd(2−τ) wτ−1
v ).
rd /wv
(5.4)
On the other hand, a vertex u in distance r forms a weak tie with v if
and only if i) it has weight wu < rd /wv , and ii) it forms an edge with v.
The probability density that it has weight w is Θ(w−τ ), and therefore
Z rd /wv  
wwv α
weak d
E[N[r,2r] ] = Θ(r )  w−τ  min 1, d dw (5.5)
1 r
Z rd /wv  
d −τ wwv α
= Θ(r )  w  dw
1 rd
Z rd /wv
d−dα α
= Θ(r wv ) wα−τ dw.
1
If α − τ < −1, then we have to evaluate the integral at the lower boundary.
Moreover, µ = α in this case, and hence
E[Nweak
[r,2r] ] = Θ(r
d−dα α
wv ) = Θ(rd  (rd /wv )−µ ).
If instead α − τ > −1, then we have to evaluate the integral at the upper
boundary. Since this is cumbersome, we cleverly observe that the inte-
gral (5.5) is in fact the same integral that we evaluated in (5.4), only with
a different integration range. Since we evaluate the integral both times at
the same value (the lower boundary in (5.4), the upper boundary in (5.5)),
by Excursion 3.1 they must give the same value up to constant factors.
Hence,
strong
E[Nweak d d
[r,2r] ] = Θ(E[N[r,2r] ]) = Θ(r  (r wv )
1−τ
).

Since µ = τ − 1, this proves the claim for the interval [r, 2r].
For the distances  r, since τ − 1 > 1 and µ > 1, we observe that
E[Nstrong
[r,2r] ] and E[N[r,2r] ] are decreasing in r. If we start at r := r and sum
weak 0

over all values of E[Nstrong


[r ,2r ] ] or E[N[r ,2r ] ] for r := r, 2r, 4r, 8r, . . ., then the
0 0
weak
0 0
0

summands form a geometric series, which is dominated by the first term.


This proves the statements about “  r”.

Let us take a moment to understand Lemma 5.4. Firstly, E[Nstrong r ] 

and E[Nweak
 r ] are decreasing in r. Hence, the further away we go from v,
the fewer neighbours we find. So, “most” edge are in or close to the ball
of influence. The term “most” is slightly imprecise because there is a soft
transition as we increase the distance from v. Moreover, within distance
Θ(r), strong neighbours need to have weight at least Ω(rd /wv ) by definition
of strong ties. In the proof, we evaluated the integral in (5.4) at the lower
boundary. This means that “most” of the strong neighbours in distance r
have the minimal possible weight Θ(rd /wv ).
For weak ties, there are two different regimes: for α > τ − 1 there
are few weak ties, and most weak neighbours in distance r have so large
weight that they almost qualify as strong ties, i.e., their weight is only a
constant factor below the threshold rd /wv . In this case, the majority of
strong ties and weak ties look rather similar to each other. For α < τ − 1
there are many weak ties. In particular, there are asymptotically more weak
neighbours than strong neighbours in distance r as r → ∞, and “most” of
the weak neighbours in distance r have weight Θ(1). Rephrased, in the
case α > τ − 1, a random neighbour in distance r  rI (v) typically has
large weight Θ(rd /wv ), while it typically has small weight Θ(1) in the case
α < τ − 1. Note that we can only observe this distinction outside of the
ball of influence, so only for radii r > rI (v). Since v connects to all vertices
inside the ball of influence, picking a random neighbour is the same as
picking a random vertex inside the ball, which will likely have weight Θ(1),
regardless of the values of α and τ.
As an interesting corollary of Lemma 5.4, we obtain that GIRGs have
a large clustering coefficient.

Corollary 5.5. Let G be a GIRG. Then the clustering coefficient of G


satisfies CC[G] = Ω(1) in expectation and with high probability.

Proof. We only sketch the argument. Let v be a random vertex. With


probability Ω(1), the vertex has constant weight, say for concreteness wv 2
[1, 2]. Assume that deg(v)  2, and pick two random neighbours u1 , u2 of
v. We need to show that u1 and u2 have probability Ω(1) to be adjacent.
Since wv = O(1), the ball of influence of v has radius rI (v) = O(1).
One consequence of Lemma 5.4 is that a constant fraction of the neigh-
bourhood of v lies in its ball of influence.5 Hence, with probability Ω(1),
both u1 and u2 lie in the ball of influence, and thus have distance at most
rI (v) from v. By the triangle inequality, then u1 and u2 have distance
at most 2rI (v) from each other. In this case, their connection proba-
bility is puv  wu1 wu2 /(2rI (v))d = Ω(1). So, with constant probabil-
ity, the vertices u1 and u2 are “close by”, i.e., they have locations such
that puv = Ω(1). This means that overall Pr[u1 ∼ u2 ] = Ω(1). Hence,
E[CC(v)] = Ω(1) for vertices v of constant weight and degree at least 2,
which implies E[CC(G)] = Ω(1). The whp statement can be obtained by
standard concentration inequalities.

5.2.2 Boundaries and communities


Equipped with Lemma 5.4, we will now prove that there are strong com-
munities in GIRGs. Pick any cube X of radius R, and let V be the set
0 0

of vertices in X . Then |V | has expectation n := Vol(X ) = Θ(Rd ). Since


0 0 0 0

the induced subgraph G is again (almost) a GIRG, G also inherits the


0 0

properties that we know about GIRGs. For example, G has a giant com-
0

ponent of size Θ(n ). Also, the number of edges within G is Θ(n ). The
0 0 0

The formal statement is a bit more technical, since Lemma 5.4 only makes a statement
5

about the expected number of neighbours in distance  r.


next theorem shows that this is much larger than the number of edges in
G that connect V to the rest of the graph. Thus V forms a community.
0 0

Theorem 5.6. Let G be a GIRG. Let X  X be a cube of radius


0

1  R  n1/d /4, and let V be the set of vertices in X . Let E(V , V \V)
0 0 0

be the set of edges from V to V \V . Let ν := max{1−1/d, 3−β, 2−α}.


0 0

Then

E[E(V , V \ V)] = Θ̃((Rd )ν ),


0

where the notation Θ̃(.) hides polylogarithmic factor in R, i.e., E[|E(V , V\ 0

V)|] = O((Rd )ν (log R)c1 ) and E[|E(V , V \ V)|] = Ω((Rd )ν (log R)c2 ) for
0

two constants c1 , c2 2 R.

Proof. The proof involves the most complex calculation in this course, and
we will ignore some borderline cases. We will go over the vertices v 2 V 0

and compute how many neighbours in V \ V they have in expectation. To


0

do this, we let ∂X be the boundary of X , and we write d(x, ∂X ) for the


0 0 0

distance of x from ∂X . Then we will use a double integral of the following


0

form.
Z Z
Pr[9v : d(xv , ∂X ) = r] Pr[wv = w]  E[#{nbs of v in V \ V } | r, w]dwdr.
0 0

In the outer integral, we integrate over the possible distances r that v may
have from ∂X . In principle, this distance may be anything between 0
0

and R. However, we will ignore distances r 2 [0, 1]. Vertices in distance


r 2 [0, 1] have a constant fraction of their neighbours in V \ V . But the 0

same is true for vertices in distance r 2 [1, 2], and there are about as many
vertices with distance r 2 [1, 2] as vertices with r 2 [0, 1], up to a constant
factor. Thus, we will lose at most a constant factor by omitting r 2 [0, 1].
On the other side, we will also ignore distances r  R/2 from ∂X , i.e., we 0

ignore the central subcube of radius R/2 of X . This part is negligible: it


0

contributes only a constant factor to the total volume, and it is not hard
to see that vertices in the center have less expected neighbours in V \ V 0

than vertices which are closer to the boundary. So, we will only consider
vertices in distance r 2 [1, R/2] from ∂X . 0

For 1  r  R/2, the probability density of having a vertex at distance


exactly r from ∂X corresponds to the surface area of a ball of radius R − r,
0
which is Θ((R − r)d−1 ) = Θ(Rd ). Conveniently, this is independent of r.
For the inner of the two integrals, we may now assume that we have a
vertex v with d(xv , ∂X ) = r. Then we integrate over the possible weights
0

wv = w that v may have, and count how many neighbours v has for these
values of r and w.
We will compute the integral in two steps. In the first step, we will
only consider vertices v for which the ball of influence I(v) has non-empty
intersection with X \ X . This is the case if and only if rI (v)  r, or
0

equivalently w  rd . Hence, we integrate w over the range [rd , R/2]. By a


similar argument as before, we may leave out weights in the range [rd , 2rd ],
because the range [2rd , 4rd ] contributes the same amount. For weights w 
2rd we have rI (v)  cr for a constant factor c = 21/d > 1. Hence, a constant
portion of I(v) lies in X \ X , so the intersection I(v) \ (X \ X ) has volume
0 0

Θ(Vol(I(v))) = Θ(w). Since every vertex in the intersection is a neighbour


of v, we have E[#{nbs of v in V \ V } | r, w]  Vol(I(v) \ (X \ X )) = Θ(w)
0 0

in this case.
So, we can finally compute the contribution of vertices v for which I(v)
intersects X \ X as
0

Z R/2 Z∞
d−1
I1 := Θ(1) R w−τ  w dwdr
1 2r d
Z R/2
= Θ(Rd−1 ) [w2−τ ]2rd dr (5.6)
1
Z R/2
= Θ(Rd−1 ) rd(2−τ) dr.
1

Now we need to distinguish two cases. Let us first assume that d(2 − τ) 6=
−1, so that we need to evaluate the function [rd(2−τ)+1 ]. For d(2 − τ) > −1
we need to evaluate the upper boundary, and for d(2 − τ) < −1 the lower
boundary. For d(2 − τ) = −1, the inner integral simply gives log(R/2),
which we may swallow by a Θ̃(.) notation. Hence,

Θ(Rd−1 )  Rd(2−τ)+1 = Θ(Rd(3−τ) ) , if d(2 − τ) > −1,
I1 = (5.7)
Θ̃(Rd−1 ) , if d(2 − τ)  −1.

The condition d(2 − τ) > −1 is equivalent to d(3 − τ) > d − 1, so we can


summarize (5.7) as follows.

I1 = Θ̃(Rmax{d−1,d(3−τ)} ) = Θ̃((Rd )max{1−1/d,3−τ} ). (5.8)


The second step is to compute the contribution of vertices v for which
I(v) is disjoint from X \ X , so assume from now on that v is such a vertex.
0

This is equivalent to the condition wv < rd . We split this in yet two sub-
cases: strong ties and weak ties. Note that a vertex can have strong ties
outside of its ball of influence, if the neighbour has large weight. For this
case, we will just show an upper bound, and find that this contribution
is negligible. We will use the same integration method as before. Con-
sider a vertex v of weight wv = w in distance r from the boundary. In
order to form a strong tie with a vertex in X \ X , the weight of the neigh- 0

bour must be at least w := rd /w. We can easily compute the expected


0

number of neighbours of weight at least w of v: the total number of neigh-


0

bours of w is Θ(w), and the degree distribution in its neighbourhood is a


power-law with exponent τ − 1. Therefore, the expected number of neigh-
bours of weight at least w is Θ(w  (w )2−τ ) = Θ(w  (rd /w)2−τ ). Hence,
0 0

E[#{strong nbs of v in V \ V } | r, w] = O(w  (rd /w)2−τ ). The contribution


0

of strong neighbours is thus upper bounded by


Z R/2 Z rd
strong d−1
I2 := Θ(1) R w−τ  w  (rd /w)2−τ dwdr
1 1
Z R/2 Z rd
= Θ(R d−1
) r d(2−τ)
w−1 dwdr (5.9)
1 1
Z R/2
= Θ̃(Rd−1 ) rd(2−τ) dr.
1

This is the same integral that we have already evaluated in (5.6). Therefore,
Istrong
2 = O(I1 ), and we may ignore this term.
Finally, we come to the last integral Iweak
2 . This covers weak ties of
vertices v for which I(v) is disjoint from X \ X . So let v be such a vertex.
0

Then in particular wv < rd . If α > τ − 1, then by Lemma 5.4 the number of


weak and strong neighbours of v in any distance range is asymptotically the
same. Hence, in this case we have Iweak
2 = Θ(Istrong
2 ), and we can also ignore
this term. Thus we may restrict to the case α < τ − 1.6 By Lemma 5.4,
for a vertex v in distance r from the boundary, the number of all weak
neighbours in distance at least r is E[Nweak
r ] = Θ(r  (r /wv ) ). Again, it
d d

−α

is not hard to argue that a constant portion of those weak neighbours are in
X \ X : the asymptotics does not change if we consider weak neighbours in
0

6
We omit the case α = τ − 1. This case gives another log R-factor, but is otherwise
identical to the other cases.
distance at least 2r, and those have a constant probability to be in X \ X . 0

Hence, we have E[#{weak nbs of v in V \ V } | r, wv ] = Θ(rd  (rd /wv )−α ).


0

Thus the last integral is


Z R/2 Z rd
Iweak
2 := Θ(1) R d−1
w−τ  rd  (rd /w)−α dwdr
1
Z R/2
1
Z rd (5.10)
d−1 d(1−α) α−τ
= Θ(R ) r w dwdr
1 1

Since we are in the case α < τ − 1, the inner integral is Θ(1), and the outer
integral has exponent d(1 − α). Similar as in (5.7), we need to make a case
distinction, depending on whether d(1 − α) > −1 or not. Note that we can
also write this condition as d(2 − α) > d − 1. Thus we get

Θ(Rd−1 )  Rd(1−α)+1 = Θ(Rd(2−α) ) , if d(2 − α) > d − 1,
Iweak
2 =
Θ̃(Rd−1 ) , if d(2 − α)  d − 1 (5.11)
= Θ̃(Rmax{d−1,d(2−α)} ) = Θ̃((Rd )max{1−1/d,2−α} ).

Now we just need to collect the results. For α  τ − 1, we may ignore


the terms Istrong
2 and Iweak
2 , since they are asymptotically dominated by I1 .
Thus we obtain

E[|E(V , V \ V )|] = Θ̃(I1 ) = Θ̃((Rd )max{1−1/d,3−τ} ).


0 0

Since 3 − τ  2 − α, this term agrees with Θ̃((Rd )ν ). If α < τ − 1, then we


may still ignore Istrong
2 , but we need to sum I1 + Iweak
2 , and obtain

E[|E(V , V \ V )|] = Θ̃(I1 + Iweak


0 0
2 ) = Θ̃((Rd )max{1−1/d,3−τ} + (Rd )max{1−1/d,2−α} )
= Θ̃((Rd )max{1−1/d,3−τ,2−α} ),

as required.

Theorem 5.6 says that the number of edges going out of community
V is Θ̃((n )ν ). Note that ν < 1, so the number of edges is indeed much
0 0

smaller (for large n ) than the number of edges inside of V , which is Θ(n ).
0 0 0

The community structure is very rich: we find communities of all sizes,


and communities can be heavily overlapping. This reflects the complex
community structures that we find in real social networks.
Separators and algorithmic implications

An important special case of Theorem 5.6 is for R = Ω(n1/d ). In this


case, we split V into two parts of linear size, with only Θ̃(nν ) = o(n)
edges between them. Moreover, we also split the giant component into
two parts of linear size. Rephrased, we find a sublinear separator of the
giant component. Recall that these do not exist for Erdős-Rényi graphs:
for every partitioning of the giant component of an Erdős-Rényi graph into
two linear parts, there are Ω(n) edges between those parts.
We have formulated Theorem 5.6 for a cube and its complement, but
of course, any hyperplane that cuts X into two halfs yields a separator of
size Θ̃(nν ). (Or rather, two hyperplanes if we use the torus topology.) We
can also iterate this process, i.e., if we have cut a GIRG into two parts of
equal size, we can cut those two parts again by another hyperplane, and
so on. This gives us a separator hierarchy. Separator hierarchies also
exist for some other graph classes, for example for planar graphs. They
can often be used to make branch-and-bound algorithms more efficient.
For example, for every ε > 0 it is possible to find a (1 + ε)-approximation
for the INDEPENDENT SET problem on GIRGs (and on planar graphs,
too) in polynomial time, while even an n1−ε -approximation is NP-hard
on general graphs. A similar technique allows to find approximate vertex
covers. Also, it is possible to compute a maximum matching on GIRGs in
time O(n(5−τ)/2 ), which is faster than the best known algorithm of runtime
p
O( nm) for general graphs [BFK16].

5.3 Greedy routing


We have earlier encountered the Kleinberg model for navigatibility. That
model was constructed in a way that makes navigation possible. As we will
see, greedy routing works automatically in GIRGs. Let us briefly return
to Milgram’s small world experiment. The test subjects were told to send
the message “to a personal acquaintance who is more likely than you to
know the target person”. This has a very natural analogon in GIRGs, as
the connection probabilities puv . Note that, if a vertex v knows weight wt
and position xt of the target t, and it knows the weight wu and position
xu of all its neighbours, then it can compute the connection probability
put for all its neighbours. Hence, it can decide which neighbour maximizes
the connection probability and forward the message to that neighbour. In
fact, in order to determine which neighbour u maximizes the connection
probability put , it is not even necessary to know the weight wt of the target.
Recall that put = min{1, wu wt /kxu − xt kd }α . We may ignore the minimum
with one, since we are finished in one more step when we reach put = 1.
Thus we must find the u which maximizes (wu wt /kxu − xt kd )α . But
this is the same u that maximizes wu wt /kxu − xt kd , and the same u that
maximizes wu /kxu − xt kd . Let us define this last expression as potential
function:
wu
ϕ(u) := ϕt (u) = d
(5.12)
kxu − xt k

Then we obtain a simple algorithm, Algorithm 2.

Algorithm 2: Greedy Routing


Assumption: Every vertex knows the position and weight of itself
and of all its neighbours. Vertex s has a message which contains
the position of target t.
Initialization: v := s
repeat
u := argmax {ϕt (u ) | u 2 V, u ∼ v}.
0 0 0

if ϕt (u)  ϕt (v) then


return failure.
else
v←u
until v = s;

Obviously, Algorithm 2 is not very smart. It just gives up when it


reaches a local maximum. At least it avoids infinite loops, since it guar-
antees that pvt increases in each step. Despite this naive handling of local
optima, the next theorem, taken from [BKL+ 22] shows that the algorithm
is stunningly successful.

Theorem 5.7. Let G = (V, E) be a GIRG, and let s, t 2 V be uniformly


at random from V. Then with probability Ω(1) the greedy routing
algorithm, starting in s, finds t in at most | log(τ−2)|
2+o(1)
log log n steps.
Proof Sketch. We will only give the main calculation. Let v be the current
vertex, and let R(v) be the maximal length of an edge of v, i.e., R(v) :=
max{kxu − xv k | u ∼ v}. So all neighbours of v lie in a ball of radius R(v)
around v. Moreover, for brevity we write D(v) := kxv − xt k for the distance
to the target. One can show that the algorithm goes through two phases,
where in the first phase R(v) < D(v), and in the second phase R(v) > D(v).
For random start points s and targets t, we typically have ws , wt =
O(1). In this case, all neighbours of s are in distance O(1), and thus
R(v) = O(1). On the other hand, typically D(v) = Θ(n1/d ), so we start in
the first phase with R(v) = o(D(v)).
So assume that the algorithm is currently in some vertex v in the first
phase. For the sake of this intuition, we even assume R(v) = o(D(v)),
not just R(v) < D(v). Which neighbour does v select in the first phase?
By the triangle inequality, all neighbours u satisfy D(u) = kxu − xt k 2
[D(v) − R(v), D(v) + R(v)]. Since R(v) = o(D(v)), this means that for any
two neighbours u, u we have D(u) = (1  o(1))D(v) = (1  o(1))D(u ).
0 0

Therefore, all neighbours u of v have essentially the same denominator


in the potential ϕ(u) = wu /D(u)d , up to (1  o(1))-factors. On the other
hand, the weight of the heaviest neighbour of v is roughly w1/(τ−2) v by Corol-
lary 5.3, so the factor wu covers the whole range [1, wv 1/(τ−2)
]. Thus in order
to maximize the potential ϕ(u) = wu /D(u) , we must essentially maxi-
d

mize wu , i.e, select the neighbour of v of largest weight. Therefore, in the


first phase the algorithm iteratively goes to the (approximatively) heaviest
neighbour. We already know that this procedure leads to doubly expon-
tially growing weights, to weight w1/(τ−2) after i steps. It reaches the heavy
i
s
core in | log(τ−2)| log log n steps. Moreover, for large wv it is very unlikely that
1 o(1)

wv does not have neighbours of larger weight. Thus the algorithm has a
constant failure probability in the first few steps, when wv is still small, but
with growing wv the failure probability quickly becomes negligibly small.
Recall that vertices in the heavy core form a single clique, regardless of
their position. Therefore, a vertex v in the heavy core reaches vertices in all
places of X , and thus it is not hard to show that R(v) > D(v). Hence, when
the algorithm reaches the heavy core, it enters the second phase. (It would
not harm the analysis if the algorithm enters the second phase earlier, or if
we only have R(v)  D(v) when the algorithm reaches the heavy core, but
both scenarios are not likely.)
Which neighbour does a vertex v pick in the second phase of the al-
gorithm? This is a bit trickier. First of all, mind that R(v) > D(v) does
not mean that t is a neighbour of v. While v has some neighbours in
distance range D(v), it does not connect to all vertices in this distance
range. In particular, it typically does not connect to t. However, v has
some neighbours which are much closer to the target t, and one of them
will be optimal. More precisely, we will show that the best neighbour u has
weight wu  wopt := ϕ(v)−1 and distance D(u)  Dopt := ϕ(v)(1−τ)/d from
t. Thus ϕ(u)  wu /D(u)d = ϕ(v)−1 /ϕ(v)1−τ = ϕ(v)τ−2 =: ϕopt . Hence, in
the second phase we increase the potential by an exponent of τ − 2 in each
step, see also Figure 5.1. (We have τ − 2 < 1, so taking the (τ − 2)-th power
brings the potential closer to one. Since the potential is less than one, this
corresponds to an increase.) We start the second phase (and in fact, also
the first phase) with a potential wv /kxv − xt kd  1/n, and if we raise the
potential to power τ − 2 in each step, then an easy calculation shows that
we need at most | log(τ−2)|
1 o(1)

log log n steps to reach potential Ω(1). Here, the
o(1) term swallows the approximations that we have swept under the rock.
Once the potential is at Ω(1), we are finished, since then the algorithm has
a probability of Ω(1) to hit t in the next step.
It remains to show that the best neighbour u indeed satisfies wu 
wopt and D(u)  Dopt . Let us first show that such a neighbour indeed
exists. We will use without proof that Dopt  D(v).7 Every vertex u of
weight wopt and distance at most D(v) from v is a neighbour of v because
wopt wv /D(v)d = ϕ(v)−1  wv /D(v)d = 1. This is not quite the set of vertices
we are looking for. Instead, we want to find a neighbour u of v which has
distance Dopt from the target. But this does not change much: every such
vertex has distance at most D(v)+Dopt  2D(v) from v, so all such vertices
of weight wopt connect to v with probability Ω(1).
On the other hand, there are vertices of weight  wopt and distance 
Dopt from the target: the expected number of such vertices is Θ(Ddopt w1−τ
opt ) =
Θ(ϕ(v)  ϕ(v) ) = Θ(1). In a real proof, we would choose Dopt slightly
1−τ τ−1

larger to make sure that there are many such vertices. Hence, we have
shown that v does have neighbours with weight wopt and distance Dopt
7
This step is actually not completely trivial. It involves showing that not all combina-
tion of D(v) and wv are possible, since whp there are no vertices with very high weight
which are very close to t. Alternatively, one can show inductively that throughout the
second phase (except for the very first step) the relation D(v)d  wτ−1 v holds, which
follows the formulas for Dopt and wopt .
Figure 5.1: A typical trajectory of greedy routing. (Figure taken
from [BKL+ 22], β = τ.) In the first phase the weight is increased by
an exponent of 1/(τ − 2) in each step. In the second phase, the potential
is increased by an exponent τ − 2 in each step.

from t.
We still need to show that there are no neighbours with better potential.
To this end, consider vertices of weight w in distance r from t. Such vertices
exist if w  (rd )1/(τ−1) , since this is the maximum weight among rd vertices.
We write equivalently
r−d w1−τ  1. (5.13)
Let us now compute the expected number of neighbours of v which have
weight  w and distance  r from t. For simplicity, we will restrict our-
selves to the case r  D(v)/2. Mind that those vertices still have distance
 D(v) from v, not distance r. Hence, the probability to form an edge

with v is wv w/D(v)d , and the expected number of such neighbours is


 wv w   
d 2−τ wv
Θ rd  w1−τ  d
= Θ r w d
= Θ(rd w2−τ ϕ(v)). (5.14)
D(v) D(v)
In order for such neighbours to exist, we require that the above expectation
is at least one. (In a full proof, we would allow some slack.) So, ignoring
constant factors, we obtain the condition
r−d wτ−2  ϕ(v). (5.15)
We want to maximize the potential w/rd under the side constraints (5.13)
and (5.15). This gives us a linear optimization problem, which can be
solved by standard methods. Here we just give the solution. We take the
(3−τ)-th power of (5.13) and multiply it with the (τ−2)-th power of (5.15),
and obtain
w/rd = (r−d w1−τ )3−τ  (r−d wτ−2 )τ−2  13−τ  ϕ(v)τ−2 = ϕopt . (5.16)
Since the left hand side is the potential of the neighbour, we have shown
that all neighbours of v have potential at most ϕopt .
This concludes the intuition that we give here. Of course, a full proof
would be much more technical. For example, we would need to show that
once the algorithm reaches the second phase, it stays in the second phase,
and formally argue that in the second phase we really only need to inves-
tigate distances r  D(v).
Theorem 5.7 is remarkable in several aspects. Firstly, it is not clear a
priori why such a stupid algorithm should have constant success probability.
Recall that the Kleinberg model only succeeded because of the underlying
grid; without the grid, routing in the Kleinberg model fails with high prob-
ability.8 Note that we cannot expect the success probability for GIRGs to
8
For GIRGs, a grid would not guarantee success of greedy routing, since a vertex v
may have higher potential ϕ(v) than all its neighbours. This can not easily be fixed by
removing the check ϕ(u)  ϕ(v) from Algorithm 2. Then from a local maximum v, the
algorithm might enter an infinite loop between v and its best neighbour.
be arbitrarily close to 1, since there is a constant probability that s or t
are not in the giant component. Thus success probability Ω(1) is the best
we can hope for.
Secondly, the number of steps is optimal. Typical distances in GIRGs
are | log(τ−2)|
2+o(1)
log log n by Corollary 5.3, so shortest paths in G have the same
length as the paths that greedy routing finds, up to a factor of 1 + o(1).
Indeed, it is even possible to make this connection for indidvidual pairs s, t:
conditioned on the algorithm finding t, with high probability the greedy
routing algorithm finds paths of stretch 1 + o(1), i.e., which are only by a
factor 1 + o(1) longer than the shortest s-t-path in G. Moreover, even if
we select s and t with given weights ws and wt and given distance D(s),
then the stretch is still 1 + o(1), even if the typical shortest path for such
conditions is shorter than | log(τ−2)|
2 o(1)

log log n, as long as the path length is
ω(1).
Unfortunately, it is usually not possible to use Algorithm 2 directly for
real-world networks, because we do not know the underlying vertex posi-
tions xv . (The weights can be approximatively deduced from the degrees).
Recall that vertex positions do not just encode GPS coordinates (if these
are available), but also intrinsic properties like age, profession and hobbies
in social networks. However, there is active research on reconstructing the
vertex positions from the abstract graph, which can then be used for greedy
routing.9

Patching

We have already mentioned that Algorithm 2 has a pretty stupid policy


(or rather, no policy) for handling local optima. It is possible to patch the
algorithm with smarter policies which ensure that the target is always found
if s and t are in the same component. This can be done in two ways: either
the routing path is added to the message itself, similar as for emails. Or
every vertex v remembers whether the message has already visited v, and
remembers the last neighbour to which v has forwarded the message. With
such patches, the algorithm can be modified such that it always finds t if
it is in the same connected component as s. Importantly, these patches do
9
Or rather, for finding vertex positions that are compatible with the abstract graph
structure. The abstract graph does not contain enough information to reconstruct the
original coordinates, but this is not necessary for such a task.
not destroy the performance guarantee: with high probability, the stretch
of the greedy routing algorithm remains 1 + o(1) [BKL+ 22].

Geometric Routing

A natural variant of greedy routing is geometric routing. In this variant,


we do not greedily optimize the connection probability put (or equivalently,
the potential ϕ(u)), but rather we optimize the geometric distance from
the target. So v sends the message to the neighbour u which minimizes the
distance kxu − xt k from the target. Otherwise, we follow the same scheme
as in Algorithm 2.
Interestingly, geometric routing may also work in some situations. More
precisely, it works if and only if α > τ − 1. Note that this is precisely the
regime in which most neighbours in distance class r > rI (v) are strong
neighbours or almost strong neighbours by Lemma 5.4. This is no coinci-
dence, and we will now sketch the reason.
Geometric routing goes to the neighbour u which is closest to the target.
Let us first consider the first phase. Then all neighbours u have similar
distance from the target t, but we can still study which of them is closest
to t. To make progress in direction of t, we need two properties:

(i) u and t should lie in the same direction from v, so the vector −
x− →
v xu
should have a similar direction as the vector −
x−→
v xt .

(ii) The distance kxu − xv k should be as large as possible.

For condition (i), all neighbours of v have the same chance to lie in a good
direction, regardless of their weight. But for condition (ii), it depends on
α. If α > τ − 1, then for large r there are more strong vertices in distance
 r than weak neighours. By Lemma 5.4, the expected number of strong

neighbours in distance  r from v is Θ((rd )2−τ wτ−1 v ). This expectation is


Θ(1) for r = rmax , where rmax := wv
d (τ−1)/(τ−2)
. Hence, the farthest neighbours
have distance  rmax from v. Moreover, “most” of the neighbours in this
distance are strong (or almost strong) neighbours, so they have weight
w  rdmax /wv = w1/(τ−2)
v . Since there are more strong than weak neighbours
in distance  r, it is not hard to check that for α > τ − 1 there are no
vertices of lower weight in distance rmax . Therefore, geometric routing will
pick among neighbours of weight w  w1/(τ−2) v , and it will select among
them whichever neighbour is aligned best with the direction − x−→
v xt . But that
means that geometric routing also increases the weight in the first phase
by an exponent of 1/(τ − 2) in each step, just like greedy routing. Hence,
it will also reach the inner core in | log(τ−2)|
1 o(1)

log log n steps.
Note that this only works if α > τ − 1. If α < τ − 1, then a similar
argument shows that the farthest neighbours of v are weak neighbours with
weight O(1). Hence, for α > τ − 1 the algorithm will stay within vertices
of weight O(1), and will only cover a distance of rmax = O(1) per step.
Thus it would need Ω(n1/d ) steps to overcome the distance Ω(n1/d ) from s
to the target t. It is even worse, because for vertices of weight O(1) there
is a constant failure probability per step, and with high probability the
algorithm will get stuck quickly.
For the second phase, we can make a similar argument. Let us assume
we have a vertex v which has distance D(v) from the target. We may
assume D(v) > rI (v), where rI (v) is the radius of the ball of influence of v.
Since v connects to all vertices in the ball of influence, t would be adjacent
to v if D(v)  rI (v), and the algorithm would terminate. By defintion of the
second phase, we may also assume D(v)  R(v), where R(v) is the length of
the longest edge incident to v. Hence, v has neighbours in distance  D(v),
but does not connect to all vertices in this distance. In particular, let us
consider a ball B with radius r < D(v)/2 around t. Then the whole ball B
is in distance  D(v) from v. It depends on the volume Θ(rd ) of B whether
v has a neighbour in the ball B. Let us choose r such that the expected
number of neighbours of v in B is one. Which weight does a neighbour u of
v in B have? Since α > τ − 1, there are more strong (or almost strong) than
weak neighbours in distance  D(r). Therefore, with high probability the
neighbour u is a strong neighbour and thus has large weight. Calculating
r and wu yields exactly the same values Dopt and wopt as in the proof
of Theorem 5.7.10 Hence, geometric routing takes the same trajectory as
greedy routing for α > τ − 1, also in the second phase.

5.4 Bidirectional search


Assume that we are given two vertices s and t, and we want to find a
shortest path from s to t. One of the first algorithms that you have learned
10
We did not show it explicitly in the proof of Theorem 5.7, but Dopt can be described
as the smallest radius for which the ball B contains a strong neighbour of v.
is breadth-first search (BFS), which needs time O(n). However, it turns out
that this is not the fastest algorithm in practice. Instead, it is more efficient
to use bidirectional search. In bidirectional search, we simultaneously start
a BFS from both s and t. We call the two processes s-BFS and t-BFS. We
alternate between the two processes, so whenever we have explored a layer
of one BFS, we switch to the other. As soon as a vertex appears in both
search trees, we have found a shortest path between s and t.
There are some variants. In practice, one should not just alternate, but
one should always continue with the process which has so far the smaller
BFS tree. This avoids the possibility that one BFS tree becomes much
larger than the other. As an alternative, one could also swap the process
after every explored vertex.
Speaking with practitioners, bidirectional search is generally faster than
the classical unidirectional search of BFS. It depends on the network how
much faster it is. As a rule of thumb, in networks with homogeneous degrees
the speed-up is modest, typically a factor of 2, but it can be much larger
in heterogeneous networks.
Traditional worst-case analysis does not show a difference between uni-
directional and bidirectional search. Without further assumptions on the
graph, both have the same worst-case performances. It is even possible
to construct examples where bidirectional search performs worse than uni-
directional search. So the differences in practice seem to come for the
structure of the underlying networks. For unidirectional search, the per-
formance does not depend on the underlying network. A BFS starting in
s explores the set of vertices in some linear order. Since t is uniformly at
random, it will appear in a random position in this ordering. So in expec-
tation, we need to explore n/2 vertices to find t. (For a connected network
of n vertices.) This is independent of the graph structure. So in order to
understand the difference, we should study bidirectional search on different
network models. In the following we will give heuristic arguments, but not
full proofs. Analyses for Chung-Lu random graphs and hyperbolic random
graphs (i.e., one-dimensional GIRGs with α = ∞) can be found in [BN19]
and [BFF+ 22] respectively.
Since the speed-up depends on the degree distribution, our first attempt
might be to analyze and compare Erdős-Rényi with Chung-Lu betworks.
We will restrict ourselves to the giant component of both models, and
always assume that the vertices s and t are drawn uniformly at random
form the giant component.

Erdős-Rényi graphs
p
In Erdős-Rényi graphs, we need to explore Θ( n) vertices from both sides.
p
Then every vertex in the s-BFS has probability Θ(1/ n) to appear in the
t-BFS, so the expected number of vertices which appear both in the s-BFS
and the t-BFS is Θ(1). This is know as the birthday paradox. By standard
p
probabilistic arguments, we thus only need to explore Θ( n) vertices in
expectation. This is much fewer than the Θ(n) vertices that we need to
explore by unidirectional search.

Chung-Lu graphs

For Chung-Lu graphs with power-law exponents τ 2 (2, 3), we know that
shortest paths run via the inner core. I.e., the two BFS will run until
both have reached the inner core, and then they will find an overlap. The
question is. How many vertices does a BFS explore before it finds the inner
core? The answer will turn out to be surprisingly simple, but in order to
understand the answer, we need to take one step back and return to the
question of how many friends our friends have.
Consider a vertex v of weight wv . The weights in the neighbourhood
of v follow a power-law distribution with exponent τ − 1. Let vmax be the
neighbour of largest weight of v. Then the weight of vmax is roughly wmax :=
w1/(τ−2)
v , and vmax has Θ(wmax ) neighbours. But how many neighbours have
all neighbours of v combined? Since each neighbour v contributes Θ(wv )
0
0

neighbours, and Pr[wv = w] = Θ(w1−τ ), we can compute this with an


0

integral:
Z wmax
#{nbs of nbs of v  wv  Θ(w1−τ  w)}dw = Θ(wv  w3−τ
max )
1
= Θ(w1+(3−τ)/(τ−2)
v ) = Θ(w1/(τ−2)
v ) = Θ(wmax ).

So, the neighbour vmax has about as many neighbours as all other neigh-
bours of v combined. Actually, we shouldn’t be surprised because we know
from Section 3.2.2 that in the limit, our friends have infinitely many friends
in expectation. This can only happen if the expectation is dominated by
the friend(s) of largest weight, which means that they must contribute as
much (or more) to the expectation than all other friends combined.
The same insight can also be transferred to sets of vertices. Assume
that S is some layer of the BFS tree S of vertices, and assume that this set
P
has total weight wS := v S wv . Consider the set S that we explore in the
2
0

next step. We have |S | = Θ(wS ), and the weights of vertices in S follow


0 0

a power-law with exponent τ − 1. Hence, the vertex vmax of largest weight


1/(τ−2)
in S will have wS
0
neighbours, about as many neighbours as all other
vertices in S . In particular, the size of the next layer in the BFS tree (and
0

1/(τ−2)
in fact, of the whole BFS tree) is in wS .
Unfortunately, the above calculation comes with a restriction. The
weights of the neighbours of a vertex v follow a power-law of exponent
τ − 1 up to the cut-off point n/wv . Beyond this cut-off point, the above
computation no longer holds. In particular, if the weight of v is too large
then the heaviest neighbour does not have weight w1/(τ−2) v . Fortunately,
this restriction only applies for very large weights wv . More precisely, it is
only relevant for the last layer, when the BFS finds the inner core.
Assume that S is the last layer before the BFS finds the inner core.
Let vmax be the heaviest vertex in S, and let wmax be its weight. Let S 0

be the neighbourhood of S, and let vmax be the heaviest vertex in S , with


0 0

weight wmax . Since we proceed layer by layer, there can be non-trivial


0

fluctuations wmax . If we are lucky, then wmax = n1/2+ε . In this case,


0 0

it can be checked that wmax is below the cut-off point for wmax , so the
0

above calculations still hold and the BFS only needs to explore O(wmax ) = 0

O(n1/2+ε ) vertices. If we are unlucky, then wmax  n1/(τ−1) , which is the


0

maximal weight in the whole graph. In this case, we need to explore the
neighbourhood of the whole inner core. The inner core contains Θ(n 
(n1/2 )1−τ vertices, all of which have degree Ω(n1/2 ), so its neighbourhood
has size Ω(n1+(1−τ)/2+1/2 ) = Ω(n(4−τ)/2 ). This bound is indeed tight, both for
the size of the neighbourhood of S and for the runtime of the BFS. Hence,
0

the runtimes of bidirectional search varies in the interval [n1/2 , n(4−τ)/2 ].


Note that, for different values τ 2 (2, 3), the upper bound may take any
value between n1/2 and n. Thus bidirectional search is asymptotically faster
than unidirectional search, but the speed-up is smaller than for Erdős-Rényi
graphs.
Geometric networks

The previous analysis have shown different things: bidirectional search


is indeed asymptotically faster than unidirectional search. This matches
practical findings. However, the speed-up for Erdős-Rényi is larger than
for Chung-Lu graphs. This is strange, since the practical experience is
opposite: the speed-up of bidirectional search is stronger for inhomogeneous
degrees, and modest for homogeneous graphs.
To understand this, let us go to geometric models. Applications are of-
ten routing and navgation problems, in which geometry plays an important
role. As homogeneous network model, let us simply consider a grid. To
find a connection between two vertices u and v, a unidirectional BFS will
explore a ball around s of radius kxs −xt k, which has volume  c  kxs −xt kd ,
where c is the volume of the unit ball.11 A bidirectional search needs to
explore the two balls around s and t of radius kxs − xt k/2, at which point
they touch. Thus the runtime is 2c  (kxs − xt k/2)d = 21−d ckxs − xt kd .
Hence, in a grid graph the bidirectional search is faster by a factor of 2.
This matches quite well the reported speed-up factor of 2, assuming that
many applications have a two-dimensional underlying geometry. The same
result also holds for more flexible models like Geometric Random Graphs,
but is more difficult to prove there.
For GIRGs, the analysis of Chung-Lu graphs still applies. The main
difference is that due to clustering and community structure, the BFS will
quite often encounter vertices that it has found before. However, the size
of the layers is determined by the weights that we obtain by picking it-
eratively the heaviest vertex, so it grows doubly exponential. Hence, the
decelerating effects of clustering and communities are negligible, and the
runtime varies again in the interval [n1/2 , n(4−τ)/2 ]. With these models, we
find only a modest improvement for homogeneous graphs, and an asymp-
totic improvement for heterogeneous graphs. This indicates that we are on
track for understanding bidirectional search in practice.12
11
The unit ball in the right norm, for example one needs to take the k.k1 -norm for
square grids. For our purpose it is just important that c is a universal constant that does
not depend on s and t.
12
Of course, this is so far only a hypothesis, and one should not stop at this point. But
further research has indeed confirmed that these models are a good match. For example,
the improvements predicted by these models match well the actual speed-ups on various
real networks.
5.5 Non-Euclidean GIRGs
We have already discussed some minor variations of GIRGs in Section 5.1.1.
In this section, we will see a more fundamental way of generalizing GIRGs.
In GIRGs, we measured “closeness” of vertices by their Euclidean distance.
However, we could equip the underlying space X with other distance func-
tions as well. The idea of the following definition is that the distance
between x, y 2 Rd is measured by κ(x − y) for some function κ : Rd → R 0 . 

Definition 5.8. Let d 2 N. Consider a measurable function κ : Rd →


R 0 . For x 2 Rd and r, R  0, we call Bκr (x) := {y 2 Rd | κ(x − y)  r}


the κ-ball of radius r around x, and we call B∞ R/2 (x) := {y 2 R |


d

kx − yk∞  R/2} the ∞-ball of radius R/2 and volume R around x.


d

For R, r > 0, let VolκR (r) := Vol(Bκr (0) \ B∞


R/2 (0)).
We call κ a feasible distance function if it satisfies the following
properties:

(i) Symmetry: f(x) = f(−x) for all x 2 Rd .

(ii) Boundedness: There is C > 0 such that κ(x)  C  kxk∞ for all
x 2 Rd .

(iii) Continuity of volume: For all R > 0, the function VolκR (r) :
R 0 → R 0 ; r 7→ VolκR (r) is surjective onto [0, Rd ].
 

When κ and R are clear from the context, we also write Vol(r) instead
of VolκR (r).

For the last condition, note that the volume of B∞ R/2 (0) is R . Hence, we
d

have VolκR (r)  Rd for all r  0. Moreover, condition (ii) implies that for
r = CR/2, we have that B∞ R (0)  Br (0), and thus VolR (CR/2) = R . The
κ κ d

function VolR (r) is increasing in r since the set Bκr (0) is growing with r,
and so the third condition requires that VolκR (0) = 0 and that the volume
increases continuously from 0 to Rd as r increases from 0 to CR/2.
For feasible distance functions, we can define a generalized GIRG model.
As for ordinary GIRG, we draw the vertex locations from an axis-parallel
cube X of volume n and radius R = n1/d . To avoid boundary effects, we
will not apply κ directly to xu − xv , but rather to xu − xv mod R, where
the “mod R” operator is applied componentwise for the d components of
the vector xu − xv . Recall that for y 2 R we define y mod R := y for the 0

unique y 2 [0, R) for which (y − y )/R 2 Z. In this way, the shape and
0 0

volume of the “ball” Bmod


r (x) := {y 2 X | κ(x − y mod R)  r} around x 2 X
is independent of x.

Definition 5.9. Let α > 1, d 2 N, let D be a power-law distribution on


[1, ∞) with exponent τ 2 (2, 3) and let κ : Rd → R 0 be a feasible 

distance function. Let X  Rd be the d-dimensional cube around zero


of volume n 2 N, and let R := n1/d /2 be its radius. A κ-Geometric
Inhomogeneous Random Graph (κ-GIRG) G = (V, E) on n vertices
is obtained by the following three-step procedure.

(a) Every vertex v draws independently a weight wv from distribution


D.

(b) Every vertex v draws indepdendently a uniformly random posi-


tion xv 2 X .

(c) Every two different vertices u, v 2 V are independently connected


by an edge with probability
α
wu wv
puv := min 1, , (5.17)
VolκR (ruv )

where ruv := κ(xu − xv mod R).

The only difference to the ordinary GIRG definition is that we have


replaced the factor kxu −xv kd in the definition of puv by the factor VolκR (ruv ).
The crucial insight is that the marginal probability stays the same as for
GIRGs and Chung-Lu graphs. Note that we obtain the GIRG model as
special case for κ(x) := kxk∞ .13

Lemma 5.10. For any feasible distance function κ, consider a κ-GIRG


G = (V, E) and let u, v 2 V. Assume we have drawn wu , wv and

13
We are very slightly cheating here. We would need VolκR (r) = rd for κ = k.k∞ to
obtain the GIRG model. This is true for small r, but fails to be true if r > R/2 due to
boundary effects. But the difference is negligible.
xu 2 X , but that xv is still random. Then the probability that u and
v are connected is
 
wu wv
Pr[u ∼ v | wu , wv , xu ] = Θ min 1, .
n

Proof. We will show that the marginal probability is the same as in the
GIRG model. Let 0  y  n, and let us study q y := Pr[VolκR (ruv ) 


y | wu , wv , xu ]. Since the function Vol(r) = VolκR (r) is surjective by Defini-


tion 5.8, there exists ry such that Vol(ry ) = y. Let us momentarily assume
that Vol(r) is even bi jective. Then we may compute

Pr[VolκR (ruv )  y] = Pr[ruv  ry ] = Pr[xv 2 Bmod


ry (xv )] (5.18)
Vol(Bmod
ry (xv ) Vol(Bmod
ry (0) y
= = = .
Vol(X ) Vol(X ) n

Note that this expression is independent of κ. Hence, for any fixed wu , wv , xv ,


the denominator in (5.17) is a random variable whose cumulative distribu-
tion is independent of κ. Thus, the distribution of puv is independent of κ,
and in particular Pr[u ∼ v | wu , wv , xu ] = E[puv | wu , wv , xu ] is independent
of κ. Thus it must be identical to the GIRG model, which we obtain as the
special case κ(x) := kxk∞ .
We have used the assumption that Vol(r) is bijective. That was not
necessary, but avoided technical difficulties. Otherwise, one would need to
define ry := sup{r | Vol(r) = y}, since otherwise the first step in 5.18 would
fail. We omit the details.

Lemma 5.10 has vast consequences. It implies that all arguments that
are based on the marginal connection probabilities remain true. In partic-
ular, all result from Corollary 5.3 directly transfer to arbitrary κ-GIRGs,
including E[deg(v)] = Θ(wv ), the degree distribution in the neighbourhood,
and the existence of a giant component with typical distances. Moreover,
Lemma 5.4, which counted the number of strong and weak neighbours in
distance at least r (or in [r, 2r]) also remains true. In particular, most
neighbours of v are in or almost in the ball of influence I(v), and in dis-
tance r > rI (v), there are more weak neighbours for α < τ − 1, and more
strong or almost strong neighbours for α > τ − 1. The result for bidirec-
tional search also relies only on the marginal connection probabilities, and
thus carries over as well.
5.5.1 The minimum component distance
One of the most intriguing examples of a distance function κ is the mini-
mum component distance. (Not to be confused with the maximum compo-
nent distance, which is just the good old ∞-norm we know from analysis.)
For d  2 and a vector x = (x1 , . . . , xd ) 2 Rd , we define κ(x) := kxkmin :=
min{|xi | | 1  i  d}. Note that this is not a norm, and it does not
even satisfy the triangle inequality. For the vectors x := (0, 0), y := (1, 0),
z := (0, 1), the vector x has distance zero from both y and z. But this does
not imply that y and z also have distance zero from each other, or even
“small” distance. There are also some other peculiarities. We have some
non-zero vectors x with kxkmin = 0. The scaling is also different than what
we know from norms and metrics. The r-neighbourhood of 0 consists of the
union of d “thickened” hyperplanes, each defined by −ε  xi  ε for a co-
ordinate i. Since each of these thickened hyperplanes has volume 2ε  Rd−1 ,
we generally have VolκR (r) = Θ(Rd−1  r). Despite all these quirks, it is easy
to check that the minimum component distance is a feasible distance.
For a κ-GIRG, this means that two nodes are considered to be close if
they agree in at least one coordinate. Intuitively, this makes sense for social
networks: most of your acquaintances probably share at least one aspect
with you: you may know colleagues taking the same lectures, comrades
from your sport team or some other hobby, your family members, and your
neighbours. But typically, your acquaintances do not share all of your
aspects with you. Few (if any) of them will be a family member and study
the same subject and play in the same sport team and live in the same
house.

Clustering and the probabilistic triangle inequality

In GIRGs, the triangle inequality was the ultimate reason for the large
clustering coefficient. Let us briefly recall the argument. A typical vertex
u has weight O(1). The typical random neigbhour of u has also weight
O(1) and is in distance O(1) from u. Hence, if v1 and v2 are two random
neighbours of u, they typically both have distance O(1) from u, and then
by the triangle inequality they also have distance O(1) from each other.
Thus they have connection probability Ω(1), which leads to a clustering
coefficient of Ω(1).
Since the minimum component distance does not satisfy the triangle
inequality, it may seem it leads to low clustering coefficient. But that
is not so. The minimum component distance still satisfies the following
relaxed version of the triangle inequality.

Definition 5.11. Let κ : Rd → R be a feasible distance function.




We say that κ satisfies a stochastic triangle inequality if there are


c, ε0 , R0 > 0 such that the following holds for all 0 < ε  ε0 and all
R  R0 .
Let X := B∞ R/2 (0), and let x, y be uniformly at random from Bε (0) \
κ

X . Then Pr[κ(x − y)  2ε]  c.

For the usual triangle inequality, we would require κ(x − y)  2ε holds


for all x, y 2 Bκε (0). Here, we only require that this holds with probability
Ω(1) when we draw x and y uniformly at random from Bκε (0). We need
to restrict ourselves to the compact set X , because in general we can not
draw uniformly at random from unbounded sets. Let us show that the
minimum component distance satisfies this condition. For κ := k.kmin , if
x 2 Bκε (0) then we must have |xi |  ε for at least one coordinate i 2 [d].
Any y 2 Bκε (0), it must also must satisfy |yj |  ε for at least one coordinate
j 2 [d]. Since κ is symmetric with respect to permutation of the axes, for
a random y we have py := Pr[|y1 |  ε] = Pr[|y2 |  ε] = . . . = Pr[|yn |  ε].
By a union bound, we have 1 = Pr[9j : |yj |  ε]  d  py . Hence, py  1/d.
In particular, Pr[|yi |  ε] = py  1/d. Hence, x and y are close to zero
in the same coordinate i with probability at least 1/d. If this happens,
then |xi − yi |  2ε by the ordinary triangle inequality on R, and hence
κ(x − y)  2ε. Thus κ = k.kmin satisfies condition (b) in Definition 5.11
with c = 1/d (and arbitrary ε0 and R0 ).
The stochastic triangle inequality implies a large clustering coefficient,
as the following theorem shows.

Theorem 5.12. Let κ be a feasible distance function that satisfies a


stochastic triangle inequality. Then the clustering coefficient of κ-
GIRG G satisfies CC(G) = Ω(1) in expectation and with high prob-
ability.

Proof. The proof is similar to the proof for GIRG in Corollary 5.5, so we
only stress the main difference. It still suffices to show that E[CC(v)] =
Ω(1) for vertices v with weight wv 2 [1, 2] and deg(v)  2. If we choose
two random neighbours u1 , u2 of v, then they have a constant probability
to be both in the ball of influence I(v) of v. Since every vertex in I(v)
connects to v, the position of u and v is uniformly at random in I(v). By
the weak stochastic triangle inequality, with probability Ω(1), u and v have
distance at most 2rI (v), which implies that they connect with probability
Ω(1). This yields E[CC(v)] = Ω(1), as required.

So quite a lot of the properties of GIRGs carries over to κ-GIRG, es-


pecially if κ satisfies a stochastic triangle inequality. One thing that does
change are the communities. It is still true that we can find relatively dense
subgraphs. For example consider the thickened hyperplane H of width 2r,
defined by |x1 |  r. For small r, vertices in H are all relatively close to each
other, so there are relatively many edge and cycle between them. If we
take a connected component, then we find by a constant factor more edges
than in a connected subgraph of Erdős-Rényi graphs, so we may still call it
a community. However, every vertex in H has only about a 1/d-fraction of
its ball of influence in H, and therefore the number of edges going out of the
community is comparable or larger than the number of edges inside of the
community. Recall that the situation for GIRGs was very different: in The-
orem 5.6 we found communities which had asymptotically fewer outgoing
edges than internal edges.
In fact, it can be shown that this does not just hold for the hyperplane
H, but that there are in general no small separators k.kmin -GIRGs. For

example, as for Erdős-Rényi graphs, whenever we partition the giant com-


ponent of a k.kmin -GIRG into two sets of size at least εn each, then there
are at least δn edges between the partite sets, for some δ that depends on
ε [LT17]. Again, this is different for GIRGs by Theorem 5.6.
Let us conclude this chapter with a final qualitative difference between
GIRGs and k.kmin -GIRGs. Consider a vertex v in a k.kmin -GIRG, and let us
pick two different neighbours u, u of v. We have learned that with constant
0

probability u and u are close to v in the same coordinate, and thus also
0

close to each other. Let us now focus on a different case, that u and u are 0

close to v (say, in its ball of influence I(v)), but in different coordinates.


For simplicity, let us assume that those are the first and second coordinate,
i.e., |v1 − u1 |  rI (v) and |v2 − u2 |  rI (v). Then it is not hard to see that
0

the other coordinates u2 , . . . , ud of u are essentially distributed uniformly


at random in the possible range [−R/2, R/2] for R := n1/d .14 Likewise, the
coordinates u1 , u3 , u4 , . . . , ud of u are uniformly at random. Therefore,
0 0 0 0 0

the vector u − u mod R is just a uniformly random vector in [0, R]d . Now
0

consider the shortest path between u and u in G[V \{v}], i.e. after removing
0

v from the graph. Since u − u mod R is random, we are simply left with
0

two vertices u, u with random distance vector from each other, and with
0

high probability the shortest path between them has length 2|τ−2| o(1)

log log n
as for every other random pair of vertices in the graph. Thus in the k.kmin -
GIRG model, if we pick two random neighbours u, u of a vertex v, then
0

they have probability Ω(1) of having a large graph-distance in G[V \ {v}].


Note that this is very different from GIRGs. A typical vertex v may
have a few weak outgoing edges, but even those tend to be short. It is very
unlikely that v has a weak neighbour on the other side of X . Hence, in
GIRGs typically any two neighbours u and u of v are geometrically rather
0

close to each other. They may not be direct neighbours, but the shortest
path between them is typically much shorter than 2|τ−2| o(1)

log log n.
How does this compare to real social networks? We do not really know
the answer, but it seems to make sense to some degree. Think of a fellow
student v who has come to ETH from abroad. If you pick a family member
u of v, do you expect to find an untypically short path from yourself to u
in the friendship network without going through v? It seems plausible that
the answer is No. (Of course, all paths are pretty short in social networks.
We want paths that are shorter than the typical distance.) One interpreta-
tion is that any person belongs to several social circles or communities, and
there is not necessarily any other connection between these communities.
This corresponds nicely the k.kmin -GIRGs, where each of the d dimensions
of a vertex v define communities which have nothing in common except
d. On the other hand, k.kmin -GIRGs also don’t seem to capture the whole
truth. Your neighbours and your fellow students probably belong to dif-
ferent social circles who have little to do with each other (unless you live
in a student hostel). But they all live in Zürich. So they are maybe as
far from each other as random Zürich citizens, but probably less far than
random people on earth. Relatedly, from a global perspective, there are
certainly small separators. The number of acquaintanships within Europe
is certainly much higher than the number of acquaintanships between Eu-
14
It is not perfectly uniform because there is a non-empty overlap between the d hyper-
planes whose union is I(v). But the overlap is negligibly small if wv is small.
ropeans and the rest of the world. So geometry seems to play a stronger
role than in k.kmin -GIRGs, but a weaker role than in GIRGs. There is still
a lot to be learned about real-world networks, and a lot to improve about
our random network models.
Bibliography

[AS16] Noga Alon and Joel H Spencer. The probabilistic method.


John Wiley & Sons, 2016.

[BA99] Albert-László Barabási and Réka Albert. Emergence of scal-


ing in random networks. science, 286(5439):509–512, 1999.

[BB03] Albert-László Barabási and Eric Bonabeau. Scale-free net-


works. Scientific american, 288(5):60–69, 2003.

[BFF+ 22] Thomas Bläsius, Cedric Freiberger, Tobias Friedrich, Maxi-


milian Katzmann, Felix Montenegro-Retana, and Marianne
Thieffry. Efficient shortest paths in scale-free networks with
underlying hyperbolic geometry. ACM Transactions on Al-
gorithms (TALG), 18(2):1–32, 2022.

[BFK16] Thomas Bläsius, Tobias Friedrich, and Anton Krohmer. Hy-


perbolic random graphs: Separators and treewidth. In
24th Annual European Symposium on Algorithms (ESA
2016). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik,
2016.

[BKL19] Karl Bringmann, Ralph Keusch, and Johannes Lengler. Ge-


ometric inhomogeneous random graphs. Theoretical Com-
puter Science, 760:35–54, 2019.

[BKL+ 22] Karl Bringmann, Ralph Keusch, Johannes Lengler, Yannic


Maus, and Anisur R Molla. Greedy routing and the algorith-
mic small-world phenomenon. Journal of Computer and
System Sciences, 125:59–105, 2022.

[BN19] Michele Borassi and Emanuele Natale. Kadabra is an adap-


tive algorithm for betweenness via random approximation.

i
BIBLIOGRAPHY ii

Journal of Experimental Algorithmics (JEA), 24:1–35,


2019.

[Cen10] Damon Centola. The spread of behavior in an online social


network experiment. science, 329(5996):1194–1197, 2010.

[CL02] Fan Chung and Linyuan Lu. The average distances in ran-
dom graphs with given expected degrees. Proceedings of the
National Academy of Sciences, 99(25):15879–15882, 2002.

[Dur07] Richard Durrett. Random graph dynamics, volume 200.


Citeseer, 2007.

[EDF+ 16] Sergey Edunov, Carlos Diuk, Ismail Onur Filiz, Smriti Bha-
gat, and Moira Burke. Three and a half degrees of separation.
Research at Facebook, 694, 2016.

[Fel91] Scott L Feld. Why your friends have more friends than you
do. American journal of sociology, 96(6):1464–1477, 1991.

[Gra73] Mark S Granovetter. The strength of weak ties. American


journal of sociology, 78(6):1360–1380, 1973.

[Kle00] Jon M Kleinberg. Navigation in a small world. Nature,


406(6798):845–845, 2000.

[KPK+ 10] Dmitri Krioukov, Fragkiskos Papadopoulos, Maksim Kitsak,


Amin Vahdat, and Marián Boguná. Hyperbolic geometry of
complex networks. Physical Review E, 82(3):036106, 2010.

[LEA+ 01] Fredrik Liljeros, Christofer R Edling, Luis A Nunes Amaral,


H Eugene Stanley, and Yvonne Åberg. The web of human
sexual contacts. Nature, 411(6840):907–908, 2001.

[LM01] Malwina J Luczak and Colin McDiarmid. Bisecting sparse


random graphs. Random Structures & Algorithms,
18(1):31–38, 2001.

[LNR17] Vito Latora, Vincenzo Nicosia, and Giovanni Russo. Com-


plex networks: principles, methods and applications.
Cambridge University Press, 2017.
BIBLIOGRAPHY iii

[LT17] Johannes Lengler and Lazar Todorovic. Existence of small


separators depends on geometry for geometric inhomoge-
neous random graphs. arXiv preprint arXiv:1711.03814,
2017.

[MR95] Rajeev Motwani and Prabhakar Raghavan. Randomized al-


gorithms. Cambridge university press, 1995.

[MU17] Michael Mitzenmacher and Eli Upfal. Probability and com-


puting: Randomization and probabilistic techniques in al-
gorithms and data analysis. Cambridge university press,
2017.

[New03] Mark EJ Newman. The structure and function of complex


networks. SIAM review, 45(2):167–256, 2003.

[UKBM11] Johan Ugander, Brian Karrer, Lars Backstrom, and Cameron


Marlow. The anatomy of the facebook social graph. arXiv
preprint arXiv:1111.4503, 2011.

[VCP+ 11] Lav R Varshney, Beth L Chen, Eric Paniagua, David H


Hall, and Dmitri B Chklovskii. Structural properties of the
caenorhabditis elegans neuronal network. PLoS computa-
tional biology, 7(2):e1001066, 2011.

[VDH09] Remco Van Der Hofstad. Random graphs and com-


plex networks. Available on http://www. win. tue.
nl/rhofstad/NotesRGCN. pdf, 11:60, 2009.

[vdHHVM05] Remco van der Hofstad, Gerard Hooghiemstra, and Piet


Van Mieghem. Distances in random graphs with finite vari-
ance degrees. Random Structures & Algorithms, 27(1):76–
123, 2005.

[WS98] Duncan J Watts and Steven H Strogatz. Collective dynamics


of ‘small-world’networks. nature, 393(6684):440–442,
1998.

You might also like