Large Networks and Graph Limits
Large Networks and Graph Limits
László Lovász
Institute of Mathematics, Eötvös Loránd University, Budapest,
Hungary
2010 Mathematics Subject Classification. Primary 05C99, Secondary 05C25,
05C35, 05C80, 05C82, 05C85, 90B15
Key words and phrases. graph homomorphism, graph algebra, graph limit,
graphon, graphing, property testing, regularity lemma
To Kati
as all my books
Contents
Preface xi
Bibliography 451
Author Index 465
xi
xii PREFACE
Another important connection that was soon discovered was the theory of prop-
erty testing in computer science, initiated by Goldreich, Goldwasser and Ron sev-
eral years earlier. This can be viewed as statistics done on graphs rather than on
numbers, and probability and statistics became a major tool for us.
One of the most important application areas of these results is extremal graph
theory. A fundamental tool in the extremal theory of dense graphs is Szemerédi’s
Regularity Lemma, and this lemma turned out to be crucial for us as well. Graph
limit theory, we hope, repaid some of this debt, by providing the shortest and
most general formulation of the Regularity Lemma (“compactness of the graphon
space”). Perhaps the most exciting consequence of the new theory is that it allows
the precise formulation of, and often the exact answer to, some very general ques-
tions concerning algorithms on large graphs and extremal graph theory. Indepen-
dently and about the same time as we did, Razborov developed the closely related
theory of flag algebras, which has lead to the solution of several long-standing open
problems in extremal graph theory.
Speaking about limits means, of course, analysis, and for some of us graph the-
orists, it meant hard work learning the necessary analytical tools (mostly measure
theory and functional analysis, but even a bit of differential equations). Involving
analysis has advantages even for some of the results that can be stated and proved
purely graph-theoretically: many definitions and proofs are shorter, more trans-
parent in the analytic language. Of course, combinatorial difficulties don’t just
disappear: sometimes they are replaced by analytic difficulties. Several of these
are of a technical nature: Are the sets we consider Lebesgue/Borel measurable? In
a definition involving an infimum, is it attained? Often this is not really relevant
for the development of the theory. Quite often, on the other hand, measurability
carries combinatorial meaning, which makes this relationship truly exciting.
There were some interesting connections with algebra too. Balázs Szegedy
solved a problem that arose as a dual to the characterization of homomorphism
functions, and through his proof he established, among others, a deep connection
with the representation theory of algebras. This connection was later further de-
veloped by Schrijver and others. Another one of these generalizations has lead to
a combinatorial theory of categories, which, apart from some sporadic results, has
not been studied before. The limit theory of bounded degree graphs also found very
strong connections to algebra: finitely generated infinite groups yield, through their
Cayley graphs, infinite bounded degree graphs, and representing these as limits of
finite graphs has been studied in group theory (under the name of sofic groups)
earlier.
These connections with very different parts of mathematics made it quite diffi-
cult to write this book in a readable form. One way out could have been to focus on
graph theory, not to talk about issues whose motivation comes from outside graph
theory, and sketch or omit proofs that rely on substantial mathematical tools from
other parts. I felt that such an approach would hide what I found the most exciting
feature of this theory, namely its rich connections with other parts of mathematics
(classical and non-classical). So I decided to explain as many of these connections
as I could fit in the book; the reader will probably skip several parts if he/she does
not like them or does not have the appropriate background, but perhaps the flavor
of these parts can be remembered.
PREFACE xiii
The book has five main parts. First, an informal introduction to the math-
ematical challenges provided by large networks. We ask the “general questions”
mentioned above, and try to give an informal answer, using relatively elementary
mathematics, and motivating the need for those more advanced methods that are
developed in the rest of the book.
The second part contains an algebraic treatment of homomorphism functions
and other graph parameters. The two main algebraic constructions (connection
matrices and graph algebras) will play an important role later as well, but they
also shed some light on the seemingly completely heterogeneous set of “graph pa-
rameters”.
In the third part, which is the longest and perhaps most complete within its
own scope, the theory of convergent sequences of dense graphs is developed, and
applications to extremal graph theory and graph algorithms are given.
The fourth part contains an analogous theory of convergent sequences of graphs
with bounded degree. This theory is more difficult and less well developed than
the dense case, but it has even more important applications, not only because
most networks arising in real life applications have low density, but also because
of connections with the theory of finitely generated groups. Research on this topic
has been perhaps the most active during the last months of my work, so the topic
was a “moving target”, and it was here where I had the hardest time drawing the
line where to stop with understanding and explaining new results.
The fifth part deals with extensions. One could try to develop a limit theory
for almost any kind of finite structures. Making a somewhat arbitrary selection,
we only discuss extensions to edge-coloring models and categories, and say a few
words about hypergraphs, to much less depth than graphs are discussed in parts
III and IV.
I included an Appendix about several diverse topics that are standard mathe-
matics, but due to the broad nature of the connections of this material in mathe-
matics, few readers would be familiar with all of them.
One of the factors that contributed to the (perhaps too large) size of this book
was that I tried to work out many examples of graph parameters, graph sequences,
limit objects, etc. Some of these may be trivial for some of the readers, others may
be tough, depending on one’s background. Since this is the first monograph on the
subject, I felt that such examples would help the reader to digest this quite diverse
material.
In addition, I included quite a few exercises. It is a good trick to squeeze a
lot of material into a book through this, but (honestly) I did try to find exercises
about which I expected that, say, a graduate student of mathematics could solve
them with not too much effort.
No. 227701, OTKA grant No. CNK 77780 and the hospitality of the Institute for
Advanced Study in Princeton while writing most of this book.
My wife Kati Vesztergombi has not only contributed to the content, but has
provided invaluable professional, technical and personal help all the time.
Many other colleagues have very unselfishly offered their expertise and advice
during various phases of our research and while writing this book. I am particularly
grateful to Miklós Abért, Noga Alon, Endre Csóka, Gábor Elek, Guus Regts, Svante
Janson, Dávid Kunszenti-Kovács, Gábor Lippner, Russell Lyons, Jarik Nešetřil,
Yuval Peres, Oleg Pikhurko, the late Oded Schramm, Miki Simonovits, Vera Sós,
Kevin Walker, and Dominic Welsh. Without their interest, encouragement and
help, I would not have been able to finish my work.
Part 1
the exact time they will need to perform some computation, are difficult to
determine from their design, due to their huge size.
• To be pretentious, we can say that the whole universe is a single (really huge,
possibly infinite) network, where the nodes are events (interactions between
elementary particles), and the edges are the particles themselves. This is a
network with perhaps 1080 nodes. It is an ongoing debate in physics how
much additional structure the universe has, but perhaps understanding the
graph-theoretical structure of this graph can help with understanding the global
structure of the universe.
These huge networks pose exciting challenges for the mathematician. Graph
Theory (the mathematical theory of networks) has been one of the fastest develop-
ing areas of mathematics in the last decades; with the appearance of the Internet,
however, it faces fairly novel, unconventional problems. In traditional graph theo-
retical problems the whole graph is exactly given, and we are looking for relation-
ships between its parameters or efficient algorithms for computing its parameters.
On the other hand, very large networks (like the Internet) are never completely
known, in most cases they are not even well defined. Data about them can be
collected only by indirect means like random local sampling or by monitoring the
behavior of various global processes.
Dense networks (in which a node is adjacent to a positive percent of other nodes)
and very sparse networks (in which a node has a bounded number of neighbors)
show a very different behavior. From a practical point of view, sparse networks are
more important, but at present we have more complete theoretical results for dense
networks. In this introduction, most of the discussion will focus on dense graphs;
we will survey the additional challenges posed by sparse networks in Section 1.7.
internet “disconnected” if, say, an earthquake combined with a sunflare severs all
connections between the Old and New worlds. So we want to ignore small com-
ponents that are negligible in comparison with the whole graph, and consider the
graph “disconnected” only if it decomposes into two parts which are commeasurable
with the whole. On the other hand, we may want to allow that the two large parts
be connected by a very few edges, and still consider the graph “disconnected”.
Question 4. Where is the largest cut in the graph?
(This means to find the partition of the nodes into two classes so as to maximize
the number of edges connecting the two classes.) This example shows that even if
the question is meaningful, it is not clear in what form can we expect the answer.
We can ask for the fraction of edges contained in the largest cut (depending on the
model, this can be determined relatively easily, with an error that is small with
high probability, although it is not easy to prove that the algorithm works). But
suppose we want to “compute” the largest cut itself; how to return the result, i.e.,
how to specify the largest cut (or even an approximate version of it)? We cannot
just list all nodes and tell on which side do they belong: this would be too much
time and memory space. Is there a better way to answer the question?
Figure 1.1. Sampling from a dense graph and from a graph with
bounded degree.
• Homomorphism numbers are better behaved algebraically, and they have been
used before to study various algebraic questions concerning direct product of
graphs, like cancellation laws (see Section 5.4.2). Furthermore, a lot is known
about other issues concerning homomorphisms: existence, structure, etc.
• When looking at a (large) graph G, we may try to study its local structure by
counting homomorphisms from various “small” graphs F into G; we can also
study its global structure by counting homomorphisms from G into various
small graphs H. The first type of information is closely related (in many cases,
equivalent) to sampling, while the second is related to global observables. This
way homomorphisms are pointing at a certain duality between sampling and
global observation. We can sum up our framework for studying large graphs in
the following formula:
F −→ G −→ H.
We will informally talk about “left-homomorphisms” and “right-
homomorphisms” to refer to these two kind of mappings.
• We will characterize which distributions come from sampling k nodes from a
(large) graph G, and we will characterize homomorphism densities as well. It
turns out that a characterization of sample distributions is simpler and more
natural, but putting it in another way, the characterization of homomorphism
densities is more surprising, and therefore has more interesting applications.
others: in this case, the third would be as easy as the verification of the first, but
the second and fourth would be quite difficult: How would you count the number
of copies of, say, the Petersen graph in a Paley graph? How would you count the
number of those differences that are quadratic residues between, say, square-free
integers in {0, . . . , q − 1}? When posed directly, these questions sound formidable;
but the equivalence of the above conditions provides answers to them.
We should emphasize that in this setting, quasirandomness is a property of a
sequence of graphs, not of a single graph. Of course, one could introduce a measure
of deviation from the “ideal” quasirandomness in each of the conditions (QR1)–
(QR5), and prove explicit relationships between them. Since our interest is the
limit theory, we will not go in this direction.
Sometimes we need to consider quasirandom bipartite graphs, which can be
defined, mutatis mutandis, by any of the properties above. More generally, just
as there are multitype random graphs, there are also multitype quasirandom graph
sequences. Similarly as for random graphs, a multitype quasirandom graph sequence
(Gn ) is defined by a “template” weighted graph H on q nodes, with a nodeweights
αi > 0 and edgeweights βij . The sequence is multitype quasirandom with template
H, if the node set V (Gn ) can be partitioned into q sets V1 , . . . , Vq such that |Vi | ∼
αi v(Gn ), the subgraphs Gn [Vi ] induced by Vi form a quasirandom sequence for
every i ∈ [q], and the bipartite subgraphs Gn [Vi , Vj ] between Vi and Vj form a
quasirandom bipartite graph sequence for each pair i ̸= j ∈ [q].
The same remark applies as for multitype random graphs: they play an ex-
tremely important role by serving as simple objects approximating arbitrarily large
graphs. The equivalence of conditions (Q1)–(Q5) can be generalized appropriately
(with a larger, but finite set of graphs in (Q3) instead of just 2), as it will be
discussed in Section 16.7.1.
The main topic of the book, the theory of convergent graph sequences, can
be considered as a further, rather far-reaching generalization of quasirandom se-
quences.
1.4.3. Randomly growing graphs. Random graph models on a fixed set of
nodes, discussed above, fail to reproduce important properties of real-life networks.
For example, the degrees of Erdős–Rényi random graphs follow a binomial distribu-
tion, and so they are asymptotically normal if the edge probability p is a constant,
and asymptotically Poisson if the expected degree is constant (i.e., p = p(n) ∼ c/n).
In either case, the degrees are highly concentrated around the mean, while the de-
grees of real life networks tend to obey the “Zipf phenomenon”, which means that
the tail of the distribution decreases according to a power law (unlike the most
familiar distributions like Gaussian, geometric or Poisson, whose tail probability
drops exponentially; Figure 1.2).
In 1999 Albert and Barabási [1999, 2002, 2002] created a new random network
model. Perhaps the main new feature compared with the Erdős–Rényi graph evo-
lution model is that not only edges, but also nodes are added by natural rules of
growing. When a new node is added, it connects itself to a given number d of old
nodes, where each neighbor is selected randomly, with probability proportional to
its degree. (This random selection is called preferential attachment.) The Albert–
Barabási graphs reproduce the “heavy tail” behavior of the degree sequences of
real-life graphs. Since then a great variety of growing networks were introduced,
reproducing this and other empirical properties of real-life networks.
1.5. HOW TO APPROXIMATE THEM? 11
This is perhaps the first point which suggests one of our main tools, namely
assigning limits to sequences of graphs. Just as the Law of Large Numbers tells us
that adding up more and more independent random variables we get an increasingly
deterministically behaving number, these growing graph sequences tend to have
a well-defined structure, for almost all of the possible random choices along the
way. In the limit, the randomness disappears, and the asymptotic behavior of the
sequence can be described by a well-defined limit object. We will return to this in
this Introduction in Sections 1.5.3 and 11.3.
Exercise 1.2. Prove that the sequence of Paley graphs is quasirandom.
1.5.1. The distance of two graphs. There are many ways of defining the
distance of two graphs G and G′ . Suppose that the two graphs have a common
node set [n]. Then a natural notion of distance is the edit distance, defined as the
number of edges to be changed to get from one graph to the other. This could also
be viewed as the Hamming distance |E(G)△E(G′ )| of the edge sets (△ denotes
symmetric difference). Since our graphs are very large, we want to normalize this.
If the graphs are dense, then a natural normalization is
|E(G)△E(G′ )|
d1 (G, G′ ) = .
n2
While this distance plays an important role in the study of testable graph properties,
it does not reflect structural similarity well. To raise one objection, consider two
12 1. VERY LARGE NETWORKS
random graphs on [n] with edge density 1/2. As mentioned in the introduction,
these graphs are very similar from almost every aspect, but their normalized edit
distance is large (about 1/2 with high probability). One might try to decrease this
by relabeling one of them to get the best overlay minimizing the edit distance; but
the improvement would be marginal (tending to 0 if n tends to infinity).
Another trouble with the notion of edit distance is that it is defined only when
the two graphs have the same set of nodes. We want to define a notion of distance
for two graphs that are so large that we don’t even know the number of their nodes,
and these numbers might be very different. For example, we want to find that two
large random graphs are “close” even if they have a different number of nodes.
One useful way to overcome these difficulties is to base the measurement of
distance on sampling. Recall that for a graph G, σG,k is the probability distribution
on graphs on [k] = {1, 2, . . . , k} obtained by selecting a random ordered k-subset
of nodes and taking the subgraph induced by them. Strictly speaking, this is only
defined when k ≤ v(G); but we are interested in taking a small sample from a large
graph, not the other way around. To make the definition precise, let us say that
the sampling returns the edgeless k-node graph if k > v(G). (In this case it would
be a better solution to sample with repetition, but sampling without repetition is
better in other cases, so let us stick to it.)
Now if we have two graphs G and G′ , we can compare the distributions of k-
node samples for any fixed k. We use the variation distance between distributions
α and β on the same set, defined by
dvar (α, β) = sup |α(X) − β(X)|,
X
where the supremum is taken over all measurable subsets (observable events). If
we want to measure the distance of two graphs by a single number, we use a simple
trick known from analysis: We define the sampling distance of two dense graphs G
and G′ by
∞
∑
′ 1
(1.2) δsamp (G, G ) = dvar (σG,k , σG′ ,k )
2k
k=1
k
(Here the coefficients 1/2 are quite arbitrary, they are there only to make the sum
convergent; but the above is a convenient choice.) This distance notion is very
suitable for our general goals, since two graphs are close in this distance if and
only if random sampling of “small” induced subgraphs does not distinguish them
reliably. However, sampling distance has one drawback: it does not directly reflect
any structural similarity.
In Chapter 8 we will define a notion of distance, called cut distance, between
graphs, which will be satisfactory from all these points of view: it will be defined
for two graphs with possibly different number of nodes, the distance of two random
graphs with the same edge density will be very small, and it will reflect global
structural similarity. The definition involves too many technical details to be given
here, unfortunately. But it will turn out (and this is one of the main results in this
book) that the cut distance is equivalent to the sampling distance in a topological
sense.
0 1 0 0 1 1 0 0 0 0
1 0 1 0 0 0 1 0 0 0
0 1 0 1 0 0 0 1 0 0
0 0 1 0 1 0 0 0 1 0
1 0 0 1 0 0 0 0 0 1
1 0 0 0 0 0 0 1 1 0
0 1 0 0 0 0 0 0 1 1
0 0 1 0 0 1 0 0 0 1
0 0 0 1 0 1 1 0 0 0
0 0 0 0 1 0 1 1 0 0
Figure 1.3. The Petersen graph, its adjacency matrix, and its
pixel picture
It is not clear that this pixel picture reveals more about small graphs than
the usual way of drawing them (probably less), but it can be suggestive for large
graphs. Figure 1.4 shows the usual drawing and the pixel picture of a half-graph,
a bipartite graph defined on the set {1, . . . , n, 1′ , . . . , n′ }, where the edges are the
pairs (i, j) with i ≤ j ′ . For large n, the pixel picture of a half-graph may be more
informative, as we will see in the next section.
The left square in Figure 1.5 is the pixel picture of a (reasonably large) random
graph. We don’t see much structure—and we shouldn’t. From a distance, this
picture is more-or-less uniformly grey, similar to the second square. The 100 × 100
chessboard in the third picture is also uniformly grey, or at least it would become so
if we increased the number of pixels sufficiently. One might think that it represents
14 1. VERY LARGE NETWORKS
a graph that is close to the random graph. But rearranging the rows and columns
so that odd indexed columns come first, we get the 2 × 2 chessboard on the right!
So wee see that both the middle and the right side pictures represent a complete
bipartite graph. The pixel picture of a graph depends on the ordering of the nodes.
We can be reassured, however, that a random graph remains random, no matter
how we order the nodes, and so the picture on the left remains uniformly grey, no
matter how the nodes are ordered.
Figure 1.5. A random graph with 100 nodes and edge density
1/2, a random graph with very many nodes and edge density 1/2,
a chessboard, and the pixel picture obtained by rearranging the
rows and columns.
is quite random-like, and further rearrangement would not reveal any additional
structure.
Still informally, the Regularity Lemma says the following:
The nodes of every graph can be partitioned into a “small” number of “almost
equal” parts in such a way that for “almost all” pairs of partition classes, the
bipartite graph between them is “quasirandom”.
Some of the expressions in quotation marks are easy to explain. For the whole
theorem, we have an error bound 0 < ε < 1 specified in advance. The condition
that the parts are “almost equal” means that their sizes differ by at most one:
if the graph has n nodes partitioned into k classes, then the size of each class is
either ⌊n/k⌋ or ⌈n/k⌉. The condition that the number of classes is “small” means
that it can be bounded by an explicit function f (ε) of ε; to exclude trivialities,
we( )also assume that k ≥ 1/ε. “Almost all” pairs of classes means that we allow
ε k2 exceptional pairs about which we don’t claim anything (we can include the
subgraphs induced by the classes among these exceptions). Finally, we need to
define what it means to be “random-like”: one way to put it is that this bipartite
graph is quasirandom with some density pij (which may be different for different
pairs of classes) and with error ε, in the sense introduced (informally) in Section
1.4.2.
Regularity partitions and quasirandomness have a lot to do with each other.
Not only is quasirandomness part of the statement of the Regularity Lemma, but
the regularity lemma can be used to characterize quasirandomness: Simonovits and
Sós [1991] proved that a graph sequence is quasirandom with density p if and only
if the graphs have regularity partitions for arbitrarily small ε > 0 such that the
densities pij between the partition classes tend to p.
I have to come back to the “small” number of partition classes. The proof gives
2...
a tower 22 of height 1/ε5 , which is a very large number, and which unfortunately
cannot be improved too much, since Gowers [1997] constructed graphs for which
the smallest number of classes in a Szemerédi partition was at least a tower of
height log(1/ε). So the tower behavior is a sad fact of life. There are related
partitions with a more decent number of classes, as we shall see in Chapter 9,
where regularity partitions will be defined formally. We will also discuss situations
when the regularity partitions have a very decent size, like 1/εconst (Sections 13.4
and 16.7). Implicitly or explicitly, regularity partitions will be used throughout this
book.
16 1. VERY LARGE NETWORKS
identically 1/2 function (have a look at the two squares on the left of Figure 1.5).
Figure 1.7 illustrates that the sequence of half-graphs (discussed in Section 1.5.2)
converges to a limit (the function W (x, y) = 1(y ≥ x + 1/2 or x ≥ y + 1/2). It
has been observed and used before (see e.g. Sidorenko [1991]) that such functions
can be used as generalizations of graphs, and this gives certain arguments a greater
analytic flexibility.
Figure 1.7. A half-graph, its pixel picture, and the limit function
Let us describe another example here (more to follow in Section 11.4.2). The
picture on the left side of Figure 1.8 is the adjacency matrix of a graph G with 100
nodes, where the 1’s are represented by black squares and the 0’s, by white squares.
The graph itself is constructed by a simple randomized growing rule: Starting with
a single node, we create a new node, and connect every pair of nonadjacent nodes
with probability 1/n, where n is the current number of nodes. (This construction
will be discussed in detail in Section 11.4.2.)
The picture on the right side is a grayscale image of the function U (x, y) =
1 − max(x, y). (Recall that the origin is in the upper left corner!) The similarity
with the picture on the left is apparent, and suggests that the limit of the graph
sequence on the left is this function. This turns out to be the case in a well defined
sense. It follows that to approximately compute various parameters of the graph
on the left side, we can compute related parameters of the function on the right
side. For example, the triangle density of the graph on the left tends (as n → ∞)
to the integral
∫
(1.3) U (x, y)U (y, z)U (z, x) dx dy dz
[0,1]3
18 1. VERY LARGE NETWORKS
(the evaluation of this integral is a boring but easy task). It is easy to see how to
generalize this formula to express the limiting density of any fixed graph F .
We hope that the examples above provide motivation for the following fact,
which is one of the key results to be discussed in the book (Theorem 11.21):
The limit of any convergent graph sequence can be represented by a graphon,
in the sense that the limiting density of any fixed simple graph F is given by an
integral of the type (1.3).
Of course, a graphon can be infinitely complicated; but in many cases, limits
of growing graph sequences have a limit graphon that is a continuous function
described by a simple formula (see some further examples in Section 11.4.2). Such
a limit graphon provides a very useful approximation of a large dense graph.
Graphons can be considered as generalizations of graphs, and this way of look-
ing at them is very fruitful. In fact, many results can be stated and proved for
graphons in a more natural and cleaner way. In particular, regularity lemmas can
be extended to graphons, where we will see that they are statements about approx-
imating general measurable functions by stepfunctions. Approximating graphs by
multitype quasirandom graphs is as basic a tool in graph theory as approximating
functions by stepfunctions is in analysis.
Remark 1.4. Much of this book is about finite, countable and uncountable graphs
and connections between them. There are two technical limitations of measure
theory that we have to work our way around. (a) One cannot construct more than
countably many independent random variables (in a nontrivial way, neither of them
concentrated on a single value). This is the reason while we cannot define a random
graph on an uncountable set like [0, 1], only on finite and countable subsets of it. (b)
There is no uniform distribution on a countable set (while there is one on every finite
set and then again on sets with continuum cardinality like [0, 1]). This limitation
is connected to the fact that the limit objects for convergent graph sequences will
be graphons (which could be considered as graphs defined on a continuum) rather
than graphs on a countable set as one would first expect.
I want to emphasize that these difficulties are not just annoying technicalities:
they reflect the fact, for example, that the limit object of a convergence graph se-
quence carries a lot more information than what could be squeezed into a countable
graph. Both measure theory and combinatorics force us into the same realm.
and count how many of them form triangles in the graph. Elementary statistics
tells us that if we sample O(ε−2 | log ε|) triples, then with probability at least 1 − ε,
our estimate will be closer than ε to the truth.
A much more interesting and difficult example( ) is that of estimating the density
a of the maximum cut (its size divided by n2 ) in a graph G. One thing we can
try is to choose N random nodes (where N depends on the error bound ε), and
compute the density X of the maximum cut in the subgraph H they induce. Is X
a good estimate for a?
The inequality X ≥ a − ε (for every ε > 0 if N is large enough, with high
probability) is relatively easy to prove. The graph G has a cut C with density a,
and this cut provides a cut C ′ in the random induced subgraph H. It is easy to
see that the density of C ′ is the same as the density a of C in expectation, and
it takes some routine computation in probability theory to show that it is highly
concentrated around this value. The density X of the largest cut in H is at least
the density of C ′ , and so with high probability it is at least a − ε (Figure 1.9).
Figure 1.9. A dense cut in the large graph gives a dense cut in the sample.
The reverse inequality is much more difficult to prove, at least from scratch,
and in fact it is rather surprising. We can phrase the question like this: Suppose
that most random induced subgraphs H on N nodes have a cut that is denser than
b. Does it follow that G has a cut that is denser than b − ε? It is not clear why this
should be so: why should these cuts in these small subgraphs “line up” to give a
dense cut in G?
We will see that it does follow that the estimate is correct, once N is large
enough (about ε−4 | log ε|). In fact, one can give general necessary and sufficient
conditions under which parameters can be estimated by sampling, as we will see in
Section 15.1.
1.6.2. Property testing. A more complicated issue is property testing: we
want to determine whether the graph has some given property, for example, can
it be decomposed into two connected components of equal size, is it planar, or
does it contain any triangle. We could consider this as a 0-1 valued parameter, but
computing this parameter approximately would not make sense (or rather, it would
be requiring too much, since this would be equivalent to exact computation).
A good way of posing this problem was developed by Rubinfeld and Sudan
[1996] and Goldreich, Goldwasser and Ron [1998]. In the slightly different context of
“additive approximation”, closely related problems were studied by Arora, Karger
20 1. VERY LARGE NETWORKS
and Karpinski [1995] (see e.g. Fischer [2001] for a survey and the volume edited by
Goldreich [2010] for a collection of more recent surveys).
This approach acknowledges that any answer is only approximate. Suppose that
we want to test for a property P, and we get information about the graph by taking
a bounded size random sample of the nodes, and inspecting the subgraph induced
by them. We interpret the answer of the algorithm as follows: If it concludes that
the graph has property P, this means that we can change εn2 edges so that we get
a graph with property P; if it concludes that the graph does not have property P,
this means that we can change εn2 edges so that we get a graph without property
P.
Again, we have to specify an error parameter ε > 0 in advance, and will have to
accept an answer which may be wrong with probability ε, and even if it is “right”, it
only means that we can change εn2 edges in the graph so that the answer becomes
correct.
Sometimes we can do better and eliminate either false positives or false neg-
atives. As an example, let us try to test whether a given (dense) graph contains
a triangle. We take a sample of size f (ε) (the best function f which is known to
work is outrageously large, but let’s not worry about this), and check whether they
contain a triangle. If they do, then we know that the graph has a triangle. If they
don’t, then one can prove (see Section 15.3) that with high probability, we can
delete εn2 edges from the graph so that no triangle remains.
Remark 1.5. We will not be concerned with the sample size as a function of the
error bound ε. Sometimes it is polynomial (as in the examples above), but in other
cases one uses the Regularity Lemma, which forces tower-size samples, making the
algorithms of theoretical interest only. Goldreich [2010], in his survey of property
testing, emphasizes the importance of testing with samples of manageable size, and
I could not agree more; but this book, being about limit theory, does not address
this issue.
Another caveat: Many extensions deal with testing models where we are allowed
to sample more than a constant number of nodes of the large graph G. For this,
we have to take the number of nodes into account, but usually it is enough to know
the order of magnitude of the number of nodes, which in practical situations is easy
to do. We do not discuss these important methods in our book.
1.6.3. Computation of a structure. Perhaps the most complex algorithmic
task is the computation of a structure, where the structure is of size comparable
with the graph itself. For example, we want to find a perfect matching in the
graph, or a maximum cut (not just its density, but the cut itself), or a regularity
partition in a huge dense graph. The conceptual difficulty is that the output of
the algorithm is too large to be explicitly produced. What we can do is to carry
out some preprocessing whose result can be stored (e.g., label a bounded number
of nodes or edges), and give an algorithm which, for given input node or nodes,
determines the local part of the structure we are looking for. Usually, this algorithm
returns the “status” of a node or edge in the output structure (for example, whether
the given edge belongs to matching, or which side of the cut the given node belongs
to).
As an example, we will describe in Section 15.4.3 how to compute a maximum
cut. We can access the graph by taking a bounded size sample of the nodes,
and inspect the subgraph induced by them. For a given ε > 0, we precompute a
1.6. HOW TO RUN ALGORITHMS ON THEM? 21
“representative set” (see next section) together with a bipartition of this set. In
addition, we describe a “Placing Algorithm” which has an arbitrary node v as its
input, and tells us on which side of the cut v is located. This Placing Algorithm
can be called any number of times with different nodes v, and the answers it gives
should be consistent with an approximately maximum cut. For example, calling
this algorithm many times, we can estimate the density of the maximum cut (but
this can be done in an easier way, as we have seen).
The parameter ε is an error bound: the cut computed may be off the true
maximum cut by εn2 edges, the precomputation may be wrong with probability at
most ε, and for each query, the answer may be in error with probability at most ε.
cannot compute all the Voronoi cells; but if we want to know which class (cell) does
a given node belong to, all we need to do is to compute its distance to the nodes
in R.
Sampling. In the case of graphs with bounded degree, the subgraph sampling
method gives a trivial result: the sampled subgraph will almost certainly be edge-
less. Probably the most natural way to fix this is to consider neighborhood sampling
(Figure 1.1). Let GD denote the class of finite graphs with all degrees bounded by
D. For G ∈ GD , select a random node and explore its neighborhood to a given
depth r. This provides a probability distribution ρG,r on graphs in GD , with a
specified root node, such that all nodes are at distance at most r from the root.
We will briefly refer to these rooted graphs as r-balls. Note that the number of
possible r-balls is finite if D and r are fixed.
The situation for bounded degree graphs is, however, less satisfactory than for
dense graphs, for two reasons. First, a full characterization of what distributions
of r-balls the neighborhood sampling procedure can result in is not known (cf.
Conjecture 19.8). Second, neighborhood sampling misses some important global
properties of the graph, like expansion. In Section 19.2 we will introduce a notion
of convergence, called local-global, which is better from this point of view, but it is
not based on any implementable sampling procedure.
This suggests looking at further possibilities. Suppose, for example, that in-
stead of exploring the neighborhood of a single random node, we could select two
(or more) random nodes and determine simple quantities associated with pairs of
nodes, like pairwise distances, maximum flow, electrical resistance, hitting times
of random walks (studies of this nature have been performed, for example, on the
internet, see e.g. Kallus, Hága, Mátray, Vattay and Laki [2011]). What information
can be gained by such tests? Is there a “complete” set of tests that would give
enough information to determine the global structure of the graph to a reasonable
accuracy? Such questions could lead to different theories of large graphs and their
limit objects; at this time, however, they are unexplored.
Remark 1.6. It is interesting to note that our two sampling methods correspond
to the two basic data structures for graph algorithms, adjacency matrix and neigh-
borhood lists. To be more specific, both methods assume that we can choose a
uniformly distributed random node, and repeat this a constant number of times.
In subgraph sampling, we must be able to determine whether two given nodes are
adjacent or not. For a graph that is explicitly given, this is easy if the graph is given
by its adjacency matrix. For neighborhood sampling, we have to be able to find all
neighbors of a given node. This is easy if the graph is given by neighborhood lists.
It would be very time consuming to perform these sampling operations on a graph
given by the wrong data structure.
1.7. BOUNDED DEGREE GRAPHS 23
Sampling distance. The construction of the sampling distance can be carried over
to graphs with bounded degree, by replacing in (1.2) the sampling distributions σG,k
by the neighborhood distributions ρG,k . We must point out, however, that it seems
to be difficult to define a notion of distance between two graphs with bounded
degree (in analogy with the cut distance) that would reflect global similarity.
Regularity Lemma. This is one of the big unsolved problems for graphs with
bounded degree. If we consider regularity lemmas as providing “approximation by
the smaller”, then there is a simple non-constructive result (Proposition 19.10),
which should be proved in a constructive way to be really useful. One can start at
many other facets of the Regularity Lemma, but a satisfactory version of bounded
degree graphs has turned out most elusive.
Limit objects. For bounded degree graphs, Benjamini and Schramm provide a
notion of a limit object (see Section 18). The Benjamini–Schramm limit object can
be described as a distribution on rooted countable graphs with a special property
called “involution invariance”.
Another way of describing a limit object is a “graphing”. In a sense, this latter
object is what we expect: a bounded degree graph on an infinite (typically un-
countable) set, with appropriate measurability and measure preserving conditions.
This construction was folklore in an informal way; the first exact statements were
published by Aldous and Lyons [2007] and Elek [2007a].
Graphings were invented by group theorists. The idea is to consider a finitely
generated group acting on a probability space (for example, rotations of a circle by
integer multiples of a given angle). One can construct a graph on the underlying
space, by connecting each point to its images under the generators of the group.
This construction gives a graph with bounded degree (the set of points is typically
of continuum cardinality). It is a beautiful fact that
graphings, representing groups this way, are just right to describe the limit
objects of convergent graph sequences with bounded degree.
Depending on personal taste, a graphing may be considered more complicated
or less complicated than an involution-invariant random countable rooted graph.
But graphings have an important advantage: they can express a richer structure,
the limits of graph sequences convergent in the local-global sense.
24 1. VERY LARGE NETWORKS
Algorithms. Here is finally an area where the study of bounded degree graphs
can be considered at least as advanced as the study of dense graphs. Let us discuss
the task of computing a structure.
Selecting random nodes and exploring their neighborhoods, we see (with high
probability) disjoint parts of the graph, and so there is no method to build up a
global structure. Still, very nontrivial algorithms can be designed in this model.
For example, in Section 22.3.1 we describe an algorithm due to Nguyen and Onak
[2008], that constructs an almost maximum matching. The way the output can be
described is similar to how the output of a maximum cut algorithm was described
in the dense setting: for any node we can tell which other node it is matched
to, inspecting a bounded neighborhood only; these assignments will be consistent
throughout the graph; and the difference in size from the true maximum matching
is only εn, where ε > 0 is an error bound and n is the number of nodes.
There is an equivalent way to describe such algorithms, which may be easier
to follow, and this is the model of distributed computing (going back to the 1980’s).
In this case, an agent (or processor) is sitting at each node of the graph, and they
cooperate in exploring various properties of it. They can only communicate along
the edges. In the case we are interested in (which is in a sense extreme), they are
restricted to exchange a bounded number of bits (where the bound may depend on
the degree D, on an error bound ε, and of course on the task they are performing,
but not on the number of nodes). In some other versions of the model (cellular
automata), the amount of communication is not restricted, but the computing
power of the agents is. Note that in our model communication between the agents
is restricted to a bounded number of bits, and hence they may be assumed to be
very stupid, even finite automata.
There is a large literature on distributed computing, both from the practical
and theoretical aspect. We will not be able to cover this; we will restrict ourselves
to the discussion of the strong connection of this computation model with our
approach to large graphs and graph limits.
CHAPTER 2
The algorithmic treatment of very large networks is not the only area where the
notions of very large graphs and their limits can be applied successfully. Many of
the problems and methods in graph limit theory come from extremal graph theory
or from statistical physics. Let us give s very brief introduction to these theories.
2.1.1. Edges vs. triangles. Perhaps the first result in extremal graph theory
was found by Mantel [1907]. This says that if a graph on n nodes has more than
n2 /4 edges, then it contains a triangle. Another way of saying this is that if we
want to squeeze in the largest number of edges without creating a triangle, then we
should split the nodes into two equal classes (if n is odd, then their sizes differ by
1) and insert all edges between the two classes. As another early example, Erdős
[1938] proved a bound on the number of edges in a C4 -free bipartite graph (see
(2.9) below), as a lemma in a paper about number theory.
Mantel’s result is a special case of Turán’s Theorem [1941], which is often
considered as the work that started the systematic development of extremal graph
theory. Turán solved the generalization of Mantel’s problem for any complete graph
in place of the triangle. We define the Turán graph T (n, r) (1 ≤ r ≤ n) as follows:
we partition [n] into r classes as equitably as possible, and connect two nodes if and
only if they belong to different classes. Since we are interested in large n and fixed
r, the complication that the classes cannot be exactly equal in size (which causes
the formula for the number of edges of T (n, r) to be a bit ugly) should not worry
us. It will be enough to know that the number of edges in a Turán graph is
( )( )
( ) r n 2
e T (n, r) ∼ ,
2 r
and in terms of( the homomorphism
) densities defined in the previous chapter in
(1.1), we (have t K2 , T)(n, r) ∼ 1 − 1r . For the triangle density we have the similar
formula t K3 , T (n, r) ∼ (1 − 1r )(1 − 2r ).
25
26 2. LARGE GRAPHS IN MATHEMATICS AND PHYSICS
Figure 2.1. (a) The closure D2,3 of the set of pairs of edge density
and triangle density. (b) Goodman’s bound. (c) Bollobás’s’ bound.
The picture is a little distorted in order to show its special features
better.
Some features of this picture are easy to explain. The lower edge means that
there are triangle-free graphs with edge density up to 1/2, and the Mantel–Turán
Theorem says that for larger edge density, the triangle density must be positive. A
lower bound for the triangle density was proved by Goodman [1959],
(2.2) t(K3 , G) ≥ t(K2 , G)(2t(K2 , G) − 1),
which corresponds to the parabola shown in 2.1(b).
The upper boundary curve turns out to be given by the equation y = x3/2 ,
which is a very special case of the Kruskal–Katona Theorem in extremal hypergraph
theory (the full theorem gives the precise value, not just asymptotics, and concerns
uniform hypergraphs, not just graphs). In other words, this says that
(2.3) t(K3 , G) ≤ t(K2 , G)3/2 .
2.1. EXTREMAL GRAPH THEORY 27
Both (2.2) and (2.3) are sharp in a sense: Goodman’s Theorem is sharp if the
edge density is of the form 1/2, 2/3, 3/4, . . . (Turán graphs give equality). In this
form of the Kruskal–Katona Theorem equality is not attained except at the points
(0, 0) and (1, 1), but for every point (x, x3/2 of the upper boundary curve there
are points representing a graph arbitrarily close (just use graphs consisting of a
complete graph and isolated nodes).
From our perspective, there is nothing to improve on the upper bound, but
can we get arbitrarily close to the lower bound between two special edge density
values 1 − 1/k? Surprisingly, the answer is no. Bollobás [1976] proved in 1976 that
the triangle density for a graph with edge density x ∈ (1 − k−1 1
, 1 − k1 ) is not only
above the parabola, but also above the chord of the parabola connecting the special
points corresponding to T (n, k − 1) and T (n, k).
Lovász and Simonovits [1976, 1983] formulated a conjecture about the exact
bounding curve, and proved it in very small neighborhoods of the special edge
density values above. One way to state this is that the minimum number of triangles
is attained by a complete k-partite graph with unequal color classes. The sizes of
the color classes can be determined by solving an optimization problem, which
leads to a cubic concave curve connecting the two special points. This conjecture
turned out quite hard. Lovász and Simonovits proved it in the special case when
the edge density x was close to one of the endpoints of the interval. Fisher [1989]
proved the conjecture for the first interval (1/2, 2/3). After quite a while, Razborov
[2007, 2008] proved the general conjecture. His work was extended by Nikiforov
[2011] to bounding the number of complete 4-graphs, and by Reiher [2012] to all
complete graphs.
So we know what the lower and upper bounding curves are. Luckily, math
plays no further tricks on us: it is easy to see that for every point between the two
curves there are points representing graphs arbitrarily close.
I dwelt quite long on this very simple special problem not only to show how
complicated it gets (and yet solvable), but also because Razborov’s methods for the
solution fit quite well in the framework developed in this book, and they will be
presented in Chapter 16.
2.1.2. A sampler of classical results. Let us start with some remarks to
simplify and to some degree unify the statements of these results. Every algebraic
inequality between subgraph densities can be “linearized”, using the following mul-
tiplicativity of t(., G):
(2.4) t(F1 F2 , G) = t(F1 , G)t(F2 , G),
where F1 F2 denotes the disjoint union of F1 and F2 . (This property will play a very
important role in the sequel, but right now it is just a convenient simplification.)
For example, we can replace (2.2) by
(2.5) t(K3 , G) ≥ 2t(K2 K2 , G) − t(K2 , G).
We can make the statements (and their proofs, as we will see below) more
transparent by two further tricks: first, if a linear inequality between the densities
of certain subgraphs F1 , . . . , Fk holds for all graphs, then we write it as an inequal-
ity between F1 , . . . , Fk ; and for specific small graphs Fi , we use little pictograms.
Goodman’s Inequality (2.2) can be expressed as follows:
(2.6) K3 ≥ 2K2 2 − K2
28 2. LARGE GRAPHS IN MATHEMATICS AND PHYSICS
or
(2.7) ≥2 − .
The Kruskal–Katona Theorem for triangles is:
(2.8) ≤ .
Let us describe some further classical results. Instead of counting complete
graphs, we can consider the density of some other graph F in G. Erdős proved the
inequality
(2.9) t(C4 , G) ≥ t(K2 , G)4 ,
or in pictograms
(2.10) ≥ .
Graphs with asymptotic equality here are quasirandom graphs (Section 1.4.2).
Bounding from below the homomorphism density of paths is a more difficult
question, but it turns out to be equivalent to theorems of Mulholland and Smith
[1959], Blakley and Roy [1965], and London [1966] in matrix theory (applied to the
adjacency matrix). If Pk denotes the path with k nodes, then for all k ≥ 2,
(2.11) t(Pk , G) ≥ t(K2 , G)k−1 .
Regular graphs give equality here. The first nontrivial case of inequality (2.11) is
(2.12) ≥ .
Translating to homomorphisms, this means that
v(G)hom(P3 , G) ≥ hom(K2 , G)2 .
If we count the homomorphisms on the left side by the image of the middle node,
we see that it is the sum of the squared degrees of G. Since hom(K2 , G) = 2e(G)
is the sum of the degrees, this inequality is just the inequality between arithmetic
and quadratic means, applied to the sequence of degrees.
Bounding the P3 -density from above in terms of the edge density is more diffi-
cult, but it was solved by Ahlswede and Katona [1978]; we formulate this as Exercise
2.4 below.
The next case of inequality (2.11) is
(2.13) ≥ ,
and this is already quite hard, although short proofs with a tricky application of
the Cauchy–Schwarz inequality are known.
In Chapter 16 we will return to the question of how far the application of such
elementary inequalities takes us in proving inequalities between subgraph densities.
starting from the lower left corner, and going counterclockwise. (It does not really
matter.)
The role of the labels is that when taking a “product” of two graphs, we take
the disjoint union, but identify nodes with the same label. With this convention,
it is easy to check that
( )2
− − + = − − +
(this combination is “idempotent”) and
( )2
− = −2 +
Forgetting the labels, adding up, and deleting isolated nodes, we get
( )2 ( )2
− − + +2 − = −2 + .
So the right side is a sum of squares, which implies that it is nonnegative:
−2 + ≥ 0,
which is just (2.7).
Is this a valid argument? It turns out that it is, and the method can be
formalized using the notion of graph algebras. These will be very useful tools in
the proofs of characterization theorems of homomorphism functions, and also in
some other studies of graph parameters.
2.1.4. General results. Moving from special extremal graph problems to
the more general, let us describe some quite general results about extremal graphs,
which were obtained quite a long time ago in several papers of Erdős, Stone and
Simonovits [1946, 1966, 1968]. We exclude an arbitrary graph L as subgraph of a
simple graph G, and want to determine the maximum number of edges of G, given
the number of nodes n. Turán’s Theorem 2.1 is a special case when L is a complete
graph. It turns out that the key quantity that governs the answer is the chromatic
number r = χ(G).
The Turán graph T (n, r − 1) is certainly one of the candidates for the extremal
graph, since it cannot contain any graph as a subgraph that has chromatic number
r. For certain excluded graphs L it is easy to construct examples that have slightly
more edges than this Turán graph; however, the gain is negligible: for every graph
G on n nodes that does not contain L as a subgraph, we have
( 1 )( n )
(2.14) e(G) ≤ (1 + o(1))e(T (n, r − 1)) = 1 − + o(1) .
r−1 2
There is also a “stability” result: For every ε > 0 there is an ε′ > 0 (depending
on
( L and ε, but not ) G) such that if G is a graph not containing
)( on ( ) L with at least
1 − 1/(r − 1) − ε′ n2 edges, then we can change at most ε n2 edges of G to get a
Turán graph T (n, r − 1).
We will see that graph limit theory gives very short and elegant proofs for
these facts. The idea that extremal graph problems have “continuous versions” (in
a sense quite similar to our use of graphons), which are often cleaner and easier to
handle, goes back to around 1980, when Katona [1978, 1980, 1985] and Sidorenko
[1980, 1982] used this method to generalize graph and hypergraph problems, and
also to give applications in probability theory.
30 2. LARGE GRAPHS IN MATHEMATICS AND PHYSICS
Remark 2.2. If r = 2 (which means that L is bipartite), then the main term in
(2.14) disappears, and all we get is that the number of edges is o(n2 ). Of course,
one would like to know the precise order of magnitude of the best upper bound.
This is known in several cases (e.g., small complete bipartite graphs and cycles),
but in general it seems to be a difficult unsolved problem. The extremal graphs in
this case are sparse, and quite complex: for example, C4 -free graphs with maximum
edge density are constructed from finite projective planes. Extremal problems for
graphs with excluded bipartite graphs do not seem to fit in with the framework
developed in this book, but perhaps they can serve as motivation for extending it
to sparser graphs.
2.1.5. General questions. We have brought up the idea of introducing
graphons (graph limits) in Section 1.5.3 motivated by the goal to approximate
very large networks by simpler analytic objects. We have seen that graphons pro-
vide cleaner formulations, with no error terms, of some results in graph theory (for
example, about quasirandom graphs). We will see in Section 16.7 that extremal
graph theory provides another, also quite compelling motivation: Graphons pro-
vide a way to state, in an exact way, general questions about the nature of extremal
graphs, and also help answering them, at least in some cases. (They have similar
uses in the theory of computing; cf. Chapter 15).
Which inequalities between subgraph densities are valid? Given a linear
inequality between subgraph densities (like (2.7) above), is it valid for all graphs
G? Hatami and Norine [2011] proved recently that this question is algorithmically
undecidable. We will describe the proof of this fundamental result in Section 16.6.1.
On the other hand, it follows from the results of Lovász and Szegedy [2012a] that if
we allow an arbitrarily small “slack”, then it becomes decidable (see Section 16.6.2).
Can all linear inequalities between subgraph densities be proved using
just Cauchy–Schwarz? We described above a proof of the simple inequality
(2.12) using the inequality between arithmetic and quadratic means, or equivalently,
the Cauchy–Schwarz Inequality. Many other extremal problems can be proved by
using the Cauchy–Schwarz Inequality (often repeatedly and in nontrivial ways).
Exercise 2.5 shows that Goodman’s Inequality can also be proved by this method.
How general a tool is the Cauchy–Schwarz Inequality in this context?
Using the notions of graphons and graph algebras we will be able to give an
exact formulation of this question. It will turn out that the answer is negative
(Hatami and Norine [2011], Section 16.6.1), but it becomes positive if we allow an
arbitrarily small error (Lovász and Szegedy [2012a], Section 16.6.2).
Is there always an extremal graph? Let us consider extremal problems of
the form “maximize a linear combination of subgraph densities, subject to fixing
other such combinations”. For example, “maximize the triangle density subject
to a given edge density” (the answer is given by the first nontrivial case of the
Kruskal–Katona Theorem (2.8)).
To motivate our approach, consider the following two optimization problems.
Classical optimization problem. Find the minimum of x3 − 6x over all
numbers x ≥ 0.
Graph optimization problem. Find the minimum of t(C4 , G) over all
graphs G with t(K2 , G) ≥ 1/2.
2.1. EXTREMAL GRAPH THEORY 31
√
The solution of the classical optimization problem is of course x = 2. This
means that it has no solution in rationals, but we can find rational numbers that
are arbitrarily close to being optimal. If we want an exact solution, we have to go
to the completion of the rationals, i.e., to the reals.
The graph optimization problem may take a bit more effort to solve, but (2.9)
shows that if the edge density is 1/2, then the 4-cycle density is at least 1/16. With
a little effort one can show that equality is never attained here. Furthermore, the
4-cycle-density gets arbitrarily close to 1/16 for appropriate families of graphs: the
simplest example is a random graph with edge density 1/2 (cf. also Section 1.4.2).
The analogy with the classical optimization problem above suggests that we
should try to enlarge the set of (finite) graphs with new objects so that the appro-
priate extension of our optimization problem has a solution among the new objects.
Furthermore, we want that these new objects should be approximable by graphs,
just like real numbers are approximable by rationals. As it turns out, graphons are
just the right objects for this.
One can prove that there is always an extremal graphon, which then gives a
“template” for asymptotically extremal graphs. This follows from another fact that
can be considered one of the basic results treated in this book:
The space of graphons is compact in the cut-distance metric.
(This notion of distance was mentioned in Section 1.5.1, and will be defined in
Chapter 8; the compactness of the graphon space will be proved in Section 9.3).
Which graphs are extremal? This is not a good question (every graph is
extremal for some sufficiently complicated extremal graph problem), but replac-
ing “graph” by “graphon” makes it mathematically meaningful. Every extremal
graphon gives a “template” for asymptotically extremal graphs.
In classical extremal graph results, these templates are quite simple (Figure
2.2). A natural guess would be that all templates have the form of a stepfunction,
like the rightmost square in Figure 1.6. All of these are indeed templates for
appropriate extremal problems, but they is not all the templates: we will see that
the limit of half-graphs (the rightmost square in Figure 1.7) is also the template for
the extremal graph of a quite simple extremal problem, and there are many other,
more complicated, templates. We will prove several results about the structure of
these extremal templates (Section 16.7), but no full characterization is known.
32 2. LARGE GRAPHS IN MATHEMATICS AND PHYSICS
Two atoms that are adjacent in the grid have an “interaction energy”, which
depends on their states. In the simplest version of the basic Ising model, the
interaction energy is some number −J if the atoms are in the same state, and J
if they are not. The states of an atom can be described by the integers 1 and −1,
and so a configuration is a mapping σ : V (G) → {1, −1}. If σu denotes the state
of atom u, then the total energy of a given configuration is
∑
H(σ) = − Jσu σv .
uv∈E(G)
Basic physics (going back to Boltzmann) tells us that the system is more likely to
be in states with low energy. In formula, the probability of a given configuration
2.2. STATISTICAL PHYSICS 33
but since the number of terms is enormous, partition functions can be very hard to
compute or analyze.
The behavior of the system depends very much on the sign of J. If J > 0, then
adjacent pairs that are in the same state contribute less to the total energy than
those that are in different state, and so the configuration with the lowest energy
is attained when all atoms are in the same state. The typical configuration of the
system will be close to this, at least as long as the temperature T is small. This is
called the ferromagnetic Ising model, because it gives an explanation how materials
like iron get magnetized. If J < 0 (the antiferromagnetic case), then the behavior is
different: the chessboard-like pattern minimizes the energy, and no magnetization
occurs at any temperature.
One may notice that the temperature T emphasizes the difference between the
energy of different configurations when T → 0 (and de-emphasizes it when T → ∞).
In the limit when T → 0, all the probability will be concentrated on the states with
minimum energy, which are called ground states. In the simplest ferromagnetic
Ising model, there are two ground states: either all atoms are in state UP, or all of
them are in state DOWN. If the temperature increases, disordered states like the
left picture in Figure 2.3 become more likely. The transition from the ordered state
to the disordered may be gradual (in dimension 1), or it may happen suddenly at a
given temperature (in dimensions 2 and higher, for large graphs G); this is called a
phase transition. This leads us to one of the central problems in statistical physics;
alas, we cannot go deeper into the discussion of this issue in our book.
To make the connection to graph homomorphisms, we generalize the Ising
model a little. First, we replace the grid by an arbitrary graph G. (From the point
of view of physics, other lattices, corresponding to crystals with other structure, are
certainly natural. Other materials don’t have a simple periodic crystal structure.)
Second, we introduce a “magnetic ∑ field”, which prefers one state over the other: in
the simplest case it adds − u hσu to the energy function, with some parameter h.
Third, we consider not two, but q possible states for every atom, which we label by
1, 2, . . . , q (unlike 1 and −1 before, these should not be considered as numbers: they
are just labels). We have to specify an interaction energy Jij for any two states i
and j, and a magnetic field energy hi for every state i. A configuration is now a
map σ : V (G) → [q], and the energy of it is
∑ ∑
H(σ) = − hσ(v) − Jσ(u),σ(v) .
v∈V (G) uv∈E(G)
34 2. LARGE GRAPHS IN MATHEMATICS AND PHYSICS
Consider the case when αi = 1 for all i, and βij is 0 or 1 (in the Ising model βij
cannot be zero, but (2.15) allows this substitution). Then every term in (2.15) is
either 0 or 1, and a term is 1 if and only if βσ(u)σ(v) = 1 for every uv ∈ E(G). Let
us build a graph H with node set V (H) = [q], in which i, j ∈ [q] are adjacent if
and only if βij = 1. Then a term in (2.15) is 1 if and only if σ is a homomorphism
G → H, and so the sum simply counts these homomorphisms, and gives the value
Z = hom(G, H).
In the case of general values for the α and β, we can define a weighted graph
H with nodeweights αi and edgeweights βij . Formula (2.15) can then serve as the
definition of hom(G, H), which will be very important for us.
We don’t discuss the connections between statistical physics and graph theory
(homomorphisms and limits) any further; for an introduction to the connections
between statistical physics and graph theory, with more examples, see de la Harpe
and Jones [1993].
Exercise 2.6. Define a model in statistical physics in which the ground state
corresponds to the maximum cut of a graph.
Part 2
In this book, different areas of mathematics come together (graph theory, prob-
ability, algebra, functional analysis), and this makes it difficult to find good nota-
tion, and impossible in some cases to stick to standard notation. I tried to find
notation that helps readability. For example, when doing computations with small
graphs, I often use pictograms instead of introducing dozens of notations for them.
When labeling one or more nodes of a graph G, I use G• or G•• , and when adding
some loops at the nodes, I use G◦ . These graphs must be still defined, but perhaps
the meaning of the notation is easier to remember keeping this in mind.
The natural logarithm will be denoted by ln; logarithm of base 2, by log. (There
is a recurring dilemma about which logarithm to use. Base 2 is used in information
theory, and it is often better suited for combinatorial problems; natural logarithm
has simpler analytical formulas. Luckily, the two differ in a constant factor only so
the difference is usually irrelevant.) We denote by log∗ x the least n for which the
n-times iterated logarithm of x is less than 1. The Lebesgue measure on R will be
denoted by λ.
We will consider partitions of both finite sets and the interval [0, 1]. A partition
of [0, 1] will be called an equipartition, if it has a finite number of measurable classes
with the same measure. A partition of a finite set V will be called equitable, if
|S| − |T | ≤ 1 for any two partition classes.
37
38 3. NOTATION AND TERMINOLOGY
Signed graphs. Suppose that the edges of a graph F are partitioned into two
sets E+ and E− . The triple F = (V, E+ , E− ) will be called a signed graph. (We
don’t consider this as a weighted graph with edge weights ±1, because these signs
will play a quite different role!)
Partially labeled graphs. This less standard type of graphs will play a
crucial role in this book. A simply k-labeled graph is a graph in which k of the
nodes are labeled by 1, . . . , k (there may be any number of unlabeled nodes). A
k-multilabeled graph is a graph in which labels 1, . . . , k are attached to some nodes;
the same node may carry more than one label (but a label occurs only once). So
a k-multilabeled graph is a graph F together with a map [k] → V (F ), and this is
k-labeled if this map is injective. We omit “simply” from k-labeled, unless we want
to emphasize that it is simply k-labeled. The set of isomorphism types of k-labeled
multigraphs will be denoted by Fk• .
More generally, for every finite set S ⊆ N of labels we can talk about S-labeled
and S-multilabeled graphs. A partially labeled graph is an S-labeled graph for
some finite set S. A 0-labeled graph (or equivalently an ∅-labeled graph) is just
an unlabeled graph. The set of S-labeled multigraphs will be denoted by FS• . A
partially labeled graph in which all nodes are labeled will be called flat or fully
labeled or flat.
For every partially labeled graph G and S ⊆ N, let [[G]]S denote the partially
labeled graph obtained by removing the labels not in S . For S = ∅, we denote
[[G]]∅ simply by [[G]]; this is the unlabeled version of the graph G.
We need some notation for differently labeled versions of some basic graphs
(Figure 3.1). We denote by Kn , Kn• , Kn•• , . . . the complete graph with 0, 1, 2, . . .
nodes labeled. . We denote by Pn , Pn• , Pn•• the path on n nodes with 0, 1, 2 endnodes
labeled. . The m-bond labeled at both nodes will be denoted by B m•• . . We de-
• ••
note by Ka,b , Ka,b , Ka,b the complete bipartite graph with a nodes in the “first”
bipartition class and b nodes in the “second”, with no node labeled, the first bi-
partition class labeled, and all nodes labeled, respectively. In figures, the labeled
nodes are denoted by black circles, the labels ordered left-to-right or up-down. The
2-multilabeled graph consisting of a single node will be denoted by K1•• .
The adjacency matrix of a multigraph G is the V (G) × V (G) matrix AG where
(AG )ij is the number of edges connecting node i and j. In the case of a simple
graph, this is a 0-1 matrix. For a weighted graph, we let (AG )ij denote the weight
of the edge ij (the nodeweights can be encoded in a separate vector in RV (G) ).
Colored graphs. We will use graphs in which all the edges and all the nodes
are colored (so they are colorful objects indeed). To be precise, a colored graph of
type (b, c) (where b and c are positive integers) is a multigraph (possibly with loops)
G = (V, E), which is node-colored with b colors and edge-colored with c colors.
case, but not necessarily in the weighted case, since their nodeweights may be
different.
Let H ′ be obtained by identifying two twin nodes i and j, which means that
we delete j, and add αj (H) to αi (H). We can repeat this operation until we get a
weighted graph with no twins. The construction leading to this twin-free weighted
graph is called twin reduction. It is not hard to see that the twin-free graph obtained
from a given graph by twin reduction is uniquely determined.
Quotient. Let P be a partition of V (G). We denote by G/P the graph obtained
by merging each class of P into a single node. This definition is not precise; in
different parts of the book, we need it with edge multiplicities summed, averaged,
or maximized. If G is a simple graph (or a looped-simple graph, then one natural
interpretation is that G/P is a looped-simple graph, in which two nodes are adjacent
if and only if they have adjacent pre-images. we will call this the simple quotient..
For other versions of the quotient construction, instead of introducing a differ-
ent notation for each of these versions, we will define how the edges are mapped
whenever we use this notation.
Blow-up. We define the m-blowup G(m) of a graph G if it is obtained by replacing
each node of G by m twin copies (m ≥ 1). Sometimes we need a blow-up of G with
a given number of nodes, and so we need a little more general notion: we say that
a graph G′ is a near-blowup of G if it is obtained by replacing each node of G by
m or m + 1 twin copies for some m ≥ 1.
Product of graphs. For two looped-simple graphs G1 and G2 , their categorical
(weak){( product G1 × G2 )is defined by V (G1 × G2 ) = V (G}1 ) × V (G2 ), and E(G1 ×
G2 ) = (u1 , u2 ), (v1 , v2 ) : u1 v1 ∈ E(G1 ), u2 v2 ∈ E(G1 ) . We denote by G×k the
k-fold categorical product of G with itself.
If G1 and G2 are simple, then so is G1 × G2 . The strong product G1 G2 of two
simple graphs can be defined by adding a loop at every node, taking the categorical
product, and then removing the loops.
A further operation on graphs is the Cartesian {( sum G1 G2 ,) defined by
V (G1 G2 ) = V (G1 ) × V (G2 ) and E(G1 G2 ) =} (u1 , u2 ), (v1 , v2 ) : u1 v1 ∈
E(G1 ) and u2 = v2 , or u1 = v1 and u2 v2 ∈ E(G1 ) .
CHAPTER 4
where P ranges over all partitions of V (F ), and µ is the Möbius function of the
partition lattice, given by (A.2) (the actual value of µP will not be important to
us, just that such integers exist). This is in fact the “lower” Möbius inverse on the
partition lattice, but thankfully we don’t need the upper one in this book. By the
general properties of Möbius inversion, we have the relations
(4.4) ∑ ∑ ∑
f (F ) = f ↑ (F ′ ), f (F ) = f ↓ (F ′ ), f (F ) = f ⇓ (F ′ ).
F ′ ⊇F F ′ ⊆F P
V (F ′ )=V (F ) V (F ′ )=V (F )
F2 , in which case u, v and v ′ must be identified (Figure 4.1). (If F1 and F2 are
simply labeled, then this does not happen, and F1 F2 is also simply labeled. If
F1 and F2 are k-labeled, then F1 F2 is also k-labeled.) Another way to describe
this construction: form the disjoint union of F1 and F2 , add edges between nodes
with the same label, and contract the new edges. So the new labeled nodes will
correspond to the connected components of the graph on the original labeled nodes,
formed by the new edges.
Even if F1 and F2 are simple graphs, which are k-multilabeled, their product
may have loops and parallel edges. If F1 and F2 are simply k-labeled and have no
loops, then F1 F2 has no loops, but may have multiple edges. For two 0-labeled
(i.e., unlabeled) graphs, F1 F2 is their disjoint union. Clearly this multiplication is
associative and commutative.
Example 4.3. Consider edgeless fully k-multilabeled graphs. Such a graph is given
by a partition of the label set [k]. The product of two such graphs is also edgeless,
and it corresponds to the join of the partitions in the partition lattice (see Appendix
A.1). The (unique) simply labeled graph in this class corresponds to the discrete
partition; the graph with one node corresponds to the indiscrete partition.
Our basic tool to study a graph parameter will be the sequence of its con-
nection matrices: These are infinite matrices, one for every integer k ≥ 0, whose
linear algebraic properties are closely related to graph-theoretic properties of graph
parameters.
Let f be any multigraph parameter and fix an integer k ≥ 0. We define the
k-th multilabeled connection matrix of the graph parameter f as the (infinite) sym-
metric matrix M mult (f, k), whose rows and columns are indexed by (isomorphism
types of) k-multilabeled multigraphs, and the entry in the intersection of the row
corresponding to F1 and the column corresponding to F2 is f ([[F1 F2 ]]). The sub-
matrix corresponding to the simply k-labeled graphs is denoted by M simp (f, k) or
just M (f, k), and will be called simply the k-th connection matrix (Figure 4.2).
The submatrix of M (f, k) formed by rows and columns that are fully labeled (flat)
will be called the flat connection matrix and denoted by M flat (f, k). If the graph
parameter f is a simple graph parameter, then in M (f, k) those rows that corre-
spond to rows indexed by graphs with loops and/or multiple edges are just copies
of rows indexed by simple graphs, and similarly for the columns.
44 4. GRAPH PARAMETERS AND CONNECTION MATRICES
0 1 0 1 2 ...
1 2 1 2 3 . . .
0 1 0 1 2 . . .
1 2 1 2 3 . . .
2 3 2 3 4 . . .
.. .. .. .. .. ..
. . . . . .
′
Example 4.10 (Simple subgraphs). Let sg′ (G) = 2e (G) denote the number of
simple subgraphs of G. Then
1
sg′ (G1 G2 ) = sg′ (G1 )sg′ (G2 ) ′ .
sg (G1 ∩ G2 )
The first two factors don’t change the rank, and the rows of the matrix given by
the third factor are determined by the edges induced by the labeled nodes, so the
k k
corresponding matrix has at most 2(2) different rows. Hence r(sg′ , k) ≤ 2(2) . Again
one can check that this is the exact value.
Next we look at some of the less trivial but still common graph parameters,
which make more complicated examples.
Example 4.11 (Stability number). The maximum size α(G) of a stable set of
nodes is additive, and has finite connection rank (Godlin, Kotek and Makowski
[2008]). This is more difficult to prove. First, we split the rows of the matrix
k
M (α, k) into 2(2) classes, according to the subgraph Hi of Fi induced by the labeled
nodes. This splits the matrix M (α, k) into 2k(k−1) submatrices, and it suffices to
show that each of these has finite rank. So let us fix H1 and H2 . Let I denote the
set of stable sets of nodes in H1 ∪ H2 , and let Fi′ = Fi \ [k] and FiS = Fi′ \ NFi (S).
For two k-labeled graphs F1 and F2 with Fi [k] = Hi , we have α(F1 F2 ) =
maxS∈I αS (F1 F2 ), where
( (
αS (F1 F2 ) = |S| + α F1S ) + α F2S )
is the maximum( size of
) a stable set in F1 F2 intersecting [k] in S. The rank of
the matrix αS (F1 F2 ) is at most 3. Unfortunately, we cannot apply Lemma
4.6 directly, since αS (F1 F2 ) is not bounded. But we can use that α(F1 F2 ) ≥
α∅ (F1 F2 ) = α(F1′ ) + α(F2′ ), and hence those sets S for which α(F1S ) < α(F1′ ) − k or
α(F2S ) < α(F2′ ) − k play no role in the maximum. In other words, we can replace
αS by
{ } { }
αS′ (F1 F2 ) = |S| + max α(F1S ), α(F1′ ) − k + max α(F2S ), α(F2′ ) − k ,
( )
and still have that α(F1 F2 ) = maxS αS′ (F1 F2 ). The matrices αS′ (F1 F2 ) have
rank at most 3, and for different sets S the corresponding entries differ by at most
3k, so the same argument as in the proof of Lemma 4.6 implies that α has finite
connection rank.
Example 4.12 (Node cover number). The minimum number of nodes covering
all edges, τ (G) = v(G) − α(G), has finite connection rank as well, since every
connection matrix of τ is the difference of the corresponding connection matrices
of the parameters v and α, which both have finite rank.
Example 4.13 (Number of stable sets). Let stab(G) denote the number of
stable sets in G. This parameter is multiplicative, and has finite connection rank:
this can be verified easily by distinguishing stable sets according to their intersection
with the set of labeled nodes.
Example 4.14 (Number of perfect matchings). Let pm(G) denote the number
of perfect matchings in the multigraph G. It is trivial that pm is multiplicative.
Let G be a k-labeled multigraph, let X ⊆ [k], and let pm(G, X) denote the
number of matchings in G that match all the unlabeled nodes and the nodes with
4.3. FINITE CONNECTION RANK 47
label in X, but not any of the other labeled nodes. Then we have for any two
k-labeled multigraphs G1 and G2
∑
pm(G1 G2 ) = pm(G1 , X)pm(G2 , [k] \ X).
X⊆[k]
Hence the matrix M (pm, k) can be written as the sum of 2k matrices of rank 1,
and its rank is at most 2k (it is not hard to see that in fact equality holds).
If we consider the matching number as a simple graph parameter (in terms of
multigraphs, this means that we don’t care which edge in a parallel class matches
a given pair of nodes), then the above argument has to be modified, to arrive at a
similar conclusion. The details of this are left to the reader as an exercise.
Example 4.15 (Number of Hamilton cycles). Let ham(G) be the number of
Hamilton cycles in G. For two k-labeled multigraphs G1 and G2 , every Hamilton
cycle H in G1 G2 defines a cyclic ordering (i1 , . . . , ik ) of the nodes in [k], and for
any two consecutive nodes ir and ir+1 , it defines an index jr ∈ [2] which tells us
whether the arc of H between ir and ir+1 uses G1 or G2 . Let us call the cyclic
ordering (i1 , . . . , ik ), together with the indices (j1 , . . . , jk ), the trace of H on the
labeled nodes. (If you are living in the set of labeled nodes, and cannot see farther
than a small neighborhood of the nodes, then the trace is all that you can see from
a Hamilton cycle.)
Given a possible trace T = (i1 , . . . , ik ; j1 , . . . , jk ), we denote by ham(Gj ; T ) the
number of systems of edge-disjoint paths in Gj which connect ir and ir+1 for all r
with jr = j, and which cover all unlabeled nodes in Gj . Then
∑
ham(G1 G2 ) = ham(G1 ; T )ham(G2 ; T ),
T
showing that the rank of M (ham, k) is bounded by the number of possible traces
(which is 2k−1 (k − 1)! by standard combinatorial calculation).
Example 4.16 (Chromatic polynomial). Every substitution into the chromatic
polynomial chr(G, x) gives a graph parameter (see Appendix A.2). If we substitute
a nonnegative integer q for the variable x, we get the number of q-colorings, which
is a special case of homomorphism functions (to be discussed in the next Chapter).
What about evaluations at other values? The rank of the connection matrices
for the general case was determined by Freedman, Lovász and Welsh (see Lovász
[2006a]).
Let Bk denote the number of partitions of a k-set (the k-th Bell number), and
let Bk,q denote the number of its partitions into at most q parts.
Proposition 4.17. For every fixed x, chr(., x) is a multiplicative graph parameter.
For every k ≥ 0,
{
Bk,x if x is a nonnegative integer,
r(chr(., x), k) =
Bk otherwise.
Furthermore, M (chr(., x), k) is positive semidefinite if and only if x is a nonnegative
integer or x ≥ k − 1.
Proof. We prove that the right hand side is an upper bound even for the
rank of the multi-connection matrix, and a lower bound for the rank of the simple
48 4. GRAPH PARAMETERS AND CONNECTION MATRICES
Note the nontrivial fact that the rank is always finite. If x is a nonnegative
integer, then the connection rank is bounded by xk , but otherwise, as a function of
k, it grows faster than ck for every c.
Example 4.18 (Tutte polynomial). The cluster expansion version cep(G; u, v)
of the Tutte polynomial generalizes the chromatic polynomial (see again Appendix
A.2), and it behaves similarly. It is not hard to show that for v ̸= 0,
{
Bk,u if u is a nonnegative integer,
r(cep, k) =
Bk otherwise
(the case v = 0 is trivial). Furthermore, cep(G; u, v) is reflection positive if and only
if u is a nonnegative integer. For other versions of the Tutte polynomial (e.g., tut)
similar conclusions hold, since they are related to cep by scaling and substitution
in the variables (except when the expressions we scale with are 0).
Example 4.19 (Number of spanning trees). The number of spanning trees
tree(G) of a graph G is obtained by substitution into the Tutte polynomial tut with
x = y = 1. Since u = (x − 1)(y − 1) = 0, this falls under the exception at the end of
the last example. Nevertheless, the arguments can be adjusted appropriately, and
we get that r(tree, k) = Bk .
We conclude with a couple of examples of parameters whose connection matri-
ces have infinite rank, but they are still “interesting”.
Example 4.20 (Maximum clique). The size of a maximum clique, ω(G), is
maxing. It does not have finite connection rank. In fact, consider the connection
matrix M (ω, 0), and its submatrix M whose rows and columns are indexed by
4.3. FINITE CONNECTION RANK 49
M (1P , k) indexed by G1 and G′1 are equal. This means that M (1P , k) has only a
finite number of different rows, and hence its rank is finite.
Corollary 4.23. Every nonnegative integer valued bounded minor-monotone multi-
graph parameter has finite connection rank.
Proof. Let f be such a parameter, and assume that f ≤ K. Then
f (G) = 1(f (G) ≤ 1) + 1(f (G) ≤ 2) + · · · + 1(f (G) ≤ K).
Since the graph property that f (.) ≤ i is minor-closed, each parameter 1(f (G) ≤ i)
has finite connection rank by Theorem 4.22, and hence so does f .
4.3.3. Monadic second order formulas. To describe a very rich class of
graph properties with finite connection rank (at least for looped-simple graphs),
we consider properties defined by certain logical formulas. A first order formula
in graph theory is composed of primitives “x = y” and “x ∼ y”, using logical
operations “∧” (AND), “∨” (OR) and “¬” (NEGATION), and logical quantifiers
“∀” and “∃”. Every such formula, properly composed, with all variables quantified,
defines a property of looped simple graphs, if we interpret the quantified variables
as nodes, and the relation x ∼ y as x and y being adjacent. For example, the
property of being a 2-regular loopless graph can be expressed as
(∀x)(∀y)(x = y ⇒ x ̸∼ y)
( )
∧ (∀x)(∃y)(∃z) y ̸= z ∧ x ∼ y ∧ x ∼ z ∧ (∀u)(x ∼ u ⇒ (u = y ∨ u = z))
(to facilitate reading these formulas, we will use some standard conventions like
writing A ⇒ B instead of ¬A ∨ B and x ̸= y instead of ¬(x = y)).
First order formulas can define rather simple graph properties only, but we get
a real jump in generality if we allow quantifying over subsets of nodes and edges.
A monadic second order formula has three types of variables, which we distinguish
using different fonts. Lower case letters denote nodes, upper case letters denote
subsets of nodes, and upper case boldface letters denote subsets of the edges. The
primitives then also include x ∈ X and xy ∈ Y. We call the formula node-monadic
if quantifying over subsets of edges is not allowed.
This way we get a quite powerful language to express graph formulas, as the
following examples show (see also the exercises at the end of the section).
4.3. FINITE CONNECTION RANK 51
with b colors, and edge-colored with c colors. In other words, a gaudy graph is a
5-tuple (V, E, α, β, γ), where (V, E) is an underlying multigraph and α : [a] → V ,
β : V → [b] and γ : E → [c] are maps. The labels of a gaudy graph will play a
role different from the role labels play in most of our discussions, and we will call
them badges.
Isomorphism of two gaudy graphs of the same type is defined in the natural
way, as an isomorphism between the underlying graphs that preserves the badge as-
signment and coloring maps. A gaudy graph parameter of type (a, b, c) is a complex
valued function defined on gaudy graphs of type (a, b, c), invariant under isomor-
phism.
To extend the notion of connection matrices to gaudy graphs is a bit tedious
but necessary. Let f be a gaudy graph parameter of type (a, b, c). We will define
infinitely many connection matrices (just like in the case of ordinary graph parame-
ters), but these will not be indexed by just a number k, as for ordinary graphs, but
we will have a connection matrix M (f ; H, a1 , a2 ) for every gaudy graph H of type
(a0 , b, c), where a0 +a1 +a2 = a. Its rows and columns are indexed by gaudy graphs
G of type (a0 + a1 , b, c) and (a0 + a2 , b, c), respectively, with a fixed embedding of H
into G which preserves badges, node colors and edge colors. The product G1 G2 of
two such structures is obtained by taking their disjoint union and then identifying
the copies of H in them. The badges and colors are defined in G1 G2 in a natural
way. The entry of M (f ; H, a1 , a2 ) in row G1 and column G2 is f (G1 G2 ).
A gaudy graph parameter f of type (a, b, c) has finite connection rank if all
connection matrices M (f ; H, a1 , a2 ) have finite rank.
The two observations in Lemmas 4.5 and 4.6 and their proofs remain valid
for gaudy graphs. We now formulate more involved operations, manipulating the
badges and colors. Let f be a gaudy graph parameter of type (a + 1, b, c), and
define the gaudy graph parameter f ∗ of type (a, b, c) by
(4.7) f ∗ (G; α, β, γ) = max f (G; α′ , β, γ).
α′ :[a+1]→V (G)
α′ |[a] =α
Let f be a gaudy graph parameter of type (a, 2b, c), and let φ : [2b] → [b]. Define
the gaudy graph parameter f ∗∗ of type (a, b, c) by
(4.8) f ∗∗ (G; α, β, γ) = max f (G; α, β ′ , γ).
β ′ :φ◦β ′ =β
The following lemma asserts that these operations preserve finite rank:
Lemma 4.28. If f is a gaudy graph parameter with finite connection rank and
finite range, then f ∗ , f ∗∗ and f ∗∗∗ have finite connection rank.
Proof. We describe the proof for f ∗ ; the proof for f ∗∗ and f ∗∗∗ is similar.
Consider a connection matrix M = M (f ∗ ; H, a1 , a2 ), where H is a gaudy graph
of type (a0 , b, c), and a0 + a1 + a2 = a − 1. Consider a general entry of M , defined
by a row index G1 and column index G2 , where G1 G2 = (G, α, β, γ):
MG1 ,G2 = f ∗ (G1 G2 ) = ′max f (G; α′ , β, γ).
α |[a]=α
4.3. FINITE CONNECTION RANK 53
We split this maximum into v(H)+2 parts, according to α′ (a) = v ∈ V (H), α′ (a) ∈
V (G1 )\V (H) and α′ (a) ∈ V (G2 )\V (H). This defines M = M (f ∗ ; H, a1 , a2 ) as the
maximum of v(H) + 2 matrices Av (v ∈ V (H)), B b and C.
b By the same argument
as in the proof of Lemma 4.6, it suffices to prove that these matrices have finite
rank. We show that these matrices can be expressed by the connection matrices of
f in a way that finite rank is preserved.
First, each of the matrices Av (v ∈ V (H)) is a connection matrix for f itself
(with a new badge added to v), and so it has finite rank.
Second, each entry B bG G is obtained as the maximum of the entries NG′ G of
1 2 1 2
the matrix N = M (f ; H, a1 + 1, a2 ), where G′1 is obtained from G1 by attaching
badge a to one additional node. Let L(G1 ) denote the set of these rows. This is a
finite set but there is no common bound on its size. However, we may notice that
for a fixed G1 , the columns have a basis with at most rk(N ) elements, and if two
rows agree on these basis columns then they agree everywhere. The range of f is
finite, say it consists of r elements; hence there are at most K = rrk(N ) different
rows in N . Let L1 (G1 ) ⊆ L(G1 ) be a maximal set of different rows.
Now we create K matrices B b1 , . . . , B
bK , which are of the same shape as B,
b the
b
row of Bi corresponding to G1 is the i-th row in L1 (G1 ) (we repeat the last row
if we run out of rows). These are all submatrices of N , so they have rank at most
rk(N ). Furthermore, B b is obtained by taking their maximum entry-by-entry, and
hence it has finite rank by the argument used in the proof of Lemma 4.6.
It follows by a similar argument that C has finite rank.
Proof of Theorem 4.27. We prove the theorem more generally for gaudy
graph properties definable by a monadic second order formula F, by induction on
the number of quantifiers. If there are no quantifiers in F, then the assertion is
easy. Else, F can be written in one of the following forms:
(∀x)Fx , (∃x)Fx , (∀S)FS , (∃S)FS , (∀Ψ)FΨ , (∃Ψ)FΨ ,
Here Fx is a monadic second order formula in the language of gaudy graphs with
an additional badge x; FS is obtained from F using twice as many node colors
{1, . . . , 2b}, where color i means “colored i and not in S”, and color i + b means
“colored i and in S”. FΨ is defined analogously, splitting edge colors according
to containment in the edge subset Ψ. Replacing the property by its negation if
necessary, we can forget about the three versions starting with ∀ (see Exercise
4.37).
First, we consider the case when F = (∃x)Fx . The indicator function 1Fx of
the gaudy graph property defined by Fx has finite connection rank by the induction
hypothesis. For the indicator functions, we have 1F = 1∗Fx , so Lemma 4.28 implies
that 1F has finite connection rank. The two remaining cases follow similarly.
Exercise 4.30. Prove that for every isolate-indifferent graph parameter f , the
connection rank r(f, k) is a monotone non-decreasing function of k.
54 4. GRAPH PARAMETERS AND CONNECTION MATRICES
Exercise 4.31. Let χ(G) denote the minimum number of cliques in the graph G
covering all nodes (the chromatic number of the complement). Prove that χ has
finite connection rank.
Exercise 4.32. Prove that the graph parameters 2α(G) and 2τ (G) are multiplica-
tive and have finite connection rank.
−→
Exercise 4.33. Prove that the multigraph parameter eul(G) has infinite connec-
−→
tion rank for k ≥ 2. Hint: for k = 2, consider the submatrix of M (eul, k) formed
by rows and columns indexed by (2i −[(1)-bonds, )]n i = 1, 2, . . . , n. Verify that this
2i−1
can be written as 2AAT , where A = i+j−1 i,j=1
is a lower triangular matrix
with 1-s in the diagonal.
Exercise 4.34. (a) Prove that no unbounded maxing graph parameter has finite
connection rank. (b) Show by an example that a bounded maxing parameter can
have finite connection rank.
Exercise 4.35. Prove that if P is a minor-closed property, then the property P
defined by P(G) = P(G) has finite connection rank.
Exercise 4.36. The Hadwiger number had(G) if a graph G is the largest n for
which Kn is a minor of G. Prove that the parameter had is minor-monotone, but
its connection rank is not finite.
Exercise 4.37. Prove that if a graph property has finite connection rank, then
so does its negation.
Exercise 4.38. Show that the following graph properties can be expressed by
monadic second order formulas: (a) G is connected; (b) G is a tree; (c) G is 3-
degenerate (i.e., its nodes can be ordered so that every node is connected to no
more than 3 earlier nodes); (d) G is Hamiltonian.
Exercise 4.39. Show that every minor-monotone graph property can be ex-
pressed by a node-monadic second-order formula.
Exercise 4.40. The property that the number of nodes is even cannot be ex-
pressed by a monadic second order formula.
Exercise 4.41. Prove that (a) the property that G is Hamiltonian has finite
connection rank; (b) the property that the complement G is Hamiltonian does
not have finite connection rank. (c) The property that G is Hamiltonian can not
be expressed by a node-monadic second order formula.
Exercise 4.42. Show that the following graph properties cannot be expressed by
monadic second order formulas: (a) G has a nontrivial automorphism; (b) G has
a node-transitive automorphism group.
Exercise 4.43. Let f be an integer valued graph parameter with finite connection
rank, and let m be a positive integer. Prove that f mod m has finite connection
rank.
Exercise 4.44. Let f be a bounded graph parameter with ( )finite connection rank.
Let g : R → R be an arbitrary function. Prove that g f (.) has finite connection
rank.
Exercise 4.45. Let f (G; x) be a graph parameter whose values are analytic
functions of a variable x (defined for |x| ≤ 1). Suppose that for every real x,
f (., x) has finite connection rank.
(a) Prove that the k-th connection rank of f (.; x) is uniformly bounded in x.
d
(b) Prove that dx f (., x) has finite connection rank for all x.
Exercise 4.46. From a gaudy graph parameter f , define the new parameters f ′ ,
f ′′ and f ′′′ by replacing the “max” by “sum” in (4.7), (4.8) and (4.9). Prove that
if f has finite connection rank, then so do f ′ , f ′′ and f ′′′ (no finiteness assumption
for the range is needed).
CHAPTER 5
Graph homomorphisms
Exercise 5.1. Verify that allowing looped-simple graphs would not give any
interesting new cases of the homomorphism existence problem.
−
→ −
→
Exercise 5.2. Let C n denote the directed n-cycle, P n , the directed path on n
−
→
nodes, and K n , the transitive tournament on n nodes. For any cycle or path in a
digraph G, define its gain as the difference between the numbers of forward and
backward edges (if the cycle or path is traversed in the opposite direction, this
number changes sign). Prove that
−
→
(a) G → C n if and only if the gain of every cycle is a multiple of n;
−
→
(b) G → P n if and only if the gain of every cycle is 0 and the gain of every path
is bounded by n − 1;
−
→ −
→
(c) G → K n if and only if P n+1 ̸→ G.
−
→ −
→ −
→
Exercise 5.3. Prove that G → C n , G → P n and G → K n are polynomial time
decidable.
55
56 5. GRAPH HOMOMORPHISMS
and
∏
(5.2) homφ (F, G) = βφ(u)φ(v) (G).
uv∈E(F )
We define
∑
(5.3) hom(F, G) = αφ homφ (F, G),
φ: V (F )→V (G)
5.2. HOMOMORPHISM NUMBERS 57
and
∑
(5.4) inj(F, G) = αφ homφ (F, G).
φ: V (F )→V (G)
φ injective
For these definitions to make sense, αv (G) and βuv (G) can be from any com-
mutative ring; we will, however, never need any field other than R and C, and most
of the time αv (G) will be positive and βuv (G) real, and often itself positive.
This definition of hom(F, G) makes sense if F is a multigraph and G is a
weighted graph. If G is an (unweighted) multigraph, then we can consider the
weighted simple graph G′ in which each edge is weighted by its multiplicity in G.
Then hom(F, G) = hom(F, G′ ) (in the node-and-edge sense).
One can also define hom(F, G) when both F and G are weighted, provided
these weights satisfy some reasonable conditions. Let us give the formula first. To
every map φ : V (F ) → V (G), we define the weights
∏
(5.5) αφ = αφ(u) (G)αu (F ) .
u∈V (F )
and
∏
(5.6) homφ (F, G) = βφ(u)φ(v) (G)βuv (F )
uv∈E(F )
We then define
∑
(5.7) hom(F, G) = αφ homφ (F, G).
φ: V (F )→V (G)
The exponential βφ(u)φ(v) (G)βuv (F ) may not be well defined. Mostly (and even this
is not very often), we will need this definition when the nodeweights and edgeweights
of F are nonnegative integers; then (with the usual convention that 00 = 1) the
definition is meaningful. Another case when the homomorphism number is well
defined is when all the edgeweights in G are positive.
Note that in the case when F is an unweighted multigraph, we can replace it
with a weighted graph on the same set of nodes where the nodeweights are 1 and
the edgeweights are equal to the corresponding multiplicities in F . This does not
change the homomorphism numbers hom(F, G).
Signed graphs. There is a convenient way to treat conditions on preservation of
edges and also preservation of non-edges together. Let F and G be simple graphs,
where F = (V, E+ , E− ) is signed. We define hom(F, G) as the number of maps
V (F ) → V (G) where edges in E+ must be mapped onto adjacent pairs, and edges
in E− must be mapped onto nonadjacent pairs. The quantity inj(F, G) is defined
analogously. If F is an unsigned simple graph, we construct the signed graph Fb
from the complete graph on V (F ) by signing the edges of F positive and the edges
not in F negative. Then for every simple graph G,
Extension of this formula to the case when F is itself weighted or signed is left to
the reader.
and
hom(F, H)
t(F, H) = v(F )
.
αH
Note that t(F, H) = hom(F, H 0 ), where H 0 is obtained from H by dividing all
node weights by αH , so that αH 0 = 1. The nodeweights in H 0 form a probability
distribution, and t(F, H) = hom(F, H 0 ) is the expectation of homφ (F, H), where φ
is the random map V (F ) → V (H) in which the image of each v ∈ V (F ) is chosen
independently from the distribution α(H0 ).
5.2. HOMOMORPHISM NUMBERS 59
hom(F, G)
(5.14) t∗ (F, G) = ,
v(G)
which we consider for connected graphs F . We call this the homomorphism fre-
quency of F in G, to distinguish it from the homomorphism densities that are used
in the dense case.
We can interpret the homomorphism frequencies as follows. Let us label any
node of F by 1, to get a 1-labeled graph F1 . For v ∈ V (G), the quantity homv (F1 , G)
denotes the number of homomorphisms φ of F1 into G with φ(1) = v. Now we select
a uniform random node v of G. Then t∗ (F, G) is the expectation of homv (F1 , G).
We can interpret the injective and induced homomorphism frequencies
inj(F, G) ind(F, G)
t∗inj (F, G) = , t∗ind (F, G) =
v(G) v(G)
similarly.
For general (not necessarily connected) bounded degree graphs, the order of
magnitude of hom(F, G) (where F is fixed and v(G) tends to infinity) is v(G)c(F ) ,
where c(F ) is the number of connected components of F . But since hom(F, G) is
multiplicative over the connected components of F , we don’t lose any information
if we restrict the definition of t∗ (F, G) to connected graphs F .
Remark 5.4. Normalizing homomorphism densities as above is not the only rea-
sonable choice. For example, if F is a bipartite graph, then hom(F, G) will be
positive for graphs G with at least one edge, and we may be interested in the order
of magnitude of, say, hom(C4 , G), given hom(K2 , G). Hence we might look at quo-
tients log hom(F, G)/ log v(G), or more generally, log hom(F1 , G)/ log hom(F2 , G).
Such quantities were studied by Kopparty and Rossman [2011] and Nešetřil and
Ossona de Mendez [2011]. However, it is fair to say that this interesting area is
largely unexplored.
where F ′ ranges over all simple graphs obtained from F by adding edges, and
∑
(5.16) hom(F, G) = inj(F/P, G),
P
60 5. GRAPH HOMOMORPHISMS
where P ranges over all partitions of V (F ), and F/P is the simple quotient graph.
Conversely, ind can be expressed by inj using inclusion-exclusion:
∑ ′
(5.17) ind(F, G) = (−1)e(F )−e(F ) inj(F ′ , G).
F ′ ⊇F
V (F ′ )=V (F )
We can also express this using the Möbius inverse (recall the definitions (4.1)–(4.3)):
ind(., G) = inj↑ (., G).
The inj function, in turn, can be expressed by hom, by considering the values
inj(F ′ , G) in the equations (5.16) as unknowns and solving the system. To give an
explicit expression, we use the Möbius inverse of the partition lattice:
∑
(5.18) inj(F, G) = µP hom(F/P, G),
P
For t and tinj the relationship is not so simple, due to the different normalizations
in their definitions, but recalling that mostly we are interested in large graphs G,
the following inequality is usually enough to relate them:
( )
1 v(F )
(5.21) |tinj (F, G) − t(F, G)| ≤ .
v(G) 2
(The proof of this inequality is left to the reader as an exercise.) It follows that (for
large graphs G, when the error in (5.21) is negligible) subgraph sampling provides
the same information as any of the homomorphism densities t, tinj or tind .
Complementation. Taking induced subgraphs commutes with complementation,
which implies
(5.22) ind(F, G) = ind(F , G),
Counting maps into the complement of a graph can be expressed, via inclusion-
exclusion, by numbers of maps into the graph itself. Applying this idea to injective
homomorphisms, we get the following identities for every simple graph F :
∑ ′
(5.23) hom(F, G) = (−1)e(F ) hom(F ′ , G)
F ′ ⊆F
V (F ′ )=V (F )
and
∑ ′
(5.24) inj(F, G) = (−1)e(F ) inj(F ′ , G).
F ′ ⊆F
V (F ′ )=V (F )
5.2. HOMOMORPHISM NUMBERS 61
We get similar expressions for the induced homomorphism numbers ind. We can
also express this in terms of homomorphism densities:
1 ∑
(5.27) tinj (F, G) = (n) tinj (F, G[S]).
t S∈ V (G)
( t )
Graph operations. If F1 and F2 are node-disjoint, then
(5.28) hom(F1 ∪ F2 , G) = hom(F1 , G)hom(F2 , G).
If F is connected and G1 and G2 are node-disjoint, then
(5.29) hom(F, G1 ∪ G2 ) = hom(F, G1 ) + hom(F, G2 ).
About homomorphisms into a product, we have
(5.30) hom(F, G1 × G2 ) = hom(F, G1 )hom(F, G2 ).
All these identities are straightforward to verify.
5.2.4. Homomorphism numbers and sampling. Our basic way of obtain-
ing information about a graph is sampling (Section 1.3.1): subgraph sampling in
the dense case and neighborhood sampling in the bounded degree case. Homo-
morphism densities and frequencies carry the same information as the appropriate
sample distributions. For the dense case, the connection is straightforward:
Proposition 5.5. For two simple graphs F and G, tind (F, G) is the probability that
sampling V (F ) nodes of G (ordered, without repetition), they induce the graph F
(with a fixed labeling of the nodes).
In the bounded degree case, homomorphism frequencies contain the same in-
formation as the distribution of neighborhood samples, but the proof of this equiv-
alence is a bit trickier. Let us recall that ρG,r is a probability distribution on rooted
r-balls: ρG,r (B) denotes the probability that selecting a uniform random node of
the graph G, its neighborhood of radius r is isomorphic with the ball B.
Proposition 5.6. Let us fix an upper bound D for the degrees of the graphs we
consider.
(a) Each density t∗ (F, G) can be expressed as a linear combination (with co-
efficients independent of G) of the neighborhood sample densities ρG,r with r =
v(F ) − 1.
(b) For every r ≥ 0 there are a finite number of connected simple graphs
F1 , . . . , Fm such that ρG,r can be expressed as a linear combination (with coeffi-
cients independent of G) of the densities t∗ (Fi , G).
62 5. GRAPH HOMOMORPHISMS
Proof. (a) From the interpretation of t∗ (F, G) given above, we see that it can
be obtained as the expectation of the number of homu→v (F, B), where u is any
fixed node of F , and B is a random ball from the neighborhood sample distribution
ρG,r , with center v and radius r = v(F ) − 1. This gives the formula
∑
t∗ (F, G) = ρG,r (B)homu→v (F, B),
B
where the summation extends over all possible r-balls.
(b) To compute the neighborhood sample distributions from the quantities
t∗ (F, G), we first express the quantities t∗inj (F, G) via inclusion-exclusion. By a
similar argument, we can express the induced densities t∗ind (F, G). (Since we are
normalizing by v(G) in all cases, we avoid here the difficulty we had in the dense
case with expressing tinj by t.)
Next, we count copies of F in G where we also prescribe the degree of each
node of F in the whole graph G. To be precise, we consider graphs F together with
maps δ : V (F ) → {0, . . . , D}, and we determine the numbers
ind(F, δ, G)
t∗ind (F, δ, G) = ,
v(G)
where ind(F, δ, G) is the number injections φ : V (F ) → V (G) which embed F in
G as an induced subgraph, so that the degree of φ(v) is δ(v). This is again done
by an inclusion-exclusion argument.
For a ball B of radius r, we have
∑ t∗ (B, δ, G)
ind
ρG,r (B) = ,
aut(B)
δ
where the summation extends over all functions δ which assign the degree in B to
each node of B at distance less than r from the root, and an arbitrary integer from
[D] to those nodes at distance r. This proves that homomorphism densities and
neighborhood sampling are equivalent.
Exercise 5.7. Find formulas similar to (5.16) and (5.18), relating hom and surj.
Exercise 5.8. Which of the relations (5.15)–(5.30) generalize to weighted graphs?
Example 5.11 (Cycles and spectrum). If Ck denotes the cycle on k nodes, then
hom(Ck , G) is the trace of the k-th power of the adjacency matrix of the graph G.
In other words,
∑n
(5.31) hom(Ck , G) = tr(Ak ) = λki ,
i=1
where λ1 , . . . , λn are the eigenvalues of the adjacency matrix of G. Knowing this
homomorphism number for sufficiently many values of k, the eigenvalues of G re-
covered; eigenvalues with large absolute value are easier to express. For example,
hom(C2k , G)1/(2k) tends to the largest eigenvalue of G as k → ∞.
Several important graph parameters can be expressed in terms of homomor-
phisms into fixed “small” graphs.
Example 5.12 (Colorings). If Kq denotes the complete graph with q nodes (no
loops), then hom(G, Kq ) is the number of colorings of the graph G with q colors,
satisfying the usual condition that adjacent nodes must get different colors.
Example 5.13 (Stable sets). Let H = be obtained from K2 by adding a loop
at one of the nodes. Then hom(G, H) is the number stab(G) of stable sets of nodes
in G.
Example 5.14 (Eulerian property). Recall that a graph is eulerian, if all degrees
are even. For every loopless graph G, let Eul(G) = 1(G is eulerian). This 0-1 valued
graph parameter can be represented as a homomorphism function hom(., H), where
H = (a, B) is a weighted graph with two nodes, given by
( ) ( )
1/2 1 −1
a= , B=
1/2 −1 1
This was first noted by de la Harpe and Jones [1993]. (By Theorem 5.54 it will
follow that this function is reflection positive, and r(Eul, k) ≤ 2k .)
Example 5.15 (Nowhere-zero flows). The number of nowhere-zero q-flows is
denoted by flo(G, q) (see Appendix A.2). The choice q = 2 gives the special case
in example 5.14 (indicator function of eulerian graphs). The parameter flo can be
described as a homomorphism function, as will be demonstrated in larger generality
in the next example.
Example 5.16 (S-Flows). Let Γ be a finite abelian group (written additively),
and let S ⊆ Γ be a subset such that −S = S, and let G be a graph. An S-flow is
an assignment of an element f (uv) ∈ S to each∑ edge uv with a specified orientation
such that f (uv) = −f (vu) for each edge, and u∈N (v) f (uv) = 0 for each node v.
Let flo(G; Γ, S) denote the number of S-flows. The special case when Γ = Zq and
S = Γ \ {0} gives the number of nowhere zero q-flows in Example 5.15.
For a fixed Γ and S, the graph parameter flo(G; Γ, S) is defined in terms of
mappings from the edge set; it is therefore surprising that it can be described as a
homomorphism number (which is defined via a function on the node set). Let Γ∗
be the character group of Γ, and let H be the complete looped directed graph on
Γ∗ . Let αχ = 1/|Γ| for each χ ∈ Γ∗ , and let
∑
βχ,χ′ = χ(−s)χ′ (s),
s∈S
64 5. GRAPH HOMOMORPHISMS
Note that stab(G, 1, . . . , 1) = stab(G) = hom(G, H), where H is the graph on two
adjacent nodes, with a loop at one of them (all weights being 1).
We have seen that both the chromatic polynomial and the stable set polynomial
can be expressed, at least for special substitutions, as homomorphism numbers. We
show that, conversely, homomorphism numbers between graphs can be expressed in
terms of the stable set polynomial and also in terms of a the chromatic polynomials
of related graphs. Our first lemma expresses the logarithm of the stable set poly-
nomial in terms of the coefficient of the linear term of the chromatic polynomial
(see Appendix A.2 for the relevant definitions). We need a natural extension of
the notion for an induced subgraph: For a multiset Z of nodes, let G[Z] denote
the graph whose nodes are the elements of the multiset Z, and two of them are
adjacent iff the corresponding nodes of G are adjacent.
Lemma 5.20. Assuming that the series below is absolute convergent, we have
∑∞ ∑
(−1)m
(5.35) ln stab(G, x) = cri(G[v1 , . . . , vm ])xv1 . . . xvm .
m!
m=1 v1 ,...,vm ∈V (G)
Proof. Let ∑I+ denote the set of non-empty stable subsets of G. Writing
stab(G, x) = 1 + A∈I+ xA , we get
(5.36)
∞ ( )( ∑
∑ )k ∞ ( )
∑ ∑
y y
stab(G, x)y = 1 + xA =1+ xA1 . . . xAk .
k k
k=1 A∈I+ k=1 A1 ,...,Ak ∈I+
(5.36) is chr0 (G[Z], k)/(m1 ! . . . mr !). We get a nicer formula if instead of multi-
sets, we sum over sequences (v1 , . . . , vm ) of nodes. Then our multiset Z is counted
m!/(m1 ! . . . mr !) times, so we have to divide by this, to get that the contribution
of a sequence (v1 , . . . , vm ) is
1
chr0 (G[v1 , . . . vm ], k)xv1 . . . xvm ,
m!
and summing over k, we get that the contribution of (v1 , . . . , vm ) is
∞ ( )
∑ y 1 1
chr0 (G[v1 , . . . vm ], k)xv1 . . . xvm = chr(G[v1 , . . . vm ], y)xv1 . . . xvm .
k m! m!
k=1
where the product extends over all connected components of G′ with at least one
edge. These components form a stable set in L(Conn(G)), and vice versa, every
stable
( set in L(Conn(G))
) corresponds to a subgraph G′ . Hence the last sum is just
stab L(Conn(G)), t .
Combining this lemma with the previous one, we get a very useful relationship
between homomorphism densities and the chromatic invariant.
Corollary 5.22. Assuming the series below is absolute convergent, we have
∑∞ ∑
(−1)m ( )∏
m
ln t(G, H) = cri L(F1 , . . . , Fm ) t(Fj , H).
m!
m=1 F1 ,...,Fm ∈Conn(G) j=1
5.3. WHAT HOM FUNCTIONS CAN EXPRESS 67
What about all these assumptions about convergence? In fact, can we take
the logarithm in Lemma 5.20 at all? A fundamental result about the roots of the
stable set polynomial, Dobrushin’s Theorem [1996], gives us a sufficient condition
for this. Dobrushin’s Theorem has many statements in the literature, which are
more-or-less equivalent (but not quite), and we choose one that is convenient for
our purposes; see e.g. Scott and Sokal [2006] and Borgs [2006].
Theorem 5.23 (Dobrushin’s Theorem). Let G = (V, E) be a simple graph, and
let z ∈ CV and b ∈ RV+ satisfy
∑
(5.39) |zj |ebj ≤ bi
j∈{i}∪N (i)
and
∞
∑ hom(Cr , G) − Dr
(5.41) ln tree(G) = (n − 1) ln D − ln n − .
r=1
rDr
The formula for the number of trees extends easily to non-regular graphs, since
we can add loops to the nodes to make the graph regular, and adding loops does
not change the number of spanning trees (adding a loop to a node increases its
degree by 1 in this case). This expression seems to have been first formulated by
Lyons [2005].
Using (5.40),
∞
∑ hom(Cr , G)
ln det(yI + DI − A) = n ln(y + D) −
r=1
r(y + D)r
∑∞
hom(Cr , G) − Dr y
= n ln(y + D) − + ln
r=1
r(y + D)r y+D
∞
∑ hom(Cr , G) − Dr
= (n − 1) ln(y + D) − + ln y.
r=1
r(y + D)r
Substituting this in the formula for ln tree(G) and letting y → 0, we get (5.41).
We see from the proof of Theorem 5.29 that in fact G is determined by the
values hom(F, G) where v(F ) ≤ v(G), as well as by the values hom(G, F ) where
v(F ) ≤ v(G). It is a long-standing open problem whether, up to trivial exceptions,
strictly smaller graphs F are enough:
Conjecture 5.30 (Reconstruction Conjecture). If G is a simple graph with
v(G) ≥ 3, then the numbers hom(F, G) with v(F ) < v(G) determine G.
There is a weaker version, which is also unsolved:
Conjecture 5.31 (Edge Reconstruction Conjecture). If G is a simple graph
with e(G) ≥ 4, then the numbers hom(F, G) with e(F ) < e(G) determine G.
It is known that the Edge Reconstruction Conjecture holds for graphs G with
e(G) ≥ v(G) log v(G) (Müller [1977]). We will prove an “approximate” version of
the Reconstruction Conjecture (Theorem 10.32): for an arbitrarily large graph
√ G,
the numbers hom(F, G) with v(F ) ≤ k determine G up to an error of O(1/ log k)
(measured in the cut distance, which was mentioned in the Introduction but will
be formally defined in Chapter 8). Unfortunately, this does not seem to bring us
closer to the resolution of the Reconstruction Conjecture.
The normalized homomorphism density function t(., G) does not determine a
simple graph(G: If G(p)) is obtained from G by replacing every node by p twin
nodes, then t F, G(p) = t(F, G). But this is all that can go wrong:
Theorem 5.32. If G1 and G2 are simple graphs such that t(F, G1 ) = t(F, G2 ) for
every simple graph F , then there is a third simple graph G and positive integers
p1 , p2 such that G1 ∼
= G(p1 ) and G2 ∼
= G(p2 ).
Proof. Let ni = v(Gi ), and consider the blowups G′1 = G1 (n2 ) and G′2 =
G2 (n1 ). These have the same number of nodes, and hence t(F, G′1 ) = t(F, G1 ) =
t(F, G2 ) = t(F, G′2 ) implies that hom(F, G′1 ) = hom(F, G′2 ). So by Theorem 5.29, we
have G′1 ∼ = G′2 . It follows that the number of elements in every class of twin nodes of
′ ∼ ′
G1 = G2 is divisible by both n1 and n2 , and so it also divisible by m = lcm(n1 , n2 ).
So G′1 ∼= G′2 ∼= G(m) for some simple graph G, and hence pi = m/ni satisfies the
requirements in the theorem.
For weighted graphs, one must be a little careful. Let H be a weighted graph
and let H ′ be obtained from H by twin reduction. Then hom(F, H ′ ) = hom(F, H)
for every multigraph F , even though H and H ′ are not isomorphic. Restricting
our attention to twin-free graphs, we have an analogue of Theorem 5.29 (Lovász
[2006b]):
70 5. GRAPH HOMOMORPHISMS
where we sum over all H-colored graphs F′ on at most v(F) nodes, with appropriate
coefficients µ(F, F′ ). Let us add the easy identity
(5.47) (F × G)H ∼= FH × GH .
From G1 × H ∼= G2 × H it follows that (G1 × H)H =∼ (G2 × H)H (as H-colored
graphs). By (5.47), this implies that (G1 )H × HH ∼
= (G2 )H × HH , and hence by
(5.45),
hom(F, (G1 )H )hom(F, HH ) = hom(F, (G2 )H )hom(F, HH ).
But notice that hom(F, HH ) > 0: if F = (F, σ), then (σ, σ) is a homomorphism
F → HH . Thus we can divide by hom(F, HH ) to get
hom(F, (G1 )H ) = hom(F, (G2 )H )
for every H-colored graph F. From here (G1 )H = ∼ (G2 )H follows just like in the
proof of Theorem 5.29.
Proof of Theorem 5.37. By Proposition 5.35, we may assume that H is an
odd cycle with V (H) = [2r + 1] and E(H) = {ij : j ≡ i + 1 (mod 2r + 1)}. By
Lemma 5.38 there exist bijections φ1 , . . . , φ2r+1 : V (G1 ) → V (G2 ) such that for
72 5. GRAPH HOMOMORPHISMS
every ij ∈ E(H), φi (u)φj (v) ∈ E(G2 ) if and only if uv ∈ E(G1 ). (Note that this
means a different condition if we interchange i and j.)
We show that φ1 is an isomorphism between G1 and G2 . Indeed, we have
( )
φ1 (u)φ1 (v) ∈ E(G2 ) ⇐⇒ φ−1
2 φ1 (u) v ∈ E(G1 ) ⇐⇒ φ1 (u)φ3 (v) ∈ E(G2 )
( )
⇐⇒ φ−1
4 φ1 (u) v ∈ E(G1 ) ⇐⇒ φ1 (u)φ5 (v) ∈ E(G2 )
⇐⇒ . . . ⇐⇒ φ1 (u)φ2r+1 (v) ∈ E(G2 ) ⇐⇒ uv ∈ E(G1 )
This completes the proof.
Remark 5.39. You may have noticed that the proof of Lemma 5.38 followed the
lines of the proof of Proposition 5.35, only restricting the notion of homomorphisms
to those respecting the H-coloring. This suggests that there is a more general
formulation for categories. This is indeed the case, as we will see in Section 23.4.
A further natural question about multiplication is whether prime factorization
is unique. This is clearly a stronger property than the Cancellation Law, so let us
restrict our attention to the strong product, which satisfies the Cancellation Law.
The following example shows that prime factorization is not unique in general. We
start with an algebraic identity:
(5.48) (1 + x + x2 )(1 + x3 ) = (1 + x)(1 + x2 + x4 ).
If we substitute any connected graph G for x, and interpret “+” as disjoint union,
we get a counterexample. For example,
(5.49) (K1 ∪ K2 ∪ K4 ) (K1 ∪ K8 ) = (K1 ∪ K2 ) (K1 ∪ K4 ∪ K16 ).
But there is a very nice positive result of Dörfler and Imrich [1970] and McKen-
zie [1971]. (The proof uses different techniques, and we don’t reproduce it here.)
Theorem 5.40. Prime factorization is unique for the strong product of connected
graphs.
Exercise 5.41. (a) Prove that the strong product of two graphs is connected if
and only if both graphs are connected. (b) Show by an example that the categor-
ical product of two connected graphs is not always connected. (c) Characterize
all counterexamples in (b).
Exercise 5.42. Given two looped-simple digraphs F and G, we define the
digraph GF ( as follows:) V (GF ) =( V (G)V (F ) , E(G) ) = {(φ, ψ) : φ, ψ ∈
F
V (G) V (F )
, φ(u), ψ(v) ∈ E(G) ∀(u, v) ∈ E(F ) }. (a) Prove the following
identities:
(G1 × G2 )F ∼= G1 × G2 ,
F F
GF1 ×F2 ∼ F F
= (G 1 ) 2 , GF1 F2 ∼
= G 1 × G 2.
F F
(b) Show that hom(F, G) is the number of loops in GF . (c) Prove that if adjacency
is symmetric both in G and in F , then it is also symmetric in GF (Lovász [1967]).
A A A
The matrices Minj and Msurj are defined analogously. Finally, we also define Maut as
the matrix with aut(Fi ) = surj(Fi , Fi ) = inj(Fi , Fi ) in the i-th entry of the diagonal
and 0 outside the diagonal.
A
Clearly, Maut is a diagonal matrix; if we order the graphs Fi according to
increasing number of edges (and arbitrarily for graphs with the same number of
A A
edges), then the matrices Minj and Msurj become triangular. All diagonal entries
A A A
are positive in each case. Hence the matrices Maut , Minj and Msurj are nonsingular.
A
With Mhom the situation is more complicated: it may be singular (Exercise 5.46).
However, we have the following simple but useful fact, observed by Borgs, Chayes,
Kahn and Lovász [2012]:
Proof. Under the conditions of the Proposition, the matrices introduced above
are related by the following identity:
A A A −1 A
(5.50) Mhom = Msurj (Maut ) Minj .
Indeed, every homomorphism can be decomposed as a surjective homomorphism
followed by an (injective) embedding. By our assumption, the image F of the
surjective homomorphism is in A. The decomposition is uniquely determined except
for the automorphisms of F . This gives the equation
∑
m
surj(Fi , Fk )inj(Fk , Fj )
hom(Fi , Fj ) = ,
aut(Fk )
k=1
We could use nodeweights in (a) chosen randomly and independently from the
uniform distribution on [0, 1] (or form any other atomfree distribution); the matrix
will be nonsingular with probability 1.
74 5. GRAPH HOMOMORPHISMS
Proof. (a) Considering the node weights as variables, the determinant of the
[ ]k
matrix hom(Fi , Hj ) i,j=1 is a polynomial p with integral coefficients. The mul-
[ ]k
tilinear part of p is just the determinant of inj(Fi , Hj ) i,j=1 , which is non-zero,
since this matrix is upper triangular and the diagonal entries are nonzero polyno-
mials. Hence p is not the zero polynomial, which shows that for an algebraically
independent substitution it does not vanish.
(b) Instead of algebraically independent weights, we can also substitute appro-
[ ]k
priate positive integers in p to get a nonsingular matrix hom(Fi , Hj ) i,j=1 , since
a nonzero polynomial cannot vanish for all positive integer substitutions. For a
graph Hj and a node v ∈ V (Hj ) with weight mv , we replace v by mv twin copies of
weight 1. Let Gj be the graph obtained this way, then hom(Fi , Gj ) = hom(Fi , Hj )
[ ]k [ ]k
for all i, and hence hom(Fi , Gj ) i,j=1 = hom(Fi , Hj ) i,j=1 is nonsingular.
(c) Let n = maxi v(Fi ), and let us add n − v(Fi ) isolated nodes to every
Fi . The resulting graphs Fi′ are non-isomorphic, and hence there are simple
[ ]k
graphs G1 , . . . , Gk such that the matrix hom(Fi′ , Gj ) i,j=1 is nonsingular. Since
hom(Fi′ , Gj ) = v(Gj )n t(Fi′ , Gj ), we can scale the columns and get that the matrix
[ ]k
t(Fi′ , Gj ) i,j=1 is nonsingular. Since clearly t(Fi , Gj ) = t(Fi′ , Gj ), this proves the
proposition.
The following corollary of these constructions goes back to Whitney [1932]. We
have seen that the homomorphism functions satisfy the multiplicativity relations
hom(F1 F2 , G) = hom(F1 , G)hom(F2 , G) (where F1 F2 denotes disjoint union). Is
there any other algebraic relation between them? Using multiplicativity, we can
turn any algebraic relation to a linear relation, so the question is: are the graph
parameters hom(F, .) linearly independent (in the sense that any finite number of
them are). Thus (b) above implies:
Corollary 5.45. The simple graph parameters hom(F, .) (where F ranges over
simple graphs) are linearly independent. Equivalently, the simple graph parameters
hom(F, .) (where F ranges over connected simple graphs) are algebraically indepen-
dent.
What about non-algebraic relations? Such relations sound unlikely, and in fact
it can be proved (Erdős, Lovász and Spencer [1979]) that they don’t exist. To be
more precise, for any finite set of(distinct connected graphs ) A = {F1 , . . . , Fk }, if we
construct the set T (A) of points t(F1 , G), . . . , t(Fk , G) ∈ Rk , where G ranges over
all finite graphs, then the closure T (A) has an internal point. We will talk more
about these sets T (A) in Chapter 16.
A
Exercise 5.46. Show by an example that Mhom may be singular.
Exercise 5.47. Prove a version of part (a) of Proposition 5.44 in which the edges
are weighted (instead of the nodes).
Exercise 5.48. Find an upper bound on the number of nodes in the graphs Gi
in part (b) and (c) of Proposition 5.44.
Exercise 5.49. For every m ≥ 1, construct a family A of m simple graphs such
A
that the matrix Mhom is the identity matrix.
Exercise 5.50. For every m ≥ 1 there exist simple graphs F1 , . . . , Fm such that
for every integer vector a ∈ Nm there is a simple graph G such that hom(Fi , G) =
ai for all i ∈ [m].
5.6. CHARACTERIZING HOMOMORPHISM NUMBERS 75
From this, we see that M (flo, k) is positive semidefinite and has rank at most q k .
Schrijver [2009] gave the following characterization of graph parameters repre-
sentable as homomorphism functions into weighted graphs with node weights 1 and
complex edgeweights. Recalling the Möbius inverse on the partition lattice (4.3),
we can state the result as follows:
76 5. GRAPH HOMOMORPHISMS
Using this theorem, Schrijver gave a real-valued version, which is more similar
to Theorem 5.54.
Theorem 5.57. Let f be a real valued graph parameter defined on looped multi-
graphs. Then f = hom(., H) for some edge-weighted graph H with real edgeweights
if and only if f is multiplicative and, for every integer k ≥ 0, the multilabeled
connection matrix M mult (f, k) is positive semidefinite.
Note that in Theorem 5.57 and Corollary 5.58 no bound on the connection rank
is assumed; in fact (somewhat surprisingly), it follows from the multiplicativity
and reflection positivity conditions that f has finite connection rank, and r(f, k) ≤
f (K1 )k for all k. Furthermore, in Corollary 5.58 it also follows from the conditions
that the values of f are integers.
Next, we state an analogous (dual) characterization of graph parameters of the
form hom(F, .), defined on looped-simple graphs, where F is also a looped-simple
graph (Lovász and Schrijver [2010]). To state the result, we need some definitions.
Recall the notion of H-colored graphs and their products from the proof of Lemma
5.38; we need only the rather trivial version where H = Kq◦ is a fully looped
complete graph. We define dual connection matrices N (f, q) of a graph parameter
f : the rows and columns are indexed by Kq◦ -colored graphs, and the entry in row
G1 and column G2 is f (G1 × G2 ).
It follows that alternative (ii) obtains iff f = hom(., H) for some weighted graph
H, and alternative (iii) obtains when f = hom(., H) for some randomly weighted
graph H in which at least one edgeweight has a proper distribution.
It is possible to give a more precise description of the asymptotic behavior
of log r(hom(., H), k), but we have to refer to the paper of Lovász and Szegedy
[2012c] for details. Let us note that no such conclusion can be drawn without
assuming reflection positivity. For example, the chromatic polynomial chr(., x)
satisfies log rk(chr(., x), k) = Θ(k log k).
5.6.2. About the proofs. The proofs of the theorems above follow at least
three different lines. To be more precise, the necessity of the conditions is easy to
prove; below we prove the “easy” direction of Theorem 5.54, and the others follow
by essentially the same argument. The sufficiency parts will be postponed until
some further techniques will be developed:
—The completion of the proof of Theorem 5.54 will be given in Section 6.2.2,
after the development of graph algebras. (These algebras will be useful to study
other related properties of homomorphism functions, and the technique will also be
applied in extremal graph theory.) Corollary 5.58 and its dual, Theorem 5.59, can
be proved by a similar technique. This technique extends to a much more general
setting, to categories, as we will sketch in Section 23.4.
—The proofs of Theorems 5.56 and 5.57 will be described in Section 6.6, where
a general connection to the Nullstellensatz and invariant theory will be developed.
This method extends to edge coloring models (see Section 23.2).
—The proof of Theorem 5.61 will use a lot of the analytic machinery to be
developed in Part 3 of the book, and will be sketched at the end of that part
(Section 17.1.4). For the details of this proof, and for the proof of Supplement 5.62,
we refer to Lovász and Szegedy [2012c].
We conclude this section proving the “easy” direction in Theorem 5.54:
5.7. THE STRUCTURE OF THE HOMOMORPHISM SET 79
Proposition 5.64. For every weighted graph H, the graph parameter hom(., H) is
reflection positive and r(hom(., H), k) ≤ v(H)k .
Proof. For any two k-labeled graph F1 and F2 and φ : [k] → V (H), we have
(5.53) homφ (F1 F2 , H) = homφ (F1 , H)homφ (F2 , H)
(recall the definition of homφ from (5.10)). Let F = [[F1 F2 ]], then the decomposition
∑
hom(F, H) = αφ homφ (F, H).
φ: [k]→V (H)
writes the matrix M (hom(., H), k) as the sum of v(H)k matrices, one for each
mapping φ : [k] → V (H); (5.53) shows that these matrices are positive semidefinite
and have rank 1. This implies Lemma 5.64.
and similarly for t(x, y). (Most of the time we will use linearity in the first argument
only.)
Quantum graphs are useful in expressing various combinatorial situations. For
example, for any signed graph F , we consider the quantum graph
∑ ′
(6.1) x= (−1)e(F )−|E+ | F ′ ,
F′
where the summation extends over all simple graphs F ′ such that V (F ′ ) = V (F )
and E+ ⊆ E(F ′ ) ⊆ E+ ∪ E− . By inclusion-exclusion we see that for any simple
graph G, hom(F, G) = hom(x, G) is the number of maps V (F ) → V (G) that
map positive edges onto edges and negative edges onto non-edges. The equation
hom(F, G) = hom(x, G) remains valid if G is a weighted graph (one way to see it is
to expand the parentheses in definition (5.9)). Due to these nice formulas, we will
denote the quantum graph x by F ; this will not cause any confusion.
The relationships between homomorphism numbers and injective homomor-
phism numbers, equations
∑ (5.16) and (5.18), ∑
can be expressed as follows: For every
graph G, let ZG = P G/P and M G = P µP G/P , where P ranges over all
partitions of V (G). Here the quotient graph G/P is defined by merging every
class into a single node, and adding up the multiplicities of pre-images of an edge
to get its multiplicity in G/P . Then
(6.2) hom(F, G) = inj(ZF, G), and inj(F, G) = hom(M F, G).
More generally, for any graph parameter f , we have f (M G) = f ⇓ (G). The op-
erators Z and M extend to linear operators Z, M : Q0 → Q0 . Clearly, they are
83
84 6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
inverses of each other: ZM = M Z = idQ0 . (In Appendix A.1 these operators are
discussed for general lattices.)
We will see that other important facts, like the contraction/deletion relation
of the chromatic polynomial (4.5) can also be conveniently expressed by quantum
graphs (cf. Section 6.3).
For any k ≥ 0, a k-labeled quantum graph is a formal linear combination of
k-labeled graphs. We say that a k-labeled quantum graph is simple [loopless] if all
its constituents are simple [loopless].
6.1.1. The gluing algebra. Let Qk denote the (infinite dimensional) vector
space of k-labeled quantum graphs. We can turn Qk into an algebra by using the
gluing product F1 F2 introduced in Section 4.2 as the product of two generators, and
then extending this multiplication to the other elements of the algebra by linearity.
Clearly Qk is associative and commutative. The fully labeled graph Ok on [k] with
no edges is the multiplicative unit in Qk .
Every graph parameter f can be extended linearly to quantum graphs, and
defines an inner product on Qk by
(6.3) ⟨x, y⟩ = f (xy).
This inner product has nice properties, for example it satisfies the Frobenius identity
(6.4) ⟨x, yz⟩ = ⟨xy, z⟩.
Let Nk (f ) denote the kernel (annihilator) of this inner product, i.e.,
Nk (f ) = {x ∈ Qk : f (xy) = 0 ∀y ∈ Qk }.
Note that it would be equivalent to require this condition for (ordinary) k-labeled
graphs only in place of y. Sometimes we write this condition as x ≡ 0 (mod f ),
and then use x ≡ y (mod f ) if x − y ≡ 0 (mod f ). We define the factor algebra
Qk /f = Qk /Nk (f ).
Formula (6.3) still defines an inner product on Qk /f , and identity (6.4) remains
valid. While the algebra Qk is infinite dimensional, the factor algebra Qk /f is finite
dimensional for many interesting graph parameters f .
Proposition 6.1. The dimension of Qk /f is equal to the rank of the connection
matrix M (f, k). The inner product (6.3) is positive semidefinite on Qk if and only
if M (f, k) is positive semidefinite.
So if the parameter f is reflection positive, then the inner product is positive
semidefinite on every Qk ; equivalently, it is positive definite on Qk /f . It follows
that the examples in section 4.3 provide several graph parameters for which the
algebras Qk /f have finite dimension. This means that in these cases our graph
algebra Qk /f is a Frobenius algebra (see Kock [2003]). For a reflection positive
parameter, the inner product is positive definite on Qk /f , so it turns Qk /f into an
inner product space.
Example 6.2 (Number of perfect matchings). Consider the number pm(G)
of perfect matchings in the graph G. It is a basic property of this value that
subdividing an edge by two nodes does not change it. This can be expressed as
≡ (mod pm),
where the black nodes are labeled.
6.1. ALGEBRAS OF QUANTUM GRAPHS 85
Sometimes it will be convenient to put all k-labeled graphs into a single struc-
ture as follows. Recall the notion of partially labeled graphs from Section 4.2, and
also the notion of their gluing product. Let QN denote the (infinite dimensional)
vector space of formal linear combinations (with real coefficients) of partially labeled
graphs. We can turn QN into an algebra by using the product G1 G2 introduced
above (gluing along the labeled nodes) as the product of two generators, and then
extending this multiplication to the other elements linearly. Clearly QN is associa-
tive and commutative, and the empty graph is a unit element.
A graph parameter f defines an inner product on the whole space QN by (6.3),
and we can consider the kernel N (f ) = {x ∈ QN : ⟨x, y⟩ = 0 ∀y ∈ QN } of this
inner product. It is not hard to see that Nk (f ) = Qk ∩ N (f ).
For every finite set S ⊆ N, the set of all formal linear combinations of S-labeled
graphs form a subalgebra QS of QN . We set QS /f = {x/f : x ∈ QS }. Clearly
QS /f is a subalgebra of QN /f , and it is not hard to see that QS /f ∼= Q|S| /f . The
graph with |S| nodes labeled by the elements of S and no edges, which we denote
by OS , is a unit in the algebra QS .
i.e., formal linear combinations of graphs in Fk,k . The graph Ok on [k] with no
edges, with its nodes labeled 1, . . . , k from both sides, is a unit in the algebra.
This algebra is associative, but not commutative. It has a “conjugate” opera-
tion, which we denote by ∗, of interchanging “left” and “right”. This is related to
multiplication through the identity (A ◦ B)∗ = B ∗ ◦ A∗ .
Given a graph parameter f defined on looped-multigraphs, we can define the
inner product of two (k, k)-labeled graphs as before: we consider them as multil-
abeled graphs (where left label i is different from right label i), form their gluing
product, and evaluate the parameter on the resulting multigraphs. (We have to
work with multilabeled graphs, since a node is allowed to have two labels. As a
consequence, the gluing product can have loops.)
A further natural generalization involves graphs with possibly different numbers
of labeled nodes on the left and on the right. Let Fk,m denote the set of multigraphs
with k labeled nodes on the left and m labeled nodes on the right. We cannot
form the product of any two graphs, but we can multiply a graph F ∈ Fk,m
with a graph G ∈ Gm,n to get a graph F ◦ G ∈ Fk,n . So bi-labeled graphs form
the morphisms of a category, in which the objects are the natural numbers. The
star operation (interchanging left and right) maps Fk,m onto Fm,k . Any graph
parameter f defines a scalar product on every Fk,m by ⟨F, G⟩ = f (F G), where F G
is defined by identifying nodes with the same left-label as well as nodes with the
same right-label in the disjoint union of F and G.
Just as above, the operations ◦, ∗, and ⟨., .⟩ extend linearly to the linear spaces
Qk,m of formal linear combinations of graphs in Fk,m . This leads us to semisimple
categories and topological quantum field theory (see Witten [1988]), which topics
are beyond the limits of this book.
6.1.3. Unlabeling. Having defined the graph algebras we need, we are going
to describe the relationship between algebras of labeled graphs using different label
sets. There is nothing terribly deep or surprising here; but it might serve as a
warm-up, illustrating how combinatorial and algebraic constructions correspond to
each other.
The unlabeling operator G 7→ [[G]]S extends to Q by linearity. We note that for
= [[[[G]]S [[H]]S ]] ∼
any two partially labeled graphs G and H, [[[[G]]S H ]] ∼ = [[G[[H]]S ]],
and hence we get the identity
(6.6) ⟨[[x]]S , y⟩ = ⟨[[x]]S , [[y]]S ⟩ = ⟨x, [[y]]S ⟩ (x, y ∈ Q).
By a similar argument we get that if S, T ⊂ N are finite sets, then
(6.7) ⟨x, y⟩ = ⟨[[x]]S∩T , [[y]]S∩T ⟩ (x ∈ QS , y ∈ QT )
One consequence of identity (6.6) is that if some x ∈ Q is congruent modulo f
to some S-labeled quantum graph y ∈ QS , then such a y can be obtained by simply
removing the labels outside S:
(6.8) x − y ∈ N (f ) =⇒ x − [[x]]S ∈ N (f ).
Indeed, for any z ∈ Q, we have
⟨x − [[x]]S , z⟩ = ⟨x, z⟩ − ⟨[[x]]S , z⟩ = ⟨y, z⟩ − ⟨[[x]]S , z⟩
= ⟨y, [[z]]S ⟩ − ⟨x, [[z]]S ⟩ = ⟨y − x, [[z]]S ⟩ = 0.
6.1. ALGEBRAS OF QUANTUM GRAPHS 87
As a special case, we get that [[x]]S ∈ N (f ) for all x ∈ N (f ). This implies that
the operator x 7→ [[x]]S is defined on the factor algebra Q/f , and in fact it gives the
orthogonal projection of Q/f to the subalgebra QS /f . Indeed, by (6.6)
⟨[[x]]S , x − [[x]]S ⟩ = ⟨x, [[x − [[x]]S ]]S ⟩ = ⟨x, [[x]]S − [[x]]S ⟩ = 0.
Another consequence of (6.8) is that for every x ∈ Q there is a unique smallest
set S ⊂ N such that x ≡ [[x]]S (mod f ).
For the rest of this section, we assume that f is multiplicative and normalized
so that f (K1 ) = 1. (This latter condition is usually easily achieved by replacing
f (G) by f (G)/f (K1 )v(G) .)
One important consequence of this assumption is that deleting isolated nodes
(labeled or unlabeled) from a graph G does not change f (G). This implies that it
does not change G/f either. Indeed, let F denote the graph obtained from G by
deleting some isolated nodes, then for every partially labeled graph H, the products
F H and GH differ only in isolated nodes, and hence f (F H) = f (GH), showing
that F/f = G/f . In particular, every graph with no edges has the same image in
Q/f , which is the unit element of Q/f .
Lemma 6.3. Let f be a multiplicative and normalized graph parameter, and let
S ⊆ T be finite subsets of N.
(a) If S ⊆ T , then QS /f has a natural embedding into QT /f .
(b) For any two S, T ⊆ N, we have QS /f ∩ QT /f = QS∩T /f .
(c) If S ∩ T = ∅, then QS QT ∼
= QS ⊗ QT and (QS /f )(QT /f ) ∼
= QS /f ⊗ QS /f .
Proof. (a) Every S-labeled graph G can be turned into a T -labeled graph G′
by adding |T \ S| new isolated nodes, and label them by the elements of T \ S.
(This is equivalent to multiplying it by UT ). As remarked above, G − G′ ∈ N (f ),
and so G/f = G′ /f .
(b) The containment ⊇ follows from (a). To prove the other direction, we
consider any z ∈ QS /f ∩ QT /f . Then we have an x ∈ QS with x/f = z, and a
y ∈ QT with y/f = z. So x−y = x−[[y]]T ∈ N (f ), and so by (6.8), x−[[x]]T ∈ N (f ).
But we can write this as [[x]]T − [[x]]S ∈ N (f ), and then by the same reasoning
[[x]]T − [[[[x]]T ]]S = [[x]]T − [[x]]T ∩S ∈ N (f ), showing that x − [[x]]T ∩S ∈ N (f ), and
so z = x/f ∈ QS∩T /f .
(c) The first relation is trivial, since the partially labeled graphs F G, F ∈ FS• ,
G ∈ FT• are different generators of QS∪T . To prove the second, let a1 , a2 , . . . be any
basis of QS /f and b1 , b2 , . . . , any basis of QT /f . Consider the map ai ⊗ bj 7→ ai bj
(which is defined on a basis of QS /f ⊗ QT /f ), and extend it linearly to a map
Φ : QS /f ⊗ QT /f → (QS /f )(QT /f ). We show that Φ is an isomorphism between
QS /f ⊗ QT /f and (QS /f )(QT /f ).
It is straightforward to check that Φ preserves product in the algebra and also
the unit element. It is also clear that (QS /f )(QT /f ) is generated by the elements
ai bj , so Φ is surjective. To prove that Φ is injective, suppose that ∑ there are real
numbers cij of which a finite but positive number is nonzero such that i,j cij ai bj =
0. Then for every x ∈ QS /f and y ∈ QT /f , we have by multiplicativity
∑ ∑ ∑ ⟨ ∑ ⟩
cij f (xai )f (ybj ) = cij f (xyai bj ) = cij ⟨xy, ai bj ⟩ = xy, cij ai bj = 0.
i,j i,j i,j i,j
88 6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
⟨ ∑ ⟩ ∑
Writing this equation as y, i,j cij f (xai )bj = 0, we see that i,j cij f (xai )bj = 0.
Since the bi are linearly independent, this means that for every 1 ≤ j ≤ m,
⟨ ∑ ⟩ ∑
x, cij ai = cij f (xai ) = 0.
i i
∑
This implies that i cij ai = 0, and since the ai are linearly independent, it follows
that cij = 0 for all i and j.
Corollary 6.4. For every multiplicative graph parameter f with finite rank, r(f, k)
is a supermultiplicative function of k in the sense that
r(f, k + l) ≥ r(f, k)r(f, l).
Proof. It follows from Lemma 6.3(c) that for any two disjoint finite sets S and
T there is an embedding
(6.9) QS /f ⊗ QT /f ,→ QS∪T /f.
Considering the dimensions, the assertion follows.
Exercise 6.5. Prove that if all nodes of a simple graph F are labeled, then both
F and the quantum graph Fb introduced above are idempotent in the algebra of
simple partially labeled graphs: F 2 = F and Fb2 = Fb.
Exercise 6.6. Let f be a graph parameter for which r(f, 2) = r is finite.
(a) Prove that every path labeled at its endpoints can be expressed, modulo f , as
a linear combination of paths of length at most r.
(b) Prove that a 2-labeled m-bond B m•• can be expressed, modulo f , as a linear
combination of 2-labeled k-bonds with k ≤ r − 1.
(c) A series-parallel graph is a 2-labeled graph obtained from K2•• by repeated
application of the gluing and concatenation operations. Prove that every series-
parallel graph can be expressed, modulo f , as a linear combination of series-
parallel graphs with at most 2r−1 edges.
The set Odd(G) can be expressed in this basis by discrete Fourier inversion:
1 ∑
Odd(G) = k−1 (−1)|S∩X| pX .
2
X⊆[k−1]
It follows that
( )
G 7→ (−1)|S∩Odd(G)| : S ⊆ [k − 1]
k−1
defines an algebra isomorphism between Qk /Eul and R2 .
For two idempotents p and q in Q/f , we say that q resolves p, if pq = q. It is
clear that this relation is transitive.
Lemma 6.8. Let r be any idempotent element of QS /f . Then r is the sum of
those idempotents in BS that resolve it.
90 6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
∑
Proof. Indeed, we can write r = p∈BS µp p with some scalars µp . Using that
r is idempotent, we get that
∑ ∑
r = r2 = µp µp′ pp′ = µ2p p,
p,p′ ∈BS p∈BS
which shows that = µp for every p, and so µp ∈ {0, 1}. So r is the sum of some
µ2p
subset X ⊆ BS . It is clear that rp = p for p ∈ X and rp = 0 for p ∈ BS \ X, so X
consists of exactly those elements of BS that resolve q.
As a special case, we see that
∑
(6.10) u= p
p∈BS
is the unit element of QS (this is the image of the edgeless graph US ), and also the
unit element of the whole algebra Q.
Lemma 6.9. Let S ⊂ T be two finite sets. Then every q ∈ BT resolves exactly one
element of BS .
Proof. We have by (6.10) that
∑ ∑ ∑
u= p= q,
p∈BS p∈BS q∈BT
q resolves p
and also ∑
u= q,
q∈BT
so by the uniqueness of the representation we get that every q must resolve exactly
one p.
f (q)
Lemma 6.10. If p ∈ BS and q resolves p, then [[q]]S = f (p) p.
To show that the weighted graph H obtained this way satisfies f (G) =
hom(G, H) for any multigraph G, we may assume that V (G) = [k] and all nodes of
G are labeled. Then we can write
∏
(6.14) G= Kuv ,
uv∈E(G)
where Kuv is the graph on k labeled nodes, with a single edge connecting u and v.
Defining pφ = pφ(1) ⊗ · · · ⊗ pφ(k) for φ : [k] → [q], the k-labeled quantum graphs
pφ form a basis of Qk /f consisting of idempotents, and hence
∑
(6.15) Ok = pφ .
φ: [k]→[q]
The general (and degenerate) case takes more work, but we are ready to give
it now.
Proof of Theorem 5.54. The idea is that we find a basic idempotent p ∈ BS
for a sufficiently large finite set S ⊆ N, with the property that the subalgebra pQ/f
behaves like the whole algebra behaved in generic case. So the idempotent bases in
it, and from these the weighted graph H, can be constructed explicitly.
Bounding the expansion. If a basic idempotent p ∈ BS has degree D, then by
Lemma 6.14, there are D basic idempotents in BT with |T | = |S| + 1 with degree
≥ D that resolve p. Hence if |T | ≥ |S|, then the dimension of QT is at least D|T \S| .
It follows that the degrees of basic idempotents are bounded by q. Let us choose
S and p ∈ BS so that D = deg(p) is maximum degree. Then it follows by Lemma
6.14 that all basic idempotents resolving p have degree exactly D.
Describing the idempotents. Let us fix a set S and a basic idempotent p ∈ BS
with maximum degree D. For u ∈ N \ S, let q1u , . . . , qDu
denote the elements of
BS∪{u} resolving p.
We can describe, for a finite set T ⊃ S, all basic idempotents in BT that resolve
p. Let V = T \ S, and for every map φ : V → {1, . . . , D}, let
∏
v
(6.17) qφ = qφ(v) .
v∈V
Note that by Lemma 6.11,
∏ ( ∏ f (qφ(v)
v
))
(6.18) f (qφ ) = f ( v
qφ(v) ) = f (p) ̸= 0,
f (p)
v∈V v∈V
and so qφ ̸= 0.
6.2. REFLECTION POSITIVITY 93
Claim 6.15. The basic idempotents in QT /f resolving p are exactly the algebra
elements of the form qφ , φ ∈ {1, . . . , D}V .
Constructing the target graph. Now we can define H as follows. Let H be the
looped complete graph on V (H) = {1, . . . , D}. We have to define the node weights
and edge weights.
Fix any u ∈ N \ S. For every i ∈ V (H), let αi = f (qiu )/f (p) be the weight of
the node j. Clearly αi > 0.
Let u, v ∈ N\S, v ̸= u, and let W = S ∪{u, v}. Let Kuv denote the graph on W
which has only one edge connecting u and v, and let kuv denote the corresponding
element of QW . We can express pkuv as a linear combination of elements of BW,p
(since for any r ∈ BW \ BW,p one has rp = 0 and hence rpku,v = 0):
∑
pkuv = βij qiu qjv .
i,j
This defines the weight βij of the edge ij. Note that βij = βji , since pkuv = pkvu .
Verifying the target graph. We prove that this weighted graph H gives the
right homomorphism function: f (G) = hom(G, H) for every multigraph G. By
(6.19), we have for each pair u, v of distinct elements of V (G)
∑ ∑ ∑ ∑
pkuv = βi,j qiu qjv = βi,j qφ = βφ(u),φ(v) qφ .
i,j∈V (H) i,j∈V (H) φ: φ(u)=i φ∈V (H)V
φ(v)=j
The factor f (p) > 0 can be cancelled from both sides, completing the proof of the
theorem.
Note that in every constituent of x ◦ y the labeled nodes are nonadjacent for
all x, y ∈ Q2 . It follows that if the algebra (Q2 /f, ◦) has a multiplicative identity
(in particular, if it has a contractor), then every y ∈ Q2 /f can be represented by a
2-labeled quantum graph with nonadjacent labeled nodes.
While the existence of a contractor, the existence of a connector, and con-
tractibility are three different properties of a graph parameter, there is some con-
nection, as expressed in the following propositions.
Proposition 6.18. If a graph parameter has a contractor, then it is contractible.
Proof. Let w be a contractor for f . Suppose that x ∈ Q2 satisfies x ≡ 0
(mod f ), and let y ∈ Q1 . Choose a z ∈ Q2 such that z ′ = y. Then
( ) ( ) ( )
f (x′ y) = f (x′ z ′ ) = f (xz)′ = f (xz)w = f x(zw) = 0,
showing that x′ ≡ 0 (mod f ).
(mod f ), and thus z ◦ P2•• is a connector. The second assertion is trivial by the
same construction.
Proposition 6.20. If f is contractible, has a connector, and r(f, 2) is finite, then
f has a contractor.
Proof. Since ⟨x, y⟩ = f (xy) is a symmetric (possibly indefinite) bilinear form
that is not singular on Q2 /f , there is a basis p1 , . . . , pN in Q2 /f such that f (pi pj ) =
0 if i ̸= j and f (pi pi ) ̸= 0. By the assumption that f has a connector, we may
represent this basis by quantum graphs with nonadjacent labeled nodes; then the
contracted quantum graphs p′i have no loops. Let
∑N
f (p′i )
z= pi .
i=1
f (p2i )
We claim that z is a contractor. Indeed, let x ∈ Q2 be a quantum graph with
∑N
nonadjacent labeled nodes, and write x ≡ i=1 ai pi (mod f ). Then we have
∑
N
f (p′i ) ∑ N
f (xz) = ai
2 f (p2i ) = ai f (p′i ).
i=1
f (pi ) i=1
∑N
On the other hand, contractibility implies that x ≡ i=1 ai p′i (mod f ), and so
′
∑
N
f (x′ ) = ai f (p′i ) = f (xz).
i=1
Proposition 6.21. If M (f, 2) is positive semidefinite and has finite rank r, and
f is contractible, then f has a connector whose constituents are paths of length at
most r + 1.
Proof. Since Q2 /f is finite dimensional, there is a linear dependence between
••
P2•• , P3•• , . . . , Pr+2 in Q2 /f . Hence there is a (smallest) k ≥ 2 such that Pk•• can
be expressed as
∑r
(6.24) Pk•• ≡ ••
ai Pk+i (mod f )
i=1
with some real numbers
∑r a1 , . . . ••
, ar . The assertion is equivalent to saying that k = 2.
Let x = P2•• − i=1 ai P2+i . Then (6.24) can be written as x ◦ Pk−1 ••
≡ 0
••
(mod f ). If k = 3, then this implies that x ◦ P2+i ≡ 0 (mod f ) for all( i ≥ 0, and )
hence x ◦ x ≡ 0 (mod f ). Using contractibility we obtain that 0 = f (x ◦ x)′ =
f (x2 ). Now semidefiniteness of M (f, 2) shows that x ≡ 0 (mod f ). So (6.24) holds
with k = 2 as well, a contradiction.
Suppose that k > 3, then
•• 2 •• ••
(x ◦ Pk−2 ) = (x ◦ Pk−1 )(x ◦ Pk−3 ) ≡ 0 (mod f ),
and so by the assumption that M (f, 2) is positive semidefinite, we get that x ◦
••
Pk−2 ≡ 0 (mod f ), which contradicts the minimality of k again.
The following statement is a corollary of Propositions 6.20 and 6.21.
6.3. CONTRACTORS AND CONNECTORS 97
Corollary 6.22. If M (f, 2) is positive semidefinite and has finite rank, and f is
contractible, then f has a contractor.
We conclude this section with a number of examples of connectors and con-
tractors.
Example 6.23 (Perfect matchings). Recall that pm(G) denotes the number of
perfect matchings in the graph G. We have seen that rk(pm, k) = 2k is exponentially
bounded, but pm is not reflection-positive, and thus pm(G) cannot be represented
as a homomorphism function.
On the other hand: pm has a contractor: a path of length 2, and also a con-
nector: a path of length 3.
Example 6.24 (Number of triangles). The graph parameter hom(K3 , .) has no
connector. Indeed, suppose that x ∈ Q2 is a connector, then we must have
hom(K3 , xP3•• ) = hom(K3 , K2•• P3•• ) = hom(K3 , K3 ) = 6,
and also
hom(K3 , xP4•• ) = hom(K3 , K2•• P4•• ) = hom(K3 , C4 ) = 0.
On the other hand,
hom(K3 , xP3•• ) = hom(K3 , xP4•• ),
since x has with nonadjacent labeled nodes, so no homomorphism from K3 touches
the edges of the P3•• factor. This contradiction shows that hom(K3 , .) has no
connector.
A similar argument shows that hom(K3 , .) is not contractible (and so it has no
contractor).
Example 6.25 (S-Flows). The number flo(G) of flows on G with values from a
given subset of a finite abelian group can be described as a homomorphism function
(Example 5.16). It has a trivial connector, a path of length 2 (which is an algebraic
way of saying that if we subdivide an edge, then the flows don’t change essentially).
In the case of nowhere-0 flows, K2•• + U2•• is a contractor (which amounts to the
contraction-deletion identity for the flow polynomial). In general, it is more difficult
to describe the contractors, but it is possible (Garijo, Goodall and Nešetřil [2011]).
Example 6.26 (Density in a random graph). Recall the multigraph parameter
simp
from Example 5.60: f (G) = pe(G ) (0 < p < 1), which is reflection positive, mul-
tiplicative, and has finite connection rank. This parameter has neither a contractor
nor a connector; it is not even contractible. We have
≡ (mod f ),
but identifying the labeled nodes produces a pair of parallel edges in the first graph
but not in the second, so they don’t remain congruent.
−
→
Example 6.27 (Eulerian orientations). Recall that eul(G) denotes the number
−
→
of eulerian orientations of the graph G. We have seen that the graph parameter eul
is reflection positive, but has infinite connection rank (so it is not a homomorphism
function). Similarly as in Example 6.25, a path of length 2 is a connector. Fur-
thermore, this graph parameter is contractible, but has no contractor (see Exercise
6.33).
98 6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
∑
t
(6.27) B= as (BD2 )s−1 B.
s=2
Since different γ’s are algebraically independent over K (which contains the coef-
ficients αr ), the two sides must be equal formally. In particular, every product
γjr γkr must occur on the left side, which implies that γjr = γkr . By the definition
of γ, this means that βjr = βkr for every r, and so nodes j and k are twins, which
was excluded.
It follows that we can find a polynomial h of degree at most q 2 − 1 such that
h(γii ) = 1 but h(γij ) = 0 if i ̸= j. Hence we get a quantum graph w such that
(6.29) homij (w, H) = 1(i = j).
100 6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
Every constituent of w is the (gluing) product of at most q 2 −1 graphs in K(q 2 −1, 2),
so it is in class K(q 2 − 1, 2, q 2 − 1).
Consider the quantum graph w′ = w ◦ w. The constituents of w′ are graphs in
class K(q 2 − 1, 2, q 2 − 1, 2). Using (6.29), we get
∑
(6.30) homij (w′ , H) = αk homik (w, H)homkj (w, H) = αi 1(i = j).
k
Expressing the function i(t) = 1t 1(t ̸= 0) on the values of homij (w′ , H) by a poly-
nomial of degree at most q as above, we can construct a quantum graph z in class
K(q 2 − 1, 2, q 2 − 1, 2, q) satisfying (6.25).
∑
N ∑
N ∑
N
f (x2 ) = ⟨qi ⊗ qi , qj ⊗ qj ⟩ − 2 ⟨qi ⊗ qi , h⟩ + ⟨h, h⟩.
i=1 j=1 i=1
∑
N ∑
N
⟨qi ⊗ qi , qj ⊗ qj ⟩ = N.
i=1 j=1
6.4. ALGEBRAS FOR HOMOMORPHISM FUNCTIONS 101
Corollary 6.37. Let H be a weighted graph that has no twins and no proper au-
tomorphisms. Then r(hom(., H), k) = v(H)k for every k.
Theorem 6.36 has a number of essentially equivalent formulations, which are
interesting on their own right. One of these characterizes homomorphism functions
of the form homφ (F, H).
Theorem 6.38. Let H be a twin-free weighted graph and h : V (H)k → R. Then
there exists a k-labeled quantum graph z such that homφ (z, H) = h(φ) for every
φ ∈ V (H)k if and only if h is invariant under the automorphisms of H: for every
φ ∈ V (H)k and every automorphism σ of H, h(σ ◦ φ) = h(φ).
Another variant of these theorems gives a combinatorial description of the basic
idempotents p1 , . . . , pn in the algebra Qk /H, which played an important role in the
proof of the characterization theorem. For every φ ∈ V (H)k , we have
homφ (pi , H) = homφ (p2i , H) = homφ (pi , H)2 ,
and hence homφ (pi , H) ∈ {0, 1}. Furthermore, for i ̸= j, we have
homφ (pi , H)homφ (pj , H) = homφ (pi pj , H) = 0,
and hence the sets Φi = {φ ∈ V (H)k : homφ (pi , H) = 1}, which we call idempotent
supports, are disjoint. Since
∑
αφ (H)homφ (pi , H) = hom(pi , H) = hom(p2i , H) > 0,
φ
∑
the idempotent supports are nonempty. We have i pi = Ok , and hence
∑
n
homφ (pi , H) = homφ (Ok , H) = 1,
i=1
Indeed, for any (k − 1)-labeled graph F , and the graph F1 obtained from F
by adding a new isolated node labeled k, we have homφ′ (F, H) = homφ (F1 , H) =
homψ (F1 , H) = homψ′ (F, H).
Claim 6.42. Suppose that φ, ψ ∈ [q]k are equivalent. Then for every µ ∈ [q]k+1
such that φ = µ′ there exists a ν ∈ [q]k+1 such that ψ = ν ′ and µ and ν are
equivalent.
Indeed, let µ belong to the support of the basic idempotent p ∈ Qk+1 /H, then
for every ν ∈ V (H)k+1 we have homν (p, H) = 1(ν ∼ µ). Let p′ be obtained by
unlabeling k + 1 in p. Then
∑ ∑
(6.31) homφ (p′ , H) = αη(k+1) (H)homη (p, H) = αη(k+1) (H),
η:η ′ =φ η:η ′ =φ
η∼µ
and similarly
∑
(6.32) homψ (p′ , H) = αη(k+1) (H).
η:η ′ =ψ
η∼µ
These two numbers are equal since φ ∼ ψ. Since the right side of (6.31) is positive,
this implies that the sum in (6.32) is nonempty, and hence there is a map ν such
that ν ′ = ψ and ν ∼ µ.
The next observation makes use of the twin-free assumption.
Claim 6.43. Every map σ : [q] → [q] such that βσ(i)σ(j) = βij for every i, j ∈ [q]
is bijective.
To prove this, note that the mapping σ has some power γ = σ s that is idem-
potent. Then for all i, j ∈ [q], we have βij = βγ(i)γ(j) = βγ 2 (i)γ(j) = βγ(i)j , which
shows that i and γ(i) are twins for all i ∈ [q]. Since H is twin-free, this implies
that γ is the identity, and so σ must be bijective.
After this preparation, we prove the theorem for larger and larger classes of
mappings.
Case 1: φ is bijective. Then k = q. We may assume that the nodes of H are
labeled so that φ is the identity, and then we want to prove that ψ (viewed as a
map of V (H) into itself) is an automorphism of H. First, we show that
(6.33) βij = βψ(i)ψ(j)
for every i, j ∈ [k]. Indeed, let kij be the k-labeled graph consisting of k nodes and
a single edge connecting nodes i and j. Then βij = homφ (kij , H) = homψ (kij , H) =
βψ(i)ψ(j) . It follows by Claim 6.43 that ψ is also bijective.
Second, we show that for every j ∈ [k],
(6.34) αj = αψ(j) .
It suffices to prove this for the case j = k. For the graph Ok−1 consisting of k − 1
isolated labeled nodes,
∏
k−1
homφ′ (Ok−1 , H) = αj ,
j=1
104 6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
Proof of Theorems 6.36, 6.38, and 6.39. Theorem 6.39 is trivially equiv-
alent to Theorem 6.40 by the description of idempotent supports. The “only if” part
of Theorem 6.38 is also trivial. To prove the “if” part, notice that every function
h : V (H)k → R invariant under automorphisms can be written as a linear combi-
nation of indicator functions of the orbits of the automorphism group. By Theorem
6.39, this means that it is a linear combination of the functions homφ (pi , H), and
hence it is of the form homφ (z, H) with some z ∈ Qk .
Finally, it follows that the number of orbits of the automorphism group of H on
V (H)k is the number of the idempotents pi , which is r(f, k), which proves Theorem
6.36.
loops at v1 and v2 , which get some weight β different from all other edgeweights.
This last trick is needed to make sure that the graph H is twin-free.
We claim that for every 1-labeled graph F
(6.35) homv1 (F, H) = homv2 (F, H).
Indeed, if F is not connected, then those components not containing the labeled
node contribute the same factors to both sides. So it suffices to prove (6.35) when
F is connected. Then we have
∑
homv1 (F, H) = β eF (S) hom(F \ S, H1 ).
v1 ∈S⊆V (F )
a map into V (H1 ) (else, the contribution of the map to hom(F, H) is 0), and the
contribution of φ to homv1 (F, H) is the product of contributions from the edges
induced by S and the contribution of φ′ to hom(F \ S, H).
Since homv2 (F, H) can be expressed by a similar formula, and the sums on the
right hand sides are equal by hypothesis, this proves (6.35).
Now (6.35) can be phrased as the maps 1 7→ v1 and 1 7→ v2 are equivalent, and
so Theorem 6.40 implies that there is an automorphism of H mapping v1 to v2 .
This automorphism gives an isomorphism between H1 and H2 .
6.4.2. The size of basis graphs. Every element of the factor algebra Qk /H
has many representations as a quantum graph in Qk . The following theorem asserts
that it has a representation whose constituents are (in a sense) small.
Theorem 6.45. Let H be a weighted graph with V (H) = [q]. The algebra Qk /H
is generated by simple k-labeled graphs with at most 2(k + q 2 )q 6 nodes, in which the
labeled nodes form a stable set.
Proof. Let F = (V, E) be any k-labeled graph; we construct a simple k-labeled
quantum graph x, where each constituent has no more than 2(k + q 2 )q 6 nodes, and
F ≡ x (mod H). ( )
Let z be a 2-labeled quantum graph such that homφ (z, H) = 1 φ(1) = φ(2)
for all φ : {1, 2} → [q]. (So z is very similar to a contractor. We have z 2 = z,
but z ◦ z ̸= z.) We may assume that every constituent of z has at most 2q 6 nodes
(Supplement 6.29 and the Remark after it). Let z = O2 − z. Let w be a simple
connector; we can assume that w is a linear combination of paths of length at least
3 and at most q + 3 by Exercise 6.46.
Let us glue a copy of U2 on every pair of distinct nodes of V ; this does not
change F . But we can expand every O2 as O2 = z + z, and obtain a representation
|V |
of F as a sum of quantum graphs xℓ (ℓ = 1, . . . , 2( 2 ) ), each of which is obtained
from F by gluing either z or z on every pair of nodes in V .
Many of these terms will be 0. For any term xℓ , let Gℓ denote the graph on
S in which two nodes are connected if and only if they have a copy of z glued on.
If (i, j) and (j, k) have z glued on, but (i, k) has z, then the union of these three
is 0 as a quantum graph in Q3 (this is easy to check; cf. Exercise 6.32). Hence if
xi is a nonzero term, then adjacency must be transitive in Gℓ , and so Gℓ consists
of disjoint complete graphs. If Gℓ has more than q components, then any map
V → V (H) will collapse two nodes of V on which a z is glued, and hence xℓ = 0.
So we are left with only those terms in which Gℓ consists of at most q disjoint
106 6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
complete graphs. Let V = V1 ∪ · · · ∪ Vr be the partition onto the node sets of these
components (r ≤ q).
Let us select a representative node vi from every Vi . It is easy to see that
deleting the copies of z except those which are attached to a vi , and also the copies
of z except those connecting two nodes vi , does not change xℓ .
If uv ∈ E with u ∈ Vi and v ∈ Vj (i ̸= j), then we can “slide” this edge to
vi vj without changing xℓ (cf. Exercise 6.31). If u, v ∈ Vi , then we replace the edge
uv by a simple connector w in which the labeled nodes are at a distance at least 3
(cf. Exercise 6.46), and then slide both attachment nodes to vi , to get a copy of w′
hanging from vi .
Each constituent of the resulting quantum graph consists of a “core”, the set
of the nodes vi and the set of labeled nodes, at most k + q nodes altogether. What
is not bounded is the sets of edges connecting a vi and a vj , the sets of copies of w′
hanging from a vi , and the copies of z connecting vi to other nodes in Vi .
However, we can get rid of these unbounded multiplicities. First, a set of q 2 or
more parallel edges can be replaced by a linear combination of sets of parallel edges
with multiplicity at most q 2 − 1, by Exercise 6.6(b). By a similar argument, a set of
q or more copies of w′ hanging from the same node vi can be expressed as a linear
combination of sets of at most q − 1 copies. Finally, again by the same argument,
a set of q or more copies of z connecting vi to unlabeled nodes can be expressed as
(aq)linear combination of sets of at most q − 1 copies. So we are left with at most
2 (q 2
− 1) edges that may be parallel to others, at most q(q − 1) hanging copies of
w′ , and at most k + q(q − 1) copies of z.
We get rid of the edge multiplicities by replacing each edge between core nodes
by a simple connector w.
After that, each constituent will be a simple graph. By the choice of z( and ) 2 q,
2 (q −
q
the number of nodes in( each constituent
) will be bounded by k + q + (q + 2)
1) + (q + 2)q(q − 1) + k + q(q − 1) (2q 6 ) < 2(k + q 2 )q 6 .
As an application of the previous theorem, we prove Theorem 5.33 in its full
strength, including the bounds on the sizes of the graphs needed.
Proof of Theorem 5.33. Following the proof of Corollary 6.44, we have to
show that (6.35) holds. We do know that it holds for every F with at most 2(v(H1 )+
v(H2 ) + 3)8 nodes. Since this includes all basis graphs of Q1 /H by Theorem 6.45,
it follows that (6.35) holds for all simple 1-labeled graphs F . From here, the proof
is unchanged.
Exercise 6.46. Prove that for every weighted graph H with q nodes and every
t ≥ 2, hom(., H) has a connector whose constituents are Pt•• , Pt+1
•• ••
, . . . , Pt+q .
Exercise 6.47. Prove that for every weighted graph H, hom(., H) has a contrac-
tor whose constituents are series-parallel graphs.
Suppose that such an expression has been computed for every proper descen-
dant of i. The ki -labeled graph Fi is obtained from Gi by attaching different
branches Fj at the sets Sj . We already know how to express each Fj in the basis
Bkj ; let us substitute this expression for Fj , to get a representation of Fi as a linear
combination of graphs, each of which consists of Gi with some number of basis
graphs attached at various subsets S ⊆ V (Gi ) with |S| ≤ k. If two or more basis
graphs are attached at the same set S, we can replace them by one, since we have
precomputed products of basis graphs. But then we have a linear combination of
ki -labeled graphs of the type we have already expressed in the basis Bki .
When we get to the root, we consider it 0-labeled, and get an expression for G
in the basis B0 , which yields the value f (G).
Lemma 6.49. The polynomials hom(g, X), where g ∈ Q0 , form the space C[X]Sq .
∏ a
Proof. Let X a = i≤j xijij be any monomial, and let G denote the multi-
graph on [q] in which nodes i and j are connected by aij edges. Then inj(G, X) =
6.6. THE POLYNOMIAL METHOD 109
∑
σ∈Sq (X ) . Since every polynomial in C[X]Sq can be written as a linear combi-
σ a
nation (with constant coefficients) of such special polynomials, it follows that every
polynomial in C[X]Sq can be written as inj(g, X) for some quantum graph g. By
identity (5.18) (which remains valid if the graph G is replaced by the matrix X),
this implies the Lemma.
Next, we describe quantum graphs g with hom(g, X) = 0 (identically 0 as a
polynomial in the entries of X). Note that if we remove an isolated node to a
constituent of any quantum graph g, and multiply its coefficient by q, then we get
a quantum graph g ′ such that hom(g, X) = hom(g ′ , X). Let us call the repeated
application of this operation an isolate removal.
Lemma 6.50. A quantum graph g satisfies hom(g, X) = 0 if and only if there
is a quantum graph h in which all constituents have more than q nodes such that
removing isolates from g we obtain M h.
Proof. If g = M h, where all constituents of h have more than q nodes, then
hom(g, X) = inj(h, X) = 0. Isolate removal does not change the value of hom(g, X).
Conversely, suppose that hom(g, X) = 0. We may assume that the constituents
of g have no isolated nodes. We have inj(Zg, X) = hom(g, X) = 0. If Zg has a
constituent with at most q nodes, then this produces in inj(Zg, X) a term which
does not cancel (here we use that the constituent has no isolated nodes). So all
constituents of Zg have more than q nodes, and we can take h = Zg.
Now we are ready to prove Theorems 5.56 and 5.57.
Proof of Theorem 5.56. Multiplicativity implies that f (K0 ) = 1 (since f is
not identically 0), and f (GK1 ) = qf (G).
We want to prove that f = hom(., A) for an appropriate symmetric complex
matrix A; in other words, we want to show that the polynomial equations
(6.36) hom(G, X) − f (G) = 0 (for all looped multigraphs G)
are solvable for the variables xij (1 ≤ i, j ≤ q) over the complex numbers. We are
going to use Hilbert’s Nullstellensatz for this, but we need some preparation. We
begin with relating the kernel of the map hom(., X) to the kernel of f .
Claim 6.51. If hom(g, X) = c (a constant polynomial) for some quantum graph g,
then f (g) = c.
First we consider the case when c = 0. We may assume that g has no isolated
nodes, since isolate removal does not change the values hom(g, X) and f (g). By
Lemma 6.50, g = M h for some quantum graph h in which all constituents have
more than q nodes. But then f (M h) = 0 by the hypothesis of the Theorem.
The case of general constant c follows easily: we have hom(g − cK0 , X) =
hom(g, X) − c = 0, and hence f (g) = f (g − cK0 ) + c = c.
Claim 6.52. The ideal generated by the polynomials hom(g, X) with f (g) = 0 does
not contain the constant polynomial 1.
Suppose that we have a representation
∑
N
1= pi (X)hom(gi , X),
i=1
110 6. GRAPH ALGEBRAS AND HOMOMORPHISM FUNCTIONS
where f (gi ) = 0, and the pi are arbitrary polynomials in C[X]. Let us apply a
permutation σ ∈ Sq to the variables, and sum over all σ. We get:
∑
N ∑ N (∑
∑ )
q! = pi (X σ )hom(gi , X σ ) = pi (X σ ) hom(gi , X).
i=1 σ∈Sq i=1 σ∈Sq
Since f is reflection positive, this value must be nonnegative for every k, which
implies that q is a nonnegative integer.
Claim 6.54. If G is a multigraph with k = v(G) > q, then f (M G) = 0.
Let us label the nodes of G by [k], to get a k-labeled graph. Then M G = [[hk G]],
and so f (M G) = ⟨hk , G⟩. Equation (6.37) implies that ⟨hk , hk ⟩ = 0, which (using
reflection positivity again) implies that ⟨hk , G⟩ = 0, which proves the Claim.
So Theorem 5.56 applies, and we get that there exists a symmetric matrix
A ∈ Cq×q for which f = hom(., A). To complete the proof, we have to show:
6.6. THE POLYNOMIAL METHOD 111
Exercise 6.56. Show by an example that hom(G, Z) can be real for every multi-
graph G for a non-real matrix Z.
Part 3
The aim of this Chapter is to introduce certain analytic objects, which will
serve as limit objects for graph sequences in the dense case. In the Introduction
(Section 1.5.3) we already gave an informal description of how these graphons enter
the picture as limit objects; however, for the next few chapters we will not talk
about graph sequences, but we treat graphons as generalizations of graphs, to
which many graph-theoretic definitions and results can be extended. Quite often,
the formulation and even the proof of these more general facts are easier in this
analytic setting. We will define the cut norm and cut-distance of these objects,
state and prove regularity lemmas for them, and prove basic properties of sampling
from them. These results will enable us to show that these are just the right objects
to represent the limits of convergent dense graph sequences.
∫1
(7.1) dW (x) = W (x, y) dy.
0
(If the graphon is associated with a simple graph G, this corresponds to the scaled
degree dG (x)/v(G).) We will see more such quantities in the next sections.
Instead of the interval [0, 1], we can consider any probability space (Ω, A, π)
with a symmetric measurable function W : Ω × Ω → [0, 1]. This would not provide
substantially greater generality, but it is sometimes useful to represent graphons by
probability spaces other than [0, 1]. We’ll discuss this in detail in Chapter 13, but
will use this different way of representing a graphon throughout.
Graphons will come up in several quite different forms in our discussions. In
Theorem 11.52 we will collect the many disguises in which they occur.
We can think of the interval [0, 1] as the set of nodes, and of the value W (x, y)
as the weight of the edge xy. Then the formula above is an infinite analogue of
weighted homomorphism numbers. We get weighted graph homomorphisms as a
special case when W is a stepfunction: For every unweighted multigraph F and
weighted graph G,
Of the two modified versions of homomorphism densities (5.12) and (5.13), the
notion of the injective density tinj has no significance in this context, since a random
assignment i 7→ xi (i ∈ V (F ), xi ∈ [0, 1]) is injective with probability 1. In other
words, tinj (F, W ) = t(F, W ) for any kernel W and any graph F . But the induced
subgraph density is worth defining, and in fact it can be expressed by a rather
7.2. GENERALIZING HOMOMORPHISMS 117
simple integral:
∫ ∏ ∏ ( ) ∏
(7.3) tind (F, W ) = W (xi , xj ) 1 − W (xi , xj ) dxi .
ij∈E ij∈(V2 )\E i∈V
[0,1]V
We should point out that tinj (F, WH ) ̸= tinj (F, H) and tind (F, WH ) ̸= tind (F, H) in
general. We have seen that tinj (F, WH ) = t(F, WH ) = t(F, H). For the induced
density, tind (F, WH ) has a combinatorial meaning if H is a looped-simple graph:
it is the probability that a random map V (F ) → V (H) (not necessarily injective)
preserves both adjacency and nonadjacency.
Many other basic properties of homomorphism numbers extend to graphons,
often to kernels, in a straightforward way, like (5.19) generalizes to
∑
(7.5) t(F, W ) = tind (F ′ , W ),
F ′ ⊇F
This shows that∑we can still identify a signed graph F = (V, E + , E − ) with the
quantum graph Y ⊆E− (−1)|Y | (V, E+ ∪ Y ).
If all edges are signed “+”, then t(F, W ) is the same as for unsigned graphs.
If Fb is the signed complete graph, obtained from an unsigned simple graph F on
the same node set, in which the edges of F are signed positive and the edges of the
complement are signed negative, then we get the following identity, equivalent to
(7.4):
Proposition 7.1. The graph parameter t(., W ) is multiplicative and reflection pos-
itive for every kernel W ∈ W. The corresponding simple graph parameter is also
multiplicative, and it is reflection positive if W ∈ W0 .
Proof. The second assertion is more difficult to prove, and we describe the
proof in this case only. Multiplicativity is trivial. To prove that t(., W ) is reflection
positive, consider any finite set F1 , . . . , Fm of k-labeled graphs, and real numbers
y1 , . . . , ym . We want to prove that
∑
m
t([[Fp Fq ]], W )yp yq ≥ 0.
p,q=1
For every k-labeled graph F with node set [n], let F ′ denote the subgraph of F
induced by the labeled nodes, and F ′′ denote the graph obtained from F by deleting
the edges spanned by the labeled nodes. Then we have
∑
m
(7.13) yp yq t([[Fp Fq ]], W )
p,q=1
∫ ∑
m
= yp yq tx (Fp′′ , W )tx (Fp′′ , W )tx (Fp′ ∪ Fq′ , W ) dx.
[0,1]k p,q=1
∑
We substitute tx (Fp′ ∪ Fq′ , W ) = H tind,x (H, W ), where the summation extends
over all graphs on [k] containing Fp′ ∪ Fq′ as a subgraph. Interchanging summation,
7.2. GENERALIZING HOMOMORPHISMS 119
we get
∑
m
(7.14) yp yq t([[Fp Fq ]], W )
p,q=1
∫ ∑ ∑
= yp yq tx (Fp′′ , W )tx (Fp′′ , W )tind,x (H, W ) dx
[0,1]k H F ,F ⊆H
p q
If we integrate over all the xu , every term cancels in which the orientation is not
−
− (u) − d→
− (u) ̸= 0. Those terms
+
eulerian, i.e., where any of the nodes u has d→
F F
corresponding to eulerian orientations contribute 1. So the sum counts eulerian
orientations.
120 7. KERNELS AND GRAPHONS
We can generalize the functional t(F, W ) further (believe me, not for the sake
of generality). Let A be a set of kernels. An A-decorated graph is a finite simple
graph F = (V, E) in which every edge e ∈ E is labeled by a function We ∈ A. We
write w = (We : e ∈ E). For every W-decorated graph (F, w) we define
∫ ∏ ∏
(7.16) t(F, w) = Wij (xi , xj ) dxi .
ij∈E i∈V
[0,1]V
For a fixed graph F , the functional t(F, w) is linear in every edge decoration We .
So it may be considered as linear functional on the tensor product W ⊗ · · · ⊗ W
(one factor for every edge of F ), or equivalently, as a tensor on W with e(F ) slots.
This definition contains some of the previous variations on homomorphism
numbers, and it can be used to express homomorphism densities in sums of kernels.
Exercise 7.6. Let F and G be two simple graphs, and let W be a graphon such
that t(F, G) > 0 and t(G, W ) > 0. Prove that t(F, W ) > 0. [Hint: Use the
Lebesgue Density Theorem.]
Exercise 7.7. Prove that for any two simple graphs F and G with v(F ) ≤ v(G)
we have (v(F ))
tind (F, G) − tind (F, WG ) ≤ 2 .
v(G)
Exercise 7.8. Let us generalize the construction of graph integrals by adding
“nodeweights”: for every graph F and bounded measurable functions α : [0, 1] →
R and W : [0, 1]2 → R (where W is symmetric), we define
∫ ∏ ∏
t(F, α, W ) = α(xi ) W (xi , xj ) dx.
i∈V (F ) ij∈E(F )
[0,1]V (F )
by
∏ (A.16) in the Appendix. Applying this equation to the function f (x1 , . . . , xn ) =
ij∈E W (xi , xj ), we get the assertion.
We want to say that W and W φ are “weakly isomorphic”. One has to be a little
careful though, because measure preserving maps are not necessarily invertible,
and so the relationship between W and W φ in Proposition 7.10 is not symmetric
(see Example 7.11). For the time being, we take the easy way out, and call two
kernels U and W weakly isomorphic if t(F, U ) = t(F, W ) for every simple graph F .
We will come back to a characterization of weakly isomorphic kernels in terms of
measure preserving maps (in other words, proving a certain converse of Proposition
7.10) in Sections 10.7 and 13.2. It will also follow that in this case the equation
t(F, U ) = t(F, W ) holds for all multigraphs F (see Exercise 7.18 for a direct proof).
Weak isomorphism of kernels is clearly an equivalence relation, and we can iden-
tify kernels that are weakly isomorphic. This identification will play an important
role in our discussions.
122 7. KERNELS AND GRAPHONS
ϕ2 ϕ3
W W W
Figure 7.1. Gray-scale images of three graphons that are weakly
isomorphic, but not isomorphic up to a null set. Recall that the
origin is in the upper left corner.
This example illustrates that weak isomorphism is not a very easy notion. We
will return to it and develop more and more information about it when we introduce
distances between graphons, sampling, twin reduction, and other tools in the theory
of graphons.
Exercise 7.12. Suppose that two kernels U and W are weakly isomorphic. Prove
that so are the kernels aU + b and aW + b (a, b ∈ R).
Exercise 7.13. Prove that the kernels W , W φ2 and W φ3 in Example 7.11 are
weakly isomorphic, but not isomorphic up to a null set.
Every kernel can be written as the direct sum of connected kernels and perhaps the
0 kernel. (We have to allow the 0 kernel, which cannot be written as the sum of
7.4. SUMS AND PRODUCTS 123
connected kernels.) This decomposition is unique (up to zero sets); see Bollobás,
Janson and Riordan [2007] and Janson [2008] for more.
Somewhat confusingly, we can introduce three “product” operations on kernels,
and we will need all three of them. Let U, W ∈ W. We denote by U W their
(pointwise) product as functions, i.e.,
(U W )(x, y) = U (x, y)W (x, y).
We denote by U ◦ W their operator product (the name refers to the fact that this is
the product of U and W as kernel operators, see Section 7.5)
∫1
(U ◦ W )(x, y) = U (x, z)W (z, y) dz.
0
We note that U ◦ W is not symmetric in general, but it will be in the cases we use
this operation (for example, when U = W ).
Finally, we denote by U ⊗ W their tensor product; this is defined as a function
[0, 1]2 × [0, 1]2 → [0, 1] by
(U ⊗ W )(x1 , x2 , y1 , y2 ) = U (x1 , y1 )W (x2 , y2 ).
This function is not defined on [0, 1]2 and hence it is not in W; however, we can
consider any measure preserving map φ : [0, 1] → [0, 1]2 , and define the kernel
( )
(U ⊗ W )φ (x, y) = (U ⊗ W ) φ(x), φ(y) .
It does not really matter which particular measure preserving map we use here:
these kernels obtained from different maps φ are weakly isomorphic by the same
computation as used in the proof of Proposition 7.10, and so we can call any of
them the tensor product of U and W .
We note that the tensor product has the nice property that
(7.17) t(F, U ⊗ W ) = t(F, U )t(F, W )
for every multigraph F .
We denote the n-th power of a kernel according to these three multiplications
by W n (pointwise power), W ◦n (operator power), and W ⊗n (tensor power).
There are many other properties and constructions for graphs that can be
generalized to graphons in a natural way. For example, we call a graphon W
bipartite, if there is a partition V (G) = V1 ∪ V2 such that W (x1 , x2 ) = 0 for almost
all (x1 , x2 ) ∈ V1 × V2 . We can define k-colorable kernels similarly. We call a
graphon triangle-free, if t(K3 , W ) = 0. Simple facts like “every bipartite graphon is
triangle-free” can be proved easily. Often one faces minor complications because of
exceptional nullsets; a rather general remedy for this problem, called pure graphons,
will be introduced in Section 13.3.
Exercise 7.14. Show that for every simple graph F , t(F, W ◦n ) = t(F ′ , W ), where
F ′ is obtained from F by subdividing each edge by n − 1 new nodes.
Exercise 7.15. Prove that connectivity of a graphon is invariant under weak
isomorphism.
Exercise 7.16. Prove that a graphon W is bipartite if and only if t(C2k+1 , W ) = 0
for all k ≥ 1.
124 7. KERNELS AND GRAPHONS
Some subgraph densities have nice expressions in terms of this spectrum. Gen-
eralizing (5.31), we have
(7.22) ∫ ∑
t(Cn , W ) = W (x1 , x2 ) · · · W (xn−1 , xn )W (xn , x1 ) dx1 . . . dxn = λnk .
k
[0,1]n
where
∫1 ∏
(7.26) Mχ (v) = fχ(uv) (x) dx.
0 u: uv∈E
(One has to be careful, since (7.19) only converges in L2 , not necessarily almost
everywhere. But using (7.21) we can substitute for the values W (xi , xj ) one by one.)
This representation expresses t(F, W ) in an infinite “edge-coloring model”, which is
analogous to homomorphism numbers with the role of nodes and edges interchanged
(see Section 23.2 for a discussion of finite edge-coloring models): we sum over all
colorings of the edges with N; for every coloring, we take the product of nodeweights
and the product of edgeweights; the edgeweights are just the eigenvalues, and the
weight of a node is computed from the colors of the edges incident with it.
One consequence of (7.22) is that the cycle densities in W determine the spec-
trum of TW and vice versa. In fact, we don’t have to know all cycle densities: any
“tail” (t(Ck , W ) : k ≥ k0 ) is enough. This follows from Proposition A.21 in the
Appendix. In particular, we see that t(C2 , W ) = ∥W ∥22 is determined by the cycle
densities t(Ck , W ), k ≥ 3.
Exercise 7.18. (a) Let F = (V, E) be a multigraph without loops, and let us
subdivide each edge e ∈ E by m(e) ≥ 0 new nodes, to get a multigraph F ′ . Show
that using (7.24) the density of F ′ in W can be expressed by a formula similar
to (7.25). (b) Show that the densities of simple graphs in a kernel determine the
densities of multigraphs.
Exercise 7.19. Let W be a graphon. Prove that (a) all eigenvalues of TW
are contained in the interval [−1, 1]; (b) the largest eigenvalue is also largest
in absolute value; (c) at least one of the eigenvectors belonging to the largest
eigenvalue is nonnegative almost everywhere.
CHAPTER 8
We have announced in the Introduction that we are going to define the distance
of two arbitrary graphs, so that this distance will reflect structural similarity. The
definition is quite involved, and we will approach the problem in several steps:
starting with two graphs on the same node set, then moving to graphs with the
same number of nodes (but on unrelated sets of nodes), then moving to the general
case. Finally, we extend the definition to kernels, where it will turn out simpler (at
least in words) than in the finite case.
In this section we consider dense graphs. The definitions are of course valid for
all graphs, but they give a distance of o(1) between two graphs with edge density
o(1), so they are not useful in that setting.
(Note the normalization for the ℓ1 and ℓ2 norms: when A an adjacency matrix, all
these norms are between 0 and 1.)
Our main tool will be a less standard norm, called the cut norm, which was
introduced by Frieze and Kannan [1999]. This is defined by
∑
1
(8.4) ∥A∥ = 2 max Aij .
n S,T ⊆[n]
i∈S,j∈T
It is clear that
(8.5) ∥A∥ ≤ ∥A∥1 ≤ ∥A∥2 ≤ ∥A∥∞ .
Example 8.1. Let A be an n × n matrix, whose entries are independent random
±1’s (with expectation 0).
∑ Then ∥A∥1 = ∥A∥2 = ∥A∥∞ = 1. On the other
hand, the expectation of i∈S,j∈T Aij is 0, and the variance is Θ(n2 ), and so the
∑
expectation of i∈S,j∈T Aij is Θ(n). The expectation of the maximum in (8.4)
127
128 8. THE CUT DISTANCE
is more difficult to compute, but using the Chernoff–Hoeffding inequality, one gets
that ∥A∥ < 4n−1/2 with high probability.
Alon and Naor [2006] relate the cut norm of a symmetric matrix to its
Grothendieck norm (well known in functional analysis). It follows by the results
of Grothendieck that the cut norm is between two absolute constant multiples of
the Grothendieck norm. The Grothendieck norm can be viewed as a semidefinite
relaxation of the cut norm, and it is polynomial time computable to an arbitrary
precision. So we can compute, in polynomial time, an approximation of the cut
norm with a multiplicative error less than 2. We don’t go into the details of these
results here; in our setting it will be more important to approximate the cut norm
by a randomized sampling algorithm, to be described in Section 10.3.
We’ll say more about approximation of the cut norm in the more general setting
of graphons in Section 14.1.
8.1.2. Two graphs on the same set of nodes. Let G and G′ be two graphs
with a common node set [n]. From any of the matrix norms introduced above, the
norm of the difference of their adjacency matrices defines a distance between two
graphs. Two of these distances have special significance.
The ℓ1 distance
|E(G)△E(G′ )|
d1 (G, G′ ) = = ∥AG − AG′ ∥1
n2
is also called the edit distance (usually without the normalization). It can be
thought of as the fraction of pairs of nodes whose adjacency we have to toggle
to get from one graph to the other.
The cut metric derived from the cut norm can be described combinatorially
as follows. For an unweighted graph G = (V, E) and sets S, T ⊆ V , let eG (S, T )
denote the number of edges in G with one endnode in S and the other in T (the
endnodes may also belong to S ∩ T ; so eG (S, S) = 2eG (S) is twice the number of
edges spanned by S). For two graphs G and G′ on the same node set [n], we define
their cut distance (as labeled graphs) by
|eG (S, T ) − eG′ (S, T )|
d (G, G′ ) = max = ∥AG − AG′ ∥ .
S,T ⊆V (G) n2
In this setting dividing by |S|×|T | instead of n2 might look more natural. However,
dividing by |S|×|T | would emphasize small sets too much, and the maximum would
be attained when |S| = |T | = 1. With our definition, the contribution of a pair
S, T is at most |T ||S|/n2 (for simple graphs).
It is easy to see that d (G, G′ ) ≤ d1 (G, G′ ), and in general the two distances
are quite different. For example, if G and G′ are two independent random graphs
on [n] with edge probability
√ 1/2, then with high probability d1 (G, G′ ) ≈ 1/2 but
′
d (G, G ) = O(1/ n).
We will have to define the distance of two weighted graphs G and G′ on the
same node set V , but with possibly different nodeweights. In this case, we have
to add a term accounting for the difference in their node weighting. To simplify
notation, let αi = αi (G)/αG , αi′ = αi (G′ )/αG′ , βij = βij (G) and βij ′
= βij (G′ ).
Then we define
∑ ∑
(8.6) d1 (G, G′ ) = |αi − αi′ | + |αi αj βij − αi′ αj′ βij
′
|
i∈V i,j∈V
8.1. THE CUT DISTANCE OF GRAPHS 129
and
∑ ∑
′
(8.7) d (G, G′ ) = |αi − αi′ | + max (αi αj βij − αi′ αj′ βij ).
S,T ⊆V
i∈V i∈S,j∈T
It is easy to check that these formulas define metrics, and they specialize to the “old”
definitions when the nodeweights are 1 and the edgeweights are 0 or 1. Another
special case worth mentioning is when the nodeweights of the two graphs are the
same: in this case, the first term in both definitions disappears, and inside the
′
second term, we get the slightly simpler expression αi αj (βij − βij ). We note,
′
furthermore, that since G and G can be represented as points in the same finite
dimensional space, all usual distance functions on the set of weighted graphs on the
same set of nodes would give the same topology.
Example 8.2. Let Hn denote the complete graph on [n], where all nodes have
weight 1 and all edges have weight 1/2. Then for a random graph G = G(n, 1/2)
on the same node set, we have d (G, Hn ) = o(1) with high probability.
8.1.3. Two graphs with the same number of nodes. If G and G′ are
unlabeled unweighted graphs on possibly different node sets but of the same cardi-
nality n, then we define their distance by
(8.8) δb (G, G′ ) = min d (G,
b Gb ′ ),
bG
G, b′
The distance of the two graphs can be described by optimizing over fractional
overlays:
(8.10) δ (G, G′ ) = min d (G, G′ , X)
X∈X (G,G′ )
′
and then δ (G, G ) can be defined by the same formula (8.10). This formula can
be rephrased as follows, using two more V × V ′ matrices Y and Z:
∑
(8.12) δ (G, G′ ) = min ′ max ′
Yiu Zjv (βij − βuv ).
X∈X (G,G ) 0≤Y,Z≤X
i,j∈V,u,v∈V ′
Indeed, the absolute value on the right is a convex function of the entries of Y
and Z, and so it is maximized when every entry is equal to either 0 or to the
corresponding entry of X.
To illuminate definition (8.10) a little, we can think of a fractional overlay as a
probability distribution χ on V × V ′ whose marginals are uniform. In other words,
it is a coupling of the uniform distribution on V with the uniform distribution on
V ′ . Select two pairs (i, u) and (j, v) from the distribution χ. Then (8.9) expresses
some form of correlation between ij being an edge and uv being an edge.
One word of warning: δ is only a pseudometric, not a true metric, because
δ (G, G′ ) may be zero for different graphs G and G′ . This is the case e.g. if
G′ = G(k) for some k (cf. Exercise 8.6).
We have to discuss a technical problem, for which only partial results are avail-
able (but these will be enough for our purposes). If G and G′ have the same number
of nodes, then the definition of δ may give a value different from their δb distance.
It is trivial that
δ (G, G′ ) ≤ δb (G, G′ ),
but how much larger can the right side be? It may be larger (see Exercise 8.8.
Perhaps the increase is never larger than a factor of 2, but this is open. To prove
anything nontrivial requires tools to be developed later; in Section 9.4 we are going
to prove, among others, the (rather weak) inequality
45
δb (G, G′ ) ≤ √ .
− log δ (G, G′ )
(One important consequence of this weak inequality will be that any Cauchy se-
quence of graphs in the δ distance is also a Cauchy sequence in the δb distance.)
Example 8.3. Let K denote the graph with a single node of weight 1, endowed
with a loop with weight 1/2. Then for a random graph G = G(n, 1/2), we have
δ (G, K) = o(1) with high probability.
8.2. CUT NORM AND CUT DISTANCE OF KERNELS 131
Exercise 8.4. Let A be a symmetric matrix. Show that restricting the pairs
(S, T ) in the definition (8.4) of the cut norm in any of the following ways will
decrease it by a small factor only: (a) T = S, by at most 2; (b) T ∩ S = ∅, by at
most 4; (c) T = [n] \ S, by at most 6; (d) |S|, |T | ≥ n/2, by at most 4.
Exercise 8.5. Prove that the definitions of δ (G, G′ ) through blow-ups and
through fractional overlays lead to the same value.
Exercise 8.6. Let G1 and G2 be two simple graphs with δ (G1 , G2 ) = 0. Prove
that there is a simple graph G and n1 , n2 ≥ 1 such that Gi ∼
= G(ni ).
Exercise 8.7. Let A be a symmetric n × n matrix with all entries in [−1, 1]. Let
A′ be obtained from A by deleting a row and the corresponding column. Prove
that
′ 2
∥A∥ − ∥A ∥ ≤ .
n
Exercise 8.8. (a) Let H denote the graph on two nonadjacent nodes, with a loop
at each of them. Prove that δb (H, K2 ) = 1/4 but δ (H, K2 ) = 1/8. (b) Prove
b n,n , K n,n ) > δ(Kn,n , K n,n ).
that if n is odd, then δ(K
8.2.1. Cut norm. We define the cut norm on the linear space W of kernels
by
∫
(8.13) ∥W ∥ = sup W (x, y) dx dy
S,T ⊆[0,1]
S×T
where the supremum is taken over all measurable subsets S and T . It is sometimes
convenient to use the corresponding metric d (U, W ) = ∥U − W ∥ .
The cut norm is a norm; this is easy to prove using standard analysis. Simi-
larly as in the case of matrices, we have the trivial inequalities between the most
important norms of a kernel in W1 :
(8.14) ∥W ∥ ≤ ∥W ∥1 ≤ ∥W ∥2 ≤ ∥W ∥∞ ≤ 1.
1/2
In the opposite direction, we have trivially ∥W ∥2 ≤ ∥W ∥1 (showing that ∥.∥1
and ∥.∥2 define the same topology on W1 ), but the other two norms in the formula
above define different topologies. However, for a stepfunction U with k steps we
have the trivial inequality
(8.15) ∥U ∥1 ≤ k 2 ∥U ∥ .
√
It can be shown, in fact, that the coefficient k 2 can be replaced by 2k (see Janson
[2010], Remark 9.8, and also our Exercise 8.18); but the inequality above will be
enough for us.
There is some natural notation that goes with this norm. For every set R ⊆ W0 ,
we define its ε-neighborhood in the cut-norm
B (R, ε) = {W ∈ W0 : d (W, R) < ε} = {W ∈ W0 : (∃U ∈ R) d (W, U ) < ε}.
132 8. THE CUT DISTANCE
8.2.2. Cut distance of unlabeled kernels. Kernels, defined on the fixed set
[0, 1], correspond to labeled graphs. Just as for graphs, we introduce an “unlabeled”
version of the cut norm, by finding the best overlay of the underlying sets. Let S [0,1]
denote the set of measure preserving maps [0, 1] → [0, 1], and let S[0,1] denote the
set of all invertible measure preserving maps [0, 1] → [0, 1] (the inverse of such a
map is known to be measure preserving as well, so S[0,1] is a group; see Appendix
A.3.2). We define the cut distance of two kernels by
(8.16) δ (U, W ) = inf d (U, W φ ),
φ∈S[0,1]
( )
(where W φ (x, y) = W φ(x), φ(y) ). It is easy to see that either one of the following
expressions could be used to define the cut distance:
(8.17) δ (U, W ) = inf d (U φ , W ) = inf d (U, W φ )
φ∈S[0,1] φ∈S [0,1]
= inf d (U ψ , W φ ).
φ,ψ∈S [0,1]
We will prove the much less trivial fact that in the last expression the infimum is
attained: Theorem 8.13 below establishes this in larger generality, for all norms
satisfying some natural conditions.
The distance δ of kernels is only a pseudometric, since different kernels can
have distance zero. (Such pairs of kernels will turn out exactly the weakly isomor-
phic pairs, but this will take more work to prove.) We can identify two kernels
whose cut distance is 0, to get the set Wf of unlabeled kernels. We define the sets
f0 and W
W f1 analogously.
Going into all the complications with using the cut norm and then minimizing
over measure preserving transformations is justified by the important fact that the
metric δ defines a compact metric space on graphons. We will state and prove
this fact in Section 9.3.
One main advantage in using graphons instead of graphs is that many formu-
las and proofs become much simpler and more transparent. (Just compare the
definition (8.16) of the distance of two graphons with the definition (8.12) of the
analogous quantity for two weighted graphs!) When going from graphs to graphons
via the correspondence G 7→ WG , we may pay a prize by having to estimate how
much error we make by this. This will indeed require extra work in some cases, but
in other cases we will be lucky, and no error will be made. For example, equation
(7.2) shows that homomorphism numbers “from the left” don’t change when we
replace G by WG . The next lemma shows that the situation is similar with the
δ distance. (We will not always be so lucky; Section 12.4.4 will be devoted to
estimating this kind of error for multicuts.)
Lemma 8.9. For any two weighted graphs H and H ′
δ (H, H ′ ) = δ (WH , WH ′ ).
(
Proof.
) (Let φ : [0, 1] →) [0, 1] be a measure preserving map. Let Si : i ∈
V (H) and Tu : u ∈ V (H ′ ) be the partitions of [0, 1] into the steps of WH and
8.2. CUT NORM AND CUT DISTANCE OF KERNELS 133
( )
WH ′ . Define Xiu = λ Si ∩ φ(Tu ) , then the matrix (Xiu ) is a fractional overlay
of H and H ′ . Conversely, every fractional overlay can be obtained from a measure
preserving map this way.
We claim that for this measure preserving map and the corresponding fractional
overlay we have
∑ ∫
′ φ
(8.18) max ′ Xiu Xjv (βij − βuv ) = sup (WH − WH ′ ).
Q,R⊆V ×V Y,Z⊆[0,1]
iu∈Q, jv∈R Y ×Z
On the other hand, if Ziu = λ(Z ∩ Si ∩ φ(Tu )) and Yiu = λ(Y ∩ Si ∩ φ(Tu )), then
0 ≤ Yiu , Ziu ≤ Xiu , and
∫ ∑
φ ′
(WH − WH ′) = Yiu Zjv (βij − βuv ).
Y ×Z i,j∈V,u,v∈V ′
So the definition (8.10) of δ (H, H ′ ) implies the direction ≤ in (8.18), while formula
(8.12) implies reverse direction. This proves (8.18), from which the Lemma follows.
8.2.3. Maxima versus suprema: cut norm. One price we have to pay for
working with infinite objects like graphons is that when maximizing a function over
an infinite set of objects (e.g. subsets), we don’t necessarily have a maximum, only
a supremum; hence we have to work with approximate optima. With two impor-
tant definitions, the cut norm and the cut distance, we don’t have this difficulty.
(The Compactness Theorem 9.23 will provide another powerful tool to avoid such
problems in many cases.) Next we prove this for the cut norm, and at the end of
this chapter, for the cut distance. This would not be absolutely necessary: in most
cases, we could just carry along an arbitrarily small error term. Nevertheless, it
makes sense to include these facts in this book: if you want to work with these
notions, you might as well work with them as conveniently as possible. The next
lemma also provides a useful expression for the cut norm.
Lemma 8.10. For any kernel W ∈ W, the optima
∫
(8.19) sup W (x, y) dx dy
S,T ⊆[0,1]
S×T
and
∫
(8.20) sup f (x)g(y)W (x, y) dx dy
f,g: [0,1]→[0,1]
[0,1]2
Proof. Let D = supf,g ⟨f, TW g⟩. We start with proving that this supremum is
attained by appropriate functions f and g. Let fn , gn : [0, 1] → [0, 1] (n = 1, 2, . . . )
be functions such that ⟨fn , TW gn ⟩ → D. The set of functions [0, 1] → [0, 1] are
weak*-compact, which means that by selecting a subsequence, we may assume that
it tends to a limit f : [0, 1] → [0, 1] in the sense that ⟨fn , h⟩ → ⟨f, h⟩ for every
h ∈ L1 [0, 1]. Similarly, we can go to a further subsequence to assume that gn
converges to a function g in the same sense. It is easy to see that f and g are
bounded (perhaps after changing them on a null set). Now we claim that
∫ ∫
fn (x)gn (y)W (x, y) dx dy −→ f (x)g(y)W (x, y) dx dy.
[0,1]2 [0,1]2
This convergence is trivial when W = 1S×T for two measurable sets S, T ⊆ [0, 1].
Hence it follows when W is stepfunction, since stepfunctions are linear combinations
of a finite number of functions of the type 1S×T . Hence it follows for every kernel,
since every kernel can be approximated by stepfunctions in L1 ([0, 1]2 ), and the
factors fn , gn , f, g are bounded. This implies that ⟨f, TW g⟩ = D.
Next we show that the maximizing functions f and g can be chosen to be 0-1
valued. Let S = {x : 0 < f (x) < 1}, and suppose that λ(S) > 0. Define
( )
fs (x) = f (x) + s min f (x), 1 − f (x) .
Then for −1 ≤ s ≤ 1, the function fs satisfies 0 ≤ fs ≤ 1, and hence, by the
maximality property of f , we have ⟨fs , Tw g⟩ ≤ ⟨f, Tw g⟩. Since ⟨fs , Tw g⟩ is a linear
function of s and equality holds for s = 0, we must have equality for ( all values
) of
s, in particular for s = 1, and so we can replace f by f1 (x) = min 1, 2f (x) . Re-
peating this construction, we get a sequence of optimizing functions that monotone
converges to the 0-1 valued function f = 1(f (x) > 0). So we can replace f by f ,
and similarly we can replace g by a 0-1 valued function g.
8.2.4. Operator norms and cut norm. While the cut norm is best suited
for combinatorial purposes, it is equivalent to more traditional norms, such as the
operator norm of TW as an operator L∞ → L1 , as the following simple lemma
shows:
Lemma 8.11. For every kernel W , we have
∥W ∥ ≤ ∥TW ∥∞→1 ≤ 4∥W ∥ .
Proof. By definition,
∥TW ∥∞→1 = sup ∥TW g∥1 = sup ⟨f, TW g⟩ = sup ⟨f, TW g⟩.
−1≤g≤1 −1≤f,g≤1 −1≤f,g≤1
Comparing this expression with (8.20), we get the first inequality. For the second,
we write
∥TW ∥∞→1 = sup ⟨f − f ′ , TW (g − g ′ )⟩.
0≤f,f ′ ,g,g ′ ≤1
Here
⟨f − f ′ , TW (g − g ′ )⟩ = ⟨f, TW g⟩ − ⟨f ′ , TW g⟩ − ⟨f, TW g ′ ⟩ + ⟨f ′ , TW g ′ ⟩ ≤ 4∥T ∥ .
8.2. CUT NORM AND CUT DISTANCE OF KERNELS 135
There are many other variations on the definition which give norms that are
some constant factor away from the cut norm; these are useful since in some proofs
they come up more directly than the cut norm. Some of these are stated as exercises
at the end of this section.
There are other well-studied operator norms that are topologically equivalent
to the cut norm (even though they are not equivalent up to a constant factor). The
Schatten p-norm Sp (TW ) of a kernel operator TW is defined as the ℓp -norm of the
sequence of its eigenvalues. For an even integer p, these can be expressed in terms
of homomorphism densities:
Sp (TW ) = t(Cp , W )1/p .
(It is not trivial that t(C2r , U )1/(2r) is a norm, i.e., it is subadditive (the other
defining properties of a norm are easy). In Proposition 14.2 we’ll describe a method
to prove that Schatten norms are indeed norms, along with certain more general
norms defined by graphs.)
These norms define the same topology on W1 as the cut norm. We prove the
explicit relationship for the case p = 4, which we need.
Lemma 8.12. For every graphon U ∈ W1 , ∥U ∥4 ≤ t(C4 , U ) ≤ 4∥U ∥ .
Proof. The second inequality is a special case of Lemma 10.23. To prove the
first inequality, we use
∥U ∥ = sup ⟨f, TU g⟩,
0≤f,g≤1
where
⟨f, TU g⟩ ≤ ∥f ∥2 ∥TU g∥2 ≤ ∥TU g∥2 = ⟨TU g, TU g⟩1/2 = ⟨g, TU2 g⟩1/2
1/2 1/2 1/2
= ⟨g, TU ◦U g⟩1/2 ≤ ∥g∥2 ∥TU ◦U ∥2→2 ≤ ∥TU ◦U ∥2→2 ≤ ∥U ◦ U ∥2
= t(C4 , U )1/4 .
8.2.5. Minima versus infima: cut distance. The last result in this section
is of a similar nature as Lemma 8.10: we prove that the “inf” in the last quantity
in formula 8.17 above is in fact a “min”. This was proved by Bollobás and Riordan
[2009]. An analogous result for the L1 -norm was proved by Pikhurko [2010]. With
later applications in mind, we prove it in greater generality.
The construction that gives the cut distance δ from the cut norm can be
applied to any other norm on W that is invariant under maps W 7→ W φ for all
φ ∈ S[0,1] . We will call such a norm invariant. For an invariant norm N on the
linear space W, we define
δN (U, W ) = inf N (U − W φ ).
φ∈S[0,1]
We call this function the distance derived from N . The distances δN will be inter-
esting for us mainly in the cases when N = ∥.∥ , N = ∥.∥1 and N = ∥.∥2 . The
corresponding unlabeled distances are δ , δ1 and δ2 .
Since the norm is invariant under measure preserving bijections, we have N (U −
−1
W φ ) = N (U φ − W ), implying that δN (U, W ) = δN (W, U ). It is trivial that the
triangle inequality holds for δN , so it is a semimetric (and clearly it is not a true
metric, since δN (U, U φ ) = 0 for every measure preserving map φ ∈ S[0,1] ).
We call a norm N smooth, if it is continuous in the topology of pointwise
convergence in W. In other words, for every sequence of kernels (Wn ) such that
136 8. THE CUT DISTANCE
= inf N (U ψ − W ) = inf N (U ψ − W )
ψ∈S[0,1] ψ∈S [0,1]
= inf N (U ψ − W φ ) = min N (U ψ − W φ ),
φ,ψ∈S[0,1] φ,ψ∈S [0,1]
and
(8.23) δN (U, W ) = min Nµ (U π − W ρ ),
µ
which implies that in each line of 8.22, the two expressions are equal. Equation
(8.23) follows similarly easily in this case.
Second, we consider arbitrary functions U, W ∈ W, and prove the formulas with
the two occurrences of “min” replaced by “inf”. Let (Un ) and (Wn ) be sequences
of stepfunctions converging almost everywhere to U and W , respectively. Then
N (Un − U ) → 0 by the smoothness of N , and similarly for W . Since N (Unφ − U φ ) =
N (Un − U ) for every measure preserving map φ, this implies that
inf N (Un − Wnφ ) = inf N (Un − Wnφ ) → inf N (U − W φ ) = δN (U, W ),
φ∈S [0,1] φ∈S[0,1] φ∈S[0,1]
which proves the equality in the first line of (8.22). The other equations follow
similarly.
However, this argument only gives an “inf” in the last two expressions for δN .
To prove that it is in fact a minimum, we begin with (8.23). The space of coupling
measures is compact in the weak topology, so it suffices to show that Nµ (U π − W ρ ),
as a function of µ, is lower semicontinuous. This means that if µn → µ weakly
(where µ and µn are coupling measures), then for every two kernels U and W , we
have
(8.24) lim inf Nµn (U π − W ρ ) ≥ Nµ (U π − W ρ ).
n
[0, 1]2 . By our assumption on the norm N , this implies that Nµn (V ) → Nµ (V ). As
a special case, we get (8.24) for continuous kernels U and W .
Let U, W : [0, 1] × [0, 1] → R be arbitrary kernels, and fix any ε > 0. There
are continuous kernels Uk and Wk (k = 1, 2, . . . ) such that Uk → U and Wk → W
almost everywhere. By the smoothness of N , we can fix k large enough so that
N (Uk − U ) ≤ ε and N (W k − W ) ≤ ε.
By the special case proved above, we know that
Nµn (Ukπ − Wkρ ) → Nµ (Ukπ − Wkρ ) (n → ∞),
and we can fix n so that |Nµn (Ukπ − Wkρ ) − Nµ (Ukπ − Wkρ )| ≤ ε. Then, using (8.21),
Nµ (U π − W ρ ) ≤ Nµ (Ukπ − Wkρ ) + Nµ (Ukπ − U π ) + Nµ (Wkρ − W ρ )
= Nµ (Ukπ − Wkρ ) + N (Uk − U ) + N (Wk − W )
≤ Nµ (Ukπ − Wkρ ) + 2ε.
Here, by the choice of n,
Nµ (Ukπ − Wkρ ) ≤ Nµn (Ukπ − Wkρ ) + ε
≤ Nµn (U π − W ρ ) + Nµn (Ukπ − U π ) + Nµn (Wkρ − W ρ ) + ε
= Nµn (U π − W ρ ) + N (Uk − U ) + N (Wk − W ) + ε
≤ Nµn (U π − W ρ ) + 3ε.
138 8. THE CUT DISTANCE
The key to relating the cut norm to other topologies is the following lemma.
Lemma 8.22. Suppose that ∥Wn ∥ → 0 as n → ∞ (Wn ∈ W1 ). Then∫ for every
function Z ∈ L1 ([0, 1]2 ), ∥ZWn ∥ → 0. In particular, ⟨Z, Wn ⟩ → 0 and S Wn → 0
for every measurable set S ⊆ [0, 1]2 .
Proof. If Z is the indicator function of a rectangle, these conclusions follow
from the definition of the ∥.∥ norm. Hence the conclusion follows for stepfunc-
tions, since they are linear combinations of a finite number of indicator functions of
rectangles. Then it follows for all integrable functions, since they are approximable
in L1 ([0, 1]2 ) by stepfunctions.
Szemerédi partitions
One of the most important tools in understanding large dense graphs is the
Regularity Lemma of Szemerédi [1975, 1978] and its extensions. This lemma has
many interesting connections to other areas of mathematics, including analysis and
information theory (see Lovász and Szegedy [2007], Bollobás and Nikiforov [2008],
Tao [2006a]). It also has weaker (but more effective) and stronger versions. Here
we survey as much as we need from this rich theory, extend it to graphons (as it
happens quite often, this leads to simpler, more elegant formulations), and prove a
very general version of it using the space of graphons.
141
142 9. SZEMERÉDI PARTITIONS
and the corresponding weighted bipartite subgraph of GP , is at most ε for all but
εk 2 pairs (i, j), and at most 1 for the remaining εk 2 pairs. This implies that the
cut distance between G and GP is at most 2ε. So the partition in Lemma 9.3 has
indeed weaker properties than the partition in Lemma 9.2. This is compensated
for by the relatively decent number of partition classes.
The Weak Regularity Lemma implies that there is a partition P such that the
template graph satisfies
2
(9.3) δ (G, G/P) ≤ d (G, GP ) ≤ √ .
log k
9.1.3. Strong Regularity Lemma. Other versions of the Regularity Lemma
strengthen, rather than weaken, the conclusion (of course, at the cost of replacing
the tower function by an even more formidable value). Such a “super-strong”
Regularity Lemma was proved by Alon, Fischer, Krivelevich and Szegedy [2000].
To state this lemma, we need a further definition. Let P be an equitable partition
of V (G), and let Q be an equitable refinement of it. Following Conlon and Fox
[2011], we say that Q is ε-close to P, if for almost every pair S ̸= T ∈ P (with at
most ε|P|2 exceptions), for almost every pair X, Y ∈ Q (with at most (|Q|/|P|)2
exceptions), we have
eG (X, Y ) eG (S, T )
− ≤ ε.
|X||Y | |S||T |
Lemma 9.4 (Very Strong Regularity Lemma). For every sequence ε =
(ε0 , ε1 , ...) of positive numbers there is a positive integer S(ε) such that for ev-
ery graph G = (V, E), the node set V has an equitable partition P and an equitable
refinement Q of P such that |Q| ≤ S(ε), P is ε0 -regular, Q is ε|P| -regular, and Q
is ε0 -close to P.
While this Very Strong Regularity Lemma has many important applications,
it is not easy to explain its significance at this point. One important feature is
that through the second partition Q, it carries information about the inside of the
partition classes of P.
A somewhat weaker (but essentially equivalent) version, which is simpler to
state but more difficult to apply, was proved by Tao [2006b] and by Lovász and
Szegedy [2007].
Lemma 9.5 (Strong Regularity Lemma). For every sequence ε = (ε0 , ε1 , ...)
of positive numbers there is a positive integer S(ε) such that for every graph G =
(V, E), there is a graph G′ on V , and V has a partition P into k ≤ S(ε) classes
such that
(9.4) d1 (G, G′ ) ≤ ε0 and d (G′ , (G′ )P ) ≤ εk .
Note that the first inequality involves the normalized edit distance, and so it is
stronger than a similar condition with the cut distance would be. The second error
bound εk in (9.4) can be thought of as being very small. If we choose εk = ε/2
for all k, we get the Weak Regularity Lemma 9.3 (without an explicit bound on
the number of classes). Choosing εk = ε20 /k 2 , the partition obtained satisfies the
requirements of the Original Regularity Lemma 9.2.
We can replace εk by the much smaller number εk /(k 2 S(εk )2 ), where S is
the bound in the Original Regularity Lemma. Then we can apply the Original
Regularity Lemma to each of the partition classes obtained in Lemma 9.5, to get
144 9. SZEMERÉDI PARTITIONS
the Very Strong Regularity Lemma 9.4. (The details of this derivation are left to
the reader as an exercise.)
We will formulate the Strong Regularity Lemma for kernels, and prove it in
that version, in Section 9.3.
Exercise 9.6. Show that (a) if a bipartite graph is ε-regular, then it is ε-
homogeneous; (b) if a bipartite graph is ε3 -homogeneous, then it is ε-regular.
Exercise 9.7. Prove that for every k ≥ 1 and every graph G √ = (V, E), V has an
equitable partition P into k classes such that d (G, GP ) ≤ 4/ log k.
It is not hard to see that the stepping operator is also contractive with respect to
the cut norm (Exercise 9.17). In fact, we will see in Section 14.2.1 that stepping is
contractive with respect to any other reasonable norm on W.
9.2.2. Weak Regularity Lemma. It is a basic fact from analysis that every
kernel W can be approximated arbitrarily well by stepfunctions in the L1 norm.
The approximating stepfunctions can be obtained by averaging over “steps”:
Proposition 9.8. Let (Pn ) be a sequence of measurable partitions of [0, 1] such
that every pair of points is separated by all but a finite number of partitions Pn .
Then WPn → W almost everywhere for every W ∈ W.
The Weak Regularity Lemma for kernels, proved by Frieze and Kannan [1999]
(and in particular its Corollary 9.13 below), is a related statement about approxima-
tion by stepfunctions in the cut norm (instead of in the sense of almost everywhere
convergence).
Lemma 9.9 (Weak Regularity Lemma for Kernels). For every function W ∈
W and k ≥ 1 there is stepfunction U with k steps such that
2
∥W − U ∥ < √ ∥W ∥2 .
log k
Roughly speaking, this Lemma says that every kernel can be approximated well
in the cut norm by stepfunctions (in fact, by its steppings). Proposition 9.8 asserts
something similar about approximating in the L1 -norm. Since ∥W ∥ ≤ ∥W ∥1 ,
approximating in the L1 norm seems to be a stronger result. However, the error in
the L1 -norm approximation depends not only on the number of steps, but on W
as well. The crucial fact about Lemma 9.9 is that the error tends to 0 as k → ∞,
uniformly in W .
The error bound in Lemma 9.9 is only attractive when compared with the error
bound in the stronger versions; for a prescribed error ε, the number of partition
classes we need is still exponential in 1/ε2 . Frieze and Kannan give a stronger
form of this result that provides a polynomial size description of the approximating
stepfunction.
Lemma 9.10. For every kernel U ∈ W1 and k ≥ 1 there are k pairs of subsets
Si , Ti ⊆ [0, 1] and k real numbers ai such that
∑
k
1
∥U − ai 1Si ×Ti ∥ < √ .
i=1
k
∑
It is clear that the function 1Si ×Ti is a stepfunction; we can make it
i ai∑
symmetric by taking the average with i ai 1Ti ×Si , getting 2k terms. This sym-
metric stepfunction has at most 22k steps, so Lemma 9.9 follows from Lemma 9.10
(replacing k by 22k ).
We have mentioned the significance of the interplay between the cut norm and
other kernel norms. The proof of the Regularity Lemma is the first point where
this is apparent. For later reference, we state the key observation in the proof of
the Weak Regularity Lemma separately, in two versions.
Lemma 9.11. (a) For every U ∈ W there are two sets S, T ⊆ [0, 1] and a real
number 0 ≤ a ≤ ∥U ∥∞ such that
∥U − a1S×T ∥22 ≤ ∥U ∥22 − ∥U ∥2 .
146 9. SZEMERÉDI PARTITIONS
(b) Let U ∈ W and let P be a measurable k-partition of [0, 1]. Then there is a
partition Q refining P with at most 4k classes such that
∥U − UP ∥ = ∥UQ − UP ∥ .
Proof. Let S and T be measurable subsets of [0, 1] such that
∫
∥U ∥ = U = |⟨U, 1S×T ⟩|,
S×T
Proof. Statement (a) follows by the same argument as Lemma 9.9, just start-
ing with Q instead of the indiscrete partition. To prove (b), we partition each class
of Q into classes of measure 1/k, with at most one exceptional class of size less then
1/k. Keeping all classes of size 1/k, let us take the union of exceptional classes,
and repartition it into classes of size 1/k, to get a partition P.
To analyze this construction, let us also consider the common refinement R =
P ∧ Q. Then WR and WP differ on a set of measure less than 2(m/k), and so
2m
∥W − WP ∥ ≤ ∥W − WR ∥ + .
k
Lemma 9.12 implies that ∥W − WR ∥ ≤ 2∥W − WQ ∥ , which completes the
proof.
9.2.3. Strong Regularity Lemma. The Strong Regularity Lemma too has
a “continuous” version:
Lemma 9.16 (Strong Regularity Lemma for Kernels). For every sequence
ε = (ε0 , ε1 , ...) of positive numbers there is a positive integer S(ε) such that for
every graphon W , there is another graphon W ′ , and a stepfunction U ∈ W0 with
k ≤ S(ε) steps such that
(9.9) ∥W − W ′ ∥1 ≤ ε0 and ∥W ′ − U ∥ ≤ εk .
We will give a proof of this Lemma, deriving it from an even more general
theorem, in the next section. Here we sketch how to derive the graph version 9.5
from the kernel version. Let (ε0 , ε1 , ...) be a sequence of positive numbers, which
we may assume is monotone decreasing. Let G be a simple graph on [n]. We apply
Lemma 9.16 with εk /2 to WG , to get a threshold S ′ (depending only on (ε0 , ε1 , . . . ),
a kernel W ′ and a partition P of [0, 1] such that |P| ≤ S ′ , ∥WG − W ′ ∥1 ≤ ε0 /2 and
∥W ′ − WP′ ∥ ≤ εk /2.
′ ′
First,(we have to turn
] W into a graph G . This can be done by randomization.
Let∫ Ii = (i − 1)/n, i/n and Rij = Ii × Ij . We connect i and j with probability
n2 Rij W ′ . The probability that this edge will be in the symmetric difference of
∫
E(G) and E(G′ ) is at most n2 |WG − W ′ |, and hence the expected (normalized)
Rij
edit distance between G and G′ is at most ∥WG −W ′ ∥1 ≤ ε0 /2. Markov’s inequality
gives that with probability at least 1/2, the distance d1 (G, G′ ) ≤ ε0 .
Next, we have to turn the partition P of [0, 1] into a partition Q of [n]. We do
this randomly again, by selecting a uniform random point Xi ∈ Ii (i = 1, . . . , n),
and putting i into the m-th class of Q if Xi is in the m-th class of P. A bit
trickier computation with second moments (which is similar to the proof of Propo-
sition 12.19, but simpler, and √ is not given here) shows that with high probability,
d (G′ , (G′ )Q ) ≤ εk /2 + 10/ n.
Now we choose k0 = max(k0′ , 400/ε2k′ ). If n ≤ k0 , then we can take G = G′
0
and partition [n] into singletons. If n > k0 , then with positive probability the
′ ′
partition Q constructed above √ satisfies |Q| = k ≤ k0 ≤ k0 , d1 (G, G ) ≤ ε0 and
d (G′ , (G′ )Q ) ≤ εk /2 + 10/ n ≤ εk .
Exercise 9.17. Prove that the stepping operator is contractive with respect to
the L1 norm and the cut norm.
9.3. COMPACTNESS OF THE GRAPHON SPACE 149
Exercise 9.18. Show by an example that the best approximation in the cut
norm of a function W ∈ W1 by a stepfunction with a given number of steps is not
necessarily a stepping of W . Is stepping the best approximation in the L2 or in
the L1 norm?
Exercise 9.19. Are analogues of Lemma 9.12 valid for the L1 L2 and L∞ norms?
Exercise 9.20. Formulate and prove the original Regularity Lemma for kernels.
Exercise 9.21. Give a proof of the Strong Regularity Lemma 9.16 along the lines
of the proof of Lemma 9.9.
Exercise 9.22. Let K1 , K2 , . . . be arbitrary nonempty subsets of a Hilbert space
H. Prove for every ε > 0 and f ∈ H there is an integer 1 ≤ m ≤ ⌈1/ε2 ⌉ and a
vector f0 = α1 f1 + · · · + αm fm (αi ∈ R, fi ∈ Ki such that for every g ∈ Km+1
we have |⟨g, f − f0 ⟩| ≤ ε∥g∥∥f ∥. Derive the weak, original, and strong lemmas by
choosing the sets Ki appropriately.
Let Pk denote the partition of [0, 1] into the steps of Uk . For every k < l, the
partition Pn,l is a refinement of the partition Pn,k , and hence Wn,k = (Wn,l )Pn,k .
It is easy to see that this kind of relation is inherited by the limiting stepfunctions:
(9.10) Uk = (Ul )Pk .
Let (X, Y ) be a random point in [0, 1]2 chosen uniformly, then (9.10) implies
that the sequence (U1 (X, Y ), U2 (X, Y ), . . . ) is a martingale. Since the random
variables Ui (X, Y ) remain bounded, the Martingale Convergence Theorem A.12
implies that this sequence is convergent with probability 1. In other words, the
sequence of functions (U1 , U2 , . . . ) is convergent almost everywhere. Let U be its
limit; we show that ∥U − Wn ∥ → 0.
Fix any ε > 0. Then there is a k > 3/ε such that ∥U − Uk ∥1 < ε/3. Fixing this
k, there is an n0 such that ∥Uk − Wn,k ∥1 < ε/3 for all n ≥ n0 . Then
δ (U, Wn ) ≤ δ (U, Uk ) + δ (Uk , Wn,k ) + δ (Wn,k , Wn )
ε ε ε
≤ ∥U − Uk ∥1 + ∥Uk − Wn,k ∥1 + δ (Wn,k , Wn ) ≤ + + = ε.
3 3 3
This completes the proof of Theorem 9.23.
The nodes left in the two graphs are matched with each other arbitrarily.
The bijection φ between V (H1 ) = V ((H1 )P ) and V (H2 ) = V ((H2 )Q ) defines a
fractional overlay Y between H1 /P and H2 /Q, such that
d (φ((H1 )P ), (H2 )Q ) = d (H1 /P, H2 /Q, Y ).
The fractional overlays X and Y are very close: |Xij − Yij | ≤ 1/n for every 1 ≤
i, j ≤ k. Hence it follows that
k2 k2
d (H1 /P, H2 /Q, Y ) ≤ d (H1 /P, H2 /Q, X) + = δ (H1 /P, H2 /Q) + .
n n
Combining, we get
b 1 , H2 ) ≤ d (φ(H1 ), H2 ) ≤ d (φ((H1 )P ), (H2 )Q ) + √ 8
δ(H
log k
8 k2 8
= d (H1 /P, H2 /Q, Y ) + √ ≤ δ (H1 /P, H2 /Q) + +√
log k n log k
k2 16
≤ δ (H1 , H2 ) + +√ .
n log k
Recalling the choice of k, we get (9.12).
Finally, the third inequality (9.13) follows easily from the first two. If n <
δ (H1 , H2 )−1/7 , then (9.11) implies that
The first and third inequalities in the theorem are very weak (at least, in
comparison with the conjectured bound of δb ≤ 2δ ). Nevertheless, (9.13) will be
important for us, since it implies that δ and δb define the same Cauchy sequences
of graphs. Borgs, Chayes, Lovász, Sós and Vesztergombi [2008] prove a stronger
inequality of this nature:
(9.14) δb (H1 , H2 ) ≤ 32δ (H1 , H2 )1/67 .
Since this is still far from Conjecture 9.28, we don’t reproduce the proof here.
We can ask the same question about any of the unlabeled distances: if two
graphs have the same number of nodes, is the distance between them defined
through optimal overlay essentially the same as the distance defined by going to
the associated graphons and considering their distance? For the edit distance, an
affirmative answer was proved by Pikhurko [2010]; this bound is much stronger
than 9.14: it is optimal except for the constant 3. We prove it in a bit more general
form, for weighted graphs, since we will need it.
Theorem 9.30. For any two edge-weighted graphs H1 and H2 on [n] we have the
following inequalities:
δ1 (H1 , H2 ) ≤ δb1 (H1 , H2 ) ≤ 3δ1 (H1 , H2 ).
Proof. The first inequality is trivial. Let A and B be the adjacency matrices
of H1 and H2 , respectively. Then we have
δb1 (H1 , H2 ) = min ∥A − P BP ∥1 ,
P
where P ranges over all permutation matrices. To express the distance δ1 is a bit
more complicated:
δ1 (H1 , H2 ) = min d1 (H1 , H2 , X),
X
where ∑
d1 (H1 , H2 , X) = Xiu Xjv |Aij − Buv |,
i,j,u,v∈[n]
where the Pk are n × n permutation matrices, because matrices of this form are
dense among all fractional overlays by the Birkhoff–von Neumann Theorem. Then
1 ∑ ∑ 1 ∑
m n m
d1 (H1 , H2 , X) = |Aij − BPk (u),Pl (v) | = ∥A − Pk BPl ∥1 .
m 2 n2 i,j=1
m2
k,l=1 k,l=1
154 9. SZEMERÉDI PARTITIONS
Sampling
We turn to the analysis of sampling from a graph, our basic method of gathering
information about very large dense graphs. In fact, most of the time we prove our
results in the framework of sampling from a graphon. We start with describing
what it means to sample from a graphon.
small measure, where there is a random bipartite graph Gx,y between Sx and Sy
with density W (x, y). These random bipartite graphs must be independent as
random variables, which makes this impossible to construct in standard measure
theory (one can construct such an object in non-standard analysis, cf. Section
11.3.2). But often this is a useful informal way of thinking of a graphon. The two
random samples H(n, W ) and G(n, W ) correspond to these two ways of looking at
graphons.
The definition of the sampling distance can also be extended from simple graphs
to graphons (recall (1.2) for graphs):
∞
∑ 1 ( )
(10.2) δsamp (U, W ) = k
dvar G(k, U ), G(k, W ) .
2
k=1
Using the fact that for any graphon U and simple graph F on node set [k], the
probability that G(k, U ) = F is just tind (F, U ), we have for all U, W ∈ W0
( ) 1 ∑
(10.3) dvar G(k, U ), G(k, W ) = |tind (F, U ) − tind (F, W )|.
2 simp
F ∈Fk
Hence
∑
(10.4) δsamp (U, W ) = 2−v(F )−1 |tind (F, U ) − tind (F, W )|,
F
where F ranges through all finite graphs with V (F ) = {1, . . . , v(F )}. By (10.1) the
distributions of G(k, G) and G(k, WG ) are almost the same if v(G) is large, and
hence
(10.5) δsamp (F, G) − δsamp (WF , WG ) ≤ 4 .
v(G)
While the sampling procedure described above is the most natural and most
often used, we sometimes need to sample in other ways. In Lemma 10.18 we
will describe a sampling method where the random selection of the nodes is more
restricted, but which is still good enough to get the same information about W
(however, we need much larger samples).
There are other uses of graphons and kernels in generating random graphs.
Bollobás, Borgs, Chayes and Riordan [2010] and Bollobás, Janson and Riordan
[2007] study sparse random graphs generated from a nonnegative kernel W by con-
structing a (W/n)-random graph on n nodes. Bollobás, Janson and Riordan [2010]
and Bollobás and Riordan [2009] study random trees generated from a graphon.
Palla, Lovász and Vicsek [2010] construct sparse random graphs as (W ⊗n )-random
graphs with n′ nodes, where n and n′ are chosen so as to keep the average degree
constant. We will not go into the details of these constructions.
for a kernel W . We note that ∥A∥ = max{∥A∥+ , ∥−A∥ }, and similarly for the
+
cut norm of kernels. In terms of this norm, we are going to prove the following
similar bounds:
Lemma 10.7. Let U ∈ W1 and let √ X be a random ordered of k-subset of [0, 1].
− k/10
Then with probability at least 1 − 2e ,
3 8
− ≤ ∥U [X]∥ − ∥U ∥ ≤ 1/4 .
k k
Let B = U ∑[X]. For any set Q 1 of rows and any set Q2 of columns, we set
B(Q1 , Q2 ) = B
i∈Q1 ,j∈Q2 ij . We denote by Q+
1 the set of columns j ∈ [k] for
−
which B(Q1 , {j}) > 0. We define the set of columns Q1 and the sets of rows
−
2 , Q2 analogously. Note that B(Q1 , Q1 ), B(Q2 , Q2 ) ≥ 0 by this definition.
Q+ + +
10.3. ESTIMATING THE DISTANCE BY SAMPLING 161
We start with proving an inequality for the case when only a random subset Q
of columns is selected.
Lemma 10.8. Let S1 , S2 ⊆ [k], and let Q be random q-subset of [k] (1 ≤ q ≤ k).
Then
( ) k2
B(S1 , S2 ) ≤ EQ B((Q ∩ S2 )+ , S2 ) + √ .
q
Proof. The inequality is clearly equivalent to the following:
( ) k2
(10.6) EQ B((Q ∩ S2 )− , S2 ) ≤ √ .
q
Note that there is no absolute value on the left side: the expectation of B(Q ∩
S2 )− , S2 ) can be very negative, but not very positive. The lemma says that the set
Q ∩ S2 )− tends to pick out those rows whose sum∑is small. ∑
Consider row i of B. Let m = |S2 |, bi = j∈S2 Bij , ci =
2
j∈S2 Bij and
∑
Ai = j∈Q∩S2 Bij . The contribution of row i to the left side is bi if Ai ≤ 0
(i.e., i ∈ (Q ∩ S2 )− ), and 0 otherwise. So the expected contribution of row i is
P(Ai ≤ 0)bi .
If bi ≤ 0, then this contribution is nonpositive. Else, we use Chebyshev’s
inequality to estimate the probability of Ai ≤ 0. We have E(Ai ) = qbi /k and
Var(Ai ) < qci /k. Hence
(
qbi qbi ) k 2 Var(Ai ) kci
P(Ai ≤ 0) ≤ P Ai − ≥ ≤ < 2.
k k q 2 b2i qbi
The probability on the left is at most 1, and so we can bound it from above by its
square root:
√
√ kci
P(Ai ≤ 0) ≤ P(Ai ≤ 0) ≤ √ .
qbi
( ) √
So the contribution of row i to EQ B((Q ∩ S2 )− , S2 ) is P(Ai ≤ 0)bi ≤ kci /q ≤
√
k/ q. Summing over all i ∈ S1 , inequality (10.6) follows.
The following lemma gives an upper bound on the one-sided cut norm, using
the sampling procedure from the previous lemma.
Lemma 10.9. Let S1 , S2 ⊆ [k], and let Q1 and Q2 be random q-subsets of [k],
(1 ≤ q ≤ k). Then
1 ( ) 2
∥B∥+
≤ 2
E Q 1 ,Q2 max B(R2
+
, R +
1 ) +√ .
k Ri ⊆Qi q
The Lemma estimates the (one-sided) cut norm by maximizing only over cer-
tain rectangles (at the cost of averaging these estimates). the main point for our
purposes will be that (for a fixed Q1 and Q2 ), the number of rectangles to consider
is only 4q , as opposed to 4k in the definition of the cut norm.
We apply Lemma 10.8 again, interchanging the roles of rows and columns:
( k2
B((Q2 ∩ S2 )+ , S2 ) ≤ EQ1 B((Q2 ∩ S2 )+ , (Q1 ∩ (Q2 ∩ S2 )+ )+ ) + √
q
( ) k2
≤ EQ1 max B(R1+ , R2+ ) + √ .
Ri ⊆Qi q
Substituting in (10.7), the Lemma follows.
Proof of Lemma 10.7. To bound the difference ∥B∥+ −∥U ∥ , we first bound
+
its expectation. For any two measurable subsets S1 , S2 ⊂ [0, 1], we have
1
∥B∥+
≥ U (S1 ∩ X, S2 ∩ X),
k2
∑
(where U (Z1 , Z2 ) = x∈Z1 ,y∈Z2 U (x, y) for finite subsets Z1 , Z2 ⊂ [0, 1]). Choosing
the set X randomly, we get
( ) 1 ( )
EX ∥B∥+ ≥ 2 EX U (S1 ∩ X, S2 ∩ X)
k ∫ ∫
k−1 1
= U (x, y) dx dy + U (x, x) dx
k k
S1 ×S2 S1 ∩S2
∫
2
≥ U (x, y) dx dy −
k
S1 ×S2
Taking the supremum of the right side over all measurable sets S1 , S2 we get
( ) 2
EX ∥B∥+ ≥ ∥U ∥ − .
+
k
From here, the bound follows by sample concentration (Theorem 10.3).
To prove an upper bound on the difference ∥B∥+ − ∥U ∥ , let Q1 and Q2 be
+
√
random q-subsets of [k], where q = ⌊ k/4⌋. Lemma 10.9 say that for every X,
1 ( ) 2
∥B∥+ ≤ 2
E Q 1 ,Q2 max B(R2
+
, R +
1 ) +√ .
k Ri ⊆Qi q
Next we take expectation over the choice of X. More precisely, we fix the sets
Ri ⊆ Qi ⊆ [k], and also those∑ points Xi ∈ [0, 1] for which i ∈ Q = Q1 ∪ Q2 .
Define Y1 = {y ∈ [0, 1] : i∈R1 U (Xi , y) > 0}, and define Y2 analogously. Let
X ′ = (Xi : i ∈ [k] \ Q), then for every i ∈ S∫1 \ Q and j ∈ S2 \ Q, the contribution
of the term U (Xi , Xj ) to EX ′ B(R2+ , R1+ ) is Y1 ×Y2 U ≤ ∥U ∥+
. The contribution of
the remaining terms U (Xi , Xj ) with either i ∈ Q or j ∈ Q is at most 2k|Q| ≤ 4kq
in absolute value. Hence
(10.8) EX ′ B(R2+ , R1+ ) ≤ k 2 ∥U ∥+
+ 4kq.
Next we show that the value of B(R2+ , R1+ ) is highly concentrated around its
expectation. This is a function of the independent random variables Xi , i ∈ [k] \ Q,
and if we change the value of one of these Xi , the sum B(R2+ , R1+ ) changes by at
most 4k (there are fewer than 2k entries that may change, and each of them by at
10.3. ESTIMATING THE DISTANCE BY SAMPLING 163
most 2). We can apply Corollary A.15 of Azuma’s Inequality, and conclude that
with probability at least 1 − e−1.9q , we have
√ √
B(R2+ , R1+ ) ≤ EX ′ B(R2+ , R1+ ) + 7.9k kq ≤ k 2 ∥U ∥+
+ 4kq + 7.9k kq.
The number of possible pairs of sets R1 and R2 is 4q , and hence with probability
at least 1 − 4q e−1.9q > 1 − e−q/2 , this holds for all R1 ⊆ Q1 and R2 ⊆ Q2 , and so it
holds for the maximum. Taking expectation over Q1 and Q2 does not change this,
so we get that with probability (over X) at least 1 − e−q/2 , we have
√
2 4q 7.9 q
∥B∥+ ≤ ∥U ∥+
+ √ + + √ .
q k k
This implies the upper bound in the lemma by simple computation (if k large
enough).
10.3.2. First applications. We can apply the First Sampling lemma when
U = W1 − W2 is a difference of two graphons. Considering Wi [X] as the edge-
weighted graph H(X, Wi ), Lemma 10.6 implies the following:
Corollary 10.10. Let W1 , W2 ∈ W0 and let X be a sequence of k ≥ 1 random
points of [0, 1] chosen independently
√
from the uniform distribution. Then with prob-
− k/10
ability at least 1 − 4e ,
8
d (H(X, W1 ), H(X, W2 )) − ∥W1 − W2 ∥ ≤ 1/4 .
k
In terms of the random weighted ( graphs H(k, W1 ) )and H(k, W2 ) this means
that they can be coupled so that d H(k, W1 ), H(k, W2 ) ≈ δ (W1 , W2 ) with high
probability. We will see that more is true: H(k, W ) will be close to W in the cut
distance with high probability. (However, quantitatively “closeness” will be much
weaker.)
We have seen that the cut distance of two samples H(k, W1 ) and H(k, W2 ) is
close to the distance of W1 and W2 (if coupled appropriately). How about the
simple graphs G(k, W1 ) and G(k, W2 )? The following simple lemma shows that if
k is large enough, then G(k, W ) is close to H(k, W ), so similar conclusions hold.
Lemma 10.11. For every edge-weighted graph H with edgeweights in [0, 1], and
√
for every ε ≥ 10/ q,
√
Applying this inequality with ε = 10/ q and bounding the distance by 1 in
the exceptional cases, we get the inequality
( ) 11
(10.9) E(d G(H), H) ≤ √ .
q
Note that no similar assertion would hold for the distances d1 or d2 . For example,
if all edgeweights of H are 1/2, then d1 (G(H), H) = d2 (G(H), H) = 1/2 for any
instance of G(H).
164 10. SAMPLING
( )
Proof. For i, j ∈ [q], define the random variable Xij = 1 ij ∈ E(G(H)) . Let
S and T be two disjoint subsets of [q]. Then the Xij (i ∈ S, j ∈ T ) are independent,
and E(Xij ) = βij (H), which gives that
∑ ( )
eG(H) (S, T ) − eH (S, T ) = Xij − E(Xij ) .
i∈S, j∈T
Let us call the pair (S, T ) bad, if |eG(H) (S, T ) − eH (S, T )| > εq 2 /4. The probability
of this can be estimated by the Chernoff–Hoeffding Inequality:
( ∑ (
) 1 ) ( ε2 q 4 ) ( −ε2 q 2 )
P Xij − E(Xij ) > εq 2 ≤ 2 exp − ≤ 2 exp .
4 32|S||T | 32
i∈S, j∈T
The number of disjoint pairs (S, T ) is 3q , and so the probability that there is a bad
pair is bounded by 2 · 3q e−ε q /32 < e−ε q /100 . If there is no bad pair, then it is
2 2 2 2
easy to see that d (G(H), H) ≤ ε (cf. Exercise 8.4). This completes the proof.
This lemma implies that the weighted sample in the First Sampling Lemma
can be replaced by a simple graph at little cost. We state one corollary:
Corollary 10.12. Let W1 , W2 ∈ W0 and k ≥ 1. Then the random graphs √
G(k, W1 )
and G(k, W2 ) can be coupled so that with probability at least 1 − 5e− k/10 ,
( )
10
d G(k, W1 ), G(k, W2 ) − ∥W1 − W2 ∥ ≤ 1/4 .
k
Exercise 10.13. Derive the First Sampling Lemma for graphs (Lemma 10.5)
from the graphon version (Lemma 10.6). Attention: sampling from a graph G
and sampling from WG does not quite give the same distribution!
Exercise 10.14. Prove the (much easier) analogue of the First Sampling Lemma
for the edit distance: Let G and H be simple graphs V (G) = V (H). Let k ≤ v(G)
be a positive integer, and let S be chosen uniformly from all ordered subsets of
V (G) of size k. Then
( ) (k − 1)n
E d1 (G[S], H[S]) = d1 (G, H),
k(n − 1)
2
and for every ε > 0, with probability at least 1 − 2e−kε /2 ,
d1 (G[S], H[S]) − d1 (G, H) ≤ ε.
The Second Sampling Lemma also extends to graphons, which can be stated
in terms of the W -random graphs H(k, W ) and G(k, W ).
Lemma 10.16 (Second Sampling Lemma for Graphons). ( Let k ≥ 1,) and let
W ∈ W0 be a graphon. Then with probability at least 1 − exp −k/(2 log k) ,
20
δ (H(k, W ), W ) ≤ √ ,
log k
and
22
δ (G(k, W ), W ) ≤ √ .
log k
Proof. First we prove that these inequalities hold in expectation. Let m =
⌈k 1/4 ⌉. By Lemma 9.15, there is an equipartition P = {V1 , . . . , Vm } of [0, 1] into m
classes such that
8
d (W, WP ) ≤ √ .
log k
Let S be a random k-subset of [0, 1], then by the First Sampling Lemma 10.6, we
have
d (W [S], WP [S]) − d (W, WP ) ≤ 8
k 1/4
with high probability. This implies that
( ) 10
E d (W [S], WP [S]) − d (W, WP ) ≤ 1/4
k
(k is large enough for this, else the bound in the lemma is trivial), and so
( ) ( )
E d (W [S], WP [S]) ≤ E d (W [S], WP [S]) − d (W, WP ) + d (W, WP )
9
≤√ .
log k
So it suffices to prove that δ (WP , WP [S]) is small on the average.
Let H = WP [S]. The graphons WP and WH are almost the same: both are
stepfunctions with m steps, with the same function values on corresponding steps.
The only difference is that the measure of the i-th step Vi in WP is 1/m, while the
measure of the i-th step in WH is |Vi ∩ S|/k, which is expected to be close to 1/m
if k is large enough. ∑
Write |Vi ∩ S|/k = 1/m + ri , then it is easy to see that δ (WP , WH ) ≤ i |ri |.
Hence it is easy to estimate the expectation of this distance, using elementary
probability theory:
√ √
( ) ∑ ( ) ( ) m−1 1
E δ (WP , WH ) ≤ E |ri | = mE |r1 | ≤ m E(r1 ) = 2 < 3/8 .
i
k k
Hence
( ) ( )
E(δ (W, W [S]) ≤ δ (W, WP ) + E δ (WP , WP [S]) + E δ (WP [S], W [S])
8 1 9 18
≤√ + +√ ≤√ .
log k k 3/8 log k log k
( )
A similar estimate for δ W, G(k, W ) follows if we invoke inequality (10.9):
( ) ( ) ( )
E δ (W, G(k, W )) ≤ E δ (W, H(k, W )) + E δ (H(k, W ), G(k, W ))
18 11 20
≤√ +√ <√ .
log k k log k
166 10. SAMPLING
Now the Lemma follows by the Sample Concentration Theorem 10.3 applied
to the graph parameter f (G) = v(G)δ (G, W ).
We have seen that the (weak) Regularity Lemma implies that we can approxi-
√
mate every simple graph G by a weighted graph on k nodes with error O(1/ log k).
As an application of the Second Sampling Lemma, we get that we can also approx-
imate every simple graph G by an (unweighted) simple graph H on k nodes, at the
cost of a constant factor in the error:
Corollary 10.17. For every k ≥ 1 and simple graph G, there is a simple graph H
with k nodes such that
10
δ (G, H) ≤ √ .
log k
We need a version of the Second Sampling Lemma for the modified sampling
procedure mentioned above in Remark 10.1. Let W be a graphon and n ≥ 1.
Let S = (s1 , . . . , sn ), where si is a random uniform point from the interval [(i −
1)/n, i/n]. We denote the random graph G(S, W ) by G′ (n, W ). The following
bound was proved by Lovász and Szegedy [2010a].
Lemma 10.18. For√every graphon W and positive integer k, we have with proba-
bility at least 1 − 5/ k,
176
δ (G′ (k, W ), W ) < √ .
log k
Proof. The trick is to do a second sampling: we choose a random r-tuple T of
nodes of G′ = G′ (k, W ), where r = ⌈k 1/4 ⌉. We may assume that k > 25, else there
is nothing to prove.
( The Second
) Sampling Lemma implies that with probability at
least 1 − 2 exp −r/(2 log r) , we have
22
δ (G′ [T ], G′ ) ≤ √ .
log r
Now we can generate G′ [T ] = G(r, G′ ) in the following way: we choose a random
sequence X of r independent uniform points in [0, 1]; if they belong to different
intervals Ji = [(i − 1)/k, i/k], then we return G(X, W ); else, we try again. This
gives us a coupling between G(r, W ) and G(r, G′ ) such that
( ) r(r − 1)
P G(r, W ) ̸= G(r, G′ ) ≤ P(∃i : |X ∩ Ji | ≥ 2) ≤ .
k
Invoking
( the Second
) Sampling Lemma again, with probability at least 1 −
2 exp −r/(2 log r) we have
22
δ (G(r, W ), W ) ≤ √ ,
log r
and hence with probability at least
r r(r − 1) 5
1 − 4 exp(− )− ≥1− √
2 log r k k
we have
44 176
δ (G′ , W ) ≤ δ (G′ , G′ [T ]) + δ (G(r, W ), W ) ≤ √ ≤√ .
log r log k
10.5. COUNTING LEMMA 167
Proof. It suffices to prove this bound for the case when We = We′ for all edges
′
but one. Let F = (V, E), and let uv be the edge with Wuv ̸= Wuv . Then
∫ ∏ ( )
t(F, w) − t(F, w′ ) = Wij (xi , xj ) Wuv (xu , xv ) − Wuv′
(xu , xv ) dx
[0,1]V ij∈E(F )\{uv}
∫
( ′
)
= f (x)g(x) Wuv (xu , xv ) − Wuv (xu , xv ) dx,
[0,1]V
where ∏
f (x) = Wij (xi , xj )
ij∈∇(u)\uv
does not depend on xu , and satisfies 0 ≤ g ≤ 1. Fixing all variables except xu and
xv , we get the following estimate by Lemma 8.10:
∫ ( )
′ ′
f (x)g(x) Wuv (xu , xv ) − Wuv (xu , xv ) dxu dxv ≤ ∥Wuv − Wuv ∥ .
[0,1]2
From this lemma, along with (10.3) and (7.4), it is easy to derive a relationship
between the variation distance of the distributions of the random graphs G(k, U )
and G(k, W ), and the cut distance of U and W .
Corollary 10.25. Let U and W be two graphons, then for every k ≥ 2, we have
( ) 2
dvar G(k, U ), G(k, W ) ≤ 2k δ (U, W ).
Exercise 10.26. Show that the Counting Lemma does not hold for multigraphs,
not even for F = C2 .
Exercise 10.27. Let F be a simple graph with m edges and let W, W ′ ∈ W1 .
Then
|t(F, W ) − t(F, W ′ )| ≤ 4mδ (W, W ′ ).
Exercise 10.28. Let F be a simple graph with m edges and let W ∈ W1 . Then
|t(F, W )| ≤ 4m∥W ∥ .
Exercise 10.29. For every W1 -decorated graph (F, w),
t(F, w) ≤ 4 min ∥We ∥ .
e∈E(F )
Exercise 10.30. Prove the following “induced” version of the Counting Lemma:
If F is a simple graph and U, W ∈ W0 , then
( )
k
|tind (F, U ) − tind (F, W )| ≤ 4 ∥U − W ∥ .
2
2
Use this to improve the coefficient 2k in Corollary 10.25.
10.6. INVERSE COUNTING LEMMA 169
Proof. The assumption implies that we can couple G(k, U )( and G(k, W))
so that G(k, U ) = G(k, W ) with probability larger than 2 exp −k/(2 log k) .
The( Second Sampling
) Lemma 10.16 implies that with probability at least 1 −
exp −k/(2 log k) , we have
( ) 22
δ U, G(k, U ) ≤ √ ,
log k
and similar assertion holds for W . It follows that with positive probability all three
happen, and then we get
( ) ( ) 50
δ (U, W ) ≤ δ U, G(k, U ) + δ W, G(k, W ) ≤ √ .
log k
Then
50
δ (U, W ) ≤ √ .
log k
Proof. Assume that U, W ∈ W0 satisfy
for every graph F with k nodes. This implies (by inclusion-exclusion) that
k k+1
|tind (F, U ) − tind (F, W )| ≤ 2(2) 2−k = 2−( 2 ) .
2
Hence
( ) ∑ ( ) ( )
P G(k, U ) = F − P G(k, W ) = F ≤ 2(2) 2−( 2 )
k k+1
dvar G(k, U ), G(k, W ) =
F
(k )
= 2−k < 1 − 2 exp − .
2 log k
An application of Lemma 10.31 completes the proof.
Exercise 10.37. Construct the coupling measures in Theorem 8.13 for the cut
distance of the three weakly isomorphic graphons in Example 7.11.
Exercise 10.38. Show by an example that the sampling distance and the cut
distance do not define the same topology on the set of finite graphs.
CHAPTER 11
Finally we have come to the central topic of this book: convergent graph se-
quences and their limits. The two key elements, namely sampling and graphons,
have been introduced in the Introduction. Here we take our time to look at them
from various aspects.
Remark 11.4. This proof builds on a fairly long chain of previous results, some of
which, like the First Sampling Lemma 10.6, were quite involved. The advantage of
this proof is that it gives a quantitative form of the equivalence of two convergence
notions. As pointed out by Schrijver, a weaker qualitative form is easier to prove,
inasmuch we can replace the use of the Inverse Counting Lemma by the characteri-
zation of weak isomorphism (for which a simple direct proof is sketched in Exercise
11.27). Indeed, consider the two metric spaces (W f0 , δ ) (the graphon space) and
F
[0, 1] (the space of graph parameters with values in [0, 1]). Both of these are com-
pact (one by the Compactness Theorem 9.23, the other by Tychonoff’s Theorem).
The map W 7→ t(., W ) is continuous by the Counting Lemma, and injective by
Corollary 10.34, and hence its inverse is also continuous. For a convergent sequence
of graphs, this means precisely that the graphons WGn form a convergent sequence
in (Wf0 , δ ).
where F ′ denotes the graph obtained by deleting node k from F . We say that the
model is local, if for two disjoint subsets S, T ⊆ [k], the subgraphs of Gk induced
by S and T are independent as random variables.
We note that consistency, together with the invariance under reordering
the nodes,
( implies
) that for every simple graph F on k nodes, the expectation
E(tind F, Gn ) = σk (F ) is independent of n once n ≥ k.
Example 11.6. For every graphon W , the random graph model Gk = G(k, W )
is both consistent and local, which is trivial to check. (It will turn out that this
example represents all such models.)
Theorem 11.7. If a graph sequence (G1 , G2 , . . . ) is convergent, then the distribu-
tions σk = limn→∞ σk,Gn form a consistent and local random graph model. Con-
versely, every consistent and local random graph model arises this way.
Before proving this theorem, we need some preparation. Let G be a graph and
k ≤ v(G). The sequence of distributions (σG,1 , σG,2 , . . . ) is not quite consistent,
because it breaks down for k > v(G); but it is consistent for the values of k for
which it is defined. There is a more serious problem with locality: selecting i
distinct random nodes of G will bias the selection of the remaining k − i, if we insist
on selecting distinct points. So locality will be only approximately true.
We can fix both problems if we consider the slightly modified distributions
′ ′ ′
σG,k (F ) = tind (F, WG ). The sequence (σG,1 , σG,2 , . . . ) is consistent. The random
graphs corresponding to this sequence of distributions are G(k, WG ). (We could
also generate this from G by selecting the k random nodes with replacement.) The
′
difference between σG,k and σG,k is very small if G is large: if we sample G(k, WG )
176 11. CONVERGENCE OF DENSE GRAPH SEQUENCES
and keep it iff the sampled points correspond to different nodes of G, and otherwise
resample, then we get a sample from the distribution G(k, G). This shows that
( )
′ n(n − 1) . . . (n − k + 1) 1 k
(11.2) dvar (σG,k , σG,k ) ≤ 1 − < .
nk n 2
As discussed in the introduction, random graphs satisfy quite strong laws of
large numbers in the sense that two large random graphs are very much alike; this
translates to the fact that a sequence of independently generated random graphs
G(n, p) is convergent with probability 1. The next lemma shows that all local and
consistent random graph models have a similar property.
Lemma 11.8. Let (σ1 , σ2 , . . . ) be a local consistent random graph model, and gen-
erate a graph Gn from every σn , independently for different values of n. Then the
sequence (G1 , G2 , . . . ) is convergent with probability 1.
Proof. First we note that for every simple graph F on [k] and n ≥ k, we have
( )
(11.3) E tind (F, Gn ) = σk (F ).
Indeed, consider any injective map φ : V (F ) → V (Gn ). It follows from the
isomorphism invariance of σn that the probability that φ is an induced embedding
is the same for every map φ, so it suffices to compute this probability when φ
is the identity map on [k]. By the consistency of the model, this probability is
P(Gk = F ) = σk (F ).
Next we show that tind (F, Gn ) is concentrated around its expectation σk (F ).
We could compute second moments, but this would not give a sufficiently good
bound. So (sigh!) we compute the fourth moment.
Let S1 , S2 , S3 , S4 be independent random ordered k-subsets of [n] (we assume
that n > k 2 ). Define Xi = 1(Gn [Si ] = F ) − σk (F ). Note that E(Xi ) = 0 by (11.3),
even if we condition on the choice of the Si , since the distribution of Gn [S] is the
same for every ordered k-set S ⊆ [n]. Furthermore,
( )
(11.4) E(X1 X2 X3 X4 ) = E (tind (F, Gn ) − σk (F ))4 ,
since for a fixed Gn the variables Xi are independent, and E(Xi | Gn ) =
tind (F, Gn ) − σk (F ).
Let A denote the event that every Si meets at least one other Sj . The key
observation is that
E(X1 X2 X3 X4 | A) = 0.
This follows since if the Si are fixed so that (say) S4 does not meet the others, then
X4 is independent of {X1 , X2 , X3 }, and its expectation is 0. (This is where we use
the assumption that our random graph model is local!) Thus
11.2.2. Countable random graph models. We can arrange all labeled sim-
ple graphs in a locally finite rooted tree, where the empty graph is the root, and
F ′ is the parent of F . If (σ1 , σ2 , . . . ) is a consistent sequence of distributions, then
σk is a probability distribution on the k-th level of the tree, and the probability of
each node is the sum of probabilities of its children.
From this setup, we can combine all the distributions σk into a single probability
distribution on all infinite paths starting at the root. To be more precise, let Ω
denote the set of such paths, and let ΩF denote the set of paths passing through
the node F . Then the sets ΩF generate a sigma-algebra A on Ω. The Kolmogorov
Extension Theorem implies that there is a (unique) probability measure σ on (Ω, A)
such that σ(ΩF ) = σk (F ) for every F .
This is so far an abstract construction. We can, however, make explicit sense of
the elements of Ω. A path in the tree starting at the root is a sequence (F0 , F1 , . . . )
′
of graphs such that Fk = Fk+1 . Hence the path gives rise to the countable graph
F = ∪n Fn on the set of positive integers N∗ . Conversely, every graph on N∗
corresponds to a path in the tree starting at the origin.
Thus the points of Ω can be identified with the graphs on N∗ . The sets ΩF
are obtained by fixing adjacency between a finite number of nodes. Thus σ can be
thought of as a probability distribution on graphs on N∗ .
A countable random graph model is a probability distribution σ on (Ω, A),
invariant under permutations of N∗ . Such a random graph can also be considered
as a symmetric exchangeable array of 0-1 valued random variables (we will come
back to this way of looking at them in Section 11.3.3). The countable random
graph model is local if for any two finite disjoint subsets S1 , S2 ⊆ N∗ , the subgraphs
induced by S1 and S2 are independent (as random variables). The discussion above
shows that every consistent random graph model defines a countable random graph
model.
178 11. CONVERGENCE OF DENSE GRAPH SEQUENCES
Proof. For every fixed simple graph F , the sequence (tinj (F, G[n])( : n =
1, 2, . . . ) )is a reverse martingale for n ≥ v(F ) in the sense that E tinj (F, G[n −
1]) | G[n] = tinj (F, G[n]) (this follows by the simple averaging principle (5.27)). By
the Reverse Martingale Convergence Theorem A.17, it follows that this sequence
is convergent with probability 1. Hence with probability 1, (tinj (F, G[n]) : n =
1, 2, . . . ) is convergent for every F .
This last proposition may sound similar to the construction in Lemma 11.8, but
there is a significant difference: in this construction, locality is not needed. Unlike
in Lemma 11.8, G[n] and G[m] are not independently generated. If we apply the
construction in Lemma 11.8 twice, and then pick the even-indexed graphs from
one sequence and interlace them with the odd-indexed graphs from the other, we
get a sequence that is constructed in the same way, and so it is convergent with
probability 1. This means that almost all sequences generated by Lemma 11.8
(for a fixed consistent and local random graph model) have the same limit. In
contrast to this, running the construction in Proposition 11.14 twice we could not
necessarily interlace the resulting sequences into a single convergent sequence: in
Example 11.10, we get a sequence of growing cliques with probability 1/2 and a
sequence of growing edgeless graphs with probability 1/2. Both of these sequences
are convergent, but they don’t have the same limit. Sequences constructed from
one and the same countable random graph model are almost always convergent,
but they may converge to different limits.
In view of Examples 11.6 and 11.12, we also get:
Exercise 11.16. (a) Prove that the Rado graph almost surely has the extension
property: for any two disjoint finite subsets S, T ⊆ N∗ there is a node connected
to all nodes in S but to no node in T .
(b) Prove that every countable graph with the extension property is isomorphic
to the Rado graph.
(c) Prove that if you generate two Rado graphs independently, they will be iso-
morphic with probability 1.
Exercise 11.17. Construct a universal triangle-free graph: a countable graph
containing every finite triangle-free graph as an induced subgraph.
Exercise 11.18. (a) We can define a random graph G(N∗ , p) for all 0 < p < 1.
Prove that with probability 1, this random graph will be isomorphic to the Rado
graph for any p.
(b) More generally, if W is a graphon with 0 < W (x, y) < 1 for all x, y ∈ [0, 1],
then G(n, W ) is almost always isomorphic to the Rado graph.
(c) Construct a graphon W such that two independent countable W -random
graphs are almost surely non-isomorphic [Gábor Kun].
Exercise 11.19. Show that without the assumption of locality, Lemma 11.8 does
not remain valid.
180 11. CONVERGENCE OF DENSE GRAPH SEQUENCES
We show that W can serve as the limit graphon (at least as long as we ignore
the ugliness of the underlying sigma-algebra):
Proposition 11.24. For every simple graph F and any sequence (Gn : n =
1, 2, . . . ) of graphs,
lim t(F, Gn ) = t(F, W ).
ω
If the graph sequence (G1 , G2 , . . . ) is convergent, then the ultralimit on the left
side is equal to limn→∞ t(F, Gn ), independently from ω.
Proof. Let V (F ) = [k]. As a first step, we express the left hand side in
the ultraproduct space. We have to introduce more sigma-algebras for ∏ this. For
every set U ⊆ [k], we take the set VnU of all maps∏U → Vn . The set ω VnU can
be identified with V U , just like (11.6) identifies ω Vn × Vn with V × V . The
ultraproduct of the Boolean algebra of all subsets of VnU gives a sigma-algebra AU
on V U . The ultraproduct of the uniform measures on the sets VnU gives a measure
τU on the product sigma-algebra on V U . We abbreviate τ[k] by τk .
The∏set Hom(F, Gn ) of homomorphisms ∏ from F into Gn is ∏ a subset of Vnk ,
k k
and so ω Hom(F, Gn ) is a subset of ω Vn = V . The set ω Hom(F, Gn )
can be identified with the set Hom(F, G) ⊆ V k of homomorphisms of F into G.
Furthermore,
(∏ ) ( )
lim t(F, Gn ) = τk Hom(F, Gn ) = τk Hom(F, G)
ω
ω
by the definition of the ultraproduct of measures. So it suffices to prove that
( )
(11.8) τk Hom(F, G) = t(F, W ).
Let (X1 , . . . , Xk ) ∈ V k be a random node chosen from the distribution τk . Then
we can rephrase the equality to be proved as
( ∏ ) ( ∏ )
(11.9) E 1E (Xi , Xj ) = E W (Xi , Xj ) .
ij∈E(F ) ij∈E(F )
(It is easy to check that the functions 1E (Xi , Xj ) are measurable with respect to
the sigma-algebra A[k] .)
If the random variables 1E (Xi , Xj ) were independent, we could take the ex-
pectation factor-by-factor, and we would be done. But of course they are not. The
trick in the proof is to replace the factors 1E (Xi , Xj ) by W (Xi , Xj ) one by one.
Consider any edge uv of F ; we show that
( ∏ ) ( ∏ )
(11.10) E 1E (Xi , Xj ) = E 1E (Xi , Xj )W (Xu , Xv )) .
ij∈E(F ) ij̸=uv
This will show that we can replace 1E (Xu , Xv ) by W (Xu , Xv ) without changing
the expectation, and repeating a similar argument for all edges of F , we get (11.9).
For notational convenience, assume that u = 1 and v = 2. The main difficulty
in the rest of the argument is to be careful about measurability, because we have
several sigma-algebras floating around. Using Lemma 11.23, it is not hard to argue
that fixing X1 and X2 , the functions 1E (Xi , Xj ) are measurable with respect to
A[k] , and so the expectation
∏
f (X1 , X2 ) = EX3 ,...,Xk 1E (Xi , Xj )
ij∈E(F ){i,j}̸={1,2}
184 11. CONVERGENCE OF DENSE GRAPH SEQUENCES
Exercise 11.26. Prove that if a graph sequence (Gn ) satisfies v(Gn ) → ∞, then
it is Cauchy in the cut distance if and only if it is Cauchy in the sampling distance.
Exercise 11.27. Prove the following facts: (a) For every stepfunction W ,
δ1 (W, H(n, W )) → 0 as n → ∞ with probability 1. (b) For every graphon W ,
δ1 (W, H(n, W )) → 0 as n → ∞ with probability 1. (c) For every graphon W ,
δ (G(n, W ), H(n, W )) → 0 as n → ∞ with probability 1. (d) If U and W are
weakly isomorphic graphons, then G(n, U ) and G(n, W ) have the same distri-
bution. (e) If U and W are weakly isomorphic graphons, then δ (U, W ) = 0
(A. Schrijver).
Exercise 11.28. Show that Theorems 11.21, 11.22 and Corollary 11.15 imply
that the space (Wf0 , δ ) is compact.
Exercise 11.29. Prove that the following properties of graphs are inherited to
their ultraproduct: (a) 3-regular; (b) all degrees bounded by 10; (c) triangle-free;
(d) containing a triangle; (e) bipartite; (f) disconnected.
Exercise 11.30. Prove that the following properties of graphs are not inherited
to their ultraproduct: (a) connected; (b) all degrees are even; (c) non-bipartite.
Exercise 11.31. (a) Prove that every bounded sequence of real numbers has a
unique ultralimit. (b) Prove that the limω (ai + bi ) = limω ai + limω bi . (c) Prove
that the ultralimit limω ai is independent of the choice of the ultrafilter ω if and
only if the sequence is convergent in the classical sense.
To figure out the limit graphon, note that the probability that nodes i and j
are connected is 1 − max(i, j)/n. If i = xn and j = yn, then this is 1 − max(x, y).
This motivates the following:
One can get a good explicit bound on the convergence rate by estimating the
cut-distance of WGua
n
and 1 − max(x, y), using the Chernoff-Hoeffding bound.
Example 11.41 (Prefix attachment graphs). In this construction, it will be
more convenient to label the nodes starting with 1. At the n-th iteration, a new
node n is born, a node z is selected at random, and node n is connected to nodes
1, . . . , z − 1. We denote the n-th graph in the sequence by Gpfx
n , and call this graph
sequence a prefix attachment graph sequence (Figure 11.2).
Again we start with some simple calculations. The probability that nodes
i < j are connected is j−i j (but these events are not independent in this case!). The
expected degree of j is therefore
∑
j−1
j−i ∑n
i−j j n
+ = n − + j ln + o(n).
i=1
j i=j+1
i 2 j
Does this mean that the graphon U (x, y) = |x − y|/ max(x, y) is the limit?
Somewhat surprisingly, the answer is negative, which we can see by computing
triangle
( )(densities.
) The probability that three nodes i < j < k form a triangle is
1 − kj 1 − ji (since if k is connected to j, then it is also connected to i). Hence
the expected number of triangles is
∑ ( j
)(
i
) ( )
1 n
1− 1− = .
k j 6 3
i<j<k
Hence ( )
1 n 1
t(K3 , Gn ) = −→ .
n3 3 6
On the other hand,
∫
|x − y| |x − z| |y − z|
t(K3 , U ) = · · dx dy dz.
max(x, y) max(x, z) max(y, z)
[0,1]3
Since the integrand is independent of the order of the variables, we can compute
this easily:
∫ ( )
x ( x) ( y) 5
t(K3 , U ) = 6 1− 1− 1− dx dy dz = .
y z z 36
0≤x<y<z≤1
Let us label a node born in step k, connected to {1, . . . , m}, by (k/n, m/k) ∈
[0, 1] × [0, 1]. Then we can observe that nodes with label (x1 , y1 ) and (x2 , y2 ) are
connected if and only if either x1 < x2 y2 or x2 < x1 y1 .
This suggests a description of the limit in the following form: Consider the
function W pfx : [0, 1]2 × [0, 1]2 → [0, 1], given by
( )
W pfx (x1 , y1 ), (x2 , y2 ) = 1(x1 < x2 y2 or x2 < x1 y1 ).
(As remarked before, we can consider 2-variable functions on other probability
spaces, not just [0, 1]; in this case, [0, 1]2 is a more convenient representation. In
the proof below, we use an analogue of Lemma 11.33, adopted to this case. For a
general statement containing both, see Exercise 13.8.)
Proposition 11.42. The prefix attachment graphs Gpfx
n tend to W
pfx
almost surely.
Proof sketch. Let Sn be the (random) set of points in [0, 1]2 of the form
(i/n, zi /i) where i = 1, . . . , n and zi is a uniformly chosen random integer in [i].
Then Gpfx n = G(Sn , W ) = H(Sn , W pfx ).
pfx
Furthermore, with probability 1, the sets Sn are well distributed in [0, 1]2 in
the sense that |Sn ∩ A|/|Sn | → λ(A) for every open set A. It suffices to verify
this for the case when A = J1 × J2 , where J1 , J2 are open intervals, and it will
be also convenient to assume that J1 does not start at 0. The assertion is then
easily verified, based on the fact that the first coordinates (i/n) are well distributed
in [0, 1], and the second coordinates are uniformly distributed random points in
{1/i, . . . , i/i}. Thus the generalized version of Lemma 11.33 applies and proves the
Proposition.
Proposition 11.42 gives a nice and simple representation of the limit object with
the underlying probability space [0, 1]2 (with the uniform measure). If we want a
2
representation by a graphon on [0, 1], we can ( map [0, 1]
) into [0, 1] by a measure
φ pfx
preserving map φ; then Wpfx (x, y) = W φ(x), φ(y) gives a representation of
the same graphon as a 2-variable function. For example, using the map φ that
separates even and odd bits of x, we get the fractal-like picture in Figure 11.3.
It is interesting to note that the graphs G(n, W ) form another (different) se-
quence of random graphs tending to the same limit W with probability 1.
11.4. PROVING CONVERGENCE 191
A final remark on this graph sequence. It is not hard to verify that for the
graphon U (x, y) = |x − y|/ max(x, y), we have
∫
(11.13) (WGpfx
n
− U ) −→ 0
S×T
for every S, T ⊆ [0, 1]. (Indeed, it is enough to prove this for sets S, T from a
generating set of the sigma-algebra of Borel sets, e.g. rational intervals. Since
there is only a countable number of these intervals, it suffices to prove that (11.13)
holds with probability 1 for any two rational intervals S and T . This is a rather
straightforward computation in probability.)
So WGpfx
n
→ U in the weak* topology of L∞ ([0, 1]2 ), but not in our sense. We
will see (Lemma 8.22) that our convergence implies weak* convergence, but not
the other way around. This example also shows that had we defined convergence
of a graph sequence by weak* convergence (after appropriate relabeling), the limit
would not be unique. The uniqueness of the limit graphon is a nontrivial fact!
So far, our randomly grown sequences tended to well-defined limit graphons
with probability 1. Now we turn to examples of randomly grown sequences that are
convergent with probability 1, but if we run the process again, they may converge
to a different limit. There is in fact a very simple sequence with this property.
Example 11.43 (Cloning). Given a simple graph G0 , we select a uniform random
node v, and create a twin of v (a new node v ′ connected to the same nodes as v; v and
v ′ are not connected). Repeating this we get a sequence of graphs G0 , G1 , G2 . . . .
We claim that this sequence is convergent with(probability 1. Let ) v(G0 ) = k.
Note that each Gn is determined by the sequence ni : i ∈ V (G0∑ ) , where ni is
the number of clones of i we created, including i itself. Clearly i ni = n + k,
and the probability that node i will ( be cloned in)the next step is ni /(n + k). So
the development of the sequence ni : i ∈ V (G0 ) follows a Pólya urn model (see
e.g. Grimmett and Stirzaker [1982]), which implies that with probability ∑ 1, every
ratio ni /(n + k) tends to some real number xi . Clearly xi ≥ 0 and i xi = 1. So
Gn → WH , where H is obtained from G0 by weighting node i with weight xi (the
edges remain unweighted).
What are these values xi ? They can be anything (as long as they are nonnega-
tive and sum to 1). In fact, it follows from the theory of the Pólya urn that
∑ the vector
(xi ) is uniformly distributed over the simplex {x ∈ RV (G0 ) : xi ≥ 0, i xi = 1}.
Let us close this section with a more interesting example with similar property.
Example 11.44 (Growing preferential attachment graphs). This randomly
growing graph sequence Gpa n is generated as follows. We start with a single node.
At the n-th step (when we already have a graph with n nodes), a new node labeled
n + 1 is created. This new node is connected to each old node i with probability
(dn (i) + 1)/(n + 1), independently for different nodes i, where dn (i) is the current
degree of node i. (Adding 1 in the numerator and denominator is needed in order
to generate anything other than empty graphs.)
The behavior of the graph sequence Gpan is somewhat unexpected: it is conver-
gent with probability 1, but the limit is not determined. More precisely:
Proposition 11.45. With probability 1, the sequence Gpa
n is quasirandom, i.e., it
converges to a constant function.
192 11. CONVERGENCE OF DENSE GRAPH SEQUENCES
∑
n−1
dn−1 (i) + 1 2 n−1
E(Xn | Gpa
n−1 ) = Xn−1 + = Xn−1 + Xn−1 + .
i=1
n n n
Hence
1 1
E(2Xn + 2n + 1 | Xn−1 ) = (2Xn−1 + 2n − 1),
(n + 2)(n + 1) (n + 1)n
which shows that the values Yn = (2Xn + 2n + 1)/((n + 2)(n + 1)) form a mar-
tingale. Since they are obviously bounded, the Martingale Convergence Theorem
implies that with probability 1 there is a value a such that Yn → a. Clearly,
Yn ∼ t(K2 , Gpa
n ), and so t(K2 , Gn ) → a.
pa
pa ∑n−1
Given Gn−1 , the degree of node n when it is born is i=1 Xi , where the Xi
are independent 0-1 random variables with E(Xi ) = (dn−1 (i) + 1)/(n + 1). Hence
∑
n−1
dn−1 (i) + 1 2 n−1
E(dn (n) | Gpa
n−1 ) = = e( Gpa
n−1 ) + ,
i=1
n n n
and hence (dn (n) + 1)/(n + 1) will be heavily concentrated around a. In particular,
(dn (n) + 1)/(n + 1) → a as n → ∞.
Next, observe that the development of dn (i), for a fixed i, follows a Pólya Urn
model with di (i) + 1 red and i − di (i) green balls, whence (dn (i) + 1)/(n + 1) is
a martingale converging to the beta distribution with parameters di (i) + 1 and
i − di (i). So for large i, (dn (i) + 1)/(n + 1) will be heavily concentrated around its
expectation (di (i) + 1)/(i + 1), which in turn is heavily concentrated around a. So
for large n, most nodes will have degree around an.
It follows that the process is almost the same as G(n, a), where we can also think
of the nodes created one-by-one and joined to each previous node with probability
a. We can couple the two processes to show that with probability 1, they converge
to the same limit, which is clearly the identically-a function.
Note that by the Martingale Stopping Theorem A.11,
2e(Gpa
n ) + 2n + 1
E(a | Gpa
n ) = Yn = .
(n + 2)(n + 1)
Since Gpan can be any simple graph on n nodes with positive probability, it follows
that a is not determined, and with a more careful computation one can see that a
falls into any interval with positive probability.
Remark 11.46. In several examples above (11.35, 11.36, 11.41, 11.43), the limit
graphing is 0-1 valued. Consider, for example, the case of prefix attachment graphs.
It follows by Proposition 8.24 that WGpfx n
→ W pfx with probability 1 in the edit
distance, not just in the cut distance. This means that while the graphs Gpfx n
are random, they are very highly concentrated: two instances of Gpfx n differ in
o(n2 ) edges only, if overlayed properly (not in the original ordering of the nodes!).
Informally, they have a relatively small amount of randomness in them, which
11.5. MANY DISGUISES OF GRAPH LIMITS 193
Proof. (a)→(b): Every graphon W gives rise to the simple graph parameter
t(., W ), which is, as we have seen, multiplicative, normalized, and nonnegative on
signed graphs.
(b)→(c): Let f be a multiplicative, normalized simple graph parameter that
is nonnegative on signed graphs. The conditions imply that the value of f does
not change if isolated nodes are added or deleted. We consider the signed graph Fb
obtained from a simple graph F by signing its edges with +, and the edges of its
∑
complement with −. It is easy to check that F ∈F simp Fb = Ok (the graph with no
∑ k
b) = f (Ok ) = f (O1 )k = 1. So the values f (Fb) form
edges), and hence simp f (F
F ∈Fk
a probability distribution σk on Fksimp . It is clear that this distribution is invariant
under isomorphism, so we get a random graph model (σk ).
∑ b = F K1 , and so
It is also easy to check that H: H ′ =F H
∑
f (Fb) = f (F
[ K1 ) = b
f (H).
H: H ′ =F
This means that generating a random graph from σk+1 , and deleting its last node,
we get a random graph from σk . So the model is consistent.
To show that the model is local, let S and T be disjoint subsets of [k], and let
FS and FT be two simple graphs on S and T , respectively. Let G be a random
graph from σk . Then
( ) ∑ ( )
P G[S] = FS , G[T ] = FT = P G[S ∪ T ] = H
V (H)=S∪T
H[S]=FS ,H[T ]=FT
∑
= b = f (F
f (H) cS F
cT ) = f (F
cS )f (F
cT )
V (H)=S∪T
H[S]=FS ,H[T ]=FT
( ) ( )
= P G[S] = FS P G[T ] = FT .
Thus the model is local.
(c)↔(d): This follows by Proposition 11.9 and the discussion before it.
(c)→(a): Generate a random graph Gn from the consistent local random graph
model. By Lemma 11.8, we get a convergent graph sequence with probability 1.
(d)↔(e): We have seen that a graph sequence is convergent if and only if it is
Cauchy in the cut distance. Every point in the completion is defined by a Cauchy
sequence, which tends to a graphon W . Two Cauchy sequences define the same
point of the completion if and only if merging them we get a Cauchy sequence,
which implies that they have the same limit graphon (up to weak isomorphism).
Conversely, every graphon is the limit of a Cauchy sequence (for example, the
sequence of W -random graphs), and so it corresponds to a point in the completion.
For the sum of powers of the eigenvalues we have the graph-theoretic expression
∑ ∑
λk = t(Ck , Wn ) and λk = t(Ck , W ).
λ∈Spec(Wn ) λ∈Spec(W )
196 11. CONVERGENCE OF DENSE GRAPH SEQUENCES
The functionals λi (W ) and λ′i (W ) are invariant under measure preserving trans-
formations, and so they can be considered as a functional on the space (W, f δ ).
The theorem shows that these functionals are continuous. Of course, similar con-
clusion holds for the eigenvalues of kernels in Wf1 . By compactness, these maps are
uniformly continuous, which can be stated as follows:
Corollary 11.55. For ε > 0 and every i ≥ 1, there is a δi > 0 such that if
U, W ∈ W1 and δ (U, W ) ≤ δi , then
|λi (U ) − λi (W )| ≤ ε and |λ′i (U ) − λ′i (W )| ≤ ε.
Example 11.56. If (Gn ) is a quasirandom sequence with density p, then the largest
normalized eigenvalue of Gn tends to p, while the others tend to 0. The limiting
graphon, the identically-p function, has one nonzero eigenvalue (namely p).
The last example suggests that perhaps convergent graph sequences can be
characterized through the convergence of their spectra, since if (Gn ) is a sequence
of graphs such that the edge density on Gn tends to p, the largest normalized
eigenvalue of Gn tends to p, and all the other eigenvalues tend to 0, then (Gn ) is
quasirandom. There is no real hope for this, however:
Example 11.57. Consider two non-isomorphic graphs G1 and G2 with the same
spectrum (for example, the incidence graphs of two non-isomorphic finite projective
planes of the same order). Consider the blow ups G1 (n) and G2 (n), n = 1, 2, . . . ,
and merge them into a single sequence. This sequence is not convergent, but all
graphs in it have the same spectra except for the multiplicity of 0.
Exercise 11.58. Prove that for any finite simple graph G, λi (G) ≥ −(i −
1)/(v(G) − i + 1).
Theorem 11.59. Let (Gn ) be a sequence of graphs such that Gn → W . Then the
graphs Gn can be labeled so that ∥WGn − W ∥ → 0.
Proof. Let Pn be a partition of [0, 1] into consecutive intervals of length
1/v(Gn ). By Proposition 9.8, we have that ∥W − WPn ∥ → 0, so combined
with the assumption that δ (W, WGn ) → 0 we see that δ (WPn , WGn ) → 0.
Here δ (WPn , WGn ) = δ (W/Pn , Gn ) can be thought of as the distance of two
weighted graphs on the same number of nodes, so by Theorem 9.29, we get that
δb (W/Pn , Gn ) → 0. This means that the graphs in the sequence (Gn ) can be
relabeled to get a graph sequence (G′n ) such that
∥WPn − WG′n ∥ = d (W/Pn , G′n ) → 0.
Since ∥W − WPn ∥ → 0, this proves the Theorem.
Exercise 11.60. Let us extend the definition of the distance δb to the case when
one of the arguments is a graphon U : δb (U, G) = minG′ ∥U − WG′ ∥ , where
G′ ranges over all relabeled versions of G. (a) Prove that if Gn → U , then
δb (U, Gn ) → 0. (b) Show by an appropriate construction that the following
stronger statement is not true: there exists a function f : [0, 1] × N∗ → [0, 1] such
that f (x, n) → 0 if x → 0 and n → ∞, and δb (U, G) ≤ f (δ (U, G), v(G)).
Exercise 11.61. Prove that a sequence of graphs is convergent if and only if they
have weak regularity partitions with convergent templates. More precisely, (Gn )
is convergent if and only if for every k ∈ N∗ V (Gn ) has
√ a k-partition Pk,n such
that (a) for every n, we have d (Gn , (Gn )Pn,k ) ≤ 10/ log k, and (b) for every k,
the template graphs Gn /Pn,k converge to some weighted graph Hk on k nodes as
n → ∞.
11.8. First applications
As a first illustration how our graph limits help proving theorems about finite
graphs, we describe two proofs working with graph limits of two important results: a
characterization of quasirandom graphs (see Section 1.4.2) and the Removal Lemma
(see Section 11.8.2). This should illustrate not only that graph limits are useful, but
also the method of obtaining a limit graphon from a sequence of counterexamples.
11.8.1. Quasirandom graphs. We start with quasirandom graphs. Let (Gn )
be a quasirandom sequence with density p (see Examples 11.37 and 11.56). This
means that t(F, Gn ) → pe(F ) for every simple graph F , or even simpler, Gn → p
(the identically-p graphon). We mentioned in the introduction the surprising fact,
due to Chung, Graham and Wilson [1989], that it is enough to require this relation
for F = K2 and F = C4 :
Theorem 11.62. If (Gn ) is a sequence of simple graphs such that v(Gn ) → ∞,
t(K2 , Gn ) → p, and t(C4 , Gn ) → p4 , then (Gn ) is quasirandom with density p.
Proof. Suppose that (Gn ) is not quasirandom, i.e., there is a simple graph F
such that t(F, Gn ) ̸→ pe(F ) . We can select a subsequence for which t(F, Gn ) →
c ̸= pe(F ) , and then we can select a convergent subsequence. Let W be its limit
graphon; then t(K2 , W ) = p, t(C4 , W ) = p4 , and W ̸= pJ as t(F, W ) = c ̸= pe(F ) .
To get a contradiction, it suffices to prove:
Claim 11.63. If W is a kernel such that t(K2 , W ) = p and t(C4 , W ) = p4 for
some real number p, then W = pJ.
198 11. CONVERGENCE OF DENSE GRAPH SEQUENCES
Since we have equality, the function txy ( , W ) must be constant, and by integra-
tion we see that its value is p . This means that W ◦ W = txy ( , W ) = p2 J.
2
2
This means that the operator TW ◦W = TW has a single nonzero eigenvalue p2 with
eigenfunction ≡ 1. But then trivially TW has a single eigenvalue ±p with the same
eigenfunction, i.e., W ≡ p or W ≡ −p. The condition t(K2 , W ) = p rules out the
second alternative.
11.8.2. Removal Lemma. One of the first, most famous, and in a sense in-
famous consequences of the Regularity Lemma was proved by Ruzsa and Szemerédi
[1976].
Lemma 11.64 (Removal Lemma). For every ε > 0 there is an ε′ > 0 such that
if a simple graph G with n nodes has at most ε′ n3 triangles, then we can delete εn2
edges from G so that the remaining graph has no triangles.
This lemma sounds innocent, almost like a trivial average computation. This
is far from the truth! No simple proof is known, and (worse) all the known proofs
give a terrible dependence of ε′ on ε. The best bound, due to Fox [2011] gives an
ε′ such that 1/ε′ is a tower 22 of height about log(1/ε). The original proof gives
2···
a tower √of height about 1/ε2 . Perhaps this looks friendlier (?) if we write it as
∗ ′
ε ≈ 1/ log (1/ε ). The proof given below does not give any explicit bound, but it
illustrates the way graph limit theory can be used.
Proof. Suppose that the lemma is false. This means that there is an ε > 0
and a sequence of graphs (Gn ) such that t(K3 , Gn ) → 0 but deleting any set of εn2
edges, the remaining graph will contain a triangle. By selecting a subsequence, we
may assume that t(F, Gn ) is convergent for every simple graph F , and then there is a
graphon W such that Gn → W . We have then t(K3 , W ) = limn→∞ t(K3 , Gn ) = 0.
The condition on the deletion of edges is harder to deal with, because it does
not translate directly to any property of the limit graphon W . What we can do
is to “pull back” information from W to the graphs Gn . By Theorem 11.59, we
may assume that ∥WGn − W ∥ → 0. (This step is not absolutely necessary, but
convenient.)
Let S = {(x, y) ∈ [0, 1]2 : W (x, y) > 0}. By Lemma 8.22, we have
∫ ∫
(1 − 1S )WGn → (1 − 1S )W = 0,
[0,1]2 [0,1]2
∫
so we can choose n large enough so that (1 − 1S )WGn < ε/4. Let V (Gn ) = [N ],
Ji = [(i − 1)/N, i/N ], and Rij = Ji × Jj .
We modify Gn by deleting the edge ij if λ(S ∩ Rij ) < 3/(4N 2 ).
Claim 11.65. The remaining graph G′n is triangle-free.
Indeed, suppose that i, j, k are three nodes such that λ(S ∩ Rij ) ≥ 3/(4N 2 ),
λ(S ∩ Rjk ) ≥ 3/(4N 2 ) and λ(S ∩ Rik ) ≥ 3/(4N 2 ). Observe that t(K3 , W ) = 0
11.8. FIRST APPLICATIONS 199
Vesztergombi [2012] for a treatment of the case when G is also weighted, and also
for consequences of these results in statistical physics.)
Recall that for a fixed weighted graph H, the value hom(G, H) grows expo-
nentially with n2 , and so a reasonable normalization is to consider the (dense)
homomorphism entropy
log hom(G, H)
ent(G, H) = .
v(G)2
The first, “naive” notion of right-convergence would be to postulate that these
homomorphism entropies converge for all weighted graphs H with (say) positive
edgeweights. This is at least a necessary condition for convergence:
Proposition 12.1. Let (Gn ) be a convergent graph sequence. Then for every
weighted graph H with positive edgeweights, the sequence ent(Gn , H) is convergent.
To prove this proposition, let us recall from Example 5.19 that the homomor-
phism entropy can be approximated by the maximum weighted multicut density:
Let G be a simple graph on [n], and H, a weighted graph on [q] with positive
edgeweights and αH = 1. Define Bij = log βij (H), then
log hom(G, H) log q
(12.1) cut(G, B) ≤ ≤ cut(G, B) + ,
n2 n
where cut(G, B) is the maximum weighted multicut density
1 ∑
cut(G, B) = max Bij eG (Si , Sj ).
(S1 ,...,Sq )∈Πn n2
i,j∈[q]
We need a couple of facts about weighted multicut densities. First, they are
invariant under blow-ups:
Lemma 12.2. For a simple graph G, symmetric matrix B ∈ Rq×q and integer
k ≥ 1, we have
cut(G(k), B) = cut(G, B).
Proof. The inequality cut(G(k), B) ≥ cut(G, B) is clear, since every q-
partition of V (G) can be lifted to a q-partition of V (G(k)), contributing the same
value to the maximization in the definition of cut(G(k), B). To prove the reverse
inequality, let (S1 , . . . , Sq ) be the q-partition of V (G(k)) attaining the maximum in
the definition of cut(G(k), B). For every node v ∈ V (G), we pick a random element
v ′ ∈ V (G(k)) uniformly from the set of twins of v created when blowing it up, and
let T = {v ′ : v ∈ V (G)}. Let G′ = G(k)[T ]. Then G′ ∼ = G, and
(1 ∑ ) 1 ∑
E 2 Bij eG′ (Si ∩ T, Sj ∩ T ) = 2
Bij eG(k) (Si , Sj )
n (nk)
i,j∈[q] i,j∈[q]
= cut(G(k), B).
It follows that for at least one choice of the nodes v ′ , we have
1 ∑
cut(G′ , B) ≥ 2 Bij eG′ (Si ∩ T, Sj ∩ T ) ≥ cut(G(k), B).
n
i,j∈[q]
Lemma 12.3. For two simple graphs G and G′ and symmetric matrix B ∈ Rq×q ,
we have
|cut(G, B) − cut(G′ , B)| ≤ q 2 δ (G, G′ ).
Proof. We start with proving the weaker inequality
(12.2) |cut(G, B) − cut(G′ , B)| ≤ q 2 δb (G, G′ )
in the case when v(G) = v(G′ ) = n. We may assume that G and G′ are optimally
overlayed, so that V (G) = V (G′ ) = [n] and δb (G, G′ ) = d (G, G′ ). Then for every
partition (S1 , . . . , Sq ) ∈ Πn , we have
1 ∑ 1 ∑
2 Bij eG (Si , Sj ) − 2 Bij eG′ (Si , Sj )
n n
i,j∈[q] i,j∈[q]
1 ∑
≤ 2 Bij |eG (Si , Sj ) − eG′ (Si , Sj )| ≤ q 2 d (G, G′ ).
n
i,j∈[q]
This proves (12.2). To get the more general inequality in the lemma, we apply
(12.2) to the graphs G(n′ k) and G′ (nk), where n = v(G), n′ = v(G′ ), and k is a
positive integer. The left side equals |cut(G, B) − cut(G′ , B)| for any k by Lemma
12.2, while the right side tends to q 2 δ (G, G′ ) if k → ∞ by the definition of δ .
Proof of Proposition 12.1. By the Theorem 11.3, we have δ (Gn , Gm ) → 0
as n, m → ∞; by Lemma 12.3, this implies that the sequence of numbers cut(Gn , B)
is a Cauchy sequence; by (12.1), it follows that the values ent(Gn , H) form a Cauchy
sequence.
It would be a natural idea here to define convergence of a graph sequence
in terms of the convergence of the homomorphism entropies ent(Gn , H). How-
ever, this notion of convergence would not be equivalent to left-convergence, and it
would allow sequences that we would not like to consider “convergent”, as Example
12.4 below shows. (Some suspicion could have been raised by (12.1) already: the
nodeweights of H disappeared, which indicated loss of information.)
Example 12.4. Let (Fn ) be a quasirandom graph sequence with edge density p,
and let (Gn ) be a quasirandom graph sequence of density 2p, where (to keep the
notation simple), we assume that v(Fn ) = v(Gn ) = n. Then we have, for every
weighted graph H with positive edgeweights,
1 ∑ 1
ent(Fn , H) = 2 max Bij eFn (Si , Sj ) + O( )
n (S1 ,...,Sq )∈Pn n
i,j∈[q]
∑ ( |S | |S | ) 1
i j
= max Bij p + o(1) + O( )
(S1 ,...,Sq )∈Pn n n n
i,j∈[q]
Applying the same computation to Gn , we get that for the graphs G2n (disjoint
union of two copies of Gn ), we have
log hom(G2n , H) log(hom(Gn , H)2 ) log hom(Gn , H)
ent(G2n , H) = = =
(2n)2 4n2 2n2
1
= ent(Gn , H) = p max{xT Bx : x ∈ Rq+ , xT 1 = 1} + o(1).
2
204 12. CONVERGENCE FROM THE RIGHT
So merging the sequences (Fn ) and (G2n ) we get a graph sequence for which the
quantities ent(Gn , H) converge for every H, but which is clearly not convergent
(check the triangle density!).
12.1.2. Typical homomorphisms. ∏ Let us try to take the nodeweights of H
into account. The values αφ = v∈V (G) αφ(v) form a probability distribution on the
maps φ : [n] → [q], where (by the Law of Large Numbers) we have |φ−1 (i)| ≈ αi n
with high probability, if n is large. However, this information becomes irrelevant
as n → ∞, and only the largest term will count rather than the “typical”. It turns
out that it is often advantageous to restrict ourselves to maps that are ”typical”,
by forcing φ to divide the nodes in the given proportions. Let Π(n, α) denote the
set of partitions (V1 , . . . , Vq ) of [n] into q parts with ⌊αi n⌋ ≤ |Vi | ≤ ⌈αi n⌉, , and
consider the set of maps
{ }
Φ(n, α) = φ ∈ [q]n : ⌊αi n⌋ ≤ |φ−1 (i)| ≤ ⌈αi n⌉ for all i ∈ [q] .
[ √ √ ]
(We could be less restrictive and allow, say φ−1 (i) ∈ αi n − n, αi n + n . This
would not change the considerations below in any significant way.)
We define a modified homomorphism number, by summing only over the “typ-
ical” homomorphisms:
∑ ∏
hom∗ (G, H) = αφ βφ(u)φ(v) .
φ∈Φ(n,α) uv∈E(G)
Exercise 12.5. Show that if αi = 1/q for all i and n < q, then hom∗ (G, H) =
tinj (G, H).
Exercise 12.6. Let (Gn ) be a quasirandom sequence with edge density p, and
let F be a simple graph such that 2e(F [S]) ≤ q|S|2 for every subset S ⊆ V (F ).
Prove that cut(Gn , F ) ≤ pq + o(1) (n → ∞).
This notion does not quite extend the maximum restricted weighted multicut of
graphs G; the reason is that in a graph, we cannot partition the set of nodes in
exactly the desired proportions. But the difference is small; we will come back to
this question in Section 12.4.1.
We can generalize even further and define, for two kernels U and W ,
∫
( )
C(U, W ) = sup ⟨U, W φ ⟩ = sup U (x, y)W φ(x), φ(y) dx dy.
φ∈S[0,1] φ∈S[0,1]
[0,1]2
It is easy to see that this extends the definition of maximum restricted weighted
multicuts in the sense that if U is any graphon and H is a weighted graph, then
(12.6) C(U, H) = C(U, WH ).
The functional C(U, W ), which we call the overlay functional, has many good
properties. It follows just like the similar statement for norms in Theorem 8.13 that
(12.7) C(U, W ) = sup ⟨U, W φ ⟩ = sup ⟨U φ , W ⟩ = sup ⟨U φ , W ψ ⟩
φ∈S[0,1] φ∈S[0,1] φ,ψ∈S [0,1]
{ }
= sup ⟨U0 , W0 ⟩ : (∃φ, ψ ∈ S [0,1] ) U = U0φ , W = W0ψ .
Hence it follows that the overlay functional is invariant under measure preserving
f0 × W
transformations of the kernels, i.e., it is a functional on the space W f0 . It is also
206 12. CONVERGENCE FROM THE RIGHT
immediate from the definition that this quantity has the (somewhat unexpected)
symmetry property C(U, W ) = C(W, U ), and satisfies the inequalities
(12.8) ⟨U, W ⟩ ≤ C(U, W ) ≤ ∥U ∥2 ∥W ∥2 , C(U, W ) ≤ ∥U ∥∞ ∥W ∥1 .
This suggests that C(., .) behaves like some kind of inner product. This analogy
is further supported by the following identity, reminiscent of the cosine theorem,
relating it to the distance δ2 derived from the L2 -norm:
1( )
(12.9) C(U, W ) = ∥U ∥22 + ∥W ∥22 − δ2 (U, W )2
2
1( )
= δ2 (U, 0)2 + δ2 (W, 0)2 − δ2 (U, W )2 .
2
Indeed,
δ2 (U, W )2 = inf ∥U − W φ ∥22 = ∥U ∥22 + ∥W ∥22 − 2 sup ⟨U, W φ ⟩
φ∈S[0,1] φ∈S[0,1]
Now if W is an arbitrary kernel, then for every ε > 0 we can find a stepfunction W ′
such that ∥W −W ′ ∥1 ≤ ε/2. We know that C(Un , W ′ ) → 0, and hence C(Un , W ′ ) ≤
ε/2 if n is large enough. But then
C(Un , W ) ≤ C(Un , W − W ′ ) + C(Un , W ′ ) ≤ ∥Un ∥∞ ∥W − W ′ ∥1 + ε/2 ≤ ε.
This shows that lim supn C(Un , W ) ≤ 0, and completes the proof.
where φ is measurable, but not necessarily measure preserving. Prove the formulas
maxcut(G) = C ∗ (WG , WK2 ),
and ( )
∥U ∥ ≈ C ∗ U, 1(x, y ≤ 1/2) ,
where the ≈ sign means equality up to a factor of 2.
Here
∫ ∫ ( )
W− W ≤ λ (Si × Sj )△(Si′ × Sj′ )
Si ×Sj Si′ ×Sj′
that d (U/P, W/P) ≤ ∥U − W ∥ for any q-partition P of [0, 1]. By the definition
of Hausdorff distance, this implies that
( )
dHaus
Qq (U ), Qq (W ) ≤ ∥U − W ∥ = δ (U, W ).
Lemma 12.11. For any two graphons U and W and any integer q ≥ 1, we have
( ) ( ) ( )
dHaus
Qq (U ), Qq (W ) ≤ sup dHaus
Qa (U ), Qa (W ) ≤ 4dHaus
Qq (U ), Qq (W )
a
(where a ranges over all probability distributions on [q]).
Proof. The first inequality is easy: let H ∈ Qq (U ), then H ∈ Qb (U ) for the
distribution b = α(H). Hence
( ) ( ) ( )
d H, Qq (W ) ≤ d H, Qb (W ) ≤ dHaus Qb (U ), Qb (W )
( )
≤ sup a dHaus
Qa (U ), Qa (W ) .
Since this holds for every H ∈ Qq (U ), and analogously for every graph in Qq (W ),
the inequality follows by the definition of the Hausdorff distance.
To prove the second inequality, let a be any probability distribution on [q]
and H ∈ Qa (U ).( For every ε > ) 0, there is a quotient L ∈ Qq (W ) such that
d (H, L) ≤ dHaus
Q q (U ), Q q (W ) + ε. Let L ∈ Qb (W ), then ∥a − b∥1 ≤ d (H, L)
by the definition of d (H, L). By Lemma 12.9, there is a quotient L′ ∈ Qa (W )
such that d (L, L′ ) ≤ d1 (L, L′ ) ≤ 3|a − b| + ε ≤ 3d (H, L) + ε. Thus
( )
d (H, L′ ) ≤ d1 (H, L) + d (L, L′ ) ≤ 4d (H, L) + ε ≤ 4dHaus
Qq (U ), Qq (W ) + 5ε.
Since ε was arbitrary, this proves the lemma.
12.3.2. Graphon convergence from the right. After this preparation, we
are ready to characterize convergence of a graphon sequence in terms of homomor-
phisms into fixed weighted graphs.
Theorem 12.12. For any sequence (Wn ) of graphons, the following are equivalent:
(i) the sequence (Wn ) is convergent in the cut distance δ ;
(ii) the overlay functional values C(Wn , U ) are convergent for every kernel U ;
(iii) the restricted multicut densities C(Wn , H) are convergent for every simple
graph H;
(iv) the quotient sets Qq (Wn ) form a Cauchy sequence in the dHaus Hausdorff
metric for every q ≥ 1.
It follows from conditions (ii) and (iii) that it would be equivalent to assume
the convergence of the sequence C(Wn , H) for every weighted graph H. Lemma
12.11 implies that we could require in (iv) the convergence of Qa (Wn ) for every
q ≥ 1 and probability distribution a on [q]. In fact, it would be enough to require
this for the uniform distribution (see Exercise 12.24). In (iv), we could use the
dHaus
1 Hausdorff metric as well.
Proof. (i)⇒(ii) by Lemma 12.7. (ii)⇒(iii) is trivial. (i)⇒(iv) by Lemma 12.10.
(iii)⇒(i): Let (Wn ) be a sequence of graphons that is not convergent in the
cut distance. By the compactness of the graphon space, it has two subsequences
(Wni ) and (Wmi ) converging to different unlabeled graphons W and W ′ . There is
a graphon U such that C(W, U ) ̸= C(W ′ , U ); in fact, (12.9) implies
( ) ( )
C(W ′ , W ′ ) − C(W ′ , W ) + C(W, W ) − C(W ′ , W ) = δ2 (W ′ , W )2 > 0,
210 12. CONVERGENCE FROM THE RIGHT
1 ∑
≤ 2 |βij (Ln ) − βij (L′ )| = d1 (Ln , L′ ) ≤ q 2 d (Ln , L′ )
q i,j
( )
≤ q 2 dHaus
Qa (Wn ), Qa (Wn ) .
By Lemma 12.11, we have
( ) ( )
dHaus
Qa (Wn ), Qa (Wn ) ≤ 4dHaus
Qq (Wn ), Qq (Wn ) ,
which tends to 0 as n, m → ∞ by hypothesis. This implies that
( )
lim sup C(Wn , H) − C(Wm , H) ≤ 0.
n
Since
( a similar conclusion ) holds with n and m interchanged, we get that
C(Wn , H) : n = 1, 2, . . . is a Cauchy sequence.
Some of the arguments in the proof of Theorem 12.12, most notably the proof
of (iii)⇒(i), were not effective. One can in fact prove explicit inequalities between
the different distance measures that occur. We refer to Borgs, Chayes, Lovász, Sós
and Vesztergombi [2012] for the details.
Exercise 12.13. Show by an example that the set Qq (W ) is not closed in general,
but it is closed if W is a stepfunction.
Exercise 12.14. Show by an example that Qa (W ) is not convex in general, even
if W is a stepfunction.
Exercise 12.15. A fractional partition of [0, 1] into q parts is an ordered q-tuple
of measurable functions ρ1 , . . . , ρq : S → [0, 1] such that for all x ∈ [0, 1], we
have ρ1 (x) + · · · + ρq (x) = 1. For a fractional partition ρ of [0, 1] and a kernel
W ∈ W, we define the fractional quotient graph W/ρ as a weighted graph on [q]
with αi (W/ρ) = ∥ρi ∥1 and
∫
1
βij (W/ρ) = ρi (x)ρj (y)W (x, y) dx dy.
∥ρi ∥1 ∥ρj ∥1
[0,1]2
Exercise 12.16. Let ρ be a fractional q-partition of [0, 1]. Prove that W/ρ ∈
Qq (W ). Also proved that every weighted graph in Qq (W ) can be represented this
way.
12.4.1. Restricted quotients. Let G be a simple graph with nodeset [n] and
let P = (S1 , . . . , Sq ) be a partition of [n]. We consider the quotient graph G/P as
a weighted graph on [q], with node weights αi (G/P) = |Si |/n (i ∈ [q]), and edge
weights βij (G/P) = eG (Si , Sj )/|Si ||Sj | (i, j ∈ [q]).
The set of all weighted graphs G/P, where P ranges over all q-partitions of [n],
will be called the quotient set of G (of size q), and will be denoted by Qq (G). For
a graph G and a probability distribution a on [q], to define the restricted quotient
set Qa (G), we have to allow the relative sizes of the partition classes to deviate a
little from the prescribed values a: we consider the set of quotients G/P, where
P ∈ Π(n, a).
Quotient sets can be used to express multicut functions. For every weighted
graph H,
∑
(12.15) cut(G, H) = max αi (L)αj (L)βij (L)βij (H).
L∈Qa (G)
i,j∈[q]
Note that the nodeweights of H and L are not the same in general, but almost:
|αi (L) − αi (H)| ≤ n1 .
Remark 12.17. The quotient sets are in a sense dual to the (multi)sets of induced
subgraphs of a given size, which was one of the equivalent ways of describing what
we could see by sampling. Instead of gaining information about a large graph by
taking a small subgraph, we take a small quotient.
However, there are substantial differences. On the set of induced subgraphs of a
given size, we had a probability distribution, which carried the relevant information.
We can also introduce a probability distribution on quotients of a given size of a
graph G, by taking a random partition. This would be quite relevant to statistical
physics, but we would run into difficulties when tending to infinity with the size
of G. The probability distributions would concentrate more and more on boring
average quotients, while the real information would be contained in the outliers.
To be more specific, a random induced subgraph (of a fixed, but sufficiently large
size) approximates the original graph well, but a random quotient does not carry
this information. In other words, it is the set of quotients that characterizes the
convergence of a graph sequence, and not the distribution on it.
212 12. CONVERGENCE FROM THE RIGHT
In the special case when every value ρi (u) is 0 or 1, then the supports of the
functions ρi form a partition P, and G/P = G/ρ.
We also introduce fractional quotient sets, replacing partitions by fractional
partitions. The set of all fractional q-quotients of a graph G is denoted by Q∗q (G),
and the set of all fractional q-quotients G/ρ for which α(G/ρ) is a fixed distribution
a on [q], by Q∗a (G).
12.4.3. Relations between quotient sets. Our goal is to use quotient sets
to characterize convergence of a graph sequence. But before doing so, we have to
formulate and prove a number of rather technical relationships between different
quotient sets.
For any simple graph G and positive integer q, we have two quotient sets: the
set Qq (G) of quotients G/P, and the set Q∗q (G) of fractional quotients G/ρ. In
addition, we have the restricted versions Qa of both of these. The quotient sets
Qq (WG ) and Qa (WG ) will also come up; but it is easy to see that these are just
the same as Q∗q (G) and Q∗a (G).
Turning to the quotient set Qq (G) (which is of course the most relevant from
the combinatorial point of view), it follows immediately from the definition that
Qq (G) ⊆ Q∗q (G). Note, however, that Qa (G) and Q∗a (G) are in general not com-
parable. The first set is finite, the second is typically infinite. On the other hand,
Qa (G) contains graphs whose nodeweight vector is only approximately equal to a,
and so it is not contained in Q∗a (G).
In the rest of this section we are going to prove that the “true” quotient sets
and their fractional versions are not too different, at least if the graph is large. We
will need the following version of Lemma 12.9, which can be proved along the same
lines.
Lemma 12.18. For (any simple graph) G and any two probability distributions a, a′
on [q], we have dHaus
1 Qa (G), Qa′ (G) ≤ 3∥a − a′ ∥1 .
The two kinds of quotient sets of the same graph are related by the following
proposition.
12.4. RIGHT-CONVERGENT GRAPH SEQUENCES 213
Proposition 12.19. For every simple graph G on [n], integer q ≥ 1, and probability
distribution a on [q],
( ∗ ) 4q ( ∗ ) 16q
dHaus
1 Qq (G), Qq (G) ≤ √ and dHaus
1 Qa (G), Qa (G) ≤ √ .
n n
Proof. We start with the first inequality, whose proof gets somewhat technical.
Since Qq (G) ⊆ Q∗q (G), it suffices to prove that if H is a fractional q-quotient of G,
√
then there exists a q-quotient L of G such that d1 (H, L) ≤ 4q/ n. We may assume
that q ≥ 2 and n > 9q 2 ≥ 36 (else, the assertion is trivial).
Let ρ ∈ Π∗ (n, α) be a fractional partition such that G/ρ = H. We want to
“round” the values ρi (u) to ri (u) ∈ {0, 1} so that we get an integer partition in
Π(n, α) with “almost” ∑ the same quotient. Let A denote the adjacency matrix of G,
and define Fij (r) = u,v∈[n] Auv ri (u)rj (v), then we want
∑ ∑
(12.16) ri (u) = 1, ⌊αi n⌋ ≤ ri (u) ≤ ⌈αi n⌉, Fij (r) ≈ Fij (ρ)
i u
and hence ∑ ∑
E( Xi2 ) = Var(Xi ) < n.
i i
Furthermore,
(12.17)
( ) ∑ ( )
Var(Yij ) = Var Fij (R) = Auv Au′ v′ cov Ri (u)Rj (v), Ri (u′ )Rj (v ′ ) .
u,v,u′ ,v ′ ∈[n]
Each covariance in this sum depends on which of u, v, u′ , v ′ and also which of i and
j are equal, but each case is easy to treat. The covariance term is 0 if the edges uv
and u′ v ′ are disjoint. If i ̸= j, we get:
ρi (u)ρj (v) − ρi (u)2 ρj (v)2 < ρi (u)ρj (v), if u = u′ and v = v ′ ,
−ρ (u)ρ (v)ρ (u)ρ (v) < 0,
i i j j if u = v ′ , v = u′ ,
ρj (v)ρj (v ′ )(ρi (u) − ρi (u)2 ) < ρi (u)ρj (v)ρj (v ′ ), if u = u′ and v ̸= v ′ ,
−ρi (u)ρj (v)ρi (v)ρj (v ′ ) < 0, if u = v ′ , u′ ̸= v.
(The other possibilities are covered by symmetry.) Summing over u, v, u′ , v ′ , we get
that the sum in (12.17) is at most (αi n)(αj n) + 0 + (αi n)(αj n)2 + (αi n)2 (αj n) + 0.
214 12. CONVERGENCE FROM THE RIGHT
The case when i = j can be treated similarly, and we get that the sum in (12.17)
is at most 2(αi n)2 + 4(αi n)3 . Hence, summing over all i and j, we get
∑ ∑ ∑ ∑
E(Yij2 ) = Var(Yij ) ≤ n2 + (n2 + 2n3 ) αi2 + 4n3 αi3 ≤ 6n3 + 2n2 .
i,j i,j i i
By Cauchy–Schwarz,
(∑ |X | ∑ |Y | )2 (1 ∑ 1 ∑ 2)
i ij
d1 (H, L)2 = + ≤ (q 2
+ q) Xi
2
+ Y ,
i
n i,j
n2 n2 i n4 i,j ij
and so
(1
6 2 ) 16q 2
E(d1 (H, L)2 ) = (q 2 + q) +
+ 2 < .
n n √ n n
Hence with positive probability, d1 (H, L) ≤ 4q/ n.
The second inequality in the proposition is quite easy to prove now, except
that we cannot use containment in either direction, and so we have to prove two
“almost containments”. Let H ∈ Qa (G) and let b = α(H), ∥a − b∥1 ≤ q/n,
and H ∈ Q∗b (G). By Lemma ∗
√ 12.18, there is an L ∈ Qa (G) such that d1 (H, L) ≤
3∥a − b∥1 ≤ 3q/n < 16q/ n.
Conversely, let H ∈ Q∗a (G), then ′
√ by part (a), there exists a q-quotient H ∈
′
Qq (G) such that d1 (H, H ) ≤ 4q/ n. Lemma 12.18 implies that there exists an
L ∈ Q∗a (G) such that
d1 (L, H ′ ) ≤ 3|a − α(H ′ )| = 3|α(H) − α(H ′ )| ≤ 3d1 (H, H ′ ),
√
and so d1 (L, H) ≤ d1 (L, H ′ ) + d1 (H ′ , H) ≤ 4d1 (H, H ′ ) ≤ 16q/ n.
12.4.4. Right-convergent graph sequences. In a sense, right-convergence
of a graph sequence is a special case of right-convergence of a graphon sequence.
However, quantities like multicuts associated with a graphon of the form WG are
only approximations of the analogous combinatorial quantities associated with the
corresponding graph G. (This is in contrast with the homomorphism densities
from the left, recall e.g. (7.2).) In this section we prove that these approximations
are good enough for the equivalent characterizations of convergence of graphon
sequences to carry over to graph sequences.
We prove the following characterization of convergence of a dense graph se-
quence, analogous to the characterization of convergence of a graphon sequence
given in Theorem 12.12.
Theorem 12.20. Let (Gn ) be a sequence of simple graphs such that v(Gn ) → ∞
as n → ∞. Then the following are equivalent:
(i) the sequence (Gn ) is convergent;
(ii) the overlay functional values C(WGn , U ) are convergent for every kernel U ;
(iii) the restricted multicut densities cut(Gn , H) are convergent for every simple
graph H;
(iv) the quotient sets Qq (Gn ) are Cauchy in the Hausdorff metric for every
q ≥ 1.
Clearly, conditions (ii) and (iii) are also equivalent to the convergence of
cut(Gn , H) for every weighted graph H. By (12.5), this is equivalent to the con-
vergence of typical homomorphism entropies ent∗ (G, J) for every weighted graph
J with positive edgeweights. By our discussion in Section 2.2, we could talk about
12.4. RIGHT-CONVERGENT GRAPH SEQUENCES 215
Exercise 12.23. Prove that maxH |C(U, H) − C(W, H)|, where the maximum is
taken over all weighted graphs H on [q] with nodeweight
( vector a) and edgeweights
( )
in [−1, 1], is equal to the Hausdorff distance dHaus
1 conv(Qa (U ) , conv Qa (U ) ).
Exercise 12.24. Prove that a sequence (Wn ) of graphons is convergent if and
only if the quotient sets Qu (Wn ) are convergent in the Hausdorff metric for every
q ≥ 1, where u is the uniform distribution on [q].
Exercise 12.25. Let (Gn ) be a quasirandom graph sequence with edge density
1/2, such that v(Gn ) ∑ = kn is a sufficiently fast increasing sequence of integers.
Define W (x, y) = 1 + ∞ n=1 WG2n (k2n x, k2n y) (where WGn (x, y) = 0 if x ∈
/ [0, 1]
or y ∈
/ [0, 1]). Prove that the kernel W is 1-2 valued, and
log t(G2n , W ) 1 log t(G2n+1 , W ) 1
2
= + o(1), 2
= + o(1).
k2n 2 k2n+1 4
CHAPTER 13
At this point, there are some rather technical questions to address. Do we want
to assume that J is a standard probability space (see Appendix A.3)? Do we want
to consider A as a complete sigma-algebra with respect to the probability measure
(like Lebesgue measurable sets in [0, 1]), or to the contrary, do we want to assume
that it is countably generated (like Borel sets in [0, 1])?
It was shown by Borgs, Chayes and Lovász [2010] that a kernel on an arbitrary
probability space can be transformed, by very simple steps, into a kernel on a stan-
dard probability space, which is equivalent for all practical purposes (in particular,
weakly isomorphic). The steps of such a transformation are described in Exer-
cises 13.11–13.13 below. This implies that we can work with standard probability
spaces whenever necessary (or just convenient). We call a kernel standard, if the
underlying probability space is standard.
From the point of view of subgraph densities, multicuts, etc. the underlying
space does not matter much, as we shall see; but choosing the underlying probability
space appropriately may lead to a simpler form for the function W and to simpler
computations. If the space J is finite, we get just a weighted graph with normalized
nodeweights. Let us see a number of further examples where allowing this more
general form is very useful (and so Example 11.41 was not an isolated occurrence).
Example 13.2. Fix some d ≥ 2, and let Vn be a set of n unit vectors in Rd , chosen
independently from the uniform distribution on the unit sphere. Connect two
elements x, y ∈ Vn by an edge if and only if xT y ≥ 0, to get a graph Gn = (Vn , En ).
The sequence (Gn : n = 1, 2, . . . ) is convergent, and its limit is the graphon whose
underlying set is S d−1 , with the uniform distribution, and W (x, y) = 1(xT y ≥ 0).
13.1.1. Atomfree and twin-free kernels. There are still ways to further
simplify a kernel (J, W ) on a standard probability space (Ω, A, π). One possibility
is to get rid of the atoms by a procedure generalizing the construction of the kernel
WH from a weighted graph H, by assigning to each atom a an interval Ia of length
π(a), and an interval I to the atom-free part of Ω, so that these intervals partition
[0, 1]. This defines a measure preserving map ψ : [0, 1] → Ω, and the pullback W φ
will define a kernel on [0, 1] that is weakly isomorphic to (J, W ).
This procedure takes us to a very familiar domain (two-variable real functions),
but the kernel on [0, 1] is still not uniquely determined by its weak isomorphism
class, as we have seen in Example 7.11. To really standardize a kernel, we go the
opposite way, by creating and merging atoms as much as we can. To be more
precise, we need some definitions.
13.1. THE GENERAL FORM OF A GRAPHON 219
for all A, B ∈ A. (Note that this holds for A, B ∈ A′ by the definition of conditional
probability.) Consider the functions
∫ ∫
UA = W (., y) dπ(y), gA = E(1A | A′ ), VA = W (., y)gA (y) dπ(y).
A Ω
′
These functions are A -measurable by the definition of twins. Furthermore, gA
is the orthogonal projection of 1A into the space of A′ -measurable functions,
gAB (x, y) = gA (x)gB (y) is the orthogonal projection of 1A×B into the space of
A′ × A′ -measurable functions, and by definition, W ′ is the orthogonal projection
of W into this space. Using these observations, we have
∫
W dπ × dπ = ⟨1B , UA ⟩ = ⟨gB , UA ⟩ = ⟨VB , 1A ⟩ = ⟨VB , gA ⟩
A×B
∫
= ⟨W, gAB ⟩ = ⟨W , gAB ⟩ = ⟨W , 1A×B ⟩ =
′ ′
W ′ dπ × dπ.
A×B
Clearly SR,r ∈ A′ . Furthermore, if x and x′ are not twins, then W (x, .) and
′
W
∫ (x , .) differ on a ∫set of positive
′
∫ a set R ∈ R
measure, and so there is ∫ such that
R
W (x, y) dπ(y) ̸
= R
W (x , y) dπ(y).
∫ Assume that (say)
∫ R
W (x, .) > R
W (x′ , .),
′
then for any rational number r with R W (x, .) > r > R W (x , .) we have x ∈ SR,r
but x′ ∈ / SR,r . So the countable family of sets SR,r separates any two points of Ω
that are not twins. It follows that the sets φ(SR,r ) ∈ A1 separate any two points
of Ω1 .
Claim 13.5 and Proposition A.4 in the Appendix complete the proof.
Exercise 13.6. Show that for the set J and function W in Example 13.1, several
different measures on J can yield the same—isomorphic—graphons.
Exercise 13.7. Consider two graphons U = (Ω, A, π, W ) and U ′ = (Ω, A, π ′ , W )
which only differ in their probability measures. Prove that δ1 (U, U ′ ) ≤
2dvar (π, π ′ ). [Hint: use Exercise 8.15.]
Exercise 13.8. Suppose that a graphon (Ω, A, π, W ) is defined on a metric space
(Ω, d), where A is the set of Borel sets, π is atom-free, and W is almost everywhere
continuous. Suppose that the sequence Sn ⊆ J is well distributed in the sense
that |Sn ∩ U |/|Sn | → π(U ) for every open set U . Then t(F, G(Sn , W )) → t(F, W )
for every simple graph F with probability 1.
This function is defined for almost all pairs x, y; we can delete those points from
J where W (x, .) ∈/ L1 (J) (a set of measure 0), to have rW defined on all pairs.
It is clear that rW is a pseudometric (it is symmetric and satisfies the triangle
inequality). We call rW the neighborhood distance on W .
Example 13.15 (Stepfunctions). For stepfunctions, the underlying metric space
is finite.
Example 13.16 (Spherical distance). Let S d denote the unit sphere in Rd+1 ,
consider the uniform probability measure on it, and let W (x, y) = 1 if x · y ≥ 0
and W (x, y) = 0 otherwise. Then (S d , W ) is a graphon, in which the neighborhood
distance of two points a, b ∈ S d is just their spherical distance (normalized by
dividing by π).
Example 13.17. Let (M, d) be a metric space, and let π be a Borel probability
measure on M . Then d can be viewed as a kernel on (M, d). For x, y ∈ M , we have
∫ ∫
rd (x, y) = |d(x, z) − d(y, z)| dπ(z) ≤ d(x, y) dπ(z) = d(x, y),
M M
so the identity map (M, d) → (M, rd ) is contractive. This implies that if (M, d)
is compact, and/or finite dimensional (in many senses of dimension), then so is
(M, rd ). For most ”everyday” metric (spaces )(like segments, spheres, or balls)
rd (x, y) can be bounded from below by Ω d(x, y) , in which case (M, d) and (M, rd )
are homeomorphic.
( More)generally, if F : [0, 1] → R is a continuous function, then W (x, y) =
F d(x, y) defines a kernel, and the identity map (M, d) → (M, rW ) is continuous.
A kernel (J, W ) is pure if (J, rW ) is a complete separable metric space and the
probability measure has full support (i.e., every open set has positive measure).
13.3. PURE KERNELS 223
This definition includes that rW (x, y) is defined for all x, y ∈ J and rW (x, y) > 0 if
x ̸= y, i.e., the kernel has no twins.
Theorem 13.18. Every twin-free kernel is isomorphic, up to a null set, to a pure
kernel.
Proof. Let (J, W ) be a twin-free kernel. Let T be the set of functions f ∈
L1 (J) such that for every L1 -neighborhood U of f , the set {x ∈ J : W (x, .) ∈ U }
has positive measure. Clearly T is a closed subset of L1 (J), and it is complete and
separable in the L1 -metric. Let J ′ be the set of points in J for which W (x, .) ∈ T ,
and let T ′ = {W (x, .) : x ∈ J ′ }. The map φ : J ′ → T ′ defined by x 7→ W (x, .)
is bijective, since (J, W ) is twin-free. The set T inherits a probability measure
π ′ = π ◦ φ−1 from J. It is easy to see from the construction that J ′ and T ′ are
measurable.
We claim that
(13.3) π(J \ J ′ ) = 0.
It is clear that for almost all x ∈ J, W (x, .) ∈ L1 (J). Every function g ∈ L1 (J) \ T
has an open∪ neighborhood Ug in L1 (J) such that π{x ∈ J : W (x, .) ∈ Ug } = 0.
Let U = g∈T / Ug . Since L1 (J) is separable, U equals the union of some countable
subfamily {Ugi : i ∈ N} and thus π{x ∈ J : W (x, .) ∈ U } = 0. Since J \ J ′ ⊆ U ,
this proves (13.3).
The functions W (x, .) (x ∈ J ′ ) are everywhere dense in T and have measure
1. So T is a complete separable metric space with a probability measure on its
Borel sets. It also follows from the definition of T that every open set has positive
measure, and (13.3) implies that π ′ (T \ T ′ ) = 0.
We define a kernel W ′ : T × T → [0, 1] as follows. Let f, g ∈ T . If f ∈ T ′ , then
f = W (x, .) for some x ∈ J, and we define W ′ (f, g) = g(x). Similarly, if g ∈ T ′ ,
then g = W (., y) and we define W ′ (f, g) = f (y). Note that if both f, g ∈ T ′ , then
this definition is consistent: W ′ (f, g) = f (y) = g(x) = W (x, y). If f, g ∈ / T ′ , then
′
we define W (f, g) = 0. We note that f and g are determined up to a zero set only;
we can choose any function representing them, and since we are changing W on a
set of measure 0 only, it remains measurable.
The kernel (T, W ′ ) is pure; indeed, we just have to check that rW ′ coincides
with the L1 metric on T ; then T will have all the right properties. For f, g ∈ T , we
have
∫ ∫
rW ′ (f, g) = |W ′ (f, y) − W ′ (g, y)| dπ ′ (y) = |W ′ (f, y) − W ′ (g, y)| dπ ′ (y)
′
∫T ∫ T
After all these changes, where W is left undefined is the set of “essential dis-
continuities” of W (of measure 0). It would be interesting to relate this set to
combinatorial properties of W .
′
for all x, x ∈ J . k
and hence
∑
m ∫
|tx (F, W ) − tx′ (F, W )| ≤ |W (xuj , xvj ) − W (x′uj , x′vj )| dy.
j=1 V \[k]
J
By the assumption that vi is unlabeled, we have xvj = x′vj for every j, and so
∑
m ∫
|tx (F, W ) − tx′ (F, W )| ≤ |W (xuj , xvj ) − W (x′uj , xvj )| dy
j=1 V \[k]
J
∑m
≤ rW (xuj , x′uj ) ≤ e(F ) max rW (xu , x′u ),
u∈[k]
j=1
Corollary 13.20. Let (J, W ) be a pure kernel, and let F = (V, E) be a k-labeled
graph with nonadjacent labeled nodes. Then tx (F, W ) is a continuous function of
x ∈ J S with respect to the metric rW .
In the case when F is a path of length 2, we get a corollary that will be
important in the next section.
Corollary 13.21. For every pure kernel (J, W ), W ◦ W is a continuous function
(in two variables) on the metric space (J, rW ).
Most applications of Corollary 13.20 use the following consequence:
13.4. THE TOPOLOGY OF A GRAPHON 225
Going to pure graphons is a good proof method, which can lead to nontrivial
results. We illustrate this by discussing properties of t(., W ) (W ∈ W) from the
point of view of Section 6.3.2. This parameter is multiplicative and reflection pos-
itive. If W is not a stepfunction, then t(., W ) has no contractor (else, 6.30 would
imply that it is a homomorphism function). On the other hand, it is contractible.
We prove this in a more general form.
Let F be a k-labeled multigraph, and let P = {S1 , . . . , Sm } be a partition of
[k]. We say that P is legitimate for F , if each set Si is stable in F . If this is the
case, then the m-labeled multigraph F/P (obtained by identifying the nodes in each
Si , and labeling the obtained node with i) has no loops. For a k-labeled quantum
graph g, we say that the partition P of [k] is legitimate for g if it is legitimate for
every constituent. Then we can define g/P by linear extension.
Proposition 13.23. Let g be a k-labeled quantum graph and P, a legitimate par-
tition for g. Let W ∈ W, and suppose that tx (g, W ) = 0 almost everywhere on
[0, 1]k . Then ty (g/P, W ) = 0 for almost all y ∈ [0, 1]|P| .
Proof. We may assume that W is pure. If tx (g, W ) = 0 almost everywhere,
then it holds everywhere by Corollary 13.20. In particular, it holds for every sub-
stitution where the variables corresponding to the same class of P are identified,
which means that ty (g/P, W ) is identically 0 on [0, 1]|P| .
Corollary 13.24. The multigraph parameter t(., W ) is contractible for every kernel
W.
Exercise 13.25. Show that using properties of pure kernels, one gets a very short
proof of the statement of Exercise 7.6.
13.4.1. The similarity distance. It was noted by Lovász and Szegedy [2007,
2010b] that for a pure graphon (J, W ), the distance function rW = rW ◦W defined
by the operator square of W is also closely related to combinatorial properties of a
graphon. We call this the similarity distance. In the special case of finite graphs,
this notion was defined in the Introduction, where the motivation for its name was
also explained. We will use the graph version to design algorithms in Section 15.4.1.
In explicit terms, we have
∫ ∫ ∫
rW (a, b) = rW ◦W (a, b) = W (a, y)W (y, x) dy − W (b, y)W (y, x) dy dx
J J J
∫ ∫
( )
(13.5) = W (x, y) W (a, y) − W (b, y) dy dx .
J J
(We write here and in the sequel dx instead of dπ(x), where π is the probability
measure of the graphon.)
Lemma 13.26. If (J, W ) is a pure graphon, then the similarity distance rW is a
metric.
Proof. The only nontrivial part of this lemma is that rW (a, b) = 0 implies
that a = b. The condition rW (a, b) = 0 implies that for almost all x ∈ J we have
∫
( )
W (x, y) W (a, y) − W (b, y) dy = 0.
J
Using that (J, W ) is pure, Corollary 13.22 implies that this holds for every x ∈ J.
In particular, it holds for x = a and x = b. Substituting these values and taking
the difference, we get that
∫
( )2
W (a, y) − W (b, y) dy = 0,
J
and hence W (a, y) = W (b, y) for almost all z. Using again that (J, W ) is pure, we
conclude that a = b.
for every measurable set A ⊆ J. We call this the weak topology on J. (We need this
name only temporarily, since we are going to show that rW gives a metrization of
the weak topology.) It is well known that this topology is metrizable.
Let J denote the completion of J in the weak topology. The map x 7→ W (x, .)
embeds J into L1 (J), and weak convergence corresponds to weak* convergence of
functions in L1 (J). Hence J corresponds to the weak* closure of J. It follows
in particular that J is a compact separable metric space (compactness follows by
Aleoglu’s Theorem, since J is a closed subset of the unit ball of L1 (J)).
13.4. THE TOPOLOGY OF A GRAPHON 227
Here the inner integral tends to 0 for every z, by the weak convergence xn → x.
Since it also remains bounded, it follows that the outer integral tends to 0. This
implies that xn → x in (J, rW ). (Let us note that since π(J \ J) = 0, it does not
matter whether we integrate over J or over J.)
From here, the equality of the two topologies follows by general arguments: the
weak topology on J is compact, and the coarser topology of rW is Hausdorff, which
implies that they are the same.
Corollary 13.28. For every pure graphon (J, W ), the space (J, rW ) is compact.
Another useful corollary of these considerations concerns continuity in the sim-
ilarity metric. We have seen that W ◦ W is continuous in the metric rW ; one might
hope that the function W is continuous as a function on (J, rW ), but this would be
too much to ask for (the half-graphon is an easy example). However, integrating
out one of the variables we get a continuous function. To be more precise (and
more general):
Corollary 13.29. For every pure graphon (J, W ), and every function g ∈ L1 (J),
the function ∫
(TW g)(.) = W (., y)g(y) dy
J
is continuous on (J, rW ).
In particular, it follows that every eigenfunction of TW is continuous on (J, rW ).
Proof. Let xn → x in the rW metric. Then W (xn , .) → W (x, .) in the weak
topology by Theorem 13.27, and hence
∫ ∫
W (Xn , y)g(y) dy → W (x, y)g(y) dy.
J J
Example 13.30. For y ∈ [0, 1), let y = 0.y1 y2 . . . be the binary expansion of y. Let
us decompose [0, 1) into the intervals Ik = [1 − 2−k , 1 − 2−k−1 ). Define U (x, y) = yk
for 0 ≤ y ≤ 1 and x ∈ Ik . Define W (1, y) = 1/2 for all y. This function is not
symmetric, so we put it together with a reflected copy to get a graphon:
U (2x, 2y − 1), if x ≤ 1/2 and y ≥ 1/2,
W (x, y) = U (2y, 2x − 1), if x ≥ 1/2 and y ≤ 1/2,
0, otherwise.
(This is rather difficult to parse, but it is an important example. Perhaps Figure
13.1 helps.) Selecting one point from each interval [1 − 2−k , 1 − 2−k−1 ), we get an
infinite number of points in [0, 1) mutually at rW -distance 1/4; so this sequence is
not convergent even in the completion of this graphon. (In particular, (J, rW ) is
not compact.) On the other hand, this same sequence converges in (J, rW ). So the
two topologies are different.
and hence
∫
(13.9) F (x) dx ≤ 2ε.
J
By the Weak Regularity Lemma, we get that every graphon has an average
2
ε-net of size 2O(1/ε ) . How about a “true” ε-net? By a standard trick, if we take a
maximum set R of points such that any two are at a distance at least ε, then every
point is at a distance of at most ε from R. But is such a set necessarily finite? And
if so, how can we bound its size?
It turns out that one can give a bound that is similar to the bound on the size
of an average ε-net derived from Theorem 13.31. The following result is due to
Alon [unpublished].
Proposition 13.32. Let (J, W ) be a graphon and let R ⊆ J be a set such that
2
rW (s, t) ≥ ε for all s, t ∈ R (s ̸= t). Then |R| ≤ (16/ε2 )257/ε .
The bound on the size of R is somewhat worse than for the average ε-net, but
the main point is that it depends on ε only. There are examples showing that an
exponential dependence on 1/ε is unavoidable (Exercise 13.41).
Here the last term is small by Lemma 8.10 and the choice of U :
∫ ( )
ε
σst (z) W (s, y) − W (t, y) U (y, z) dy dz ≤ 2∥U ∥ ≤ .
2
J×J
13.4. THE TOPOLOGY OF A GRAPHON 231
∑ ∫ ∑
k
( ) k
ε ε
ai σst (z) W (s, y) − W (t, y) 1Si ×Ti (y, z) dy dz < |ai | √ ≤ .
i=1 i=1
4 k 2
J×J
∑
(In the last step we used that i a2i ≤ 4 and the inequality between arithmetic and
quadratic means.) By (13.12) this implies that rW (s, t) < ε, a contradiction.
Proposition 13.34. A pure graphon (J, W ) misses some signed bipartite graph
with k nodes in the smaller bipartition class if and only if W is 0-1 valued al-
most everywhere
{ ( } VC-dimension of the family of neighborhoods RW =
) and the
supp W (x, .) : x ∈ J is less than k.
Proof. First, suppose that (J, W ) misses a signed bipartite graph F with bi-
partition V1 ∪ V2 , where V1 = [k] and V2 = {1′ , . . . , m′ }. We start with showing
that W is 0-1 valued almost everywhere. Let F • be obtained by labeling all nodes
of V1 . Then for almost all x ∈ J k , we have tx (F • , W ) = 0. By Corollary 13.22, it
follows that tx (F • , W ) = 0 for every x ∈ J k . In particular, tz...z (F • , W ) = 0 for all
z ∈ J. But for this substitution,
∫ ∏m
( )d− (j)
tz...z (F • , W ) =
+
W (z, yj )d (j) 1 − W (z, yj ) dy1 . . . dym
J m j=1
+ −
(where d (j) and d (j) are the numbers of positive and negative edges of F incident
with j, respectively). If there is a z ∈ J such that 0 < W (y, z) < 1 for all y ∈ Y ,
where Y has positive measure, then the part of the integral over Y m is already
positive, so tz...z (F • , W ) > 0, a contradiction.
Next, we show that the VC-dimension of RW is less than k. Suppose not,
then
{ there
( is a) set S = {x } 1 , . . . , xk } ⊆ J with |S| = k such that the family H =
supp W (x, .) : x ∈ S is qualitatively independent (this means that for every
H′ ⊆ H there is a point contained in all sets of H′ but in no set of H \ H′ ). This
implies that tx1 ...xk (F • , W ) > 0. By the purity of (J, W ) and Corollary 13.22,
the set of points (y1 , . . . , yk ) ∈ [0, 1]k for which ty1 ...yk (F • , W ) > 0 has positive
measure. Hence t(F, W ) > 0.
Conversely, suppose that W is 0-1 valued (we may assume everywhere), and
dimV C (RW ) < k. Let F denote the signed complete bipartite graph with k nodes
in one class U and 2k nodes in the other class U ′ , in which each node in U ′ is
connected to a different set of nodes in U by positive edges. Let F • be obtained by
labeling the nodes in U . Then any choice of x1 , . . . , xk for which tx1 ...xk (F • , W ) > 0
gives k points with qualitatively independent neighborhoods, which is impossible.
So we must have t(F, W ) = 0.
Our main goal is to connect the VC-dimension of neighborhoods to the dimen-
sion of J. The following theorem was proved (in a slightly more general form) by
Lovász and Szegedy [2010b].
Theorem 13.35. If a pure graphon (J, W ) misses some signed bipartite graph F ,
then
(a) W is 0-1 valued almost everywhere,
(b) (J, rW ) is compact, and
(c) it has Minkowski dimension at most 10v(F ).
Proof. (a) is just repeated from Proposition 13.34. To prove (b), we start with
studying weakly convergent sequences of functions W (x, .). Let (x1 , x2 , . . . ) be a
sequence of points in J and suppose that there is a function f ∈ L1 (J) such that
∫ ∫
W (xn , y) dy −→ f (y) dy
S S
for every measurable set S ⊆ J.
13.4. THE TOPOLOGY OF A GRAPHON 233
Claim 13.36. The weak limit function f is almost everywhere 0-1 valued.
Suppose not, then there is an ε > 0 and a set( Y ⊆ J2 with
) positive measure such
that ε ≤ f (x) ≤ 1 − ε for x ∈ Y . Let Sn = supp W (xn , .) ∩ Y . We select, for every
k ≥ 1, k indices n1 , . . . nk so that the Boolean algebra generated by Sn1 , . . . Snk (as
subsets of Y ) has 2k atoms of positive measure. If we have this for some k, then
for every atom A of the Boolean algebra
∫ ∫
λ(A ∩ Sn ) = W (x, yn ) dx −→ f (x) dx (n → ∞),
A A
and so if n is large enough, then
ε ( ε)
λ(A) ≤ λ(A ∩ Sn ) ≤ 1 − λ(A).
2 2
If n is large enough, then this holds for all atoms A, and so Sn cuts every previous
atom into two sets with positive measure, and we can choose nk+1 = n.
But this means that the VC-dimension of the supports of the W (x, .) is infinite,
contradicting Proposition 13.34. This proves Claim 13.36.
Claim 13.37. The convergence W (xn , .) → f also holds in L1 .
Indeed, we know that f (x) ∈ {0, 1} for almost all x, and hence
∫ ∫
( )
∥f − W (xn , .)∥1 = 1 − W (xn , y) dy + W (xn , y) dy −→ 0.
{f =1} {f =0}
Now it is easy to prove that (J, rW ) is compact. Consider any infinite sequence
(x1 , x2 , . . . ) of points of J. By Alaoglu’s Theorem, this has a subsequence for which
the functions W (xn , .) converge weakly to a function f ∈ L1 (J). By Claim 13.37,
they converge to f in L1 . This implies that they form a Cauchy sequence in L1 ,
and so (x1 , x2 , . . . ) is a Cauchy sequence in (J, rW ). Since (J, rW ) is a complete
metric space, this sequence has a limit in J.
To prove (c), let F be a signed bipartite graph such that t(F, W ) = 0, and let
(V1 , V2 ) be a bipartition of F with |V1 | = k, where we may assume that k ≤ v(F )/2.
We may assume that F is complete bipartite, since adding edges (with any signs)
does not change the condition that t(F, W ) = 0. Let F • be obtained from F by
labeling the nodes in V1 .
We want to show that the Minkowski dimension of (J, rW ) is at most 20k. It
suffices to show that every finite set Z ⊆ J such that the rW{-distance ( of any
) two
−20k
elements
} is at least ε, is bounded by |Z| ≤ c(k)ε . Let H = supp W (x, .) : x∈
Z . Since W is 0-1 valued, the condition on Z means that
(13.13) π(X△Y ) ≥ ε
for any two distinct sets X, Y ∈ H.
We do a little clean-up: Let A be the union of all atoms of the set algebra
generated by H that have measure 0. Clearly A itself has measure 0, and hence the
family H′ = {X \ A : X ∈ H} still has property (13.13).
We claim that H′ has VC-dimension less than k. Indeed, suppose that J \ A
contains a shattered k-set S. To each j ∈ V1 , we assign a point
( qj ∈) S bijectively.
To each i ∈ V2 , we assign a point pi ∈ Z such that qj ∈ supp W (pi , .) if and only if
ij ∈ E + . (This is possible since S is shattered.) Now fixing the pi , for each j there
234 13. ON THE STRUCTURE OF GRAPHONS
is a subset of J of positive measure whose points are contained in exactly the same
members of H′ as qj , since qj ∈ / A. This means that the function tx1 ...xk (F • , W ) is
positive for xi = pi . Corollary 13.22 implies that tx1 ...xk (F • , W ) > 0 for a positive
fraction of the choices of x1 , . . . xk ∈ J, and hence t(F, W ) > 0, a contradiction.
Applying Proposition A.30 we conclude that |Z| = |H| ≤ (80k)10k ε−20k . This
proves that the Minkowski dimension of (J, rW ) is bounded by 20k.
The results in this section do not remain true if the signed graph we exclude
is nonbipartite. For example, if we exclude any non-bipartite graph, then any
bipartite graph satisfies the condition, but some bipartite graphs are known to
need an exponential (in 1/ε) number of classes in their weak regularity partitions.
Exercise 13.38. Two metrics d1 and d2 on the same set are called uniformly
equivalent, if( there is) a function f : R+( → R+ )such that f (x) ↘ 0 if x ↘ 0,
d1 (x, y) ≤ f d2 (x, y) and d2 (x, y) ≤ f d1 (x, y) . Prove that for a pure kernel
(J, W ), the space (J, rW ) is compact if and only if the metrics rW are rW are
uniformly equivalent.
Exercise 13.39. Figure out the completions of the spaces ([0, 1], rW ) and
([0, 1], rW ) for the graphon W in Example 13.30.
Exercise 13.40. For the graphon (S d , W ) defined in Example √ 13.16, show that
the similarity distance of two points a, b ∈ S d is Ω(](a, b)/ d).
Exercise 13.41. Show that the graphon in the previous exercise, with an ap-
2
propriate choice of d, contains 2Ω(1/ε ) points mutually at least ε apart in the
similarity distance.
Exercise 13.42. Let W be a graphon such that (J, rW ) can be covered by m
balls of radius ε. Prove that there exists a stepfunction U with m(1/ε)m steps
such that ∥W − U ∥1 ≤ 2ε.
Exercise 13.43. Let M (ε) denote the minimum number of sets of diameter at
most ε covering( a metric
) space
( ) and define the covering dimension of (S, d)
(S, d),
by lim supε→0 log M (ε) / log(1/ε) . Prove that this is the same as the Minkowski
dimension.
Exercise 13.44. (a) Check that all graphons constructed in Section 11.4.2 are
at most 2-dimensional. (b) Prove that a graphon W on [0, 1] that is a continuous
function is at most 1-dimensional. (c) Find the dimension of the graphon in
Example 13.16. (d) Construct an infinite dimensional graphon.
Exercise 13.45. Let W be a graphon such that t(F, W ) = 0 for a signed bipartite
graph F = (V, E). Prove that for every 0 < ε < 1, there exists a 0-1 valued
2
stepfunction U with O(ε−10v(F ) ) steps such that ∥W − U ∥1 ≤ ε.
isometry of the compact metric space (J, rW ) (this is trivial), and those isometries
that correspond to automorphisms form a closed subgroup (this takes some work
to prove; see Lovász [Notes]).
We will not go into the detailed study of Aut(W ) in this book, even though it
has interesting and nontrivial properties. We restrict our treatment to generalizing
the easy direction of Theorem 6.36, and to an application of the results of this
chapter to characterizing when t(., W ) has finite connection rank.
The group Aut(W ) acts on J k for any k. The number of orbits of this action
can be estimated from below as follows.
Proposition 13.46. The number of orbits of the automorphism group of W on
[0, 1]k is at least r(t(., W ), k).
Proof. Suppose that Aut(W ) has a finite set of orbits O1 , . . . Om on [0, 1]k .
Let F and F ′ be two k-labeled graphs. Then
∫
t([[F F ′ ]], W ) = tx1 ...xk (F, W )tx1 ...xk (F ′ , W ) dx1 . . . dxn .
[0,1]k
The functions tx1 ...xk (F, W ) and tx1 ...xk (F ′ , W ) are constant on every orbit, and
hence
∑
m
t([[F F ′ ]], W ) = λ(Oj )txj,1 ...xj,k (F, W )txj,1 ...xj,k (F ′ , W ),
j=1
where (xj,1 . . . xj,k ) is any representative point of Oj . This shows that M (t(., W ), k)
is the sum of m matrices of rank 1, and so it has rank at most m.
almost
∑ everywhere (so that all eigenvalues of W will be roots of the polynomial
k+2
k a k x ). We claim that this is equivalent to requiring that
∑
m
(13.15) ak ⟨W ◦(k+2) , W ◦(l+2) ⟩ = 0 (l = 0, . . . , m).
k=0
Indeed, (13.14) clearly implies (13.15) for every l; on the other hand, (13.15) implies
that
⟨∑ m ∑
m ⟩
(13.16) ak W ◦(k+2) , ak W ◦(k+2) = 0,
k=0 k=0
236 13. ON THE STRUCTURE OF GRAPHONS
Since this matrix is a submatrix of M (f, 2), this determinant will certainly vanish
if m ≥ r(f, 2). It follows that the number of distinct nonzero eigenvalues of the
operator TW is at most r(f, 2). Since every eigenvalue has finite multiplicity, it
follows that TW has finite rank.
Next, we show that the range of W ◦ W is finite (up to a set of measure
0). Consider its moments ∫ as a single variable function on the probability space
[0, 1]2 : Mk (W ◦ W ) = [0,1]2 (W ◦ W )k , and the corresponding moment matrix
( )∞
M (W ◦ W ) = Mk+l (W ◦ W ) k,l=0 . Note that Mk (W ◦ W ) = t(K2,k , W ), and
•• ••
so Mk+l (W ◦ W ) = t(K2,k K2,l , W ). It follows that M (W ◦ W ) is a submatrix of
M (f, 2), and hence its rank is finite. By Theorem A.22, the range of W ◦ W is finite
(up to a set of measure 0).
2
The fact that TW has finite rank implies that TW ◦W = TW has finite rank.
This, together with the fact that the range of W is finite, implies that W is a
stepfunction. Indeed, the row space of W is finite dimensional, so we can select
a finite set of points x1 , . . . , xr so that every row W (x, .) is a linear combination
of the functions W (xi , .). Since W has finite range, the functions W (xi , .) are
stepfunctions. There is a finite partition [0, 1] = S1 ∪ · · · ∪ Sp such that every
function W (xi , .) is constant on every Si , and hence every row is constant on every
Si . By symmetry, this implies that W is constant on every rectangle Si × Sj , i.e.,
it is a stepfunction.
The space of graphons is the stage where many acts of interaction between
graph theory and analysis take place. This chapter collects a number of questions
about the structure of this space that arise naturally and that have at least partial
answers.
There are several classes of graphs F with norming properties. Besides cycles,
complete bipartite graphs with an even number of nodes in each bipartition class are
norming, and all complete bipartite graphs are weakly norming. For more properties
and examples of graphs with norming properties, see Exercises 14.5–14.8.
Norming properties are closely related to Hölder-type inequalities for homo-
morphism densities, which can be stated using the notion of W-decorated graphs
introduced in Section 7.2. A(simple graph F) = (V, E) has the Hölder property, if
for every W-decoration w = we : e ∈ E(F ) of F ,
∏
(14.1) t(F, w)e(F ) ≤ t(F, we ).
e∈E
It has the weak Hölder property, if this inequality holds for every W0 -decoration of
F (equivalently, for every W-decoration with nonnegative functions).
Hatami [2010] gives the following characterizations of seminorming and weakly
norming graphs in terms of Hölder properties.
Theorem 14.1. A simple graph is seminorming if and only if it has the Hölder
property. It is weakly norming if and only if it has the weak Hölder property.
Proof. We prove the second assertion; the proof of the first is similar. In the
“if” direction, suppose that a simple graph F = (V, E) with m edges has the weak
Hölder property. Let W1 , W2 ∈ W0 . We have
∑
t(F, W1 + W2 ) = t(F, w),
w
Using this Theorem, one can prove about some graphs that they are norming
(Hatami [2010]). A characterization of such graphs is open.
Proposition 14.2. (a) Even cycles are norming.
(b) Hypercubes are weakly norming.
(c) Deleting a perfect matching from a complete bipartite graph Kn,n , we get a
weakly norming graph.
Proof. We describe the proof of (b); the proofs of (a) and (c) are similar
(in fact, simpler). Consider the d-dimensional hypercube graph Qd . We consider
its node set as V = {0, 1}d , and its edge set as E = {xy : x, y ∈ V, xi =
yi for all but one i}.
By Theorem 14.1, it is enough to prove that (14.1) holds for F = Qd and
any decoration with graphons. Let A be the set of graphons that occur. We may
assume that A does not contain the graphon that is almost everywhere 0 (else,
the inequality/is trivial). We say that an A-decoration W of Qd is pessimal, if
d ∏
t(Qd , W )e(Q ) e∈E t(Q , We ) is maximal among all A-decorations. Since there
d
are only a finite number of such decorations, at least one of them is pessimal. In
these terms, inequality (14.1) means that there is a pessimal A-decoration with all
decorating graphons equal.
Let S1 denote the set of nodes x of Qd with x1 = 1, x2 = 0; let S2 be the set
of nodes x with x1 = 0, x2 = 1; and let T = V \ S1 \ S2 . Note that T separates S1
and S2 . Let Ei be the set of edges incident with any node in Si , and let E0 be the
set of edges spanned by T .
We can write
∫ ∏ ∫ ∏ ∏ ∏
t(F, W ) = Wij (xi , xj ) dx = .
ij∈E ij∈E0 ij∈E1 ij∈E2
[0,1]V [0,1]V
Considering the first factor as a weight function (here we use that W ≥ 0), we can
apply the Cauchy–Schwarz Inequality to get
( ∫ )1/2 ( ∫ )1/2
∏ ( ∏ )2 ∏ ( ∏ )2
t(Q , W ) ≤
d
.
ij∈E0 ij∈E1 ij∈E0 ij∈E2
[0,1]V [0,1]V
which is invariant under interchanging the first two entries in every x ∈ V . We call
this symmetrization with respect to the hyperplane x1 = x2 . We can symmetrize
similarly with respect to the hyperplane x1 + x2 = 1.
Now consider a pessimal decoration W and a face Z of the cube such that all
edges of Z are decorated by the same graphon W . Suppose that Z is not the whole
cube. We may assume that Z is the face defined by x1 = x2 = · · · = xk = 0,
where 0 < k < d. Let Z ′ be the face obtained by reflecting Z in the hyperplane
xk = xk+1 . The intersection of Z and Z ′ is the face defined by x1 = x2 = · · · =
xk = xk+1 = 0. The smallest face Z ′′ containing both Z and Z ′ is defined by
x1 = x2 = · · · = xk−1 = 0.
Let us symmetrize with respect to the hyperplane xk = xk+1 . The decoration of
the edges of Z does not change, but the decoration of the edges of Z ′ also becomes
U . Symmetrizing with respect to xk + xk+1 = 1, we get a pessimal decoration in
which all edges of Z ′′ have the same decoration. Repeating this procedure, we get
a pessimal decoration with all edges decorated by the same graphon and we are
done.
Almost all norms on W (in short, norms for this section) that we need have
some natural properties. Recall from Section 8.2 that a norm N is called invariant,
if N (W φ ) = N (W ) for every measure preserving transformation φ ∈ S[0,1] , and
smooth, if for every sequence Wn ∈ W1 of kernels such that Wn → 0 almost
everywhere, we have N (Wn ) → 0. The norms L1 , L2 , the cut norm, and the graph
norms from the previous section share these properties. (But the L∞ -norm is not
smooth!)
Recall the obvious inequalities
(14.3) ∥W ∥ ≤ ∥W ∥1 ≤ ∥W ∥2 .
1/2
For W ∈ W1 , we have ∥W ∥2 ≤ ∥W ∥1 , and hence these two norms define the
same topology on W1 . Trivially, the cut norm is continuous in this topology. How
about the other way around? There are easy examples showing that ∥Wn ∥ → 0
does not imply that ∥Wn ∥1 → 0 or ∥Wn ∥2 → 0: let (Gn ) be a quasirandom graph
sequence with edge density 1/2, and Wn = 2WGn − 1. Then ∥Wn ∥ → 0, but
∥Wn ∥1 = ∥Wn ∥2 = 1.
The main goal in this section is to establish the following picture about smooth
invariant norms.
Theorem 14.10. (a) Every smooth invariant norm (as a function on W f1 ) is con-
tinuous with respect to the L1 norm, and the cut norm is continuous with respect
to any smooth invariant norm.
(b) Any smooth invariant norm is lower semicontinuous with respect to any
other smooth invariant norm.
We also prove an analogous (but not equivalent!) theorem about the distances
δN on Wf defined by smooth invariant norms N . Let us call these, for brevity,
delta-metrics.
Theorem 14.11. (a) Every delta-metric is continuous (as a function on W f1 × W
f1 )
with respect to δ1 , and δ is continuous with respect to any delta-metric.
(b) Any delta-metric is lower semicontinuous with respect to any other delta-
metric.
The fact that we prove continuity (or lower semicontinuity) as a function in two
variables, and not just separately in each variable, is significant. As an example of
a different nature, recall that the overlay functional C(U, W ) is continuous in each
variable, but not as a 2-variable function (Section 12.2).
Some of the above statements are trivial, and some follow easily from each
other. Along the lines, we are going to prove a couple of facts that will be useful
in other contexts too.
14.2.1. Smooth and invariant norms. As a technical preparation, we have
to prove some simple facts about smooth and invariant norms.
Lemma 14.12. Every smooth norm N is uniformly continuous with respect to the
L1 norm on W1 .
Proof. Suppose not, then there exists an ε > 0 and a sequence of kernels
Wn ∈ W1 such that ∥Wn ∥1 → 0 but N (Wn ) > 0. By selecting a subsequence, we
may assume that Wn → 0 almost everywhere, contradicting the assumption that
N is smooth.
244 14. THE SPACE OF GRAPHONS
On the other hand, the Ergodic Theorem implies that Un → WP almost everywhere
as n → ∞. Since, trivially, Un ∈ W1 and N is smooth, this implies that N (WP ) =
limn→∞ N (Un ) ≤ N (W ).
Next we give a useful representation of smooth invariant norms. By the Hahn–
Banach Theorem, we can represent any norm on W that is continuous in the L∞
norm as
(14.4) N (W ) = sup ℓ(W ),
ℓ∈L
is 0-1 valued, then G(n, W ) = H(n, W ), and so to generate G(n, W ), we don’t need
randomness to get the edges (of course, we still need randomness to generate the
nodes).
Among hereditary properties, it is quite easy to characterize random-free prop-
erties.
Lemma 14.22. A hereditary graph property P is random-free if and only if there
is a signed bipartite graph F such that t(F, W ) = 0 for all W ∈ P.
The proof will show that it would be enough to assume that for every graphon
W ∈ P there is a signed bipartite graph with t(F, W ) = 0.
Proof. Suppose that for every signed bipartite graph F there is a graphon
W ∈ P such that t(F, W ) > 0. Let (Fn ) be a quasirandom sequence of bipartite
graphs with bipartition V (Fn ) = Vn′ ∪ Vn′′ , with edge density 1/2, and with |Vn′ | =
|Vn′′ |. Consider the signed bipartite graphs Fbn , obtained from Kn,n by signing
the edges of Fn with +, the other edges with −. Let Wn ∈ P be a graphon
such that t(F cn , Wn ) > 0. It is easy to see that this means that there is a simple
graph Gn obtained from Fn by adding edges within the color classes such that
tind (Gn , Wn ) > 0. By Proposition 14.21, this implies that Gn ∈ P.
By selecting a subsequence we may assume that the graph sequences G′n =
(Gn [Vn′ ]) and G′′n = (Gn [Vn′′ ]) are convergent. By Theorem 11.59, we can order the
nodes in Vn′ and in Vn′′ so that WG′n converges to a graphon W ′ on [0, 1] in the
cut norm, and similarly WG′′n converges to a graphon W ′′ on [0, 1]. If we order the
nodes of Gn so that the nodes in Vn′ precede the nodes in Vn′′ , and keep the above
ordering inside Vn′ and Vn′′ , then WGn converges to the graphon
′
W (2x, 2y) if x, y < 1/2,
′′
U (x, y) = W (2x − 1, 2y − 1) if x, y > 1/2,
1/2 otherwise.
Corollary 14.23. If a hereditary property of bipartite graphs does not contain all
bipartite graphs, then it is random-free.
Using Theorem 13.35(c), we can associate a finite dimension with every nontriv-
ial hereditary property of bipartite graphs. It would be interesting to find further
combinatorial properties of this dimension.
The natural analogue of this corollary for properties of nonbipartite graphs fails
to hold.
Example 14.24. Let P be the property of a graph that it is triangle-free. Then
every bipartite graphon is in its closure, but such graphons need not be 0-1 valued.
For more characterizations of hereditary and random-free properties, and for
more on their connection, see Janson [2011c].
14.3. CLOSURES OF GRAPH PROPERTIES 249
Example 14.35 (Hadamard kernels). Our next examples show that graphon
varieties can encode quite substantial combinatorial complications. A symmetric
n × n Hadamard matrix B gives rise to a kernel WB , which we alter a little to get
a graphon UB = (WB + 1)/2. We call UB an Hadamard graphon.
14.4. GRAPHON VARIETIES 251
Hadamard graphons, together with the graphon J1/2 , form a simple graphon
variety. Indeed, the condition
( )
(14.6) t K3 − 2K2 K2 + K2 , 1 − (2U − 1) ◦ (2U − 1) = 0
implies that 1 − (2U − 1) ◦ (2U − 1) is a kernel that is either identically 1 or it
corresponds to a complete graph (Example 14.33). Let W = 2U − 1, then either
W ◦ W = 0 or W ◦ W = WI where I is an identity matrix of some size n. In the first
case, W = 0 and so U = J1/2 . In the second case, we note that every eigenvector
of TW ◦W = TW 2
is a stepfunction with steps [0, 1/n), . . . , [(n − 1)/n, 1), and hence
so are the eigenvectors of TW . It follows that W is a stepfunction with these steps,
and so W = WB for some n × n matrix B. Furthermore, W ◦ W = WI implies
that B 2 = nI. Since U is a graphon, we have −1 ≤ ∑ W ≤ 1, and so every entry of
B is in [−1, 1]. The condition B 2 = nI implies that i Bij 2
= n for every j, which
implies that every entry of B is either 1 or −1, and so B is an Hadamard matrix.
We must add that (14.6) can be expanded into a subgraph density condition
on U , using the fact that t(F, U ◦ U ) = t(F ′ , U ) (where F ′ is the subdivision of F ).
Example 14.36 (Zero-one valued graphons). It is not hard to see that W ∈ W
is 0-1 valued almost everywhere if and only if t(B4 −2B3 +B2 , W ) = 0 (one approach
is to note that W is 0-1 valued iff txy (B2•• , W ) = txy (B1•• , W ), and use Lemma 14.37
below). Hence 0-1 valued graphons form a kernel variety. However, the variety of
0-1 valued graphons is not simple, because it is not closed in the cut distance: for
a quasirandom graph sequence (Gn ) the associated graphons WGn are 0-1 valued,
but WGn → J1/2 in the ∥.∥ norm.
14.4.1. Unlabeling. Before describing more complicated graphon varieties,
we introduce a tool that is very useful in constructing varieties. Instead of pre-
scribing subgraph densities, we can try to define graphon or kernel varieties by a
(seemingly) more general condition on the density function of a k-labeled graph or
quantum graph: such conditions can be written as tx (g, W ) = 0 (for all x ∈ [0, 1]k )
for some k-labeled quantum graph g. However, there is a way to translate labeled
constraints to unlabeled constraints. This fact will be convenient in constructions,
since it is often easier to describe a property by the density of a labeled quantum
graph.
Lemma 14.37. For every k-labeled quantum graph f there is an unlabeled quantum
graph g such that for any W ∈ W, t(g, W ) = 0 if and only if tx1 ...xk (f, W ) = 0
almost everywhere. If f is simple, then we can require that g is simple, and the
labeled nodes form a stable set in every constituent of g.
Proof. The first assertion is trivial: tx1 ...xk (f, W ) = 0 almost everywhere if
and only if t([[f 2 ]], W ) = 0. This construction works for the second statement as
well, provided the labeled nodes form a stable set in every constituent of f .
To prove the second statement for every simple k-labeled quantum graph g,
define Lb(g) as the disjoint union of the subgraphs of the constituents induced
by the labeled nodes (note that these subgraphs ( all )have the
( same) node set [k]).
We use induction on the chromatic number χ Lb(f ) . If χ Lb(f ) = 1, then the
labeled nodes are nonadjacent in every constituent, and the trivial construction
above works.
252 14. THE SPACE OF GRAPHONS
( )
Suppose that χ Lb(f ) = r > 1, let [k] = S1 ∪· · ·∪Sr be an r-coloring of Lb(f ),
and let q = |Sr |. We may suppose that Sr = {k − q + 1, . . . , k}. We glue together
two copies of f along Sr . Formally, let f1 be obtained from f by increasing the
labels in Sr by k − q (the labels not in Sr are not changed). Let f2 be obtained
from f by increasing all labels by k − q. So the product f1 f2 is a (2k − q)-labeled
quantum graph, in which the nodes of Sr are labeled 2k − 2q + 1, . . . , 2k − q. Let
h be obtained from f1 f2 by unlabeling the nodes in Sr .
Claim 14.38. For every W ∈ W, tx1 ...xk (f, W ) = 0 almost everywhere if and only
if tx1 ...x2k−2q (h, W ) = 0 almost everywhere.
The “only if” part is obvious, since
tx1 ...xk (f, W ) = 0 ⇒ tx1 ...x2k−q (f1 , W ) = tx1 ...x2k−q (f2 , W ) = 0
⇒ tx1 ...x2k−q (f1 f2 , W ) = 0 ⇒ tx1 ...x2k−2q (h, W ) = 0.
To prove the “if” part, note that two labeled nodes whose labels correspond to the
same label in f are never adjacent, so we can identify these labels in h to get f 2
(with the labels in Sr removed). So tx1 ...x2k−2q (h, W ) = 0 almost everywhere implies
by Proposition 13.23 that t([[f 2 ]], W ) = 0, and hence we get that tx1 ...xk (f, W ) = 0
almost everywhere. This proves the Claim.
Thus it suffices to express the constraint tx1 ...x2k−2q (h, W ) =( 0 by )an appropri-
ate unlabeled constraint. This can be done by induction, since χ Lb(h) ≤ r−1.
In some cases, the following simple observation suffices to go between labeled
and unlabeled conditions.
Lemma 14.39. Let F be a k-labeled signed graph. Then in W0 , the constraints
tx1 ...xk (F, W ) = 0 and t([[F ]], W ) = 0 define the same graphon variety.
Proof. Clearly tx1 ...xk (F, W ) = 0 implies that t([[F ]], W ) = 0. Conversely, in
the constraint
∫ ∏ ∏ ( )
t([[F ]], W ) = W (xi , xj ) 1 − W (xi , xj ) dx = 0
ij∈E+ ij∈E−
[0,1]V (F )
Proof. By Proposition 13.23, (14.7) implies that for the 2-labeled signed bond
B •• obtained by identifying each color class of F , we have txy (B •• , W ) = 0. This
clearly implies that W is 0-1 valued almost everywhere.
for almost all choices of the variables xi (1 ≤ i ≤ k + 1) and xij (1 ≤ i < j ≤ k + 1).
Indeed, there are always two variables with one index, say xi and xj , which belong
to the same step, and then the corresponding factor in (14.9) is 0 for any choice of
xij .
Conversely, suppose that U ∈ W is not a stepfunction with k steps. Then there
is a set of (k + 1)-tuples (x1 , . . . , xk+1 ) with positive measure such that no two of
the xi are twins. For every (k + 1)-tuple in this set, there is a positive measure of
choices for xij such that U (xij , xi ) ̸= U (xij , xj ). So (14.9) fails to hold on a set of
positive measure.
Now (14.9) can be written as tx (g,( U ) )= 0 (for an ) appropriate simple m-labeled
quantum graph g, where m = k + 1 + k+1 2 = k+2
2 . By Lemma 14.37, this can be
expressed as t(g, U ) = 0 with a simple unlabeled quantum graph g. This implies
that Sk is a simple kernel variety. Let us note in addition that every constituent of
g is bipartite and fully labeled, hence the construction in the proof of Lemma 14.37
is finished in two steps, and it only doubles the number of nodes.
We define the rank of a kernel W as the rank of the corresponding kernel
operator TW . This is usually infinite, but we will be interested in the cases when it
is finite. Since every nonzero eigenvalue of TW has finite multiplicity, we know that
a kernel has finite rank if and only if it has a finite number of distinct eigenvalues.
Every stepfunction has finite rank. It is easy to see that the sum and product of
two kernels with finite rank have finite rank (see Exercise 14.56).
If the rank r of W is finite, then some of the formulas in Section 7.5 become
simpler: the spectral decomposition (7.19) of W will be finite (and therefore, it will
hold almost everywhere, not just in L2 ):
∑
r
(14.10) W (x, y) = λk fk (x)fk (y).
k=1
The expression (7.25) for the density of a graph F in W also becomes finite:
∑ ∏ ∏
(14.11) t(F, W ) = λχ(e) Mχ (f ).
χ:E→[r] e∈E v∈V
Example 14.46 shows that not every kernel with finite rank is a stepfunction.
The following theorem asserts that, at least from the point of view of varieties, the
two classes are not very far.
Theorem 14.48. If a kernel variety contains a kernel with finite rank, then it
contains a stepfunction.
Proof. Let the variety V be defined by the equations
t(gi , X) = 0 (i = 1, . . . , m),
and let W ∈ V have finite rank r. Let H be the set of all constituents of the gi .
The equations
t(F, X) = t(F, W ) (F ∈ H)
define a kernel variety V ′ ⊆ V.
Let f1 , . . . , fr be the eigenfunctions of TW . By (14.11), if (u1 , . . . , ur ) is another
set of bounded measurable functions that satisfy
(14.12) Ma (u) = Ma (f )
for every vector ∑of exponents a ∈ [M ]r (where M = max{e(F ) : F ∈ H}), then
r
the kernel U = t=1 λt ut (x)ut (y) satisfies t(F, U ) = t(F, W ) for all F ∈ H and
′
so U ∈ V . By Proposition A.25 in the Appendix there is a system of functions u
satisfying (14.12) which are stepfunctions, and then U is also a stepfunction.
We will see that every stepfunction in W forms a simple variety in itself (Corol-
lary 16.47). Hence the family of stepfunctions in Theorem 14.48 could not be
replaced by any other family of finite rank kernels (e.g., polynomials).
Exercise 14.49. Show how the facts that regular graphons form simple varieties
and 0-1 valued kernels form a variety (Examples 14.32, 14.34 and 14.36) follow
immediately from the unlabeling method.
Exercise 14.50. Prove that the union and intersection of two graphon varieties
are graphon varieties.
Exercise 14.51. Graphons with values in a fixed finite set S ⊆ R form a variety,
but this variety is not simple unless |S| = 1.
Exercise 14.52. Show that W0 is not a variety in W.
Exercise 14.53. Prove that every monotone decreasing 0-1 valued graphon is
almost everywhere equal to a threshold graphon.
Exercise 14.54. We say that ∪ a kernel W on [0, 1] is an equivalence graphon if
there is a partition [0, 1] = i∈I S1 into measurable parts such that for almost all
pairs x, y ∈ [0, 1], W (x, y) = 1(i = j) if x ∈ Si and y ∈ Sj . Prove that equivalence
graphons (up to isomorphism modulo a 0 set) form a simple variety in W0 .
Exercise 14.55. Assume that a graphon W satisfies (14.8). Prove the following
consequences: (a) W is 0-1 valued( almost) everywhere.
( (b) Defining ) N (x) = {y ∈
[0, 1] : W (x, y) = 1}, either λ N (x)\N (y) = 0 or λ N (y)\N (x) = 0 for all x, y ∈
( { ( ) } )
[0, 1]. (c) Defining α(x) = 12 1 + λ z ∈ [0, 1] : λ N (z) \ N (x) = 0 − λ(N (x)) ,
we have α ≥ 0; (d) W (x, y) = 1(α(x) + α(y) ≤ 1) almost everywhere.
256 14. THE SPACE OF GRAPHONS
Exercise 14.56. Let U and W be two kernels with finite rank. Prove that the
kernels U + W , U W , and U ⊗ W have finite rank. If U has finite rank and W is
arbitrary, then the kernel U ◦ W + W ◦ U has finite rank.
(d)→(a). Let W be a random graphon from any random graphon model. This
defines a graph parameter f by
( )
f (F ) = E t(F, W) .
For every fixed graphon W , the graph parameter f (·) = t(·, W ) is normalized,
isolate-indifferent (since it is multiplicative), and has nonnegative Möbius inverse
(by Theorem 11.52). Trivially, these properties are inherited by the expectation.
258 14. THE SPACE OF GRAPHONS
Among graph parameters described above, the multiplicative ones have differ-
ent characterizations as well.
Proposition 14.61. Let f be a graph parameter satisfying the conditions of Propo-
sition 14.60. Then the following are equivalent:
(c) f is multiplicative;
(b) f = t(., W ), where W is a graphon;
(c) f is the limit of homomorphism density functions t(., G), where G is a
simple graph.
Proof. Clearly both (b) and (c) imply (a). Conversely, ( if f is)multiplicative,
then by Theorem 11.52, there is graphon W such that f = E t(., W ) . By Corollary
11.15, the function t(., W ) is the limit of homomorphism density functions for every
W.
14.6. EXPONENTIAL RANDOM GRAPH MODELS 259
metric space is crucial. Chatterjee and Varadhan [2011] applied the theory of
graph limits to the theory of large deviations for Erdős–Rényi random graphs.
This was extended by Chatterjee and Diaconis [2012] to more general distributions
on graphs, which they call exponential random graph models. We summarize their
ideas without going into the details of the proofs.
Let f be a bounded graph parameter such that for every convergent graph
sequence (Gn ), the numerical sequence f (Gn ) is convergent. Such parameters are
called estimable. The canonical examples of such parameters are subgraph densities
t(F., ), but there are many others. We will return to them in Section 15.1 to study
their estimation through sampling and other characterizations. Right now we only
need the fact that every such parameter can be extended to the graphon space
f0 so that if Gn → W then f (Gn ) → f (W ), and the extension is continuous in
W
the distance δ (in particular, the extension is invariant under weak isomorphism).
These facts are immediate consequences of the definition.
Suppose that we want to understand the structure of a random graph, but
under the condition that f (G) is small. For example, Chatterjee and Varadhan
were interested in random graphs G(n, 1/2) in which the triangle density is much
less than 1/8 (the expectation). To this end, we consider a weighting of all simple
graphs on n nodes by e−f (G)n ; this will emphasize those graphs for which f (G)
2
2
is small. The factor n in the exponent is needed to make the logarithms of the
weights to have the same order of magnitude
( ) as the logarithm of the total number of
simple graphs on [n] (which is just n2 , if you take binary logarithm). We introduce
the probability distribution φn on Fnsimp by
e−f (G)n
2
Let ψn denote the normalizing factor in the denominator. It looks quite hairy, but
Chatterjee and Diaconis derived an asymptotic formula for it. To state their result,
we need some notation. For W ∈ W0 , consider the entropy-like functional
∫
1 ( ) ( )
I(W ) = W (x, y) log W (x, y) + 1 − W (x, y) log 1 − W (x, y) .
2 [0,1]2
Chatterjee and Varadhan proved that this functional is invariant under weak iso-
f0 , δ ) (this fact is quite similar
morphism and lower semicontinuous on the space (W
to Lemma 14.16).
The formula of Chatterjee and Diaconis can be stated as follows:
Theorem 14.64. If f is an estimable graph parameter, then
( )
lim ψn = sup f (W ) − I(W ) .
n→∞ W ∈W0
Using this, they prove the following result about the behaviour of a random
graph drawn from the distribution φn . Since f (W )−I(W ) is upper semicontinuous
on the compact space (W f0 , δ ), the supremum in the above formula is in fact a
maximum, and it is attained on a compact set Kf ⊆ W0 .
Theorem 14.65. Let f be an estimable graph parameter, and let Gn be a random
graph from the distribution φn . Then for every η > 0 there are C, ε > 0 such that
P(δ (WGn , Kf ) > η) ≤ Ce−εn .
2
14.6. EXPONENTIAL RANDOM GRAPH MODELS 261
We call the parameter g a test parameter for f . However, we don’t really need
this notion: we can always use g = f (cf. Goldreich and Trevisan [2003]). Indeed,
(15.1) implies that P(|f (G[X]) − g(G[X])| > ε) < ε, and so
so we can choose the threshold k belonging to ε/2 in the original definition to get
the condition obtained by replacing g by f .
It is easy to see that estimability is equivalent to (saying )that for every con-
vergent graph sequence (Gn ), the sequence of numbers f (Gn ) is convergent. (So
graph parameters of the form t(F, .) are estimable by the definition of convergence.)
Using this, for any estimable parameter f we can define a functional fb on W0 , where
fb(W ) is the limit of f (Gn ) for any sequence of simple graphs Gn → W . It is also
immediate that this functional fb is continuous on (W f0 , δ ). The functional fb does
not determine the graph parameter f : defining f0 (G) = fb(WG ) we get a graph
parameter with fb0 = fb, but f could be any parameter of the form f0 + h, where
h(G) → 0 if v(G) → ∞.
263
264 15. ALGORITHMS FOR LARGE GRAPHS AND GRAPHONS
All this is, however, more-or-less just a reformulation of the definition. Borgs,
Chayes, Lovász, Sós and Vesztergombi [2008] gave a number of more useful condi-
tions characterizing testability of a graph parameter. We formulate one, which is
perhaps easiest to verify for concrete parameters.
Theorem 15.1. A graph parameter f is estimable if and only if the following three
conditions hold:
(i) If Gn and G′n are simple graphs on the same node set (n = 1, 2, . . . ) and
d (Gn , G′n ) → 0, then f (Gn ) − f (G′n ) → 0.
( )
(ii) For every simple graph G, f G(m) has a limit as m → ∞ (recall that
G(m) denotes the graph obtained from G by blowing up each node into m twins).
(iii) f (GK1 ) − f (G) → 0 if v(G) → ∞ (recall that GK1 is obtained from G by
adding a single isolated node).
Note that all three conditions are special cases of the statement that
(iv) if |V (Gn )|, |V (G′n )| → ∞ and δ (Gn , G′n ) → 0, then f (Gn ) − f (G′n ) → 0.
This condition is also necessary, so it is equivalent to its own three special cases
(i)–(iii) in the Theorem.
Proof. The necessity of condition (iv) (which implies (i)–(iii)) is easy: Sup-
pose that there are two sequences of graphs Gn such that |V (Gn )|, |V (G′n )| → ∞
and δ (Gn , G′n ) → 0, but f (Gn ) − f (G′n ) ̸→ 0. By selecting a subsequence, we
may assume that |f (Gn ) − f (G′n )| > ε for all n for some ε > 0. Going to a further
subsequence, we may assume that the sequences (G1 , G2 , . . . ) and (G′1 , G′2 , . . . )
are convergent. But then δ (Gn , G′n ) → 0 implies that the interlaced graph se-
quence (G1 , G′1 , G2 , G′2 , . . . ) is convergent as well. However, the numerical sequence
(f (G1 ), f (G′1 ), f (G2 ), f (G′2 ), . . . ) is not convergent, a contradiction.
To prove the sufficiency of (i)–(iii), we start with proving the following stronger
form of (i):
(i’) If Gn and G′n are simple graphs with the same number of nodes (n =
1, 2, . . . ) and δ (Gn , G′n ) → 0, then f (Gn ) − f (G′n ) → 0.
This follows by Theorem 9.29, which implies that one can overlay the graphs
Gn and G′n so that d (Gn , G′n ) → 0.
Consider a convergent graph sequence (G1 , G2 , . . . ), we prove that the sequence
(f (G1 ), f (G2 ), . . . ) is convergent. Let ε > 0. Using (i’), we can choose an ε1 > 0
so that if δ (G, G′ ) ≤ ε1 , then |f (G) − f (G′ ) ≤ ε. Since the graph sequence is
convergent, we can choose and( fix an integer n ≥ 1 so )that δ (Gn , Gm ) ≤ ε1 /2 for
m ≥ n. By (ii), the sequence f (Gn (p)) : p = 1, 2, . . . (is convergent. ) Let a be its
limit, then we can choose a threshold p0 ≥ 1 so that f Gn (p) − a ≤ ε for every
integer p ≥ p0 . We may assume that p0 ≥ 4/ε1 . Finally, based on (iii), we can
chose a threshold q ≥ 1 such that |f (GK1 ) − f (G)| ≤ ε/v(Gn ) whenever v(G) ≥ q.
Now consider a member Gm of the sequence for which m ≥ n and v(Gm ) ≥
max(q, p0 v(Gn ), 4v(Gn )/ε1 ). We can write v(Gm ) = pv(Gn ) + r, where p ≥ p0 and
0 ≤ r < v(Gn ). Then Gm and G′ = Gn (p)K1r have the same number of nodes.
Furthermore,
( )
δ (Gm , G′ ) ≤ δ Gm , Gn (p) + δ (Gn (p), G′ )
2r
≤ δ (Gm , Gn ) + ≤ ε1 .
pv(Gn )
15.1. PARAMETER ESTIMATION 265
Exercise 15.4. Show that neither one of the three conditions in Theorem 15.1
can be dropped.
Exercise 15.5. Use Theorem 15.1 to prove that the i-th largest eigenvalue of
a graph is an estimable parameter for every fixed i ≥ 1. Give a new proof of
Theorem 11.54 based on this argument.
Exercise 15.6. Fix a graphon W , then cut(W, F ), as a function of F , defines a
simple graph parameter. Prove that it is estimable.
Then for every positive integer k ′ that is large enough there is a test property Q′
such that for every graph G with at least k ′ nodes
{
( ) ≥ d if G ∈ P1 ,
P G(k ′ , G) ∈ Q′
≤ c if G ∈ P2 .
f (F ) = P(G(k, F ) ∈ Q),
and define the property Q′ as the set of graphs F on k ′ nodes such that f (F ) ≥
(a + b)/2. ( )
Let G ∈ P1 , v(G) ≥ k ′ . Since G k, G(k ′ , G) is a random k-node subgraph of
G, we have
( )
f0 = E f (G(k ′ , G)) = P(G(k, G) ∈ Q) ≥ b.
Furthermore, the graph parameter f has the property that if we change edges in F
incident with a given node v, then the validity of the event G(k, F ) ∈ Q changes
only if the random k-subset contains v, which happens with probability k/k ′ . So
the value f (F ) changes by at most k/k ′ . We can apply the Sample Concentration
Theorem 10.2 to the parameter (k ′ /k)f , and get that
( ) ( a + b) ( b − a)
P G(k ′ , G) ∈
/ Q′ = P f (G(k ′ , G) ≤ ≤ P f (G(k ′ , G) ≤ f0 − ≤ e−t ,
2 2
where t = (b − a)2 k ′ /(8k 2 ). Choosing k ′ large enough, this will be less than 1 − d,
proving that Q′ and k ′ satisfy the first condition in the lemma. The second condition
follows similarly.
Theorem 15.8. For two graph properties P1 and P2 , the following are equivalent:
(a) P1 and P2 are distinguishable by sampling;
(b) there exists a positive integer k such that for any Gi ∈ Pi with v(Gi ) ≥ k,
we have δ (G1 , G2 ) ≥ 1/k;
(c) there exists a positive integer k such that for any Gi ∈ Pi with v(Gi ) ≥ k,
we have
( ) 1
dvar G(k, G1 ), G(k, G2 ) ≥ .
3
Note that (b) could be phrased as P1 ∩ P2 = ∅.
Proof. (a)⇒(c): Let P1 and P2 be distinguishable with sample size k and test
property Q. Then for any two graphs G1 , G2 ∈ Pi , we have
( ) ( ) 1
P G(k, G1 ) ∈ Q − P G(k, G2 ) ∈ Q ≥ .
3
268 15. ALGORITHMS FOR LARGE GRAPHS AND GRAPHONS
This holds for all U ∈ R, so δ (W, R) ≥ 13 2−k . Let n be large enough (depending
2
on k), and consider the W -random graph G(n, W ), and the corresponding graphon
Wn = WG(n,W ) . Then with high probability
( ) 1 ( ) 1 20 20
δ Wn , R ≥ 2−k − δ Wn , W ≥ 2−k − √
2 2
>√ .
3 3 log n log n
270 15. ALGORITHMS FOR LARGE GRAPHS AND GRAPHONS
by Lemma 8.22. This implies that δ (Wn , Wn′ ) → 0, and hence δ (Wn′ , U ) → 0.
By the hypothesis of the lemma implies that d1 (Wn′ , R) → 0. Hence
d1 (Wn , R) ≤ d1 (Wn′ , R) + ∥Wn − Wn′ ∥1 → 0.
Since the condition of the last lemma is trivially fulfilled if R is flexible, we get
a useful corollary:
Corollary 15.15. Every closed flexible graphon property is testable. In particular,
the closure of every hereditary property is testable.
The following result can be viewed as the graphon analogue of the theorem of
Fischer and Newman [2005] (from which the finite theorem can be derived).
Theorem 15.16. A closed graphon property R is testable if and only if the func-
tional d1 (., R) is continuous in the cut norm.
Proof. If d1 (., R) is continuous, then Corollary 15.13 implies that R is testable.
Suppose that R is testable. The functional d1 (., R) is lower semicontinuous in
the cut norm by Lemma 14.15. To prove upper semicontinuity, let W, Wn ∈ W0
and let ∥Wn − W ∥ → 0. We claim that lim supn d1 (Wn , R) ≤ d1 (W, R).
Let ε > 0, and let U ∈ R be such that ∥W − U ∥1 ≤ d1 (W, R) + ε. By
Proposition 8.25, there is a sequence of graphons Un such that ∥Un − U ∥ → 0 and
∥Un − Wn ∥1 → ∥U − W ∥1 . By Corollary 15.13 (in the form in the remark after its
statement), it follows that d1 (Un , R) → 0, and so
d1 (Wn , R) ≤ ∥Wn − Un ∥1 + d1 (Un , R) → ∥U − W ∥1 .
Hence
lim sup d1 (Wn , R) ≤ ∥U − W ∥1 ≤ d1 (W, R) + ε.
n→∞
Since ε > 0 is arbitrary, this implies that d1 (., R) is upper semicontinuous.
Example 15.17 (Neighborhood of a property). Let S ⊆ W0 be an arbitrary
graphon property and let a > 0 be an arbitrary number. Then the property R =
{U ∈ W0 : δ (U, S) ≤ a} is testable.
To show this, we use Corollary 15.13. For ε > 0 define ε′ = aε/2. Let W ∈
B (R, ε′ ). Then W ∈ B (S, a+ε′ ), and so there is a U ∈ S such that ∥U −W ∥ ≤
a + 2ε′ . Consider Y = (1 − ε)W + εU . Then ∥Y − U ∥ = ∥(1 − ε)(U − W )∥ ≤
(1 − ε)(a + 2ε′ ) < a, so Y ∈ R. Furthermore, ∥W − Y ∥1 = ∥ε(W − U )∥1 ≤ ε, and
so W ∈ B1 (R, ε). Since W was an arbitrary element of B (R, ε′ ), this implies that
δ (R, Rcε ) > ε′ .
Example 15.18 (Subgraph density). For every fixed graph F and 0 < c < 1,
the property R of a graphon W that t(F, W ) = c is testable. Let us verify that for
every ε > 0 there is an ε′ > 0 such that d1 (W, R) ≥ ε implies that d (W, R) ≥ ε′ .
Assume that d1 (W, R) ≥ ε, then t(F, W ) ̸= c; let (say) t(F, W ) > c. The graphons
Us = (1 − s)W , 0 ≤ s ≤ ε, are all in B1 (W, ε), and hence not in R. It follows that
t(F, Us ) > c for all 0 ≤ s ≤ ε. Since t(F, Uε ) = (1 − ε)e(F ) t(F, W ), this implies
that t(F, W ) > (1 − ε)−e(F ) c. Thus for every U ∈ R we have t(F, W ) − t(F, U ) ≥
272 15. ALGORITHMS FOR LARGE GRAPHS AND GRAPHONS
( )
(1 − ε)−e(F ) − 1 c. By the Counting Lemma 10.23 this implies that δ (U, W ) ≥
( )
(1 − ε)−e(F ) − 1 c/e(F ). Choosing the right hand side in this inequality as ε′ , we
get that δ (W, R) ≥ ε′ , which we wanted to verify.
Fixing two subgraph densities, however, may yield a non-testable property: for
example, t(K2 , W ) = 1/2 and t(C4 , W ) = 1/16 imply that W ≡ 1/2 (see Section
1.4.2), and we have seen that this graphon property is not testable.
15.3.2. Testable graph properties. We call a graph property P testable, if
for every ε > 0, graphs in P can be distinguished from graphs farther than ε from
P in the edit distance. To be more precise, recall that d1 (F, G) is defined for two
graphs on the same node set, and it denotes their normalized edit distance. So for
a graph G on n nodes, d1 (G, P) is the minimum number of edges to be changed in
order to get a graph with property P, divided by n2 . If there is no graph in P on
n nodes, then we define d1 (F, P) = 1. Let Pεc denote the set of simple graphs F
such that d1 (F, P) ≥ ε; then we want to distinguish P from Pεc by sampling.
This notion of testability is usually called oblivious testing, which refers to the
fact that no information about the size of G is assumed.
Using our analytic language, we can give several reformulations of the definition
of testability of a graph property, which are often more convenient to use.
(T1) if (Gn ) is a sequence of graphs such that v(Gn ) → ∞ and δ (Gn , P) → 0,
then d1 (Gn , P) → 0;
(T2) for every ε > 0 there is an ε′ > 0 such that if G and G′ are simple
graphs such that v(G′ ), v(G′ ) ≥ 1/ε′ , G ∈ P and δ (G, G′ ) < ε′ , then
d1 (G′ , P) < ε;
(T3) P ∩ Pεc = ∅ for every ε > 0.
The equivalence of these with testability follows by Theorem 15.8.
From this characterization of testability it follows that if P is a testable property
such that infinitely many graphs have property P, then for every n that is large
enough, it contains a graph on n nodes. Indeed, suppose that for infinitely many
n, P contains a graph Gn on n nodes but none on n + 1 nodes. We may assume
that Gn → W ∈ W0 . Then Gn K1 ∈ P1/2 c
and Gn K1 → W , so W ∈ P ∩ Pεc , a
contradiction.
It is surprising that this rather restrictive definition allows many testable graph
properties: for example, bipartiteness, triangle-freeness, every property definable
by a first order formula (Alon, Fischer, Krivelevich and Szegedy [2000]). Let us
begin with some simple examples.
Example 15.19 (Nonempty). Let P be the graph property that “G has at least
one edge”. This is testable with the identically true test property.
This example sounds like playing in a trivial way with the definition. The
following examples are more substantial.
Example 15.20 (Large clique). Let P be the graph property ω(G) ≥ v(G)/2.
Then P is testable. This can be verified using (T2): we show that for every ε > 0
there is an ε′ > 0 such that δ (G, P) ≤ ε′ implies that d1 (G, P) ≤ ε. We show that
ε′ = exp(−10000/ε6 ) does the job.
Indeed, if δ (G, P) ≤ ε′ , then there is (a graph H ∈) P such that δ (G, H) ′
( ≤ )2ε .
′
Let V (H) = [q] and V (G) = [p], then δ G(q), H(p) ≤ 2ε , and since v G(q) =
( ) ( ) √
v H(p) = pq, Theorem 9.29 implies that δb G(q), H(p) ≤ 45/ − log(2ε′ ) <
15.3. PROPERTY TESTING 273
( )
ε3 /2, which means that G(q) and H(p) can be overlaid so that d G(q), H(p) ≤
ε3 /2. Since H ∈ P, it contains a complete graph of size at least v(H)/2, and so
H(p) contains a complete graph K of size at least pq/2. Now the definition of
the cut distance implies that G(q)[V (K)] is almost complete: it misses at most
ε3 (pq)2 /2 edges. Let qi denote
∑ the number of nodes in K that come from node i of
G, so that 0 ≤ qi ≤ q and i qi = pq. For simplicity of presentation, assume that
p = 2r is even and ε ≤ 1/2. Also assume that q1 ≥ q2 ≥ · · · ≥ qp .
Let k denote the number of qi with i ≤ r and qi ≤ εq. We claim that k ≤ 2rε.
We
∑ may assume that k > 0, then qr+1 , . . . , qp < εq, and so (r − k)q + (r + k)εq ≥
i qi = rq. This implies the bound on k. Let G1 = G[1, . . . , r], then
1 3 2 2 ∑
ε p q ≥ qi qj ≥ (e(G1 ) − kr)(εq)2 ,
2
ij∈E(G1 )
whence
ε ε ε
e(G1 ) ≤ kr + p2 ≤ p2 + p2 < εp2 .
2 2 2
So adding at most εp2 edges to G, we can create a complete subgraph with p/2
nodes, showing that d1 (G, P) ≤ ε.
Example 15.21 (Triangle-free). Let P be the property of being triangle-free.
Then P is a valid test property for itself. It is trivial that if G is triangle-free, then
any sample G(k, G) is also triangle-free. The other condition is, however, far from
being trivial. If G(k, G) is triangle-free with probability at least 2/3, and k is large
enough, then G has very few triangles. Hence by the Removal Lemma 11.64, we
get that we can change (in this case, delete) a small number of edges so that we
get rid of all triangles. In fact, the Removal Lemma is equivalent to the testability
of triangle-freeness.
Theorem 15.24 below will give a general sufficient condition for testability,
which will imply the Removal Lemma.
There is a tight connection between testability of graph properties and the
testability of their closures. To formulate it, we need a further definition. A graph
property P is robust, if for every ε > 0 there is an ε0 > 0 such that if G is a
graph with v(G) ≥ 1/ε0 and d1 (WG , P) ≤ ε0 , then d1 (G, P) ≤ ε. Another way
of stating this is that if (Gn ) is a sequence of graphs such that d1 (WGn , P) → 0,
then d1 (Gn , P) → 0. (For a more combinatorial formulation of this property, see
Exercise 15.30.)
Theorem 15.22. (a) A graphon property is testable if and only if it is the closure
of a testable graph property.
(b) A graph property is testable if and only if it is robust and its closure is
testable.
It is not the first time in this book that results about graphons are nice and
easy, but to describe the connection between the notions for graphons and the
corresponding notions for graphs is the hard part. This is true in this case too, and
we will omit some details of the rather long proof of this theorem; see Lovász and
Szegedy [2010a] for these details.
Before going into the proof, let us look at an example.
274 15. ALGORITHMS FOR LARGE GRAPHS AND GRAPHONS
The proof of the converse in (a) (which we will not use in this book) is omitted.
Next we show that every testable graph property P is robust. Let (Gn ) be a
sequence of graphs such that d1 (WGn , P) → 0, then d (WGn , P) → 0. This implies
that there are graphons Un ∈ P such that d (WGn , Un ) → 0. By the definition of P,
this implies that there are simple graphs Hn ∈ P such that δ (Gn , Hn ) → 0. This
means that δ (Gn , P) → 0. Since P is testable, this implies that d1 (Gn , P) → 0.
Finally, we show that if a graph property P is robust and its closure is testable,
then P is testable. Let (Gn ) be a sequence of graphs such that d (Gn , P) → 0.
Then d (WGn , P) → 0. Since P is testable, this implies that d1 (WGn , P) → 0. By
robustness, we get that d1 (Gn , P) → 0, which proves that P is testable.
have this property.) Clearly this theorem implies Theorem 15.24. The proof will
be quite involved, using much of the material developed earlier in this chapter.
Theorem 15.25. A graph property P is testable if and only if for every ε > 0
there is an ε′ > 0 such that if H ∈ P and G is an induced subgraph of H with
v(G) ≥ 1/ε′ and δ (G, H) < ε′ , then d1 (G, P) < ε.
Another way to state the condition is that if Hn ∈ P (n = 1, 2, . . . ) is a
sequence of simple graphs, Gn is an induced subgraph of Hn , and δ (Gn , Hn ) → 0,
then d1 (Gn , P) → 0. Informally, induced subgraphs inherit the property of the
big guy, but they have to pay an inheritance tax; the tax is however small if the
descendants are also big and they are close to the big guy.
Proof. The “only if” part is trivial, since the condition is a special case of the
reformulation (T2) of testability. By theorem 15.22, it suffices to prove that P is
testable and P is robust.
We start with proving a graphon version of the condition in the theorem.
Claim 15.26. Let U ∈ P and let (Gn ) be a sequence of simple graphs with Gn → U .
Also assume that tind (Gn , U ) > 0. Then d1 (Gn , P) → 0.
Since U ∈ P, there is a sequence of simple graphs Hm ∈ P (m = 1, 2, . . . )
such that Hm → U . Condition tind (Gn , U ) > 0 implies that tind (Gn , Hm ) > 0 for
every n if m is large enough. Furthermore, both Gn → U and Hm → U , and hence
δ (Gn , Hm ) → 0 if n, m → ∞. So if n is large enough,
( we can) select an m(n) such
that Gn is an induced subgraph of Hm (n) and δ Gn , Hm (n) → 0. The condition
in the Theorem implies the claim.
To prove that P is testable, we use Lemma 15.14. So let us consider a graphon
U ∈ P and a sequence of graphons Wn → U where every Wn is a flexing of U . We
want to prove that d1 (Wn , P) → 0. For each n, we choose a simple graph Gn such
that
(15.7) v(Gn ) ≥ n,
(15.8) tind (Gn , Wn ) > 0,
1
(15.9) δ (Gn , U ) ≤ δ (Wn , U ) + ,
n
1
(15.10) d1 (Gn , P) ≥ d1 (Wn , P) − .
n
This is not difficult: Gn = Gnk = G(k, Wn ) will satisfy these conditions with high
probability if k is sufficiently large. Indeed, (15.7) and (15.8) are essentially trivial,
and (15.9) follows by Lemma 10.16. To verify (15.10), we select a graph Hnk ∈ P
with V (Hnk ) = [k] such that
d1 (Gnk , P) = d1 (Gnk , Hnk ) ≥ δ1 (WGnk , WHnk ).
Let k → ∞, then WGnk → Wn in the δ -distance with probability 1, and (by
selecting an appropriate subsequence) WHnk → Un ∈ P. By Lemma 14.16, we get
that
lim inf δ1 (WGnk , WHnk ) ≥ δ1 (Wn , Un ) ≥ δ1 (Wn , P),
n→∞
Claim 15.26 implies that d1 (Gn , P) → 0. Indeed, condition (15.8) implies that
tind (Gn , U ) > 0 (here we use that Wn is a flexing of U ). Furthermore, Gn → U by
(15.9), so Claim 15.26 applies.
From here, the testability of P follows easily:
1
d1 (Wn , P) ≤ δ1 (Wn , P) ≤ δ1 (Gn , P) + → 0.
n
Our second task is to prove that P is robust: if (Gn ) is a sequence of simple
graphs such that d1 (WGn , P) → 0, then d1 (Gn , P) → 0. Let Wn ∈ P be such that
∥WGn − Wn ∥1 → 0. By selecting an appropriate subsequence, we may assume that
δ (Wn , U ) → 0, for some graphon U . Clearly U ∈ P and Gn → U .
Consider the random graph G′n = G′ (v(Gn )), U ). We have tind (G′n , U ) > 0
with probability 1, and by Lemma 10.18, with probability tending(to 1, G′n →)U .
Furthermore, an easy computation (cf. Exercise 10.14) gives that E d1 (Gn , G′n ) =
E(∥WGn − WG′n ∥1 ) = ∥WGn − Wn ∥1 → 0, and so with high probability, we have
d1 (Gn , G′n ) → 0. By Claim 15.26, this implies that d1 (G′n , P) → 0. Hence
d1 (Gn , P) ≤ d1 (Gn , G′n ) + d1 (G′n , P) → 0, which proves that P is robust.
Other characterizations of testable graph properties are known. Alon, Fischer,
Newman and Shapira [2006] characterized testable graph properties in terms of
Szemerédi partitions (we refer to their paper for the formulation). Fischer and
Newman [2005] connected testability to estimability. We already stated a version
of this result for graphons (Theorem 15.16), from which it can be derived (we don’t
go into the details):
Theorem 15.27. A graph property is testable if and only if the normalized edit
distance from the property is an estimable parameter.
Exercise 15.28. Prove that the graph property ω(G) ≥ v(G)/2 satisfies the
condition given in Theorem 15.25.
Exercise 15.29. Prove the following analogue of Proposition 15.11 for finite
graphs: If P is a testable graph property, then
{ 20 }
P ′ = F : v(F ) = 1 or δ (F, P) ≤ √
log v(F )
is a valid test property for P.
Exercise 15.30. Prove that graph property P is robust if and only if for every
ε > 0 there is an ε0 > 0 such that if G is a graph with v(G) ≥ 1/ε0 and G has
infinitely many near-blowups G′ with d1 (G′ , P) ≤ ε0 , then d1 (G, P) ≤ ε.
and let
1 ∑
(15.11) dsim (s, t) = a(s, t; w).
n
w∈V (G)
We can think of a(s, t; w) as a measure of how different s and t are from the point of
view of w; then dsim (s, t) is an average measure of this difference. Of course, w could
be more myopic and not look for neighbors of s and t among its own neighbors,
but look only for s and t; then dsim (s, t) would measure the size of the symmetric
difference of the neighborhoods of s and t. As explained in the Introduction, this
is a perfectly reasonable definition, but it would not measure what we want. The
node w could also look for second or third neighbors of s and t, but this would not
give anything more useful than this definition, at least for dense graphs.
There are many ways to rephrase this definition. We can pick three random
nodes w, v, u ∈ V (G), and define
(15.12) dsim (s, t) = Ew Ev (asv avw ) − Eu (atu auw ),
where (aij ) is the adjacency matrix of G. We could use v = u here, I just used
different variables to make the correspondence with the definition clearer. We can
also notice that dsim (s, t) is the L1 distance of rows s and t of the square of the
adjacency matrix, normalized by n2 . Finally, the similarity distance is quite closely
related to the distance rWG , discussed in Section 13.4: dsim (s, t) = rWG (x, y), where
x and y are arbitrary points of the intervals representing s and t in WG .
There is an easy algorithm to compute (approximately) the similarity distance
of two nodes.
Algorithm 15.31.
Input: A graph G given by a sampling oracle, two nodes s, t ∈ V , and an error
bound ε > 0.
Output: A number D(s, t) ≥ 0 such that with probability at least 1 − ε,
D(s, t) − ε ≤ dsim (s, t) ≤ D(s, t) + ε.
The algorithm is based on (15.12). Select a random node w and fix it tem-
porarily. Select O(1/ε2 ) random nodes v and compute the average of asv avw , to
get a number that is within an additive error of (ε/4 to Ev (asv avw ) ). Estimate
Ev (atv avw ) similarly. This gives an estimate for |Ev auw (asv − atv ) | with error at
most ε/2 with high probability. Repeat this O(1/ε2 ) times and take the average to
get D(s, t).
Next, we specialize Theorem 13.31 to graphs.
Theorem 15.32. Let G = (V, E) be a graph.
(a) If P = {S1 , . . . , Sk } is a partition of V (G) such that d (G, GP ) = ε, then
we can select a node vi ∈ Si from each partition class such that the average dsim -
distance from S = {v1 , . . . , vk } is at most 4ε.
(b) If S ⊆ V is a subset such that the average dsim -distance from S =
{v1 , . . . , vk } is √ε, then the Voronoi cells of S form a partition P such that
d (G, GP ) ≤ 8 ε.
We define a representative set with error ε > 0 as a subset R ⊆ V (G) such that
any two elements of R are at a (similarity) distance at least ε/2, and the average
distance of nodes from R is at most 2ε. (The first condition is not crucial for the
278 15. ALGORITHMS FOR LARGE GRAPHS AND GRAPHONS
applications we want to give, but it guarantees that the set is chosen economically.)
2
Theorem 13.31 implies that such a set R exists with |R| ≤ 232/ε . Furthermore,
such a set can be constructed in our model.
Algorithm 15.33.
Input: A graph G given by a sampling oracle, and an error bound ε.
2
Output: A random set R ⊆ V (G) such that |R| ≤ (64/ε2 )1028/ε , and with
probability at least 1 − ε, R is a representative set with error ε.
The set R is grown step by step, starting with the empty set. At each step,
a new uniform random node w of G is generated, and the approximate distances
D(w, v) are computed for all v ∈ R with error less than ε/4 with high probability.
If all of these are larger than 3ε/4, then w is added to R. Else, w is discarded and
a new random node is generated. If R is not increased in k = ⌈2000 1ε log 1ε ⌉ steps,
the algorithm halts.
We have to make sure that we don’t make the mistake of stopping too early.
It is clear that as long as the average distance from R is larger than 2ε, then the
probability that a sample has distance at least ε is at least ε, and so the probability
that in k iterations we don’t pick a node whose distance from R is less than ε is less
than e−kε . If we find a good node u, then with high probability the approximate
distance satisfies D(u, R) > 3ε/4, and so we add u to R. Hence the probability
that we stop prematurely is less than e−kε E(|R|) ≤ ε.
The size of R can be bounded using Proposition 13.32, which gives the bound
on the output size. We can say more. Suppose √ that there exists a representative
set R with error ε. Then only a fraction √ of 2 ε nodes of G (call these nodes
“remote”) are at a distance
√ more than ε from R. Let us run the above
√ algorithm
with ε replaced by 2 ε, to get a representative set R′ with error q ε. The set R′
will contain at most one non-remote node from every Voronoi cell of R. We have
little control√over how many remote nodes we selected, but we can post-process the
′
result. The ε-balls around the non-remote √ nodes in R cover all the non-remote
nodes, so only leave out a fraction of 2 ε of all nodes. By sampling and brute
force, we can select the smallest subset R′′ ⊆ R′ with this property. This way √ we
have constructed a representative set R′′ with |R′′ | ≤ |R| and error at most 3 ε.
As a special case, if there is a representative set whose size is polynomially
bounded in the error ε, our algorithm will find one with a somewhat worse polyno-
mial bound.
Remark 15.34. One could try to work with a stronger notion: define a strong
representative set with distance ε > 0 as a subset R ⊆ V (G) such that any two
elements of R are at a (similarity) distance at least ε, and any other node of G is
at a distance at most ε from R.
It is trivial that every graph contains a strong representative set: just take a
maximal set of nodes any two of which are at least ε apart. Furthermore, Propo-
sition 13.32 shows that the size of such a set can be bounded by a function of
ε. There are, however, several problems with the idea of computing and using it.
First, in our very large graph model, the similarity distance cannot be computed
exactly; second (and more importantly) the graph can have a tiny remote part
which no sampling will discover but a representative of which should be included
in the strong representative set.
15.4. COMPUTABLE STRUCTURES 279
for every pair of nodes u, v ∈ V (G), we compute whether they should be adjacent,
knowing only the induced subgraph G[A ∪ {u, v}]. We could decide simply the
adjacency of them in G, but then we would not do any repair. Taking the subgraph
G[A] and its connections to u and v also into account, our algorithm will define in
a modified graph G′ . This graph G′ should have property P, and its edit distance
from G should be arbitrarily small if N and k are large enough.
It may or may not be possible to do so. We say that P is locally reparable if
it is always possible. Austin and Tao prove, among others, that every hereditary
property is reparable. For the exact definitions, formulation, proofs, generalizations
to hypergraphs and other results we refer to the paper.
2. There is a natural nondeterministic version of testability, introduced by
Lovász and Vesztergombi [2012] . A property of finite graphs is called nondeter-
ministically testable if it has a “certificate” in the form of a coloring of the nodes
and edges with a bounded number of colors, adding new edges with other colors,
and orienting the edges, such that once the certificate is specified, its correctness
can be verified by random local testing. Here are a few examples of properties that
are nondeterministically testable in a natural way: “the graph is 3-colorable;” “the
graph contains a clique on half of its nodes;” “the graph is transitively orientable”;
“one can add at most v(G)2 /100 new edges to make the graph perfect.”
Using the theory of graph limits, it is proved that every nondeterministically
testable property is deterministically testable. In a way, this means that P = N P in
the world of property testing for dense graphs. (Many, but not all, of the properties
described above are also covered by Theorem 15.24.)
We will see that for bounded-degree graphs, the analogous statement does not
hold. In fact, the study of nondeterministic certificates will lead to a new interesting
notion of convergence (Section 19.2).
CHAPTER 16
Extremal graph theory was one of the motivating fields for graph limit theory, as
described in the Introduction. It is also one of the most fertile fields of applications
of graph limits. In this chapter we give an exposition of some of the main directions.
We start with two sections developing some technical tools, reflection positivity
and variational calculus. Then we discuss extremal problems for complete graphs
and some other specific problems. We re-prove some classical general results in
extremal graph theory, and finally, we treat some very general questions (formulated
in the introduction) about decidability of extremal graph problems and the possible
structure of extremal graphs.
∑
m
ai aj t([[Fi Fj ]], W ) ≥ 0,
i,j=1
281
282 16. EXTREMAL THEORY OF DENSE GRAPHS
This is equivalent to
(∑
m )2
(16.1) [[ ai Fi ]] ≥ 0.
i,j=1
So we get the fact, used implicitly in the Introduction (Section 2.1.3), that unlabel-
ing the square of a k-labeled quantum graph, we get a nonnegative quantum graph.
Let us also recall the trivial fact that adding or deleting isolated nodes to a graph
F does not change the homomorphism densities t(F, .). We call a quantum graph
g a square-sum if there are k-labeled
∑ quantum graphs y1 , . . . , ym for some k such
that g can be obtained from i yi2 by unlabeling and adding or deleting isolated
nodes. We have just shown that every square-sum satisfies g ≥ 0.
Another important property of semidefinite matrices is that their determinant
is nonnegative: for any set {F1 , . . . , Fm } of graphs and any graphon W , we have
t([[F1 F1 ]], W ) . . . t([[F1 Fm ]], W )
.. ..
. . ≥ 0.
t([[Fm F1 ]], W ) . . . t([[Fm Fm ]], W )
This is still a rather complicated inequality, but one special case will be useful:
(16.3) [[F1 F1 ]][[F2 F2 ]] ≥ [[F1 F2 ]]2 .
Another consequence of reflection positivity is the following: let (aij )m
i,j=1 be a
symmetric positive semidefinite matrix, then
∑
m
(16.4) aij [[Fi Fj ]] ≥ 0.
i,j=1
We can add two relations related to subgraphs. First, adding an isolated node
to a graph F does not change its density in any graph or graphon, and so
F K1 − F ≥ 0, but also F − F K1 ≥ 0.
Furthermore, if F ′ is a subgraph of F , then t(F ′ , W ) ≥ t(F, W ) for every graphon
W , and hence
(16.5) F ′ − F ≥ 0.
Exercise 16.1. Show that inequalities (16.2) (16.4) and (16.5) can be derived
from the inequalities (16.1).
Exercise 16.2. Prove the following “supermodularity” inequality: if F1 and F2
are two simple graphs on the same node set, then F1 ∪ F2 + F1 ∩ F2 ≥ F1 + F2 .
16.2. VARIATIONAL CALCULUS OF GRAPHONS 283
Example 16.3. Clearly Cn‡ = nPn•• , where Pn•• denotes the path on n nodes with
its endpoints labeled. So txy (Cn‡ , W ) = ntxy (Pn•• , W ) = nW ◦(n−1) .
The Edge Reconstruction Conjecture 5.31 says in this language that if F and
G are simple graphs that are large enough (both have at least four non-isolated
nodes), then [[F ‡ ]] = [[G‡ ]] implies that F ∼
= G.
We study two kinds of variations of a kernel W : in the more general version,
we change the values of W at every node. However, it is often easier to construct
variations in which the measure on [0, 1] is rescaled. This simpler kind variation will
have the advantage that if we start with a graphon, then we don’t have to worry
about the values of W running out of the interval [0, 1]. We start with describing
the variation of the measure.
Consider a family αs : [0, 1] → R+ (s ∈ [0, 1]) of weight functions such that
∫1
0
α s (x) dx = 1 for every s. Every such function defines a probability measure µs
on [0, 1] by
∫
µs (A) = αs (x) dx.
A
We say that the family (αs ) has uniformly bounded derivative, if for every x ∈ [0, 1]
d
the derivative α̇s (x) = ds αs (x) exists, and there is a constant M > 0 such that
|α̇s (x)| ≤ M for all x and s. If αs is a family of weight functions with uni-
formly bounded derivative, then by elementary analysis it follows that the function
284 16. EXTREMAL THEORY OF DENSE GRAPHS
∫
Proof. (a) Let φ : [0, 1] → [−1, 1] be a measurable function such that φ = 0.
For s ∈ [−1, 1], we re-weight the points of [0, 1] by αs (x) = 1 + sφ(x) to get the
graphon Ws . Using (16.6), we get
∫1
d ( ) ∑m
Φ t(F1 , Ws ), . . . , t(Fm , Ws ) = ai φ(x)tx (Fi† , W ) dx
ds s=0
i=1 0
∫1 ∑
m
= φ(x) ai tx (Fi† , W ) dx.
0 i=1
d ( )
Φ t(F1 , W + sU ), . . . , t(Fm , W + sU )
ds s=0
∫ ∑
m
= U (x, y) a′i txy (Fi‡ , W ) dx dy.
i=1
[0,1]2
This must hold for all functions U ∈ W1 such that U (x, y) ≥ 0 if W (x, y) = 0 and
U (x, y) ≤ 0 if W (x, y) = 1, which implies (b).
Proof. The “only if” direction is trivial. To prove the “if” direction, suppose
that t(g, Kn ) ≥ 0 for all n; we want to prove that t(g, W ) ≥ 0 for every W ≥ 0.
It suffices to prove this for any dense set of graphons W , and we choose the set
graphons WH , where H is a node-weighted simple graph (all edgeweights are 0 or 1).
Let V (H) = [q], and let α1 , . . . , αq ≥ 0 be the
∑ nodeweights (we allow 0 nodeweights
for this argument). We may assume that i αi = 1. Supposing that there is an
H with t(g, H) < 0, choose one with minimum number of nodes, and choose the
nodeweights so as to minimize t(g, H). Then all the nodeweights must be positive,
since a node with weight 0 could be deleted without changing any subgraph density,
contradicting the minimality of q. Clearly t(g, H) is a polynomial in the nodeweights
αi . Furthermore, the assumptions that the constituents of g are complete and H
has no loops imply that every homomorphism contributing to t(g, H) is injective,
and so t(g, H) is multilinear.
Next we prove that H must be complete. Indeed, if (say) nodes 1, 2 ∈ V (H) are
nonadjacent, then t(g, H) has no term containing the product α1 α2 , i.e., fixing the
remaining variables, t(g, H) is a linear function of α1 and α2 . Since only the sum
of α1 and α2 is fixed, we can shift them keeping the sum fixed and not increasing
the value of t(g, H) until one of them becomes 0. This is a contradiction, since we
know that all weights must be positive.
To show that all weights are equal, let us push the argument above a bit further.
Fixing all variables but α1 and α2 , we can write t(g, H) = a + b1 α1 + b2 α2 + cα1 α2 .
Since H is complete, we know that t(g, H) is a symmetric multilinear polynomial
in α1 , . . . , αq , and so b1 = b2 . Since α1 + α2 is fixed, we get t(g, H) = a′ + cα1 α2 ,
where a′ does not depend on α1 or α2 . If c ≥ 0, then this is minimized when α1 = 0
or α2 = 0, which is a contradiction as above. Hence we must have c < 0, and in
this case t(g, H) is minimized when α1 = α2 . Since this holds for any two variables,
all the αi are equal.
16.3. DENSITIES OF COMPLETE GRAPHS 287
But this means that t(g, H) = t(g, Kq ), which is impossible since t(g, Kq ) ≥ 0
by hypothesis. This completes the proof.
As a corollary (which is in fact equivalent to the theorem) we get the following.
( an integer m ≥ 1, )and associate with every graphon W the vector tW =
Fix
t(K2 , W ), . . . , t(Km , W ) . Let Tm denote the set of the vectors tW . It follows
from Theorem 11.21 and Corollary 11.15 that Tm is the closure of the points tG ,
where G is a simple graph (we write tG for tWG ).
Corollary 16.9. The extreme points of the convex hull of Tm are the vectors tKn
(n = 1, 2, . . . ) and (1, . . . , 1).
The following corollary is interesting to state in view of the undecidability result
of Hatami and Norine [2011] already mentioned in the Introduction (which will be
proved as Theorem 16.34 a little later). It is easy to design an algorithm to check
whether (16.9) holds for every n, and hence:
Corollary 16.10. For quantum graphs g with rational coefficients whose con-
stituents are complete graphs, the property g ≥ 0 is algorithmically decidable.
As a further corollary, we derive Turán’s Theorem for graphons.
Corollary 16.11. For every r ≥ 2, we have
f0 , t(Kr , W ) = 0} = 1 − 1
max{t(K2 , W ) : W ∈ W ,
r−1
and the unique optimizer is W = WKr−1 .
One could prove this result along the lines of several well-known proofs of
Turán’s Theorem; we could also prove a generalization of Goodman’s inequality
2.2. Specializing the proof of Theorem 16.8 above just to this case, we get the
proof by “symmetrization”, due to Zykov [1949].
Proof. Let us prove the inequality
(16.10) rr t(Kr , W ) − (r − 1)t(K2 , W ) + r − 2 ≥ 0.
By Theorem 16.8, it suffices to verify this inequality when W = WKn for some
n ≥ 1. This is straightforward, and we also see that equality holds for n = r − 1
only. Corollary 16.9 implies that equality holds in (16.10) only if W = WKr−1 . In
the special case with t(Kr , W ) = 0, we get Corollary 16.11.
16.3.2. Edges vs. triangles. In the introduction (Section 2.1.1) we men-
tioned several results about the number of triangles in a graph, if the number of
edges is known: Goodman’s bound and its improvements, and the Kruskal–Katona
Theorem. In this Section we describe the exact relationship between the edge den-
sity and triangle density in a graph, i.e., we describe the set D2,3 ; for convenience,
we recall how it looks (Figure 16.1; also recall that the figure is distorted to be able
to see some features better).
As a special case of Corollary 16.9, we get the result of Bollobás [1976] men-
tioned in the introduction:
Corollary 16.12. The set D2,3 is contained in the convex hull of the points (1, 1)
and
( ) ( n − 1 (n − 1)(n − 2) )
tn = t(K2 , Kn ), t(K3 , Kn ) = , (n = 1, 2, . . . ).
n n2
288 16. EXTREMAL THEORY OF DENSE GRAPHS
Figure 16.1.
However, a quick look at Figure 2.1 shows that Corollary 16.12 does not tell the
whole story: between any two special points (including the endpoints of the upper
boundary), the domain D2,3 is bounded by a curve that appears to be concave. It
turns out that these curves are indeed concave, which can be proved by the same
kind of argument as used in the proof of Theorem 16.8 (see Exercise 16.18). The
formula for these curves (cubic equations) is more difficult to obtain, and this will
be our main concern in the rest of this section.
The Kruskal–Katona bound. Let us start with the curve bounding the
domain T from above, which is not hard to determine: its equation is y = x2/3 .
As we mentioned in the introduction, this follows from (a very special case of) the
Kruskal–Katona Theorem in extremal hypergraph theory. Here we give a short
direct proof using the formalism of graph algebras. Applying (16.3) with F1 = P3••
and F2 = P2•• , we get
( )( ) 2 2 2
=[[ ]] ≤ [[ ]][[ ]] = ≤
(the last step uses the trivial monotonicity (16.5)). This shows that t(K3 , W ) ≤
t(K2 , W )3/2 for every graphon W , what we wanted to prove.
We also want to prove that this upper bound on the triangle density is sharp.
For n ≥ 1, let G consist of a complete graph on k nodes and n − k isolated nodes.
Then
( t(K2 , G) =) (k)2 /n2 and t(K3 , G) = (k)3 /n3 . Clearly, points of the form
(k)2 /n2 , (k)3 /n3 get arbitrarily close to any point on the curve y = x2/3 .
Razborov’s Theorem. To determine the lower bounding curve of D2,3 is
much harder (Razborov [2008]); even the result is somewhat lengthy to state. Per-
haps the best way to remember it is to describe a family of extremal graphons
(which are all node-weighted complete graphs).
Theorem 16.13. For all 0 ≤ d ≤ 1, the minimum of t(K3 , W ) subject to W ∈ W0
and t(K2 , W ) = d, is attained by the stepfunction W = WH , where H is a weighted
complete graph on k = ⌈ 1−d1
⌉ nodes with edgeweights 1 and appropriate nodeweights:
k − 1 of the nodeweights are equal and the last one is at most as large as these.
One indication of the difficulty of the proof is that the extremal graphon is not
unique, except for the special values d = 1 − 1/k. Let us consider the interval I
representing the smallest weighted node and the interval J representing any other
node. Restricted to I ∪ J, the graphon is bipartite and hence triangle-free. If we
16.3. DENSITIES OF COMPLETE GRAPHS 289
replace the function WH on (I ∪J)×(I ∪J) by any other triangle-free function with
the same integral, then neither the edge density nor the triangle density changes,
but we get a different extremal graphon.
The nodeweights can be determined by simple computation. With a con-
venient parametrization suggested by Nikiforov [2011], they can be written as
(1 + u)/k, . . . , (1 + u)/k, (1 − (k − 1)u)/k. The edge density in the extremal graph
is
k−1 k−1
(16.11) t(K2 , H) = d = (1 − u2 ) = (1 + u)(1 − u),
k k
and the triangle density is
(k − 1)(k − 2) (k − 1)(k − 2)
(16.12) t(K3 , H) = 2
(1−3u2 −2u3 ) = (1+u)2 (1−2u).
k k2
This gives a parametric
[ k−2 equation
] for the cubic curve bordering the domain in Figure
2.1 in the interval k−1 , k−1
k . We can solve (16.11) for u as a function of d, and then
substitute this in (16.12) to get an explicit expression for t(K3 , H) as a function of
d (this hairy formula is not the way to understand or remember the result; but we
need it in the proof):
(k − 1)(k − 2)
(16.13) t(K3 , H) = f (d) = (1 + u)2 (1 − 2u)
k2 √ √
(k − 1)(k − 2) ( kd )2 ( kd )
= 1 + 1 − 1 − 2 1 − .
k2 k−1 k−1
(where k = ⌈ 1−d
1
⌉). This function is rather complicated, and I would not even
bother to write it out, except that we need its explicit form in the proof below.
Perhaps the following form says more:
( 3kd k 2 f (d) )2 ( kd )3
(16.14) 1− + = 1− .
2(k − 1) 2(k − 1)(k − 2) k−1
This shows that after an appropriate affine transformation, every concave piece
of the boundary of the region D2,3 looks alike (including the curve bounding the
region from above). Perhaps this is trying to tell us something—I don’t know.
These considerations allow us to reformulate Theorem 16.13 in a more direct
form:
Theorem 16.14. If G is a graph with t(K2 , G) = d, then t(K3 , G) ≥ f (d).
The original proof of this theorem uses Razborov’s flag algebra technique, which
is basically equivalent to the methods developed in this book. Since then, the result
has been extended by Nikiforov [2011] to the number of K4 ’s and by Reiher [2012] to
all complete graphs. We describe the proof of Razborov’s Theorem in our language.
(Reiher’s proof can be viewed as a generalization of this argument to all complete
graphs; the generalization is highly nontrivial.)
( )
Proof. Let W ∈ W0 minimize t(K3 , W ) − f t(K2 , W ) subject to k−2 k−1 ≤
t(K2 , W ) ≤ k , and suppose (by way of contradiction) that the minimum value
k−1
(The representation in terms of the parameter u will be useful if you want to follow
some of the computations below). Since the objective function is 0 at the endpoints
k−1 < d < k , and so W is a local minimizer in W0 .
of the interval, we must have k−2 k−1
This is possible, since the left side, as a function of w0 , ranges from 0 to λ2 /(3µ) on
this interval, and the target value (k − 3)/(k − 2) is in this range by (16.26). Then
(k − 3) 3 (k − 3)(k − 4) 3
g(w0 ) = f w = w0 .
k−2 0 (k − 2)2
( )
The linear function we will use goes through the point w0 , g(w0 ) and has appro-
priate slope:
Claim 16.15. Let λ and µ be real numbers satisfying (16.25) and (16.26), and let
w0 satisfy (16.28). Then for every w ∈ [ 2λ
µ µ
, λ ], we have
1
g(w) − g(w0 ) ≥(2λ2 − 3µ)(w − w0 ).
3
All functions in this claim are explicit, which makes it a (hard and tedious)
exercise in first year calculus; we do not reproduce the details of its proof. Using
this claim, we have
∫1 ∫1
( ) 1 1
t(K4 , W ) ≥ g w(z) dz ≥ g(w0 ) − (2λ − 3µ)w0 + (2λ − 3µ) w(z) dz
2 2
3 3
0 0
(k − 3)(k − 4) 3 1
= w0 + (2λ2 − 3µ)(d − w0 ).
(k − 2)2 3
Hence, returning to (16.22),
(k − 3)(k − 4) 3 1
(16.29) (λ+3d−2)t(K3 , W ) ≥ λ(2d2 −d)+ w0 + (2λ2 −3µ)(d−w0 ).
(k − 2)2 3
This is another messy formula, but we can express the variables in terms of u and
y = w0 /(1 + u) (the latter is, of course, chosen with hindsight): we already have
expressions for λ and d; we have w0 = y(1 + u), and then µ can be expressed using
(16.28). With these substitutions, the difference of the two sides looks like
( k − 2)
(16.30) (1 + u)2 y −
( k )
k−1 (k − 3 3k − 7 ) 2(k − 1)(k − 3)
× (1 − u) − u+ y+ 2
(1 + u)y .
k k−2 k−2 k(k − 2)
2
(k−2)
k ≤ y ≤ k(k−3) .
1
We know that 0 < u < k−1 , and (16.28) and (16.26) imply that k−2
Then we face another exercise in calculus, to show that in this range (16.30) is
negative (we don’t describe the details). This contradicts (16.29), and completes
the proof.
It would of course be important to find a more “conceptual” proof of Theorem
16.13. As a couple of examples of the kind of general question that arises, is an
algebraic inequality between densities of complete graphs decidable? Does such an
inequality hold true if it holds true for all node-weighted complete graphs?
Exercise 16.16. Let g be a quantum graph such that every constituent with
negative coefficient is complete. Prove that tinj (g, W ) ≥ 0 for every graphon W if
and only if tinj (g, Kn ) ≥ 0 for all n ≥ 1 (Schelp and Thomason [1998]).
Exercise 16.17. Prove that for quantum graphs g with rational coefficients whose
constituents are complete graphs, the property g ≥ 0 is in P . (The input length is
the total number of digits in the numerators and denominators of the coefficients,
and complete k-graphs (0 ≤ k ≤ m) contribute 1 even it their coefficient is 0.)
16.4. THE CLASSICAL THEORY OF EXTREMAL GRAPHS 293
Exercise 16.18. Adopt the proof of Theorem 16.8 to prove that the boundary
of the domain D2,3 (Fig. 2.1) is concave between any two special points tn and
tn+1 .
Exercise 16.19. (a) Let Kr′ denote the graph obtained by deleting an edge from
Kr . Prove that
′ t(Kr , G)2
t(Kr+1 , G) ≥ .
t(Kr−1 , G)
(b) Prove that
( ′ )
t(Kr+1 , G) − t(Kr , G) ≤ r t(Kr+1 , G) − t(Kr+1 , G) .
(c) Prove that
t(Kr , G) t(Kr+1 , G)
r ≤ (r − 1) + 1.
t(Kr−1 , G) t(Kr , G)
(d) Prove the following result of Moon and Moser [1962]: If Nr denotes the number
of complete r-graphs in G, then
( )
Nr+1 1 Nr
≥ 2 r2 − N1 .
Nr r −1 Nr−1
Exercise 16.20. Prove the following generalization of Goodman’s Theorem (2.2):
if d = t(K2 , W ) is the edge density of a graph G, then
( )
t(Kr , G) ≥ d(2d − 1)(3d − 2) · · · (r − 1)d − (r − 2) .
Exercise 16.21. Prove that
( )
t(K4′ , G) ≥ t(K3 , G)2 log∗ 1/t(K3 , G) .
[T. Tao; hint: use the Removal Lemma.]
Theorem 16.23. Let L1 , . . . , Lk be simple graphs and let r = mini χ(Li ). Then
1
max{t(K2 , W ) : W ∈ W0 , t(L1 , W ) = · · · = t(Lk , W ) = 0} = 1 − ,
r−1
and the unique optimizer (up to weak isomorphism) is W = WKr−1 .
Proof. Let, say χ(L1 ) = r. Then t(L1 , Kr ) > 0, and hence it follows easily
that t(Kr , W ) = 0 (Exercises 7.6, 13.25). Application of Turán’s Theorem for
graphons (Corollary 16.11) completes the proof.
It is not quite trivial, but not very hard either, to derive the classical results
mentioned above from Theorem 16.23. Let us illustrate this by the derivation
of stability. If stability fails, then there exists a sequence Gn of simple graphs
such that tinj (L1 , Gn ) = · · · = tinj (Lk , Gn ) = 0 and t(K2 , Gn ) → 1 − 1/(r − 1), but
δb1 (Gn , T (n, r−1)) ̸→ 0. By Theorem 9.30, this implies that δ1 (Gn , T (n, r−1)) ̸→ 0.
In graphon language, this means that δ1 (WGn , WKr−1 ) ̸→ 0. By choosing a subse-
quence and than a subsequence of that, we may assume that δ1 (WGn , WKr−1 ) > a
for some a > 0 for all n, and Gn → U for some graphon U . Then t(L1 , U ) =
· · · = t(Lk , U ) = 0 and t(K2 , U ) = 1 − 1/(r − 1), so by Theorem 16.23, U is weakly
isomorphic to WKr−1 . So Gn → WKr−1 . Since WKr−1 is 0-1 valued, it follows by
Proposition 8.24 that δ1 (WGn , WKr−1 ) → 0, a contradiction.
Proof. This proof illustrates the power of extending graph problems to a con-
tinuum. By Proposition 14.26, the set W0 \ R is convex. Hence it follows that the
d1 distance from R is a concave function on W0 \ R. We also know by Proposition
15.15 and Theorem 15.16(b) that d1 (., R) is a continuous function on (W f0 , δ ), and
hence it assumes its maximum. Let M be the set of maximizing graphons in W0 ;
this is a convex, closed subset of W0 . Since W0 \ R is invariant under the group
of invertible measure preserving transformations of [0, 1], so is M , and hence M is
also compact in the pseudometric δ .
This implies in many ways that M contains a constant function; here is a fast
argument. Let W ∈ M have minimum L2 -norm (such a graphon exists, since by
Lemma 14.15 the L2 -norm is lower semicontinuous with respect to the cut norm).
For every measure preserving transformation φ, we have (W + W φ )/2 ∈ M . Fur-
thermore,
W + Wφ
≤ 1 (∥W ∥2 + ∥Wφ ∥2 ) = ∥W ∥2 .
2
2
2
By the choice of W , we must have equality here, which implies that W and W φ
are proportional, but since they have the same integral, they must be equal almost
everywhere. Since this holds for any φ, W must be constant almost everywhere.
16.5.2. The Sidorenko Conjecture. Sidorenko [1991, 1993] conjectured
that the inequality
(16.33) t(F, W ) ≥ t(K2 , W )e(F ) .
holds for all bipartite graphs F and all W ∈ W, W ≥ 0. Several special cases of
this inequality were mentioned in the Introduction, Section 2.1.2. Sidorenko in fact
formulated this not only for graphs but for graphons, being perhaps the first to
use the integral expression for t(., .) as a generalization of subgraph counting. (The
conjecture extends to non-symmetric functions W , but we restrict our attention to
the symmetric case here.) A closely related conjecture in extremal graph theory
was raised earlier by Simonovits [1984]. In spite of its very simple form and a lot
of effort, this conjecture is unproven in general.
It is easy to see that every graph satisfying Sidorenko’s Conjecture must be
bipartite. Indeed, if W = WK2 , then the right side of (16.33) is positive, but the
left side is positive only if F is bipartite.
We can view this as an extremal problem in two ways: (1) for every nonnegative
W ∈ W, matchings minimize t(F, W ) among all bipartite graphs with a given
number of edges; (2) for every bipartite graph F , constant functions W minimize
t(F, W ) among all nonnegative kernels W with a given integral. Since both sides
of (16.33) are homogeneous in W of the same degree, we can scale W and assume
that t(K2 , W ) = 1. Then we want to conclude that t(F, W ) ≥ 1 for every bipartite
graph F .
There are partial results in the direction of the conjecture. Sidorenko proved
it for a fairly large class of graphs, including trees, complete bipartite graphs, and
all bipartite graphs with at most 4 nodes in one of the color classes. After a long
period of little progress, several new (but unfortunately still partial) results were
obtained recently. Each of these is in one way or other related to the material in
this book, so we discuss them in some detail.
We have defined weakly norming graphs in Section 14.1. Hatami [2010] gives a
proof of the following (easy) fact, attributing it to B. Szegedy : If a bipartite graph
296 16. EXTREMAL THEORY OF DENSE GRAPHS
F is weakly norming, then it satisfies the Sidorenko conjecture. Combined with the
result of Hatami (Proposition 14.2) that all cubes are weakly norming, it follows
that all cubes satisfy Sidorenko’s conjecture.
In another direction, Conlon, Fox and Sudakov [2010] proved that the con-
jecture is satisfied by every bipartite graph that contains a node connected to all
nodes on the other side. Their proof uses a sophisticated probabilistic argument.
Li and Szegedy [2012] give a shorter analytic proof, which extends to a larger class
of graphs. Szegedy [unpublished] uses entropy arguments to prove the conjecture
for an even larger class, which includes of all previously settled special cases. The
smallest graph for which the conjecture is not known is the Möbius ladder of length
5 (equivalently, a 10-cycle with the longest diagonals added).
Proof. Let us start with the “if” part. Replacing W by 1+(W −1)/∥W −1∥∞ ,
we may assume that 0 ≤ W ≤ 2. Let U = W −1, then U ∈ W1 . The homomorphism
density t(F, W ) = t(F, 1 + U ) can be expanded in terms of the subgraphs of F , and
so what we want to prove is
∑
t(F ′ , εU ) ≥ 1
F ′ ⊆F
for a sufficiently small ε > 0. (Let us agree that, for the rest of this section, F ′ ⊆ F
means that F ′ is a subgraph of F without isolated nodes.) The term with F ′ = ∅
is 1, and so (pulling out the ε factors) we want to prove that
∑ ′
(16.34) t(F ′ , U )εe(F ) ≥ 0.
∅̸=F ′ ⊆F
It follows from the definition of U that t(K2 , U ) = 0, and so every term in (16.34)
where F ′ is a matching cancels. If F itself is a matching, we have nothing to
prove. Otherwise, the next smallest term is t(P3 , U ), which is nonnegative, since
P3 = [[(K2• )2 ]] is a square. If t(P3 , U ) > 0, then for every sufficiently small ε > 0 it
dominates the sum (16.34), and we are done. So suppose that t(P3 , U ) = 0, then
16.5. LOCAL VS. GLOBAL OPTIMA 297
tx (K2• , U ) = 0 for almost all x. This implies that t(F ′ , U ) = 0 whenever U has a
node of degree 1. In particular, if F is a forest, we are done.
Suppose that F is not a forest, then the nonzero term in 16.34 with the small-
est number of edges is t(C2r , U ), where C2r is the shortest cycle in F . Since
t(C2r , U )1/(2r) is a norm by Proposition 14.2, this term is nonzero if U ̸= 0, and so
for a sufficiently small ε > 0, it dominates the remaining terms.
To prove the “only if” part, suppose
( 1) that the girth of F is odd. Let U be
the graphon defined by the matrix −1 1 −1 . Then t x (K2
•
, U ) = 0 for every x, and
hence all those terms in (16.34) are 0 in which F has a node with degree 1. So
the nonzero terms with the smallest exponent of ε correspond to the shortest (odd)
cycles. Trivially t(F ′ , W ) = −1 for such a term, and so for a sufficiently small ε
the whole expression (16.34) will be negative.
The proof of Theorem 16.26 expands the idea of the proof above, but one has
to do much more careful estimations. Many steps in the proof can be viewed as
using a version of the calculus “for W0 ” developed before, but this time “for W1 ”.
Some steps are described in Exercises 16.29-16.31 to illustrate this potentially useful
technique.
16.5.4. Common graphs. The following inequality is closely related to
Goodman’s Theorem (2.2), and it can be proved along the same lines:
1
(16.35) t(K3 , G) + t(K3 , G) ≥ ,
4
and equality holds asymptotically if G is a random graph with edge density 1/2.
Erdős conjectured that a similar inequality will hold for K4 in place of K3 , but this
was disproved by Thomason [1998]. More generally, one can ask which graphs F
satisfy ( )
tinj (F, G) + tinj (F, G) ≥ 1 + o(1) 21−e(F ) ,
where the o(1) refers to v(G) → ∞. Going to the limit, we get a formulation free
of remainder terms: Which simple graphs F satisfy
1
(16.36) t(F, W ) + t(F, 1 − W ) ≥ 21−e(F ) = 2t(F, )
2
for every graphon W ? Such graphs F are called common graphs. So the triangle is
common, but K4 is not. Are there any other common graphs?
Sidorenko [1996] studied graphs with this and other “convexity” properties.
Let F be a graph satisfying Sidorenko’s conjecture. Then
t(F, W ) + t(F, 1 − W ) ≥ t(K2 , W )e(F ) + t(K2 , 1 − W )e(F )
( t(K , W ) + t(K , 1 − W ) )e(F )
2 2
≥2 = 21−e(F ) ,
2
so F is common. Sidorenko’s conjecture would imply that all bipartite graphs are
common, and all bipartite graphs mentioned above for which Sidorenko’s conjecture
is verified are common. Among non-bipartite graphs, not many common graphs are
known. Jagger, Štovı́ček and Thomason [1996] showed that no graph containing
K4 is common.
Franek and Rödl [1992] showed that if we delete an edge from K4 , the obtained
graph K4′ is common. Recently Hatami, Hladky, Král, Norine and Razborov [2011]
proved that the 5-wheel is common, using computers to find appropriate nonnega-
tive expressions in the flag algebra. We cannot reproduce their proof here; instead,
298 16. EXTREMAL THEORY OF DENSE GRAPHS
let us give the proof of the fact that K4′ is common, which should give a feeling for
this technique. We do our computations in the graph algebra, instead of the flag
algebra.
We start with rewriting (16.36) as follows. Let U = 2W − 1, then substituting
U in (16.36) and multiplying by 2e(F ) , we get
(16.37) t(F, 1 + U ) + t(F, 1 − U ) ≥ 2,
which should hold for every U ∈ W1 . In other words, F is common iff the left side
is minimized by U = 0.
The subgraph densities on the left side of (16.37) can be expanded as before,
and we get ∑
t(F, 1 + U ) + t(F, 1 − U ) = 2 t(F ′ , U ).
F ′ ⊆F
e(F ′ ) even
The term with e(F ′ ) = 0 gives 2, the value on the right side of (16.37), so F is
common if and only if
∑
(16.38) t(F ′ , U ) ≥ 0
F ′ ⊆F
e(F ′ )>0, even
for every U ∈ W1 . Note that inequality (16.38) has to be true for all U ∈ W1 , not
just for U ∈ W0 , so the fact that all terms on the left have nonnegative coefficient
does not make this relation trivial.
As an example, in the case F = K4′ we get from (16.38) that its commonness
follows if we can show that
(16.39) 2 +8 + +4 ≥ 0 (for W1 ).
Here we can write the left side as
( )2
]] + 4( )
2 2
[[
2 +4 + +2 − .
It is easy to see (see Exercise 16.30) that the last term is nonnegative. The other
terms are squares, which proves (16.39).
Locally common graphs. We say that a graph F is locally common, if for every
U ∈ W1 there is a 0 < εU ≤ 1 such that if 0 < ε < εU , then t(F, 1 + εU ) + t(F, 1 −
εU ) ≥ 2.
Franek and Rödl [1992] proved that K4 is locally common. In fact, the follow-
ing more general result holds, and can be proved along the lines of the proof of
Proposition 16.27, using formula 16.38.
Proposition 16.28. Let G be a graph in which the subgraph with the minimum
number of edges such that all degrees are at least 2 and the number of edges is even
is an even cycle. Then G is locally common.
In particular, every bipartite graph is locally common (this follows by Propo-
sition 16.27 as well), and so is every simple graph containing a 4-cycle. Combining
with the theorem of Jagger, Štovı́ček and Thomason [1996] mentioned above, it
follows that every graph that contains a K4 is not common but locally common.
Not all graphs are locally common (see Exercise 16.33).
Exercise 16.29. Prove that C2 ≥ C4 ≥ C6 ≥ . . . (for W1 ).
16.6. DECIDING INEQUALITIES BETWEEN SUBGRAPH DENSITIES 299
Exercise 16.30. Prove that [[F12 F2 ]] ≤ [[F12 ]] (for W1 ) for any two k-labeled
multigraphs F1 and F2 .
Exercise 16.31. Suppose that a bipartite graph F contains a 4-cycle. Prove that
F ≤ C4 (for W1 ). More generally, if F is not a forest, then F ≤ C2r (for W1 ),
where C2r is the shortest cycle in F .
Exercise 16.32. Prove that triangles are common (16.35).
Exercise 16.33. Prove that [[C7• C11
•
]] is not locally common.
in 3k variables:
( v1 w1 vk wk )
p∗∗ (u1 , v1 , w1 , . . . , uk , vk , wk ) = (u1 . . . uk )N p∗ 2 , 3 , . . . , 2 , 3 ,
u1 u1 uk uk
(where N = 5 deg(p∗ ) is large enough to cancel all denominators).
Claim 16.36. The following are equivalent: (i) p ≥ 0 on Ak ; (ii) p∗ ≥ 0 on D2,3
k
;
∗
(iii) p ≥ 0 on D .
k
M∑
k
≥− (1 − xi )2 |xi − zi |.
2 i=1
We show that each term is compensated for by the corresponding term in the other
part of p∗ , i.e.,
1
(16.41) (1 − xi )2 |xi − zi | ≤ yi − 2x2i + xi .
2
Let us assume e.g. that xi ≤ zi (the other case is similar). Let wi ∈ A be the
closest point to xi with wi < xi .
By Corollary 16.12, yi is above the chord between zi and wi of the parabola
2x2 + x. On the other hand, 2x2i + xi is below the chord between zi and (zi + wi )/2.
The slope of the first chord is 2zi + 2wi − 1; the slope of the second, 3zi + wi − 1.
The difference in slopes is zi − wi , and so yi − 2x2i + xi ≥ (zi − wi )(zi − xi ). Simple
computation shows that
(1 − wi )2 1
zi − wi = ≥ (1 − xi )2 .
2 − wi 2
This proves (16.41) and thereby also Claim 16.36.
As a preparation for the rest of the proof, we need to construct some special
graphs. We fix a simple graph F with node set [k] that has no automorphisms.
For any set of positive integers n1 , . . . , nk , let F (n1 , . . . , nk ) denote the (unlabeled)
graph obtained from F by replacing every node i by a set of ni twins. We will call
these ni nodes the clones of i.
16.6. DECIDING INEQUALITIES BETWEEN SUBGRAPH DENSITIES 301
As a further step, we add all missing edges to Fir with negative sign, to get the
k-labeled signed graph Fbir (as we have seen, this can be considered as a k-labeled
quantum graph). Recall that Fb is defined analogously.
Claim 16.37. Every homomorphism of Fb into any graph G is an induced embed-
ding.
Indeed, every homomorphism preserves both edges and non-edges by the def-
inition of signed graphs. Suppose that two nodes u, v ∈ V (F ) are mapped onto
the same node of G. Then every further node of F must be connected to u and
v in the same way, and so interchanging u and v is an automorphism of F , which
contradicts the choice of F .
Claim 16.38. For every homomorphism of Fb into any of the special graphs
F (n1 , . . . , nk ), each node i ∈ V (F ) is mapped onto a clone of i.
The proof is similar to the previous one. We already know that the map is
injective. For u ∈ V (F ), let σ(u) be defined as the node of F whose clones in
F (n1 , . . . , nk ) contain the image of u. No two nodes u, v ∈ V (F ) have σ(u) = σ(v):
similarly as before, interchanging two such nodes would be an automorphism of F .
Hence σ is an automorphism of F , and hence σ must be the identity. This proves
the Claim.
Our next observation is that homomorphism densities from the signed graphs
Fir into any simple graph G can be expressed quite simply. Let G be any simple
graph, and let φ : [k] → V (G), and let S = φ([k]) be its range. Let Uφ,i be the set
of nodes in V (G) \ S which are connected to φ(i) and all the neighbors of φ(i) in
S, but to no other node in S.
We claim that
{
hom(Kr , G[Uφ,i ]) if φ is an induced embedding of F ,
(16.42) homφ (Fbir , G) =
0 otherwise.
Assume first that φ is an induced embedding of F into G. It is clear that if ψ is any
homomorphism of Fbir into G extending φ, then all the clones of i in Fbir must be
mapped onto nodes in Uφ,i . Since these twins form a complete graph Kr , the num-
ber of ways to map these twins into G[Uφ,i ] homomorphically is hom(Kr , G[Uφ,i ]),
and every such map, together with φ, forms a homomorphism of Fbir into G. Claim
302 16. EXTREMAL THEORY OF DENSE GRAPHS
This proves Claim 16.39, and together with Claim 16.35, it completes the proof
of the theorem.
As mentioned in the introduction, an inequality g ≥ 0, where g is a quantum
graph, is decidable with an arbitrarily small error:
Proposition 16.40. There is an algorithm that, given a quantum graph g with
rational coefficients and an error bound ε > 0, decides either that g 0 or that
g + εK1 ≥ 0 (if both inequalities are true, then it may return either answer).
Proof. This will follow from Theorem 16.41 in the next section, ∑ but let us
describe a simple direct proof suggested
∑ by Pikhurko. Let g = F a F F be the
given quantum graph, let a = F |aF |e(F ), ε1 = ε/a. By Corollary 9.25, there
is an integer k ≥ 1 such that all simple ∑ graphs with k nodes form an ε1 -net in
(W0 , δ ). Let us check the inequality F aF t(F, G) ≥ 0 for all simple graphs G
with at most k nodes. If we find a graph that violates it, we know that g 0. Else,
let G be any simple graph. By the definition of k, there is a simple graph G′ on k
nodes such that δ (G, G′ ) ≤ ε1 , and hence by the Counting Lemma 10.22, we have
∑ ∑ ∑
aF t(F, G) ≥ aF (t(F, G′ )−e(F )ε1 ) ≥ aF t(F, G′ )−ε ≥ −ε = −εt(K1 , G′ ),
F F F
so we can conclude that g + εK1 ≥ 0.
16.6.2. Positivstellensatz for graphs. Is there a quantum graph g ≥ 0
which is not a square sum? Hatami and Norine [2011] constructed such a quantum
graph. In fact, the existence of such a quantum graph follows from Theorem 16.34,
stating that it is algorithmically undecidable whether a quantum graph with ra-
tional coefficients is nonnegative. To see this, consider two Turing machines, both
working on an input which is a quantum graph g with rational coefficients. We may
assume that the constituents of g have no isolated nodes. One of them will look for
a graph G with t(g, G) < 0; the other, for a representation of g as a square-sum.
If for every input one of them halts, then we know whether or not g ≥ 0. So there
must be an input g on which both Turing machines run forever; then we have g ≥ 0,
and g is not a square sum.
(To be precise, we must add that if g is a square-sum, then it is a square-sum
where the coefficients in the quantum graphs yi in the definition are algebraic real
numbers; then there are only a countable number of possibilities, and the second
Turing machine can check them in an appropriate order. One needs a method to
check that given k-labeled quantum
∑ graphs yi , . . . , yk with algebraic coefficients,
whether g is obtained from i yi2 by unlabeling and deleting isolated nodes. Such
an algorithm follows from Tarski’s Theorem on the decidability of the first order
theory of real numbers.)
But not all is lost: the following weaker result was proved by Lovász and
Szegedy [2012a].
Theorem 16.41. Let f be a quantum graph. Then f ≥ 0 if and only if for every
ε > 0 there is a square-sum g such that ∥f − g∥1 < ε.
An analogous theorem for nonnegative polynomials was proved by Lasserre
[2007].
Proof. The “if” part is trivial. The idea of the proof of
∑ the “only if” part is
the following. Consider the (unlabeled) quantum graph g = F aF F (where only a
304 16. EXTREMAL THEORY OF DENSE GRAPHS
finite number of the aF are nonzero). We may assume that no graph F with aF > 0
contains an isolated node, since removing isolated nodes does not change t(g, W ).
The condition g ≥ 0 means that h(g) ≥ 0 for every graph parameter h of the form
h = t(., W ) with W ∈ W0 . This constraint is linear, so we can(equivalently
) require
the inequality for every graph parameter of the form h = E t(., W) , where the
expectation is over some probability distribution on graphons (see Section 14.5).
By Proposition 14.60, this is equivalent to requiring that h is normalized, isolate-
indifferent∑ and reflection positive. We can forget about the normalization, since the
condition F aF h(F ) ≥ 0 is homogeneous. So the question is: Does the inequality
∑
(16.44) aF h(F ) ≥ 0
F
hold for every isolate-indifferent and reflection positive graph parameter h?
This problem can be rephrased in terms of the connection matrix X = M (h, N),
whose entries we consider as unknowns. These unknowns are not all different: if
(for this proof only) F ′ denotes the graph obtained from F by removing its isolated
nodes, then we have XF1 ,F2 = XG1 ,G2 whenever [[F1 F2 ]]′ ∼
= [[G1 G2 ]]′ . The reflection
positivity conditions mean that X ≽ 0. The question is: Do these constraints imply
the inequality
∑
(16.45) aF XF,K0 ≥ 0
F
for every F ∈ F simp (where the summation extends over all partially labeled simple
graphs F1 and F2 ). We can rewrite this as
∑ ∑
g= aF F = YF1 ,F2 [[F1 F2 ]]′
F F1 ,F2
T
Let us write Y = ZZ with some matrix Z; this takes care of the semidefiniteness
condition (remember, we are ignoring the problem that these matrices are infinite).
Then
∑ ∑ ∑ (∑ )2
g= ZF1 ,m ZF2 ,m [[F1 F2 ]]′ = [[
Zm,F F . ]]
F1 ,F2 m m F
showing that g is a square-sum.
Now we have to make this argument precise. Let Fk′ denote the set of fully
labeled graphs on [k]. Let M denote the linear space of all symmetric matrices
indexed by partially labeled simple graphs, let P be the subset of M consisting of
positive semidefinite matrices, and let L denote the subspace of matrices satisfying
XF1 ,F2 = XG1 ,G2 whenever [[F1 F2 ]]′ ∼
= [[G1 G2 ]]′ . Clearly, P is a convex cone. Let
16.6. DECIDING INEQUALITIES BETWEEN SUBGRAPH DENSITIES 305
Φk denote the operator mapping a matrix in M to its restriction to Fk′ × Fk′ (this
is a finite matrix!). Then Mk = Φk M is the space of all symmetric Fk′ × Fk′
matrices, and Pk = Φk P is the positive semidefinite cone in Mk . It is also clear
that Lk = Φk L consists of those matrices X ∈ Mk for which XF1 ,F2 = XG1 ,G2
whenever [[F1 F2 ]]′ ∼
= [[G1 G2 ]]′ . Clearly,
(16.47) Φk (P ∩ L) ⊆ Pk ∩ Lk ,
but equality may not hold in general.
'k
%k 'm,k %m %
'm
(k $k (m $m ( $
Indeed, let A be a matrix that is contained in the right hand side. Then for every
m ≥ k we have a matrix Bm ∈ Pm ∩ Lm such that A is a restriction of Bm .
Now let m → ∞; by selecting a subsequence, we may assume that all entries of
Bm tend to a limit. This limit defines a graph parameter f , which is normalized,
isolate-indifferent and flatly reflection positive. By Proposition 14.60, f is reflection
positive, and so the matrix M (f ) is in P ∩ L and Φk M (f ) = A.
We may assume that |V (F )| = k whenever aF ̸= 0. Let A ∈ Mk denote the
matrix {
aF , if F = G,
AF G =
0, otherwise.
Then g ≥ 0 means that A · Z ≥ 0∑for all Z ∈ Φk (P ∩ L) (where the inner product
A·Z of two matrices is defined as i,j Aij Zij ). In other words, A is in the dual cone
of Φk (P ∩ L). From (16.48) it follows that there are diagonal matrices Am ∈ Mk
such that Am → A and Am · Y ≥ 0 for all Y ∈ Φm,k (Pm ∩ Lm ). In other words,
Am · Φm,k Z ≥ 0 for all Z ∈ Pm ∩ Lm , which can also be written as Φ∗m,k Am · Z ≥ 0,
where Φ∗m,k : Mk → Mm is the adjoint of the linear map Φm,k : Mm → Mk .
(This adjoint acts by adding 0-s.) So Φ∗m,k Am is in the polar cone of Pm ∩ Lm ,
306 16. EXTREMAL THEORY OF DENSE GRAPHS
∗
which is Pm + L∗m . The positive semidefinite cone ∑ is self-polar. The linear space
∗
Lm consists of those matrices B ∈ Mm for which F1 ,F2 BF1 ,F2 = 0, where the
summation extends over all pairs F1 , F2 ∈ F ′ M for which F1 F2 ≃ F0 , for every
fixed graph F0 . Thus we have Φ∗m,k Am = P + L, where P is positive semidefinite
∑N
and L ∈ L∗m . Since P is positive semidefinite, we can write it as P = k=1 vk vkT ,
′
where vk ∈ RFm . We can write this as
{
∑ ∑ N
(Am )F0 ,F0 , if F1 F2 ≃ F0 ∈ Fk′ ,
vk,F1 vk,F2 =
F ,F k=0
0, otherwise.
1 2
F1 F2 ≃F0
In other words,
( )2
∑
N ∑ ∑
vk,F F = (Am )F0 ,F0 F0 .
k=1 F F0
So the quantum graph on the right side is a sum of squares. Furthermore, if m → ∞,
then Am → A and so
∑ ∑
(Am )F0 ,F0 F0 → AF0 ,F0 F0 = f.
F0 F0
is equivalent to saying that the upper Möbius inverse f ↑ (see (4.1)) is nonnega-
tive, which is easy to check. Since f is isolate-indifferent, this proves that f is
nonnegative on every quantum graph in Sk . It follows that g ∈/ Sk for any k.
But g can be approximated by members of Sk for large k. To construct such
an approximation, let Hij (1 ≤ i < j ≤ k) consist of V (G) = [k] and a single edge
connecting i to j. Expanding and unlabeling the quantum graph
2
1 ∑ 1 ∑
h2k = H1i − (k−1) Hij ,
k−1 2
2≤i≤k 2≤i<j≤k
we get
( ) 1 ( )
− + k − (k − 6) + (4k − 10) .
(k − 1)(k − 2)
The last term tends to 0 as k → ∞, so g is arbitrarily well approximated by h2k .
Razborov and his collaborators found several concrete applications of the
method of semidefinite programming in extremal graph theory. Let us mention
one. Paul Erdős [1984] conjectured in 1984 that the number of pentagons in a
triangle-free graph with a given number of nodes is maximized by blow-ups of a
pentagon. In our language, if G is a triangle-free simple graph, then
2
(16.49) t(C5 , G) ≤ t(C5 , C5 ) = .
625
In spite of its naturality and simple form, this conjecture remained unproven un-
til recently, when Hatami, Hladky, Kral, Norine and Razborov [2011] and inde-
pendently Grzesik [2012] found a proof, using flag algebras and computer-assisted
solutions to semidefinite programs.
∑
Exercise 16.43. Prove that the problem whether i ai hom(F, Gi ) ≥ 0 holds for
every simple graph G (for given simple graphs F1 , . . . , Fm and integer coefficients
a1 . . . , am ) is undecidable. [Hint: Use the result of Exercise 5.50 and Matiyase-
vich’s Theorem.]
Exercise 16.44. Use the method of Example 16.42 to show that if a quantum
graph g can be written as a square sum (unlabeled and isolates removed), then it
can be approximated up to an arbitrarily small error by square-sums of quantum
graphs with fully labeled constituents.
In this discussion we stay with simple graphs, but we consider a much more
general type of graph theoretic extremal problem:
(16.50) maximize t(f, W )
subject to t(g1 , W ) = a1
..
.
t(gk , W ) = ak
where f, g1 , . . . , gk are given simple quantum graphs. Most of the graphon versions
of extremal problems discussed so far fit this scheme. Is there a special family of
graphons such that every extremal graph problem has a solution from this family?
We define an interesting class of graphons that are all needed, and we conjecture
that they are also sufficient.
Split this integral according to which xi and which yj is the largest. Restricting the
integral to, say, the domain where x1 and y1 are the largest, we have that whenever
W (x1 , y1 ) = 1 then also W (xi , yj ) = 1 for all i and j, and hence
∫ ∫ ∫ ∫ ∏
a ∏
b
W (xi , yj ) dy dx
i=1 j=1
x1 ∈[0,1] x2 ,...,xa ≤x1 y1 ∈[0,1] y2 ,...,yb ≤y1
∫ ∫ ∫
= W (x1 , y1 )xa−1
1 y1b−1 dy1 dx1 = xa−1 y b−1 dy dx .
x1 ∈[0,1] y1 ∈[0,1] (x,y)∈SW
Since there are a choices for the largest xi and b choices for the largest yj , this
implies that
∫
(16.53) t(Ka,b , W ) = ab xa−1 y b−1 dy dx .
(x,y)∈SW
∑
2 deg(p)+4
I(V ) = cab t(Ka,b , V ),
a,b=1
16.7. WHICH GRAPHS ARE EXTREMAL? 311
∑
2 deg(p)+4
= cab t(Ka,b , U ) = I(U ) = 0,
a,b=1
where the error at the ≈ sign is bounded by Cε for some C that depends only on
a, b and p.
On the other hand, the integrand in I(V ) is nonnegative everywhere. Letting
ε → 0, we see that the x(1 − x)y(1 − y) p(x, y)2 = 0 on ∂(SW ). Since p is monotone
decreasing, this implies that ∂(SW ) = ∂(SU ), and hence W = U except perhaps on
the boundary (which is of measure 0).
16.7.3. Not too many finitely forcible graphons. Are there any graphons
that are not finitely forcible? A natural extension of the class of stepfunctions is
the class of kernels with finite rank. However, we don’t get any new finitely forcible
graphons in this class. In fact, Theorem 14.48 implies:
Corollary 16.51. Every finitely forcible kernel with finite rank is a stepfunction.
In view of Proposition 16.49, the following further corollary may be surprising:
Corollary 16.52. Assume that W ∈ W0 can be expressed as a non-constant poly-
nomial in x and y. Then W is not finitely forcible.
We want to derive more general necessary conditions for being finitely forcible.
We start with a rather strong property of finitely forcible functions. Recalling the
definition of the 2-labeled graph F ‡ from Section 16.2, let L(W ) be the linear space
generated by all 2-variable functions txy (F ‡ , W ), where F ranges over all simple
graphs.
Lemma 16.53. Suppose that W ∈ W is forced (in W) by the simple graphs
F1 , . . . , Fm . Then either the functions txy (F1‡ , W ), . . . , txy (Fm
‡
, W ) are linearly de-
pendent, or they generate L(W ) (or both).
Proof. Suppose not, then there is a simple graph Fm+1 such that the func-
tions txy (Fi‡ , W ) (i = 1, . . . , m + 1) are linearly independent. For U ∈ W, set
312 16. EXTREMAL THEORY OF DENSE GRAPHS
note that from any constituent of Fi‡ we can recover Fi by connecting its labeled
nodes, so no cancellation will occur.
As an application of Corollary 16.55, we prove:
Theorem 16.56. The set of finitely forcible graphons is of first category in
f0 , δ ).
(W
Proof. For a fixed set {F1 , . . . , Fk } of simple 2-labeled graphs with no floating
components, let T (F1 , . . . , Fk ) denote the set of graphons W for which there is a
∑k
nonzero quantum graph of the form f = i=1 ai Fi satisfying txy (f, W ) = 0 for
all x, y ∈ [0, 1]. Corollary 16.55 implies that every finitely forcible graphon belongs
to one of the sets T (F1 , . . . , Fk ), so it suffices to prove that these sets are nowhere
dense. We do so in two steps.
Claim 16.57. Let F1 , . . . , Fk be simple 2-labeled graphs with no floating compo-
nents, and let W ∈ W0 . Then every neighborhood of W contains a graphon W ′
such that txy (F1 , W ′ ), . . . , txy (Fk , W ′ ) are linearly independent.
It is easy to see that there is a simple 2-labeled graph G such that
[[GF1 ]], . . . , [[GFk ]] are mutually non-isomorphic (a large complete graph Kn•• , with
an edge incident with the node labeled 1 removed, suffices). Proposition 5.44 im-
( )k
plies that there are graphons U1 , . . . , Uk such that the matrix t([[GFi ]], Uj ) i,j=1 is
nonsingular. For 0 < ε < 1/k, define
W ε = (1 − kε)W ⊕ (ε)U1 ⊕ · · · ⊕ (ε)Uk
(so the components of W ε are W, U1 , . . . , Uk , scaled by 1 − kε, ε, . . . , ε).
First we show that W ε → W in (W f0 , δ ) if ε → 0. Indeed, for every connected
simple graph F , we have
( )
t(F, W ε ) = (1 − kε)v(F ) t(F, W ) + εv(F ) t(F, U1 ) + · · · + t(F, Uk ) ,
and hence t(F, W ε ) → t(F, W ) as ε → 0.
Next, we show that txy (F1 , W ε ), . . . , txy (Fk , W ε ) are linearly independent for
all ε > 0. If not, then there are real numbers ai such that
∑
k
ai txy (Fi , W ε ) = 0
i=1
for all x, y ∈ [0, 1]. Suppose 1 − kε + (j − 1)ε ≤ x, y ≤ 1 − kε + jε, then every choice
of the variables for which one of the unlabeled nodes has value outside the interval
[1 − kε + (j − 1)ε, 1 − kε + jε] contributes 0 to txy (Fi , W ε ). Hence
∑
k
ai εv(Fi ) txy (Fi , Uj ) = 0 (j = 1, . . . , k)
i=1
for all x, y ∈ [0, 1]. Multiplying by txy (G, Uj ) and integrating, we get
∑
k
ai εv(Fi ) t([[GFi ]], Uj ) = 0 (j = 1, . . . , k).
i=1
( )
But this contradicts the nonsingularity of the matrix t([[GFi ]], Uj ) , and proves
the claim.
314 16. EXTREMAL THEORY OF DENSE GRAPHS
Indeed, Claim 16.57 implies that every nonempty open set in (W f0 , δ ) contains
a graphon U such that txy (F1 , U ), . . . , txy (Fk , U ) are linearly independent. Then
k
their Gram determinant t([[Fi Fj ]], U )i,j=1 is positive. But this determinant is a
continuous in U , and so there is a neighborhood of U in which it does not vanish, and
hence txy (F1 , U ′ ), . . . , txy (Fk , U ′ ) are linearly independent in this neighborhood.
This proves the claim and thereby the theorem.
The following corollary shows that the (false) conjecture mentioned above that
only stepfunctions are finitely forcible is true in a weaker sense.
Corollary 16.60. Graphons that are both finitely forcible and infinitesimally
finitely forcible are exactly the stepfunctions.
This corollary implies that our examples of finitely forcible non-step-functions
(e.g., the simple threshold graphon) are finitely forcible but not infinitesimally
finitely forcible. We don’t know any examples for the converse.
Remark 16.62. 1. All the examples of finitely forcible graphons discussed above,
indeed all the examples we know of, have dimension at most 1 (in the sense of
the topology of the graphon discussed in Section 13.4). Most likely this is just
due to the lack of more involved constructions; but it is not too far fetched to
ask: Does every finitely forcible graphon have finite dimension? Together with
Proposition 13.34 this would imply that finitely forcible graphons have polynomial-
size weak regularity partitions. Together with Conjecture 16.45 and the properties
of finitely forcible graphons proved above, this would provide nontrivial “templates”
for extremal graphs, and possibly provide some help in finding the extremal graphs
for specific extremal graph problems by imposing limitations on them.
2. We have seen a number of “finiteness” conditions on a graphon W : (a) W is a
stepfunction; (b) W has finite rank; (c) W is finitely forcible; (d) W is infinitesimally
finitely forcible; (e) the graph parameter t(., W ) has finite connection rank, or
equivalently, the corresponding gluing algebras Qk /W have finite dimension; (f)
the spaces (J, rW ) and/or (J, rW ) are finite dimensional. We could add further
such conditions, like (g) the algebras Qk /W are finitely generated (this is true
not only for stepfunctions, but also for a simple threshold function, for example).
Several implications between these finiteness properties have been proved in this
book, but several others are only conjectured.
Exercise 16.63. Prove that the simple threshold graphon (Example 11.36) is
forced by the conditions (16.51) and t(P3 , W ) − t(K2 , W ) + 1/6 = 0.
Exercise 16.64. Show that for every kernel W there is a nonzero simple 2-
labeled quantum graph g with nonadjacent labeled nodes (which may have floating
components) such that txy (g, W ) = 0.
Exercise 16.65. Which implications among the finiteness conditions (a)-(g) in
Remark 16.62 are proved in this book? Which others are trivial/easy/possible?
CHAPTER 17
Limit objects can be defined for multigraphs, directed graphs, colored graphs,
hypergraphs etc. In many cases, like directed graphs without parallel edges, or
graphs with nodes colored with a fixed number of colors, this can be done along
the same lines as for simple graphs.
Turning to multigraphs, even the definition of homomorphisms is not unique,
as we have discussed in Chapter 5. In one version, a homomorphism F → G is a
map V (F ) → V (G) where the image of any edge has at least as large multiplicity
as the edge itself (node-homomorphism); in another version, to specify a homomor-
phism between multigraphs, we have to tell the image of every node as well as the
image of every edge (node-and-edge homomorphism). We also mentioned homo-
morphisms that preserve edge-multiplicities (induced homomorphisms). But this is
not the main complication. To illuminate the content of this chapter, let us discuss
informally convergence of multigraphs. We get to the most general question in sev-
eral steps. We want to define convergence of a multigraph sequence (G1 , G2 , . . . )
in terms of the convergence of the homomorphism densities t(F, Gn ) for every F ,
and want to construct a limit object that appropriately reflects the limiting values.
(1) In the previous chapters, this program was carried out in detail (maybe
even in more detail than you wished to see) in the case when the graphs Gn as well
as the graphs F were simple.
(2) Suppose that the graphs Gn are multigraphs, but we care about the densities
of simple graphs F only. In this case, node-homomorphisms mean nothing new,
but node-and-edge homomorphisms do. Let us assume for the time being that the
edge multiplicities in the graphs Gn remain uniformly bounded by a fixed constant
d. This case is quite easy, and it has been settled (even in greater generality)
by Borgs, Chayes, Lovász, Sós and Vesztergombi [2008]: the limit object can be
described by a kernel with values in [0, d], and the proofs are rather straightforward
generalizations of the proofs from case (1).
(3) Let the graphs Gn be multigraphs with bounded edge multiplicities as
before, but we want the limit object to correctly reflect densities of multigraphs
F . This case is more interesting. It turns out that whether we consider node-
homomorphisms or node-and-edge homomorphisms does not matter much (this is
not obvious at the first sight). Nor do the numerical values of the edge multiplicities:
we can think of them just as decorations of the edges from the set K = {0, 1, . . . , d},
and the only relevant property of this set is that it is finite. Here comes the first
surprise: the limit object can again be defined as a function on [0, 1]2 , but its values
are not numbers, but probability distributions on K (in other words, d-tuples of
numbers). The second surprise is that one can generalize the results to decorations
from a set K that is any compact Hausdorff space. Once the right statement of
317
318 17. MULTIGRAPHS AND DECORATED GRAPHS
the results is found, the proofs can be obtained by essentially the same techniques
as before. These results of B. Szegedy and the author will be discussed in Section
17.1.
(4) Let us backtrack and generalize in another direction: we allow unlimited
edge multiplicities for the graphs Gn , but are only interested in densities of simple
graphs F . The limit object is, not too surprisingly, an unbounded kernel. But the
treatment becomes more technical; one needs appropriate bounds on the growth
of edge multiplicities, and even then, one has to modify the definition of the cut
norm and strengthen the Regularity Lemma to get the proofs. Some preliminary
results of L. Szakács and the author [unpublished] are described in the internet
notes [Notes].
(5) Finally, if we have sequences of graphs with unbounded edge-multiplicities
and we want a limit object that correctly reflects densities of multigraphs, then
we have to combine the ideas of questions (3) and (4). Here the cases of
node-homomorphism densities and node-and-edge homomorphism densities diverge:
there will be graph sequences that are convergent in the node-homomorphism sense
but not in the node-and-edge homomorphism sense. Kolossváry and Ráth [2011]
showed how to assign limit objects if we work with node-homomorphisms; these
results can also be derived from the results mentioned in point (3) above, by com-
pactifying the set of integers. The limit object is a function defined on [0, 1]2 ,
whose values are probability distributions on N. One expects that under appro-
priate bounds on the growth of edge multiplicities, these limit objects will also be
valid for the node-and-edge homomorphism densities. However, as far as I know,
no details have been worked out here.
F G F G
(Recall that βijis a function on K while βφ(i)φ(j) is an element of K, so βij (βφ(i)φ(j) )
is well defined.) The homomorphism number hom(F, G) is defined, as earlier, by
∑
hom(F, G) = homφ (F, G),
φ: V (F )→V (G)
and we define inj(F, G), as before, by summing over injective maps. We also define
the homomorphism density by
hom(F, G)
t(F, G) = .
v(G)v(F )
These subgraph densities have some new features relative to those used so far.
First of all, they are not necessarily in [0, 1]. Second, while for simple graphs
homomorphism numbers and sample distributions were easily expressed in terms
of each other (recall Proposition 5.5), in this more general setting the situation
is different. For a fixed (large) K-decorated graph G, sampling from G assigns
probabilities to K-decorated graphs, while the homomorphism numbers into G
assign real numbers to C-decorated graphs.
We are going to characterize convergence of a graph sequence in terms of homo-
morphism numbers from C-decorated graphs. This seems to be quite wasteful, since
in the case of simple graphs, only a countable number of convergence conditions
had to be assumed. But we can restrict ourselves to graphs decorated by elements
from an appropriate subset of C. We say that a set B ⊆ C is a generating system if
the linear space generated by the elements of B is dense in C in the L∞ norm. If
C is finite dimensional, then it is the most economical to choose a basis of C for B.
It turns out that the choice of the family B has combinatorial significance, as the
following examples show.
Example 17.1 (Simple graphs). Let K be the discrete space with two elements
called “edge” and “non-edge”(or shortly) 1 and 0. The set C consists of all maps
{0, 1} → R, i.e., of all pairs f (0), f (1) of real numbers. A natural generating
subset (in fact, a basis) in C consists of the pairs f0 = (1, 1) and f1 = (0, 1).
Sampling, convergence, and homomorphism densities correspond to these notions
introduced for simple graphs.
One may, however, take another basis in C, namely the pair g0 = (0, 1) and
g1 = (1, 0). Then again B-decorated graphs can be thought of as simple graphs,
320 17. MULTIGRAPHS AND DECORATED GRAPHS
and hom(F, G) counts the number of maps that preserve both adjacency and non-
adjacency.
Example 17.2 (Colored graphs). Let K be a finite set of “colors” with the
discrete topology. Continuous functions on K can be thought of as vectors in
RK . The standard basis B in this space corresponds to elements of K, and so
B-decorated graphs are just the same as K-decorated graphs. The homomorphism
density t(F, G) is the probability that a random map V (F ) → V (G) preserves edge
colors.
Example 17.3 (Multigraphs). Let G be a multigraph with edge multiplicities at
most d. Then G can be thought of as a K-decorated graph, where K = {0, 1, . . . , d}.
However, there are several meaningful ways of picking a basis in C, giving rise
different notions of homomorphisms.
• Taking the standard basis in C = RK means that we think of the edge multi-
plicities just as different labels (colors). The graph F will be decorated with edge
multiplicities too, and then a homomorphism must preserve edge multiplicities.
This is equivalent to Example 17.2; it can be thought of as the induced version of
homomorphisms between multigraphs.
• Take the basis B = {(1, 0, 0, . . . , 0), (1, 1, 0, . . . , 0), . . . , (1, 1, 1, . . . , 1)} in C. Again,
we can think of a B-decorated graph as a multigraph with edge multiplicities at
most d. Then a map φ : V (F ) → V (G) counts as a homomorphism if and only
if the multiplicity of each target edge φ(i)φ(j) ∈ E(G) is at least as large as the
multiplicity of ij ∈ E(F ); in other words, it counts node-homomorphisms.
• Take the functions B = {1, x, . . . , xd } in C. Again, we can think of a B-decorated
graph as a multigraph with edge multiplicities at most d, where an edge decorated
by xi is represented by i parallel edges. With this choice, hom(F, G) is the number
of node-and-edge homomorphisms of F into G as multigraphs.
Example 17.4 (Weighted graphs). Let K ⊆ R be a bounded closed interval.
Let B be the collection of functions x 7→ xj for j = 0, 1, 2, . . . on K; then B is a
generating system. It is natural to consider a B-decorated graph F as a multigraph,
and then hom(F, G) is just our usual homomorphism into a weighted graph. Note
that Example 17.3 for the third choice of the basis of C is a special case.
Much of the theory of graph homomorphisms can be built up for compact
decorated graphs without much difficulty, but with some care. For example, the
relationships between hom and inj (equations (5.16) and (5.24)) can be extended
for any C-labeled graph F and K-labeled graph G:
∑
(17.1) hom(F, G) = inj(F/P, G),
P
and
∑
(17.2) inj(F, G) = µP hom(F/P, G),
P
where we have to re-define F/P for a partition P = V1 , . . . , Vq∏of V (F ): the nodes
F
are the partition classes, and an edge (Vi , Vj ) is decorated by u∈Vi , v∈Vj βuv .
(It may seem strange at the first sight that we decorate the edges of F/P by
the product of their inverse images, and not by the sum, say. Looking at Example
17.1 gives an explanation: a missing edge is decorated by the function (1, 1), an
17.1. COMPACT DECORATED GRAPHS 321
edge present, by the function (0, 1), so the product is (0, 1) if and only if at least
one edge is present.)
From the results in Part 2, we state a generalization of one result (the first
statement of Theorem 5.29), which we will need later.
Proof. The “only if” part is obvious. For the converse, suppose that
hom(F, G1 ) = hom(F, G2 ) for every B-decorated graph F . In particular, this holds
for F = K1 , which implies that v(G1 ) = v(G2 ) = n (say). Furthermore,
• hom(F, G1 ) = hom(F, G2 ) for every F with v(F ) ≤ n whose edges are deco-
rated by linear combinations of functions from B. Indeed, expanding the product in
the definition of the homomorphism number, we see that hom(F, G1 ) can be written
as a linear combination of values hom(F ′ , G1 ), where every F ′ is B-decorated. We
get a similar expression for hom(F, G2 ) in terms of the values hom(F ′ , G2 ). Since
hom(F ′ , G1 ) = hom(F ′ , G2 ) for all these graphs F ′ by hypothesis, it follows that
hom(F, G1 ) = hom(F, G2 ).
• hom(F, G1 ) = hom(F, G2 ) for every C-decorated F with v(F ) ≤ n. This
follows since linear combinations of functions in B are dense in C.
• inj(F, G1 ) = inj(F, G2 ) for every C-decorated F with v(F ) ≤ n. This follows
from (17.2).
Now let S be the set of all elements of K occurring as edge-decorations in G1 or
G2 . Let F = Kn◦ , and let us decorate the edges of F with functions fe ∈ C such that
the values fe (s) (e ∈ E(F ), s ∈ S) are algebraically independent transcendentals
(such functions clearly exist). In the equation inj(F, G1 ) = inj(F, G2 ), every term
is a product of these transcendentals, so for the equation to hold, we need that
every term on the left cancels a term on the right, and vice versa. But if the
term corresponding to an (injective) map φ : V (F ) → V (G1 ) cancels the term
corresponding to ψ : V (F ) → V (G2 ), then φ−1 ◦ ψ is an isomorphism between G1
and G2 .
1≤i<j≤k
[0,1]k
It is easy to see that for every K-decorated graph G and C-decorated graph F ,
Example 17.7. It may be worthwhile to revisit our examples from Section 17.1.1.
Simple graphs could be thought of as K-decorated graphs with K = {0, 1}.
Every probability distribution on K can be represented by a number between 0 and
1, which is the probability of being adjacent (i.e., the probability of the element 1 ∈
K). So a K-graphon is described by a symmetric measurable function W : [0, 1]2 7→
[0, 1], i.e., by a graphon.
324 17. MULTIGRAPHS AND DECORATED GRAPHS
Proof. The necessity of the conditions is easy, along the lines of the proofs of
Propositions 5.64 and 7.1.
The sufficiency takes more work, but the proof can be put together from
arguments that we have established before. Let f be a multiplicative, reflec-
tion positive multigraph parameter for which r(f, 2) is finite. Replacing f by
f (G)/(f (K1 )v(G) f (K2 )e(G) ), we may assume that f (K1 ) = f (K2 ) = 1. We es-
tablish several representations of f .
1. First, we represent f as an expectation of homomorphism-like quantities.
For every n ≥ 1, there is a distribution on [−1, 1]-weighted graphs on [n] such
17.2. MULTIGRAPHS WITH UNBOUNDED EDGE MULTIPLICITIES 325
( )
that if Hn is chosen from this distribution, then f (G) = E inj(G, Hn ) for every
multigraph G on [n]. This can be proved using Proposition A.24 in the Appendix.
2. If the weighted graph Zn on [n] is chosen from the distribution in Step 1,
then f (G) = limn→∞ t(G, Zn2 ) with probability 1 for every multigraph G. This can
be proved by modifying the proof of Lemma 11.8 appropriately (in fact, it suffices
to compute the second moments only).
3. There is a [−1, 1]-graphon W (in the sense of Section 17.1.3) such that
f = t(., W ). This can be deduced from Theorem 17.8.
4. If W = (W0 , W1 , . . . ) be the moment sequence representation of W , then
every Wn is a stepfunction with the same steps. This is proved using the assumption
that r(f, 2) is finite in a way similar to the proof of Theorem 13.47.
Knowing this, it is easy to construct the randomly weighted graph H. Its nodes
correspond to the steps S1 , . . . , Sq of the Wn , and the measures of the steps also
give their nodeweights. The random variable W (x, y) (x ∈ Si , y ∈ Sj ) gives the
decoration of the edge ij. It is not hard to check that the construction gives the
right graph parameter.
Graphings
This next part of the book treats convergence and limit objects of bounded
degree graphs. We fix a positive integer D, and consider graphs with all degrees
bounded by D. Unless explicitly said otherwise, this degree bound will be tacitly
assumed.
In this chapter we introduce infinite graphs that generalize finite bounded de-
gree graphs. Their main role will be to serve as limit objects for sequences of
bounded degree graphs, analogous to the role of graphons in the previous part.
Graphons (symmetric functions in two variables) are very common objects and of
course they have been studied for many reasons since the dawn of analysis. Graph-
ings are less common; however, they are interesting on their own right, and in fact,
they too have been studied in other contexts, mainly in connection with group
theory.
The situation will be more complex than in the dense case, and there will
be no single “true” limit object. But the connection between these objects is
quite interesting. As a further warning, it is not known whether the objects to be
discussed in this chapter are all limit objects of sequences of finite graphs. This
makes it even more justified to treat them separately from convergent finite graph
sequences.
In this part we will consider finite graphs, countably infinite graphs, and even
larger graphs (typically of continuum cardinality). To keep notation in check, we
will denote finite graphs by F, F ′ , G, G′ , G1 . . . , and families of finite graphs by
calligraphic letters. In particular, we denote by G the family of all finite graphs
(with all degrees bounded by D). We denote countable graphs by H, H ′ , H1 , . . . ,
and their families by Gothic letters. In particular, G denotes the family of all
countable graphs (with all degrees bounded by D). Graphs with larger cardinality
will be denoted by boldface letters like G, G′ , . . . ; we will not talk about families
of them.
but we will be interested in graphs with all degrees bounded by D, and will tacitly
assume this condition for these infinite graphs too.
Example 18.1. For a fixed a ∈ (0, 1), we define a graph Pa on [0, 1] by connecting
two points x and y if |x − y| = a. This defines a Borel graph. The graph structure
of this Borel graph is quite simple: it is the union of finite paths. If a > 1/2, then
it is just a matching together with isolated nodes. Of course, the Borel structure
adds additional structure.
We can make the example more interesting, if we wrap the interval [0, 1] around,
and consider the graph Ca on [0, 1) in which a node x is connected to x+a (mod 1)
and x−a (mod 1). If a is irrational, we get a graph that consists of two-way infinite
paths; if a is rational the graph will consist of cycles.
The following lemma is very useful and it also motivates some of the definitions
in the sequel.
Lemma 18.2. A graph G on a Borel space (Ω, B) is a Borel graph if and only if
for every Borel set B ∈ B, the neighborhood NG (B) is Borel.
Proof. Suppose that G is a Borel graph, and let B ∈ B. Then B ′ = E(G) ∩
(B × Ω) is a Borel set. Furthermore, if we project B ′ to the second coordinate, then
the inverse image of any point is finite. A classical theorem of Lusin [1930] implies
that the projection is also Borel; but this projection is just NG (B).
Conversely, assume that G has the property that the neighborhood of any Borel
set is also Borel. Let Pi (i = 1, 2, . . . ) range over all partitions of Ω into a finite
number of sets in J . We claim that
∩ ∪ ( )
(18.1) E(G) = J × NG (J) ;
i J∈Pi
Proposition 18.6. Let G be a bounded degree graph (of any cardinality), and
suppose that G has no automorphism. Then E(G) is closed in the local topology,
and hence G is a Borel graph with respect to the Borel space defined by the local
topology.
Proof. Let xy ∈ / E(G), and let y1 , . . . , yd be the neighbors of x. Since G has
no automorphism, there is an r ≥ 1 such that BG,r (y) ̸∼ = BG,r (y1 ), . . . , BG,r (yd ).
We claim that if u, v ∈ V (G) such that d◦ (u, x) < 2−r and d◦ (v, y) < 2−r , then
uv ∈
/ E(G).
Assume that uv ∈ E(G). By the definition of the distance function, d◦ (u, x) <
2−r implies that BG,r+1 (x) ∼ = BG,r+1 (u). Let, say, y1 correspond to v under this
isomorphism, then BG,r (y1 ) ∼= BG,r (v). But d◦ (u, x) < 2−r implies that BG,r (v) ∼ =
BG,r (y), so BG,r (y1 ) ∼ B
= G,r (y), a contradiction.
What to do if G has automorphisms? One possibility is to decorate the nodes
from some set K of “colors”, in order to break all automorphisms. A similar
construction will be described in Section 18.3.4, and here we don’t go into the
details.
Exercise 18.7. Let G be a Borel graph, and let us add all edges that connect
nodes at distance 2. Prove that the resulting graph G2 is Borel.
Exercise 18.8. Let G be a Borel graph. Prove that for every 1-labeled simple
graph F , the quantity homu (F, G) is well-defined, and it is a Borel function of
u ∈ V (G).
Exercise 18.9. Let G be a Borel graph and let Vk denote the set of nodes with
degree k. Prove that Vk is a Borel set.
Exercise 18.10. Let G be a Borel graph and let Vk denote the union of its finite
components with k nodes. Prove that Vk is a Borel set.
Exercise 18.11. Prove that every Borel graph has a maximal stable set of nodes
that is Borel.
Exercise 18.12. Prove that if a graph with bounded degree has no automor-
phism, then its cardinality is at most continuum.
“graphons” in the dense case (whose name comes from the contraction of graph-
function), I like the parallel “graphon–graphing”, and will adopt the above meaning.
Besides the probability measure λ on the points, there are two (related) mea-
sures that often play a role. The integral measure of the degree function is often
called the volume:
∫
(18.3) Vol(A) = deg(x) dλ(x).
A
The volume of the whole underlying set is the average degree:
∫
(18.4) d0 = Vol(Ω) = deg(x) dλ(x).
Ω
for product sets (A, B ∈ B). It is not hard to see that Caratheodory’s Theorem
applies and we can extend η to the sigma-algebra B × B. If we want a probability
measure on the edges, we can normalize by the average degree: the measure η/d0
can be considered as the uniform probability measure on E(G). Equation (18.2)
implies that η is invariant under interchanging the coordinates. Both marginals of
η give the volume measure.
Lemma 18.14. The measure η is concentrated on E(G).
Proof. Let J = {J1 , J2 , . . . }. We claim that
∞
∪ ( )
(18.6) E(G) = (Ω × Ω) \ Ji × (Ω \ NG (Ji )) .
i=1
It
∪ is ( clear that E(G) )is contained in the right hand side. Conversely, if (x, y) ∈ /
i Ji × (Ω \ N G (J i )) , then for each i for which Ji ∋ x, we have y ∈ NG (Ji ). So
there is a zi ∈ Ji adjacent to y. Since y has finite degree, this can hold for each Ji
only if x is (adjacent to y. This) proves (18.6).
Since η Ji × (Ω \ NG (Ji )) = 0 by the definition of η, equation (18.6) implies
that η(Ω × Ω \ E(G)) = 0.
Assuming that the average degree is positive, one way to generate a random
edge from the distribution η/d0 is to select a point x from the distribution λ∗ , and
then select an edge incident with x uniformly at random. Conversely, selecting a
random edge from the distribution η/d0 , and then selecting randomly one of its
endpoints, we get a point from the distribution λ∗ . To describe the connection
between λ and λ∗ in this language, we can generate a point from λ∗ by generating
a random point x according to λ, and keeping it with probability deg(x)/D (else,
rejecting it and generating a new one). If there are no isolated nodes, we can
generate a point from λ by generating a random point x according to λ∗ , and
keeping it with probability 1/ deg(x).
334 18. GRAPHINGS
Remark 18.18. It may be useful to allow graphs measurable with respect to the
completion of B instead of B with respect to the probability measure λ (in the case
of the interval [0, 1], this means allowing Lebesgue measurable sets instead of Borel
sets). We call a graph Lebesgue measurable if for every set A ∈ A, its neighborhood
NG (A) ∈ A. The correspondence between graphings and Lebesgue measurable
graphs is described in Exercises 18.30–18.32.
18.2.1. Verifying measure preservation. Suppose that we have a measur-
able graph G with a probability measure λ on the node set. How to verify that
this graph is measure preserving? Let us describe some methods to do so.
18.2. MEASURE PRESERVING GRAPHS 335
Edge measure. The simplest method, which often works well, is to specify a
measure η on the edge set satisfying (18.5). To be more precise, we consider a
probability space (Ω, B, λ) and a Borel graph G on it. Suppose that there exists a
finite measure η on the Borel sets in Ω × Ω, which is invariant under interchanging
the coordinates, concentrated on E, and its marginal is the volume measure Vol.
This trivially implies that (18.2) holds.
Borel subgraphs. Every subgraph of a graphing that is in a sense explicitly
definable is itself a graphing: there is no constructive way to violate (18.2). The
following lemma makes this precise.
Lemma 18.19. Let G = (Ω, B, λ, E) be a graphing, and let L ⊆ E be a symmetric
Borel set. Then G′ = (Ω, B, λ, L) is a graphing.
Proof. Let A, B ⊆ Ω be Borel sets. We want to show that
∫ ∫
′ ′
(18.7) dG
B (x) dλ(x) = dG
A (x) dλ(x).
A B
[ ]
First we prove that this equation holds when L = E ∩ (S × T ) ∪ (T × S) with
two disjoint Borel sets S, T . Indeed, for any two Borel sets A and B,
∫ ∫ ∫
′
dG
B (x) dλ(x) = d G
B∩T (x) dλ(x) + dG
B∩S (x) dλ(x)
A A∩S A∩T
∫ ∫ ∫
′
= dG
A∩S (x) dλ(x) + dG
A∩T (x) dλ(x) = dG
A (x) dλ(x).
B∩T B∩S B
A similar computation shows that (18.7) holds if L = E ∩ (S × S) for any Borel set
S.
To prove the lemma in general, we use induction on the degree bound D. For
D = 1 the assertion is trivial.
Let ε > 0, let J = {J1 , J2 , . . . } be a countable generator set of B, and let Pn
be a partition of V (G) into the atoms generated by J1 , . . . , Jn . Let Xn be the set
of points with degree D all whose neighbors belong to the same class of Pn . Since
any two points are separated by Pn if n is large enough, we have ∩n Xn = ∅, and
hence λ(Xn ) ≤ ε if n is large enough. Let us fix such an n.
For S ∈ Pn , let S ′ = S \ Xn . For every S, T ∈ Pn , the graph G(S, T ) obtained
by restricting the edge set of G to (S ′ × T ′ ) ∪ (T ′ × S ′ ) is measure preserving by
the special case proved above. In G(S, T ), each point has degree at most D − 1,
by the definition of S ′ and T ′ . Hence by the induction hypothesis, restricting the
edge set to E(G(S, T )) ∩ L we get a graphing G′ (S, T ), which means that
∫ ∫
G′ (S,T ) G′ (S,T )
dB dλ(x) = dA (x) dλ(x).
A B
Since the graphings G (S, T ) are edge-disjoint, it follows that G1 = ∪S,T G′ (S, T )
′
Here
∫ ∫ ∫
′
dG
B∩Xn (x) dλ(x) ≤ dG
B∩Xn (x) dλ(x) = A (x) dλ(x) ≤ Dλ(Xn ) ≤ Dε,
dG
A A B∩Xn
and ∫
′
B\Xn (x) dλ(x) ≤ Dλ(Xn ) ≤ Dε.
dG
A∩Xn
Hence
∫ ∫ ∫ ∫
dG ′
(x) dλ(x) − d G′
(x) dλ(x) ≤ dG1
(x) dλ(x) − dG1
(x) dλ(x) + 2Dε
B A B A
A B A B
= 2Dε.
Since ε was arbitrarily small, this proves (18.7).
Corollary 18.20. The intersection and union of two graphings on the same prob-
ability space are graphings.
Proof. Let G1 and G2 be the two graphings, then consider G1 ∩ G2 (we keep
the underlying point set and do the set operation on the edge set). This is a Borel
subgraph of G1 , and hence it is a graphing.
The assertion about the union is trivial if the graphings are edge-disjoint. In
the general case, consider the graphs G1 \ G2 , G2 \ G1 and G1 ∩ G2 . These three
graphs are Borel subgraphs of one of the graphings G1 and G2 , and hence they are
graphings. But then so is their union, which is just G1 ∪ G2 .
known whether (for an appropriately rich family of translations) it has a Borel per-
fect matching. (Exercise 18.29 shows that a graphing can have a perfect matching,
but no Borel perfect matching.)
Exercise 18.24. Let G be a graphing, and let us add all edges that connect
nodes at distance 2. Prove that the resulting graph G2 is a graphing.
Exercise 18.25. Let G be a graphing in which every connected component has at
most k nodes. Let S ⊆ V (G) be a measurable set that intersects every connected
component. Prove that λ(S) ≥ 1/k.
Exercise 18.26. Let G be a graphing on [0, 1], let E ′ ⊆ E(G) be a symmetric
Borel set, and E ′′ = E(G)\E ′ . Consider the graphings G′ and G′′ on [0, 1] defined
by the edge sets E ′ and E ′′ (cf. Lemma 18.19). Prove that ηG = ηG′ + ηG′′ .
Exercise 18.27. Let G be a graphing, and let S ⊆ E(G) be a (not necessarily
symmetric) Borel set. For x ∈ V (G), let d+ S (x) denote the number of pairs
−
(x,
∫ y) ∈ S, and
∫ let dS (x) denote the number of pairs (y, x) ∈ S. Prove that
d+ dλ = V (G) d−
V (G) S S dλ.
18.3.1. The graph of graphs. Let G• denote the set of connected countable
graphs (with all degrees bounded by D) that also have a specified node called their
root. We denote the root of a graph H ∈ G• by root(H). (We could consider these
graphs as 1-labeled, but in this context calling the single labeled node the “root” is
common.) Sometimes we will write we also write H = (H ′ , v), where v = root(H),
and H ′ = [[H]] is the unrooted graph underlying H. For every rooted graph H,
we denote by deg(H) the degree of its root. The set of finite graphs in G• will be
denoted by G • .
We consider two graphs in G• the same if there is an isomorphism between
them that preserves the root. Let Br ⊆ G • denote the set of r-balls, i.e., the set
of finite rooted graphs in which every node is at a distance at most r from the
root. (Since we keep the degree bound D fixed, the set Br is finite.) For a rooted
countable graph (H, v) ∈ G• , let BH,r = BH,r (v) ∈ Br denote the neighborhood of
the root with radius r. For every r-ball F , let G•F denote the set of “extensions”
of F , i.e., the set of those graphs H ∈ G• for which BH,r ∼ = F (as rooted graphs).
With all this notation, we can define something more interesting. First we
define a graph H on the set G• . Let (H, v) ∈ G• . For every edge e = vv ′ ∈ E(H),
connect (H, v) by an edge to the rooted graph (H, v ′ ) ∈ G• . So every edge of H
incident with v gives rise to an edge of H incident with (H, v). In particular, all
degrees in H are bounded by D. We call H the “Graph of Graphs”.
The r-neighborhood of a rooted graph H in H is almost the same as the r-
neighborhood of the root in H. To be precise, if [[H]] has no automorphism, then
BH,r (H) ∼ = BH,r (root(H)). The image of v ∈ V (H) under this isomorphism is
obtained by moving the root of H to v. However, if there is an automorphism of
H moving root(H) to v, then the “curse of symmetry” strikes again, and this map
is not one-to-one.
We endow the set G• with a metric: For two graphs H1 , H2 ∈ G• , define their
ball distance by
d• (H1 , H2 ) = inf{2−r : BH1 ,r ∼
= BH2 ,r }.
(This is reminiscent of the semimetric defining the local topology of a graph, but
it is defined on a different set.) This turns G• into a metric space. It is easy to see
(Exercise 18.43) that the sets G•F are both closed and open, they form an open basis,
and the space (G• , d• ) is compact and totally disconnected. The sigma-algebra of
Borel sets of (G• , d• ) will be denoted by A.
As usual, every subset of G• you will ever need, and every function G• → R
you will ever define, will be Borel. The graph H is Borel with respect to the
sigma-algebra A. This follows by the same argument as Proposition 18.6.
18.3.2. Invariant measures. You may have noticed that we have not defined
any measure on the set G• . We will in fact consider many probability measures on
it; these measures will carry the real information. Let σ be any probability measure
on (G• , A). It is easy to see that the degree deg of the root is a measurable function
on G. Nodes with different degrees cause some complication here, and it will be
best to introduce right away another probability measure on G• : we define
∫ /∫
σ ∗ (A) = deg dσ deg dσ.
A G•
340 18. GRAPHINGS
Clearly these integrals are finite (at most D). If the denominator is 0, then σ is
concentrated on the graph consisting of a single node (the only connected graph
with average degree 0). In this trivial case, we set σ ∗ = σ.
Next, we introduce a very important condition on the distribution, which ex-
presses that all possible roots of a graph are taken into account judiciously. (The
meaning of this condition will be clearer when we get to limits of graph sequences.)
Select a rooted graph H according to the distribution σ ∗ and then select a uniform
random edge e from the root. We consider e as oriented away from the root. This
way we get a probability distribution σ → on the set G→ of graphs in G• with an
oriented edge (the “root edge”) from the root also specified. We say that σ is involu-
tion invariant (another name commonly used is unimodular) if the map G→ → G→
obtained by reversing the orientation of the root edge is measure preserving with
respect to σ → . By an involution invariant random graph we mean a random rooted
connected graph drawn from an involution invariant probability measure on G• .
Example 18.33. Let G ∈ G be a connected finite graph. Selecting a root from
V (G) uniformly at random defines a probability distribution σG on G• (concen-
trated on rooted copies of G). If we select the root v with probability proportional
to the degree of v, and a root edge e incident with v uniformly, then simple compu-
tation shows that the edge is uniformly distributed among all oriented edges, and
so the distribution σG is involution invariant.
Example 18.34 (Path). Let P denote the two-way infinite path with any node
chosen as a root. The distribution on G• concentrated on P is involution invariant,
since selecting any root edge we still get a distribution concentrated on a single
graph, so reversing the edge preserves this distribution.
Example 18.35 (Triangular Ribbon). Let P be the 2-way infinite path and let
R be the “ribbon” obtained from P by connecting every pair of nodes at distance
2 (Figure 18.2(a)). If we specify any node as its root, we get a connected countable
4-regular rooted graph R• . The distribution on G• (where D = 4) concentrated on
R• is involution invariant. To see this, note that if we select an oriented edge as a
root, we get only two edge-rooted graphs H ′ and H ′′ (up to isomorphism): either
an edge of P is selected, or an edge not on P . Furthermore, reversing the edge
yields an isomorphic edge-rooted graph, so the distribution on {H ′ , H ′′ } remains
involution invariant.
(a) (b)
x ∈ [0, 1]. (Let us ignore the ambiguity that one can write rational numbers whose
denominator is a power of two in two different ways; this involves a set of measure 0
anyway.) Similarly, the sequence of colors to the left of the root (this time excluding
the root) gives a number y ∈ [0, 1]. So every point of the unit square corresponds to
a 2-colored 2-way infinite path (with a root), and this correspondence is bijective.
To shift the root to the right by one step corresponds to replacing x by 2x (mod 1)
and y by y/2 if x < 1/2 and by y/2+1/2 if x ≥ 1/2. (This map, as a transformation
of {0, 1}Z , is called a Bernoulli shift. In its other incarnation as a transformation
of the unit square, it is sometimes called the dough folding map.) The graphing
will be defined on [0, 1]2 (with the Lebesgue measure), and every point (x, y) will
be connected to its image and to its inverse image under the dough folding map.
For D > 2 a geometric construction of a representing graphing is more com-
plicated. We can start from the fact that a D-regular tree is the Cayley graph
of a group freely generated by D involutions. This group can be represented, for
example, by reflections in D generic hyperplanes through the origin in D-space.
If we take the surface of the unit sphere in RD with the uniform probability dis-
tribution, and connect every point to its images and inverse images under these
reflections, we get a graphing representing the infinite D-regular tree. (Points on
the D hyperplanes in which we reflect will have lower degree than D; but we can
delete these points and all their images under the group, which is a set of measure
0, and then we get a graphing in which every connected component is a D-regular
tree.)
Proof. The proof is essentially the same as the proof of Lemma 18.40, since
assigning the random weights to the nodes of a graph G chosen from σ almost
surely destroys all automorphisms.
Exercise 18.42. Prove that for r > 2, an r-ball has at most Dr nodes, and the
r
number of non-isomorphic r-balls is bounded by DD .
Exercise 18.43. Prove that the sets G•F are closed and open in the metric space
(G• , d• ), they form an open basis, and the space is homeomorphic to a Cantor
set.
Exercise 18.44. Show that a function f : G• → R is continuous if and only if
for every finite rooted graph F ∈ Br there is an ε > 0 such that for all graphs
H ∈ G• with BH,r ∼ = F , we have |f (G) − f (F )| < ε.
344 18. GRAPHINGS
18.4.1. Mass Transport Principle. Let us consider the set G•• of 2-labeled
connected countable graphs (again, graphs that are isomorphic as 2-labeled graphs
are identified). We can endow this set with a compact topology just like we did for
G• , and then Borel functions are defined.
The following very useful characterization of involution invariance was proved
by Aldous and Lyons [2007] (it was in fact this form how Benjamini and Schramm
first defined involution-invariant measures).
Proposition 18.48 (Mass Transport Principle). Let σ be a probability distri-
bution on G• . Then σ is involution invariant if and only if for every Borel function
f : G•• → R+ the following identity holds:
(∑ ) (∑ )
(18.8) E f (H, v, u) = E f (H, u, v) ,
u u
One can formulate a related identity for graphings, which shows that the Mass
Transport principle is in a sense a form of Fubini’s Theorem. To illustrate that we
∑ f : S ×S → R∑
can vary the conditions, let us say that a function (where S is a set
of any cardinality) is locally finite, if the sums x∈S f (x, y) and y∈S f (x, y) are
absolutely convergent (this includes that they have a countable number of nonzero
terms).
Proposition 18.49. Let G be a graphing, and let f : V (G) × V (G) → R be a
locally finite Borel function. Assume that f (x, y) = 0 unless y ∈ V (Gx ). Then
∫ ∑ ∫ ∑
f (x, y) dx = f (x, y) dy
y x
V (G) V (G)
If f is the indicator function of edges between two Borel sets A and B, then this
identity gives the basic measure preservation identity 18.2. The Mass Transport
Principle can be used to prove properties of “typical” graphs from an involution
invariant distribution; see Exercises 18.52 and 18.53.
We describe the proof of the graphing version; Proposition 18.48 can be proved
along the same lines.
Proof. It suffices to prove this identity for nonnegative Borel functions, since
we can write a general f as the difference of two such functions, which will also be
locally finite. It suffices to prove it for bounded Borel functions, since we can obtain
an unbounded nonnegative f as the limit of an increasing sequence of bounded Borel
functions. By scaling, we may assume that the range of f is contained in [0, 1]. We
may assume that there is an r ∈ N such that f (x, y) = 0 unless y ∈ BG,r (x), since
we can obtain f as the limit of an increasing sequence of such functions. Finally,
it suffices to consider 0-1 valued Borel functions, since we can write f as
∫ 1
f (x, y) = 1(f (x, y) ≥ t) dt,
0
and here the function 1(f (x, y) ≥ t) is a 0-1 valued Borel function for every t.
A 0-1 valued Borel function corresponds to a Borel subset S ⊆ V (G) × V (G).
Consider the graphing Gr obtained from G by connecting any two nodes at distance
at most r. (This is indeed a graphing by Exercise∫ 18.7.) The
∫ set S is a Borel subset
of E(Gr ), and hence by Exercise 18.27 we have d+ S dλ = d−
S dλ. But this is just
the identity to be proved.
18.4.2. Homomorphism frequencies. Recall that in a finite graph G,
t∗ (F, G) can be interpreted as the expectation of homu (F ′ , G), where F ′ is ob-
tained from F by labeling one of its nodes, and u is a random node of G. This
can be generalized to homomorphisms into an involution-invariant random graph.
Indeed, let σ be an involution-invariant distribution, and let (H, v) denote a ran-
dom rooted graph from σ. Then homv (F ′ , H) is a bounded nonnegative integer,
and since it depends only on a bounded ( neighborhood
) of the root v, it is a Borel
function of (H, v). So t∗ (F ′ , σ) = E homv (F ′ , H) is well defined. Based on the
finite case, we expect that t∗ (F ′ , σ) is independent of the node labeled in F , and
so we can define t∗ (F, σ) = t∗ (F ′ , σ). This is correct, but not obvious.
Proposition 18.50. Let F ′ and F ′′ be two 1-labeled graphs obtained from the
same unlabeled connected graph F . Let σ be an involution-invariant distribution,
then t∗ (F ′ , σ) = t∗ (F ′′ , σ).
346 18. GRAPHINGS
Proof. Let F ∗ be the 2-labeled graph obtained by labeling both nodes that are
labeled in F ′ or F ′′ . Then
∑ for every rooted graph (H, u) generated
∑ according to σ,
we have homu (F ′ , H) = v homuv (F ∗ , H) and homu (F ′′ , H) = v homvu (F ∗ , H).
Applying the Mass Transport Principle to the function f (H, u, v) = homuv (F ∗ , H),
we get the assertion.
( )
be the node with φi αi (v) = v. (Note that φ−1 i (v) is not necessarily a singleton,
but exactly one of( its elements
) belongs to H i .)
Let α(v) = α1 (v), α2 (v) ∈ Ω, then α(u) and α(v) are adjacent in G if and
only if u and v are adjacent in H0 , by the definition of the product graph. So α
is an embedding of H0 into H as an induced subgraph. We want to argue that
α(H0 ) = H. Indeed, if not, then there (is a node ) α(u) that is connected by an
edge of H to a node (w1 , w2 ) ∈ V (H) \ α V (H0 ) . Now α1 (u) is connected to w1
in H1 by the definition of the product graph, and hence z = φ1 (w1 ) is connected
to u in H0 , and so it is a node in H0 . Similarly, φ2 (w2 ) is in H0 . Furthermore,
(w1 , w2 ) ∈ V (H) ⊆ Ω, and hence φ1 (w1 ) = φ2 (w2 ) = z. But then (w1 , w2 ) = α(z),
a contradiction.
It follows that with probability 1, G(x1 ,x2 ) ∼ = (Gi )xi , and ψi provides this
isomorphism, which proves that ψi is a local isomorphism. Thus G1 and G2 are
bi-locally isomorphic.
Corollary 18.57. Bi-local isomorphism is a transitive relation.
Proof. Figure 18.3 tells the whole story: composing two bi-local isomorphisms,
the middle part can be “flipped up” by Lemma 18.56 to get a single bi-local iso-
morphism.
Lemma 18.58. The maps φ : V (G+ ) → V (G) and ψ : V (G+ ) → V (Bσ ) defined
above are local isomorphisms.
Now we are able to prove the main result in this section.
Theorem 18.59. Two graphings are locally equivalent if and only if they are bi-
locally isomorphic.
Proof. The “if” part is trivial by the discussion above. To prove the “only if”
part, let G1 and G2 be two locally equivalent graphings, we want to prove that they
are bi-locally isomorphic. They define the same involution invariant distribution
σ on G, and so they are both locally equivalent to the Bernoulli graphing Bσ .
Lemma 18.58 implies that they are both bi-locally isomorphic to Bσ . Corollary
18.57 implies that they are bi-locally isomorphic.
Convergence of a graph sequence with bounded degree was perhaps the first
which was formally defined (Benjamini and Schramm [2001]), but it is a more
complex notion than convergence in the dense case. There are more than one non-
equivalent reasonable definitions, which capture different aspects of the notion that
graphs in a sequence are becoming “more and more similar” to each other. We
treat two such notions in this Chapter.
dvar (ρG,r , φr ) ≤ ε/6 and dvar (ρG′ ,r , φ′r ) ≤ ε/6 with high probability. We claim
∑k
that A = r=0 2−r dvar (φr , φ′r ) is a good estimate of δ⊙ (G, G′ ). Indeed, with high
probability,
∑k
1 ∑k
1
|δ⊙ (G, G′ ) − A| ≤ r
d var (ρG,r , φ r ) + d (ρ ′ , φ′r )
r var G ,r
r=0
2 r=0
2
∞
∑ 1
+ dvar (ρG,r , ρG′ ,r ).
2r
r=k+1
Here the first term is bounded by (1+1/2+· · ·+1/2k )ε/6 < ε/3, and similar bound
applies for the second term. The last term is bounded by 1/2k+1 + 1/2k+2 + · · · =
1/2k ≤ ε/3. So the total error is less than ε.
Often we have to compare two graphings G1 , G2 that are defined on the same
Borel graph G, and only differ in the invariant distributions π1 , π2 on them. In this
case the sampling distance can be bounded by the variational distance of π1 and
π2 ; it is easy to see that for every r ≥ 1, we have
(19.4) r
δ⊙ (G1 , G2 ) ≤ dvar (π1 , π2 ), and δ⊙ (G1 , G2 ) ≤ dvar (π1 , π2 ).
We will also need the edit distance of graphs/graphings on the same node set.
For two graphs G, G′ ∈ G with V (G) = V (G′ ) = [n], this is defined as
1
d1 (G, G′ ) = |E(G)△E(G′ )|.
n
The difference from the dense case is in the normalization. (We will not need the
“best overlay” version δ1 .) To extend the edit distance to two graphings G, G′ with
V (G) = V (G′ ) = [0, 1], there is a little subtlety. To “count” the edges to be edited,
we use the edge measure defined by (18.5); but these two graphings have different
edge measures, which edge measure to use? After a little thought, the solution is
natural:
d1 (G, G′ ) = ηG (E(G) \ E(G′ )) + ηG′ (E(G′ ) \ E(G)).
We note that ηG (E(G) ∩ E(G′ )) = ηG′ (E(G′ ) ∩ E(G)) (Exercise 18.26).
An easy inequality between the edit distance and sampling distances is stated
in the following proposition.
Proposition 19.1. For any two graphings G and G′ on the same underlying prob-
ability space and r ∈ N, we have
r
δ⊙ (G, G′ ) ≤ 2Dr d1 (G, G′ ),
and
δ⊙ (G, G′ ) ≤ 3d1 (G, G′ )1/ log(2D) .
In particular, these bounds hold for finite graphs.
Proof. Let S = E(G) \ E(G′ ) and S ′ = E(G′ ) \ E(G).
Claim 19.2. Let x be a random point in G, then the probability that the ball
BG,r (x) contains any edge in S is bounded by 2Dr ηG (S).
The number of points x for which BG,r (x) contains a given edge is bounded by
2Dr (this follows by an elementary computation). In the finite case, this implies
the Claim by an easy double counting. For graphings, this double counting can be
justified using the Mass Transport Principle for graphings, Proposition 18.49. For
19.1. LOCAL CONVERGENCE AND LIMIT 353
( )
two nodes of G, let f (x, y) = degS (y)1 x ∈ BG,r (y) (where degS (y) denotes the
number of edges in S incident with y). Let x be a random point of V (G). Then
{ ( ) } ( ) ( ∑ )
λ x : E BG,r (x) ∩ S ̸= 0 ≤ E |BG,r (x) ∩ S| ≤ E degS (y)
y∈BG,r (x)
(∑ ) (∑ ) ( )
=E f (x, y) = E f (y, x) ≤ 2Dr E degS (x) = 2Dr η(S).
y y
nodes at distance more than r − 1 from the root u; two, we delete all the nodes at
distance more than r −1 from v, and consider v the root and vu the root edge. If we
get the same distribution on (r − 1)-balls with a root edge with both construction,
and this holds for every r ≥ 1, we say that the sequence (σ1 , σ2 , . . . ) is involution
invariant. To sum up, every convergent graph sequence gives rise to an involution
invariant and consistent probability measure on Br .
We have defined involution invariance for measures on the “graph of graphs”,
and of course the two notions are closely related. From every probability distribu-
tion σ on (G• , A), we get a probability distribution σr on Br by selecting a random
countable graph from σ and taking the r-ball about its root. It is trivial that this
sequence (σ1 , σ2 , . . . ) is consistent.
Conversely, from every consistent sequence (σ1 , σ2 , . . . ) we get a distribution σ
on (G• , A), by defining σ(G•F ) = σr (F ) for every r-ball F . It is also straightforward
to check that (σ1 , σ2 , . . . ) is involution invariant if and only if σ is.
So there is a bijective correspondence between consistent involution invariant
sequences (σ1 , σ2 , . . . ), where σr is a distribution on Br , and involution invariant
probability distributions on (G• , A). Through this correspondence, every locally
convergent graph sequence gives rise to an involution invariant distribution σ on
the sigma-algebra (G• , A). This is the Benjamini–Schramm limit or local limit of
the sequence.
By Theorem 18.37, it follows that there is a graphing G such that ρGn ,r → ρG,r
for every r ≥ 1. We write Gn → G, and say that this graphing “represents” the
limit; but one should be careful not to call it “the” limit; all locally equivalent
graphings represent the same limit object.
Example 19.3 (Cycles III). Consider the sequence of cycles (Cn ). It is easy
to see that the Benjamini–Schramm limit is the involution invariant distribution
concentrated on the two-way infinite path (with any node specified as the root).
The graphing Ca constructed in Example 18.1 represents the limit of this sequence
for any irrational number a. All connected components of this graphing are two-way
infinite paths, so generating a random point x ∈ [0, 1], its connected component
(Ca )x has the Benjamini-Schramm limit distribution.
Every graphing locally equivalent to Ca (i.e., in which almost all connected
components are two-way infinite paths) provides a representation of the limit object.
Example 18.54 shows two different graphings representing this limit.
Example 19.4 (Grids). Let Gn be the n×n grid in the plane. The r-neighborhood
of a node v is a (2r + 1) × (2r + 1) grid (rooted in the middle), provided v is
farther than r − 1 from the boundary. This holds for (n − 2r)2 of the nodes, which
means almost all nodes if n → ∞. So in the weak limit, every r-neighborhood is
a (2r + 1) × (2r + 1) grid. Hence the Benjamini–Schramm limit of this sequence
is concentrated on the infinite square grid (with a root). We have seen (Example
18.38) how to represent this involution invariant distribution as a graphing.
Example 19.5 (Penrose tilings). This is a more elaborate example, but interest-
ing in many respects. We can tile the plane with the two rhomboids of the left side
of Figure 19.1. This is no big deal, if we can use them periodically (for example,
as in the middle of Figure 19.1); but we put decorations on the edges, and impose
the restriction that these decorations must match along every common edge (as on
the right side of Figure 19.1); in particular, we are not allowed to combine two of
19.1. LOCAL CONVERGENCE AND LIMIT 355
the same kind into a single parallelogram. It turns out that you can tile the whole
plane this way (in fact, in continuum many ways), but there is no periodic tiling.
Figure 19.2 shows the graph obtained from a Penrose tiling of the plane. There
is a related (in fact, equivalent) version, in which we use two deltoids instead of
two rhomboids; such a tiling is also shown in Figure 19.2. A deltoid tiling can be
obtained from a rhomboid tiling by cutting up the rhomboids into a few pieces and
recombining these to form deltoids. (To figure out the details is left as a challenge
to the reader.)
One of the interesting (and nontrivial) features of such tilings is that every
one of them contains each of the two rhomboids with the same frequency. Similar
property holds for every configuration of rhomboids: if a finite configuration F of
tiles can be completed to a tiling at all, then this configuration occurs in every
Penrose tiling with the same frequency. To be precise, if we take a K × K square
about the origin in the plane, and count how many copies of F it contains, then
this number, divided by K 2 , tends to a limit if K → ∞. Moreover, this limit is
independent of the Penrose tiling that we are studying.
We are not going to dive into the fascinating theory of Penrose tilings, but
point out that their basic properties can be translated into graph limits. Let Gn
be the graph obtained by restricting the graph of a Penrose rhomboid tiling to the
n×n square about the origin. The above properties of the Penrose tiling imply that
this sequence is convergent, and in fact it remains convergent if we interlace it with
a sequence obtained from a different Penrose tiling. In other words, these finite
pieces of any Penrose tiling converge to the same limit. The Benjamini–Schramm
limit will be not the original Penrose tiling, but a probability distribution on all
Penrose tilings. (This illuminates that in Example 19.4 of grids we end up with a
single limiting grid only because grids are periodic.)
356 19. CONVERGENCE OF BOUNDED DEGREE GRAPHS
By computations similar to the above, we can see that random D-regular bipartite
graphs tend to the same local limit as random D-regular graphs, the D-regular
rooted tree.
19.1.3. Which distributions are limits? A big difference from the dense
case is that there is no easy way to construct a sequence of finite graphs that
converges to a given graphing (or involution invariant distribution). In fact, we
don’t know whether all involution invariant distributions arise as limit objects:
Conjecture 19.8 (Aldous–Lyons [2007]). Every involution invariant distribution
on (G• , A) is the limit of a locally convergent bounded-degree graph sequence.
Since every involution invariant distribution can be represented by a graphing
(Theorem 18.37), this is equivalent to asking whether every graphing is the local
limit of a locally convergent sequence of bounded-degree graphs. This conjecture,
which is a central unsolved problem in the limit theory of bounded-degree graphs,
generalizes a long-standing open problem about sofic groups. It is known in some
special cases: when the distribution is concentrated on trees (Bowen [2004], Elek
[2010b]; see Exercise 19.12), and also when the graphing is “hyperfinite” (to be
discussed in Section 21.1).
The following is an interesting reformulation of this conjecture. Let Ar ⊆ RBr
denote the set of all probability distributions ρG,r , where G ranges through all finite
graphs. Let A′r ⊆ RBr denote the set of probability distributions ρG,r , where G
ranges through all graphings. Equivalently, A′r consists of probability distributions
on Br induced by an involution invariant probability distribution on G• . Clearly
Ar ⊆ A′r .
Proposition 19.9. (a) The closure Ar of Ar is a compact convex set. (a) A′r is a
compact convex set.
While most of the time the limit theory of graphs with bounded degree is more
complicated than the dense theory, Proposition 19.9 represents an opposite case: in
the dense case, even the set D2,3 discussed in Section 16.3.2 was non-convex with
a complicated structure.
Proof. (a) Let G1 and G2 be two finite graphs, and consider the graph G =
v(G ) v(G )
G1 2 G2 1 consisting of v(G2 ) copies of G1 and v(G1 ) copies of G2 . Then
1( )
ρG,r (B) = ρG1 ,r (B) + ρG2 ,r (B)
2
for every r-ball B. This implies that Ar is convex. Since it is a bounded closed set
in a finite dimensional space, it is compact.
(b) The fact that A′r is closed follows from general considerations: the set M of
involution-invariant measures, as a subset of the set of all probability measures on
the compact metric space G• , is closed in the weak topology, and so it is compact.
Using that each of the cylinders G•F is open-closed, the projection of M onto RBr is
continuous, and hence the image, which is just A′r , is compact. The convexity of
A′r follows by a construction similar to that in (a).
The Aldous–Lyons Conjecture is equivalent to saying that Ar = A′r for every r.
So if the conjecture fails to hold, then there is an r ∈ N and a linear inequality on
RBr that is valid for Ar but not for A′r . This would be a linear inequality between
358 19. CONVERGENCE OF BOUNDED DEGREE GRAPHS
r-neighborhood densities that holds for every finite graph, but fails to hold for all
graphings, a “positive” consequence of a “negative” fact.
There is a finite version of the Aldous–Lyons conjecture, which was raised by
this author at a conference, and was proved, at least in a non-effective sense, quickly
by Alon [unpublished]:
Proposition 19.10. For every ε > 0 there is a positive integer n such that for
every graph G ∈ G there is a graph G′ ∈ G such that v(G′ ) ≤ n and δ⊙ (G, G′ ) ≤ ε.
Proof. Let r = ⌈log(2/ε)⌉, and let G1 , . . . , Gm be any maximal family of
graphs in G such that δ⊙ r
(Gi , Gj ) > ε/2 for all 1 ≤ i < j ≤ m. Such a family is
finite, since every graph is represented by a point in Ar , which is a bounded set
in a finite dimensional space, and these points are at least ε/2 apart in the total
variation distance. It follows that n = maxi v(Gi ) is finite. By the maximality of
the family, for every graph G there is an i ≤ m such that δ⊙ r
(G, Gi ) ≤ ε/2. We
have v(Gi ) ≤ n, and by (19.3)
1 1 ε
δ⊙ (G, Gi ) ≤ + dr⊙ (G, Gi ) ≤ r + ≤ ε.
2r 2 2
Unfortunately, no effective bound on n follows from the proof (one can easily
get an explicit bound on m, the number of graphs in the representative family, but
not on the size of these graphs). It would be very interesting to give any explicit
bound (as a function of D and ε), or to give an algorithm to construct H from
G. Ideally, one would like to design an algorithm that would work locally, in the
sampling framework, similarly to the algorithm in Section 15.4.2 in the dense case.
Proposition 19.10 is related to the Aldous–Lyons Conjecture 19.8. Indeed, the
Aldous–Lyons Conjecture implies that for any graphing G there is a finite graph
G whose neighborhood distribution is arbitrarily close; Proposition 19.10 says that
for any finite graph G there is a finite graph H of bounded size whose neighborhood
distribution is arbitrarily close. Suppose that we have a constructive way of finding,
for an arbitrarily large graph G with bounded degree, a graph H of size bounded
by a function of r and ε that approximates the distribution of r-neighborhoods in
G with error ε. With luck, the same construction could also work with a graphing
in place of G, proving the Aldous–Lyons Conjecture.
One route to disproving the Aldous–Lyons Conjecture could be to explicitly find
the sets Ar and A′r for some r, and see that they are disjoint. Since the dimension
of Ar grows very fast with r, it seems useful to consider even simpler questions.
Instead of looking at Ar and A′r , we could fix a finite set {F1 , . . . , Fm } of simple
graphs, assign the vector (t∗ (F1 , G), . . . , t∗ (Fm , G)) to every graph G ∈ G, and
consider the set T (F1 , . . . , Fm ) of all such vectors. We define the set T ′ (F1 , . . . , Fm )
analogously, replacing graphs by graphings. By the same argument as above, the
sets T (F1 , . . . , Fm ) and T (F1 , . . . , Fm ) are convex. The Aldous–Lyons Conjecture
is equivalent to saying that T (F1 , . . . , Fm ) = T (F1 , . . . , Fm ) for every F1 , . . . , Fm .
This leads us to the problem, very interesting on its own right, to determine the
sets T (F1 , . . . , Fm ) and T (F1 , . . . , Fm ), and more generally, to extremal problems
for bounded degree graphs. This should be the title of a chapter, but very little
has been done in this direction. There are, of course, many results in extremal
graph theory that concern graphs with bounded degree; but the limit theory of
bounded degree graphs has not been applied to extremal graph theory in a sense
in which the limit theory of dense graphs has been. One notable exception is the
19.1. LOCAL CONVERGENCE AND LIMIT 359
result of Harangi [2012], who determined the sets T (K3 , K4 ) and T (K3 , K4 ) for
D-regular graphs. He found the same answer in both cases (so this did not give a
counterexample to the conjecture).
Exercise 19.13. Prove that merging two node colors or two edge colors, every
convergent colored graph sequence remains convergent.
This example suggests that in the limit object, the underlying σ-algebra carries
combinatorial information. This is in stark contrast with the dense case (cf. Remark
10.1 and the discussion in that section).
In this section we define a notion of convergence for graphs with bounded
degree that is stronger than the local convergence (Hatami, Lovász and Szegedy
[2012]). Among others, if a sequence of graphs is convergent in this stronger sense,
then we can read off from the limit whether the graphs are expanders (up to a
non-expanding part of negligible size).
instead of a minimum:
(r,k) { ( )
(19.5) δ⊙ (G1 , G2 ) = inf c : ∀α1 ∃α2 δ⊙
r
(G1 , α1 ), (G2 , α2 ) ≤ c, and
( ) }
∀α2 ∃α1 δ⊙
r
(G1 , α1 ), (G2 , α2 ) ≤ c .
The quantity δ⊙ nd
(G, G′ ) is defined from this just like in the case of graphs.
We say that two graphings G and G′ are locally-globally equivalent, if
δ⊙ (G, G′ ) = 0. A sequence of graphs (Gn ) is locally-globally convergent if it is
nd
( ) ( ) ( ) ε ε
r
δ⊙ (G, β), (G, αi ) ≤ δ⊙
r r
(G, β), Fi + δ⊙ Fi , (G, αi ) ≤ + = ε.
2 2
Proof of Theorem 19.16. We apply Lemma 19.17 with ε = 2−r , and denote
M (k, r, 2−r ) by M (k, r). We fix a set of M (k, r) k-colorings as in Lemma 19.17 for
every graph G ∈ G, and call them its representative
∏∞ k-colorings.
Consider the product space K = k,r=1 [k]M (k,r) ; this is compact and totally
disconnected. We start with constructing a decoration χ = χG : V (G) → K for
every G ∈ G. Given a node v ∈ V (G), we consider the representative k-colorings
α1 , . . . , αM (k,r) of G, and concatenate the sequences (α1 (v), . . . , αM (k,r) (v)) for
k, r = 1, 2, . . . to get χ(v).
Using the decoration χG and the projection map φk,r : K → [k]M (k,r) , we can
manufacture many k-colorings of G as β = ψ ◦ φk,r ◦ χ, where ψ : [k]M (k,r) → [k]
is any map. We call these k-colorings “special”. It follows from the construction
of χ that the representative k-colorings of G are special. Hence for every graph G,
every k, r ≥ 1, and every k-coloring
( α of)V (G), there is a special k-coloring β close
to α, in the sense that δ⊙ (G, α), (G, β) ≤ 2−r .
The graphing HK we construct is similar to the “Graph of Weighted Graphs”
+
H introduced in Section 18.3.3, but instead [0, 1], we use weights from K. We
construct probability measures on HK to get representations of finite graphs and
then, representations of the limit. With the decoration χG , and any choice of a root
v ∈ V (G), the triple (G, v, χG ) is a point of HK . The map τG : v 7→ (G, v, χG )
defines an embedding G → HK onto a connected component of HK (the fact that
this map is injective is clear, since for any two nodes u, v ∈ V (G) one of the k-
colorings in Lemma 19.18 must distinguish
( ) them once r is large enough). Let ζG
be the uniform distribution on τG V (G) . Since G is finite, this distribution is
involution-invariant on HK .
Let (Gn ) be a locally-globally convergent graph sequence. By Prokhorov’s
Theorem (see Appendix A.3.3), we can replace our graph sequence by a subsequence
such that the distributions ζGn converge weakly to a distribution ζ on HK . Since
every ζGn is involution-invariant, so is ζ, and hence G = (HK , ζ) is a graphing.
We claim that Gn → G in the local-global sense. To prove this convergence,
we need the following auxiliary fact.
The main observation we need is that the integrand is continuous. Indeed, suppose
that Hn → H in the topology of HK , where Hn , H ∈ HK are rooted K-decorated
countable graphs. Then for a sufficiently large n, the balls BHn ,r and BH,r are
isomorphic, ( and) moreover, there is an isomorphism σn : BH,r → BHn ,r such
that(χHn σ)n (x) → χH (x) for every x ∈ V (BH,r ). This means that χH (x) and
χ(Hn σn (x)
) agree in more and more coordinates as n grows, which implies that
β σn (x)( → )β(x), since β is continuous. Since β has finite
( range, this implies )
that β σ (x) = β(x) if n is large enough. But then 1 B (H ) ∼
= B =
( n ) G,β,r n 0
∼
1 BG,β,r (H) = B0 if n is large enough, which proves that the integrand is contin-
uous.
Hence it follows by the weak convergence ζGn → ζ that
∫ ∫
1(BG,β,r = B0 ) dζGn −→ 1(BG,β,r ∼
∼ = B0 ) dζ,
HK HK
which proves that ρGn ,βn ,r (B0 ) → ρG,β,r (B0 ) for every k-colored r-ball B0 . This
proves the claim.
Let us return to the proof of the local-global convergence Gn → G. By the
definition of the nondeterministic sampling distance, we have to verify two things
for every r, k ≥ 1: every k-coloring of Gn can be “matched” by a k-coloring of G so
that the distributions of r-neighborhoods are close, and vice versa. Let ε > 0; we
may assume that ε ≥ 2−r , since larger neighborhoods are more difficult to match.
First, let α be a Borel k-coloring of G. Then by Lemma 19.17, there is another
K
Borel k-coloring β such that β is continuous in the ( topology of )H and α = β on
−r
a set of measure at least 1 − ε(2D) . Then δ⊙ (G, α), (G, β) ≤ ε/2 by (19.4).
r
For every n, the k-coloring β gives a k-coloring ( βn of the nodes ) of Gn , under the
embedding τGn . By Claim 19.19, ( we have δ ⊙ )(G n , β n ), (G, β) ≤ ε/2 if n is large
enough. This implies that δ⊙ r
(Gn , βn ), (G, α) ≤ ε.
(r,k)
Second, let n be large enough so that for all m ≥ n, we have δ⊙ (Gn , Gm ) ≤
ε/3, and let αn be a k-coloring
( of Gn . Then ) for every m ≥ n there is a k-coloring
αm of Gm such that δ⊙ r
(Gn , αn ), (Gm , αm ) ≤ ε/3. Furthermore, there is a special
( = ψm ◦ φk,r ◦ χGm of → )[k])
M (k,r)
k-coloring βm ) Gm (with an appropriate ( ψm : [k]
such that δ⊙ (Gm , αm ), (Gm , βm ) ≤ ε/3. It follows that δ⊙ (Gn , αn ), (Gm , βm ) ≤
r r
is continuous,
( ) τGm (v) it coincides
and on ( with βm (v).) Claim (19.19) implies that
δ⊙r
(Gm , βm ), (G, β) → 0. Hence δ⊙ r
(Gn , αn ), (G, β) ≤ ε if n is large enough.
Exercise 19.20. Let F1 and F2 be two finite graphs and let GF1 and GF2 denote
nd nd
the associated graphings. Prove that δ⊙ (F1 , F2 ) = δ⊙ (GF1 , GF2 ).
Exercise 19.21. For an (uncolored) graph G, let Qr,k (G) denote the set of all
neighborhood distributions ρG∗ ,r , where G∗ is a k-colored version of G. Prove
that ( )
δ⊙ (G, G′ ) = dHaus Qr,k (G), Qr,k (G′ ) .
(r,k)
var
Exercise 19.22. (a) Let (Gn ) be a locally-globally convergent graph sequence.
Prove that the numerical sequences α(Gn )/v(Gn ) and Maxcut(Gn )/v(Gn ) are con-
vergent. (b) Show by an example that this does not hold for every locally conver-
gent sequence.
CHAPTER 20
graphon. Let G = (V, E) be a simple graph and let H be a weighted graph with
nonnegative edgeweights.
We have considered random maps φ : V (H) → V where the probability of φ
was proportional to αφ . It is also quite natural to bias these with the product of
the edgeweights. In other words, let the probability of φ be
homφ (G, H)
(20.1) πG,H (φ) = αφ .
hom(G, H)
In the special case when H is a looped-simple unweighted graph, this is the uniform
distribution on the set Hom(G, H).
Example 20.1 (Ising model). Recall the example from the Introduction (Section
2.2). There is a very large graph G (most often, a grid) whose nodes are the atoms
and whose edges are bonds between these atoms. There is a small graph H, whose
nodes represent the possible states of an atom. (In the case of the Ising model, H
has two nodes only, representing the spins “UP” and “DOWN”.) The nodeweights
αi = e−hi represent the influence of an external field on an atom in state i, and
the edgeweights βij = e−Jij represent the interaction energy between two adjacent
atoms in states i and j (we ignore the dependence on the temperature for this
discussion). A possible configuration is a map σ : V (G) → V (H), and its energy is
∑ ∑
H(σ) = − hσ(u) − Jσ(u),σ(v) .
u uv∈E(G)
(The condition x|S = y may have probability 0, but the formula works.) In the
special case when S = V \ {v}, the distribution πy can be identified with a distri-
bution on [0, 1], which we denote by πy,v . It will be important to notice that in this
case the distribution πy is determined by the restriction of y to NG (v).
Is there a more tangible way of defining this distribution? A general tech-
nique of generating random elements of complicated distributions and studying
their properties is to construct a Markov chain with the given stationary distribu-
tion. In this case, there is a rather simple Markov chain M on weightings in [0, 1]V
with this property. (In the special case when W = WKq , this will specialize to
the “heat-bath” chain, or “Glauber dynamics”, on q-colorings of G.) One step of
this Markov chain is described as follows: Given a weighting x, we select a uniform
random node v ∈ V (which we call the pivot node) and reweight it from the distri-
bution πx,v . All other nodeweights remain unchanged. It is not hard to check that
πG,W is a stationary distribution of this Markov chain.
Let us fix a set U ⊆ V and its complement Z = V \ U . We can modify
the Markov chain M by selecting the pivot node v from Z only. This modified
Markov chain preserves the weighting of U ; if we restrict it to the extensions of a
partial weighting a ∈ [0, 1]U , then we get a Markov chain Ma , whose stationary
distribution is πa .
Next, we define a Markov chain M2 on pairs (x, y) ∈ [0, 1]V × [0, 1]V . Given
(x, y), we generate a random pivot node v ∈ Z and modify both x and y according
to M, separately but not independently: using the same pivot node, we generate
a random weight x from the distribution πx,v , and a random weight y from the
distribution πy,v , and couple x and y optimally, so that P(x ̸= y) = dtv (πx,v , πy,v ).
We change the weight of v in x to x, and in y to y. Note that for fixed a, b ∈ [0, 1]U ,
the set of pairs of weightings (x, y) with x|U = a and y|U = b is invariant. Let Ma,b
denote the Markov chain restricted to such pairs.
The stationary distribution of this Markov chain is difficult to construct di-
rectly, but at least it exists:
Lemma 20.2. The Markov chain Ma,b has a stationary distribution with marginals
πa and πb .
This is trivial if Ma,b has a finite number of states (which happens if W is a
stepfunction, i.e., we are studying homomorphisms into a finite weighted graph).
For the general case, the proof follows by more advanced arguments in probability
theory, and is not given here (see Lovász [Notes]).
These Markov chains (especially the simplest chain M) are quite important in
simulations in statistical physics and also in theoretical studies. A lot of work has
been done on their mixing times and other properties. For us, however, the main
consequence of their introduction will be the existence of the stationary distribution
of Ma,b .
20.1.2. Correlation decay. Our next goal is to state and prove the fact,
mentioned above, that πG,W has no long-rage interaction: under appropriate con-
ditions, the weights of two distant nodes in a random W -weighting from πG,W are
essentially independent. We start with an easy observation (the verification is left
to the reader as an exercise).
Proposition 20.3. Let G = (V, E) be a simple graph, and let W be a graphon
of rank 1. Then πG,W is a product measure on [0, 1]V . In other words, if x is a
370 20. RIGHT CONVERGENCE OF BOUNDED DEGREE GRAPHS
where x, y ∈ [0, 1][r] range through weightings of the leaves of Sr+1 that differ only
for a single node u ∈ [r]. If dob(W ) is small, then changing the weight of a neighbor
of a node has little influence on the weight of the node. Changing the weight of
neighbors one by one, we get by induction that for any graph G ∈ G, node v ∈ V (G)
and x, y ∈ [0, 1]V , we have
(20.5) dtv (πx,v , πy,v ) ≤ dob(W ){u ∈ N (v) : xu ̸= yu }.
Theorem 20.4. Let G = (V, E) be a (finite) graph with all degrees bounded by D,
and let W be a graphon. Then for any partition V = Z ∪ U and any two maps
a, b ∈ [0, 1]U , the distributions πa and πb have a coupling κ such that for every node
v ∈ Z and every pair (x, y) of random W -weightings from the distribution κ, we
have
P(xv ̸= yv ) ≤ (dob(W )D)d(v,U ) ,
where d(v, U ) denotes the distance of v from U in G.
What is important in this theorem is that it gives an exponentially decaying
correlation between the weight of v and the weights of nodes far away, provided
dob(W ) < 1/D.
Proof. We assume that dob(W ) < 1/D (else, there is nothing to prove). Let
κ be the stationary distribution of the Markov chain Ma,b with marginals πa and
πb . So κ is a coupling of these distributions.
Let x, y ∈ [0, 1]V , and let (x′ , y ′ ) be obtained from (x, y) by making one step of
Ma,b , using a random pivot node v ∈ Z. Let n = |Z|. Then for any node w ∈ Z,
n−1 1
(20.6) P(x′w ̸= yw
′
)= P(x′w ̸= yw
′
| v ̸= w) + P(x′w ̸= yw
′
| v = w).
n n
Here P(x′w ̸= yw′
| v ̸= w) = 1(xw ̸= yw ) (since nothing changes at w under this
condition), and
P(x′w ̸= yw
′
| v = w) = P(i ̸= j | v = w) = dtv (πx,w , πy,w ).
Substituting in (20.6), we get
n−1 1
(20.7) P(x′w ̸= yw
′
)= 1(xw ̸= yw ) + dtv (πx,w , πy,w ).
n n
20.1. RANDOM HOMOMORPHISMS TO THE RIGHT 371
Now let (x, y) be a random pair from κ, and average (20.7) over x and y, to
get
n−1 1 ( )
(20.8) P(x′w ̸= yw
′
)= P(xw ̸= yw ) + E dtv (πx,w , πy,w ) .
n n
By the definition of stationary distribution, (x′ , y′ ) has the same distribution as
(x, y), and hence P(x′w ̸= yw
′
) = P(xw ̸= yw ). Substituting in (20.8), we get
( )
(20.9) P(xw ̸= yw ) = E dtv (πx,w , πy,w ) .
So far, we have not used the Dobrushin parameter dob(W ). By (20.5), we get
( ) ∑
(20.10) E dtv (πx,w , πy,w ) ≤ dob(W ) P(xu ̸= yu ).
u∈N (w)
Define f (u) = P(xu ̸= yu ). We have f (u) ∈ {0, 1} if u ∈ U , and (20.9) and (20.10)
imply that
∑
(20.11) f (w) ≤ dob(W ) f (u)
u∈N (w)
holds for all w ∈ Z. Inequality (20.11) says that the function f is strictly subhar-
monic at the nodes of Z. It is easy to derive from this fact an estimate on f . Let us
start a random walk (v 0 = v, v 1 , . . . ) on G from v ∈ Z, and let T be the (random)
time when this random walk hits U (if the connected component of v does not
intersect U , then f = 0 on this connected component and the conclusion below is
trivial). Consider the random variables Xt = f (v t )(dob(W )D)t . It follows from
(20.11) that these form a submartingale, and hence by the Martingale Stopping
Theorem A.11, we get
( ) ( )
f (v) = X 0 ≤ E(X T ) = E (dob(W )D)T f (v T ) ≤ E (dob(W )D)T .
Since trivially T ≥ d(v, U ), this completes the proof.
It is important that the coupling κ constructed above is independent of the
node v. This means that if we want to estimate the probability that x|S ̸= y|S for
some subset S ⊆ Z, then we get the same coupling distribution κ, and so we can
use the union bound:
Corollary 20.5. Under the conditions of Theorem 20.4, every S ⊆ Z satisfies
P(x|S ̸= y|S ) ≤ (dob(W )D)d(S,U ) |S|.
Let us formulate some other consequences. First, consider proper q-colorings
of G, i.e., homomorphisms G → Kq . For Sr+1 in the definition of the Dobrushin
parameter, let φ and ψ be two q-colorings of the leaves that differ at node 1 only.
Then πφ,0 is the uniform distribution on the set [q]\φ([r]), and πψ,0 has an analogous
description. These sets have Hamming distance at most 2D, and hence their total
variation distance is at most 1/(q − D). So dob(WKq ) < 1/D is satisfied if q > 2D,
and we get:
Corollary 20.6. Let G = (V, E) be a graph with all degrees bounded by D, and let
q > 2D. Then for any U ⊆ V , any two proper q-colorings α and β of G[U ], and
any v ∈ V \ U , the random extensions φ and ψ of α and β to proper q-colorings of
G satisfy
( D )d(v,Z)
dtv (φ(v), ψ(v)) ≤ .
q−D
372 20. RIGHT CONVERGENCE OF BOUNDED DEGREE GRAPHS
Corollary 20.7. Let G = (V, E) be a simple graph with all degrees bounded by
D, and let H be a looped-simple graph with 2D∆(H) < v(H). Then for any subset
U ⊆ V , any two homomorphisms α, β : G[U ] → H, and any v ∈ V \U , the uniform
random extensions φ and ψ of α and β to homomorphisms G → H, restricted to
the node v, satisfy
( D∆(H) )d(v,Z)
dtv (φ(v), ψ(v)) ≤ .
v(H) − D∆(H)
∫ 1
(20.12) ∆(W ) = sup W (x, y) dy.
x∈[0,1] 0
∆(W )
dob(W ) ≤ .
1 − D∆(W )
∏
r ∏
r ∫ 1
g(x) = W (x, zi ) = W (x, wi ) and s(x) = g(y)W (x, y) dy.
i=2 i=2 0
The density functions of the distributions πz,0 and πw,0 are g(x)W (x, z1 )/s(z1 ) and
g(x)W (x, w1 )/s(w1 ), respectively, and hence
∫ 1 W (x, z ) W (x, w )
1 1 1
(20.13) dtv (πz , πw ) = g(x) − dx.
2 0 s(z1 ) s(w1 )
20.1. RANDOM HOMOMORPHISMS TO THE RIGHT 373
(πS ) of distributions satisfying these consistency relations, then the Extension The-
orem of Kolmogorov gives us a probability distribution π on all W -weightings such
that π|S = πS for all finite sets S ⊆ V (G).
So far, this is quite simple. There are many ways to specify such a family (πS )
of distributions. However, we would like other conditions to be satisfied. Let us
formulate two:
• Markov property. If G1 and G2 are two finite k-labeled graphs, φ is
a random W -weighting of G1 G2 , and we condition on φ|[k] , then φ|V (G1 ) and
φ|V (G2 ) become independent. This is just another way to express the product
formula (5.53). This property can be generalized to infinite graphs. Of course, we
have to exercise some care, since G1 and G2 may be infinite. Let S ⊆ V (G) be a
finite set and suppose that G \ S is the disjoint union of two graphs G1 and G2 . Let
z be a W -weighting of S, and let x denote a random W -weighting of V \ S from
the distribution obtained by conditioning on x|S = α. We require that the random
weightings x|V (G1 ) and x|V (G2 ) be independent. We say that the distribution of x
has the Markov property, if this condition holds for every finite subset S ⊆ V (G)
and every W -weighting z of S.
• Locality. For a finite set S ⊆ V (G), we would like to get a good idea
of the distribution πS by looking at a sufficiently large neighborhood of S. Let
B(S, r) = {v ∈ V (G) : d(v, S) ≤ r} be the r-neighborhood of S, and let xr
denote a random W -weighting of G[B(S, r)]. Then we want that xr |S → x|S in
distribution as r → ∞. We call the distribution of x local if this holds.
These conditions are not too strong, as the following classical theorem shows
(see [1988] and [1993] for slightly different statements of this fact).
Theorem 20.12. Let G be a countable graph with degrees bounded by D, and let
W be a graphon such that dob(W ) < 1/D. Then there is a unique local probability
distribution πG,W on W -weightings of G with the Markov property.
Proof. Let S ⊆ V (G) be a finite set, and let xr be a random W -weighting of
G[B(S, r)].
Claim 20.13. The distribution of xr |S tends to a limit as r → ∞.
We show that these distributions form a Cauchy-sequence in the total variation
distance. Let ε > 0. Since dob(W ) < 1/D, we can choose r large enough so that
(Ddob(W ))r ≤ ε/|S|. Let m, n > r, we claim that the distributions of xm |S and
xn |S are ε-close in total variation distance. Let zn be the restriction of xn to
B(S, n) \ B(S, r), and let x′n be the random weighting of G[B(S, n)], obtained by
conditioning on zn . By the Markov property (we are using it for finite graphs here!),
x′n has the same distribution as xn . We define zm and x′m analogously.
Now we fix any two weightings zn of B(S, n) \ B(S, r) and zm of B(S, m) \
B(S, r), and let yn and ym be obtained by conditioning xn and xm on these partial
weightings. By Theorem 20.4, yn and ym can be coupled so that P(yn (v) ̸=
ym (v)) ≤ ε/|S| for every v ∈ S. This implies that
dtv (yn |S , ym |S ) ≤ ε.
Since this holds for fixed zn and zm , it also holds if they are random restrictions
of xn and xm , so it holds for x′n and x′m . Since these weightings have the same
distribution as xn and xm , the claim follows.
20.2. CONVERGENCE FROM THE RIGHT 375
Now we are able to define the distribution on W -weightings. For a finite set
S ⊆ V (G), let πS be the limit of the distributions of xr |S as r → ∞. It is easy
to check (using similar arguments as in the proof of Claim 20.13 above), that the
family (πS ) of distributions is consistent, and the distribution πG,W they define has
the Markov property. Uniqueness follows immediately from locality.
A probability distribution on W -weightings of G is called a Gibbs state if it is
invariant under the Markov chain M of local re-weightings (as used in the proof
of theorem 20.4 in the finite case). It can be proved that under the condition that
dob(W ) < 1/(2D), the Gibbs state is unique.
Remark 20.14. In a sense, the construction of a random homomorphism can be
extended to graphings. The method is similar to the Bernoulli lift of a graphing
(Section 18.5). Given a graphing G and a graphon W on [0, 1] such that dob(W ) <
1/(2D), we define a graphing G[W ] on the Graph of Weighted Graphs H+ . To
describe the probability distribution on G+ , we generate a random element from it
as follows: pick a point x ∈ V (G) and generate a random W -weighting of Gx as
described above. If W ≡ 1, we get the Bernoulli lift.
We cannot randomly map all points of a graphing into [0, 1] in any reasonable
way; this is impossible even if the graphing has no edges. But if we select any
countable subset, this can be mapped, and the graphing G[W ] contains the neces-
sary information. I don’t know of any applications of this construction, but I like
the fact that our two basic limit objects, graphings and graphons, can be combined
this way.
Theorem 20.15. For any sequence (Gn ) of graphs in G, the following are equiva-
lent:
(i) (Gn ) is locally convergent;
(ii) for every graphon W with dob(W ) ≤ 1/D, the sequence ent∗ (Gn , W ) is
convergent;
(iii) there is an ε > 0 such that for every looped-simple graph H with ∆(H) ≤
εv(H) the sequence ent∗ (Gn , H) is convergent.
The equivalence of conditions (ii) and (iii) is analogous to the equivalence of
conditions (ii) and (iii) in Theorem 12.20, and similarly as there, we could replace
them by any condition “inbetween”, like weighted graphs satisfying the Dobrushin
condition.
In the special case when H = Kq , we have ∆(Kq ) = 1, and hom(G, Kq ) is the
number of q-colorings of G. So it follows that if (Gn ) is convergent and q > 2D,
then the number of q-colorings grows as cv(Gn ) for some c > 1. It is easy to see that
some condition on q is needed: for example, if Gn is the n-cycle and q = 2, then
ent∗ (Gn , K2 ) oscillates between −∞ and ≈ 0 as a function of n.
Lemma 20.8 says that ∆(W ) < 1/(2D) is sufficient for (ii) to apply. This
condition could not be relaxed by more than a constant factor, as the following
example shows.
Example 20.16. Let Gn be a random D-regular graph on 2n nodes, and G′n
be a random bipartite D-regular graph on 2n nodes. The interlaced sequence
(G1 , G′1 , G2 , G′2 , . . . ) is locally convergent with high probability (almost all r-
neighborhoods are D-regular trees if r is fixed and n is large enough). Let H
be obtained from K2◦ by weighting the non-loop edge by 1 and the loops by 2c .
Inequality (5.33) can be generalized to give the bounds
c maxcut′ (G) ≤ ent∗ (G, H) ≤ c maxcut′ (G) + 1.
(Here maxcut′ (G) = Maxcut(G)/v(G) is normalized differently from the normal-
ization in (5.33).) The maximum cut in G′n has Dn/2 edges, but the maximum
cut in Gn has at most Dn/3 edges with high probability (see Bertoni, Campadelli,
Posenato [1997] for a sharp estimate). Hence
cD cD
ent∗ (G′n , H) = , but ent∗ (Gn , H) ≤ 1 +
2 3
If cD/2 − cD/3 = cD/6 > 1, then the sequence (ent∗ (G1 ), ent∗ (G′1 ), ent∗ (G2 ),-
ent∗ (G′2 ), . . . ) cannot be convergent with high probability. So assuming ∆(W ) ≤
7/D would not be enough in (ii).
While Theorem 20.15 sounds similar to the results in Chapter 12 (in particular
Theorem 12.20), it is both more and less than that theorem. We get a characteriza-
tion of convergence in terms of left and right homomorphisms, but no analogue of
the characterization as a Cauchy sequence in the cut metric. Also, convergence is
not established for all soft-core graphs H, just for those close to a complete graph.
On the other hand, the proof below says more, since it provides explicit formu-
las relating left and right homomorphism numbers. Furthermore, homomorphism
densities into graphons are considered, not just weighted graphs; recall that the
corresponding extension of Theorem 12.20 to graphons if false (Remark 12.22).
20.2. CONVERGENCE FROM THE RIGHT 377
Here y is a random coloring with colors from [q], and x is a random color. Whatever
Vτ (v) is, y assigns uniform and independent colors to the nodes in Nτ (v), since our
graph is a tree. Hence for every x,
( ∏ ) ( q − 1 )|Nτ (v)|
Ey 1(x ̸= yu ) = ,
q
u∈Nτ (v)
and hence
(( )
q − 1 )|Nτ (v)| ( 1) D ( 1)
ent∗ (σ, W ) = Eτ log = Eτ (Nτ (v)) log 1 − = log 1 − .
q q 2 q
So we get the theorem of Bandyopadhyay and Gamarnik (20.14).
Proof of Theorem 20.15. (i)⇒(ii) Let G = (V, E) be a simple graph with
degrees bounded by D. We may assume that αH = 1. We use the formula (20.15)
derived above, and concentrate on the innermost expression
( ∏ )
s(v, τ, y) = Ex W (x, yu ) .
u∈Nτ (v)
The Dobrushin Uniqueness Theorem 20.4 implies that we don’t change the expres-
sion by much if we restrict everything to the r-neighborhood Nr (v). To be precise,
let c = Ddob(H) < 1, and define Gr = G[Nr (v)], Vτr (v) = Nr (v) ∩ Vτ (v), and let
sr denote the function s defined for the graph Gr . Let z be a random W -weighting
of G[Vτr (v)], then Theorem 20.4 implies that the distributions of y and z, when
restricted to v and its neighbors, are closer that (D + 1)cr−1 is total variation
distance. This implies that
r
Ez s (v, τ, z) − Ey s(v, τ, y) ≤ (D + 1)cr−1 ,
and hence
∗
(20.17) ent (G, H) − Ev Fr (v) ≤ (D + 1)cr−1 ,
where
( )
(20.18) Fr (v) = Eτ log Ez sr (v, τ, z) .
(We can take expectation over the same τ , since it induces a uniform random
permutation of V (Gr ) as well as of V (G).)
20.2. CONVERGENCE FROM THE RIGHT 379
Let us note that in (20.18) Fr (v) depends only on the r-ball B = Nr (v), and
we can denote it by F (B). This allows us to express Ev Fr (v) in terms of the
distribution σG,r of r-neighborhoods in G. Thus (20.17) implies
∑
∗
(20.19) ent (G, H) − σG,r (B)F (B) ≤ (D + 1)cr−1 .
B∈Br
and hence ∑
lim sup ent∗ (Gn , H) ≤ lim inf σr (B)F (B).
n r
B∈Br
A similar argument proves that lim inf n ≥ lim supr , which implies that both limits
exist.
(ii)⇒(iii) is trivial.
(iii)⇒(i) We switch to the natural logarithm, since we are going to use analytic
formulas (this only means that all formulas are multiplied by ln 2). We express the
logarithm of t(G, H) as
∑
(20.20) ln t(G, H) = ℓ(G[S], H),
S≤V (G)
Using that ln t(., H) is an additive graph parameter for any fixed H, it is easy to
see that ℓ(F, H) = 0 unless F is a connected graph together with isolated nodes
(cf. Exercise 4.2). The term corresponding to the edgeless graph is 0, and so we
can modify (20.20) so that the summation runs over connected induced subgraphs
of G. Collecting terms with isomorphic graphs, we get
∑ ind(F, G) ℓ(F, H)
(20.22) ent∗ (G, H) = · ,
v(G) aut(F )
F
where the summation ranges over all isomorphism types of connected graphs F ;
but of course, only a finite number of terms are non-zero for any fixed G.
So we can express the homomorphism entropies ent∗ (Gn , H) as linear combina-
tions of the induced subgraph densities ind(F, Gn )/v(Gn ). This suggests a heuristic
for the proof: We show that the system of equations (20.22) can be inverted, to ex-
press the induced subgraph densities as linear combinations of the homomorphism
entropies. It follows then that if the homomorphism entropy into any given graph
converges to some value, then so does the frequency of each induced subgraph.
This heuristic is of course very naive: (20.22) is an infinite system of equations,
and so to do anything with it we need tail bounds; furthermore, the coefficient
ℓ(F, H) is defined by the hairy formula (20.21), which has all the unpleasant features
one can think of: it has an exponential number of terms, these terms alternate in
sign, and the terms themselves are logarithms of simpler functions.
380 20. RIGHT CONVERGENCE OF BOUNDED DEGREE GRAPHS
The identities developed in Section 5.3.1 come to rescue. We can get rid of
the logarithms using Corollary 5.22. Substituting the formula for ln t(G, H) in the
definition of ℓ(G, H), we get a lot of cancellation, which leads to the formula
∑∞ ∑ ∑
(−1)m e(Ji )
(20.23) ℓ(F, H) = (−1) i
m!
m=1 J1 ,...,Jm ∈Conn(F )
∪i V (Ji )=V (F )
( )∏
k
× cri L(J1 , . . . , Jm ) t(Jr , H).
r=1
(It is not clear at this point that this is any better than (20.21), but be patient.)
Next we turn to inverting the expression (20.22). Let m ≥ 1 and let
{F1 , . . . , FN } be the set of all connected simple graphs with 2 ≤ v(Fi ) ≤ m. Let
q > m/ε, add q − v(Fi ) ≥ 1 new isolated nodes to Fi , and take the complement to
get a looped-simple graph Hi on [q] with loops added at all nodes. We weight each
node of Hi by 1/q. Every node in Hi has degree at least q − m, so ∆(H i ) ≤ εq.
Consider any graph G with all degrees at most D. We write (20.22) in the form
∑N
ind(Fi , G) ℓ(Fi , Hj )
(20.24) ent∗ (G, Hj ) = · + R(G, Hj ),
i=1
v(G) aut(Fi )
where
∑ ind(F, G)ℓ(F, Hj )
(20.25) R(G, Hj ) =
aut(F )v(G)
v(F )>m
is a remainder term.
We can view (20.24) as a system of N equations in the N unknowns xi =
( )N
inj(Fi , G)/v(G). Let A = ℓ(Fi , Hj )/aut(Fi ) i,j=1 be the matrix of this system,
and let s, R ∈ RN be defined by sj = ent∗ (G, Hj ) and Rj = R(G, Hj ), then we
have AT x = s−R. Assuming that A is invertible (which we will prove momentarily),
let B = (AT )−1 . Then the system can be solved: x = B(s − R), or
ind(Fi , G) ∑
N
(20.26) = Bij ent∗ (G, Hj ) + ri (G),
v(G) j=1
where
∑
N
ri = ri (G) = Bij R(G, Hj )
j=1
is a remainder term.
We have to show that the matrix A is invertible (at least if q is large enough)
and estimate the remainder terms. We use (20.23):
∞
∑ ∑ ∑
(−1)k e(Ji )
(20.27) ℓ(F, Hi ) = (−1) i
k!
k=1 J1 ,...,Jk ∈Conn(F )
∪V (Ji )=V (F )
( )∏
k
× cri L(J1 , . . . , Jk ) t(Jr , Hi ).
r=1
20.2. CONVERGENCE FROM THE RIGHT 381
Note that for a nonzero term the exponent of q is less than −v(F ) except for k = 1
and V (J1 ) = V (F ), and that the last product does not depend on q. Hence for any
simple graph F ,
∑
(20.28) ℓ(F, Hi ) = q −v(F ) (−1)e(J)−1 t(J, Fi ) + O(q −v(F )−1 ).
J∈Csp(F )
(Here and in what follows, the constants implied in the big-O notation may depend
( )N
on m, but not on q and G). By Proposition 5.43, the matrix M = t(Fi , Fj ) i,j=1
is nonsingular. Let L be the N × N matrix with entries Lij = 1(Fi ∈ Csp(Fj )),
and let P and Q denote the diagonal matrices with entries Pii = (−1)e(Fi )−1 and
Qii = q v(Fi ) aut(Fi ), respectively. Clearly L, P and Q are nonsingular. By (20.28),
we have
QAT = LT P M + O(q −1 ),
which implies that A is nonsingular if q is large enough. Furthermore,
Bij = q v(Fi ) aut(Fi )((M T P L)−1 )ij + O(q v(Fj )−1 ),
and so
(20.29) |Bij | = O(q v(Fi ) ) = O(q m ).
Using this, the remainder terms can be estimated as follows:
∑∞ ∑ ind(F, G)
|R(G, Hj )| ≤ |ℓ(F, Hj )|
r=m+1
aut(F )v(G)
v(F )=r
∞
∑ ∑ ind(F, G)
= O(q −r )
r=m+1 v(F )=r
aut(F )v(G)
∞
∑
(20.30) = 2Dr O(q −r ) = O(q −m−1 ).
r=m+1
and
∑
N
(20.31) ri (G) = Bji R(G, Hj ) = O(q m )O(q −m−1 ) = O(q −1 ).
j=1
So we have proved that in (20.26), for fixed m, the error term ri tends to 0 as
q → ∞.
The rest of the proof is standard analysis: Assume that ent∗ (Gn , H) → Sj
(n → ∞) for every looped-simple graph H with ∆(H) ≤ ε. Consider any simple
graph Fi on m nodes. Equation (20.26) implies that
ind(F , G ) ∑ N ∑ N
|Bji ||ent∗ (Gn , Hj ) − Sj | + |ri (Gn )|.
i n
(20.32) − Bji Sj ≤
v(Gn ) j=1 j=1
Let δ > 0 be given, and choose q large enough so that |ri (Gn )| ≤ δ/2 for every n
(recall that the big-O in (20.31) does not depend on G). Since ent∗ (Gn , Hj ) → Sj ,
382 20. RIGHT CONVERGENCE OF BOUNDED DEGREE GRAPHS
the first term on the right side of (20.32) is at most δ/2 if n is large enough. It
follows that ind(F, Gn )/v(Gn ) is a Cauchy sequence, which means that the sequence
(Gn ) is locally convergent.
The proof of the Supplement is based on similar arguments and not given here
in detail. The proof method used above for (iii)⇒(i) can also be used to prove a
somewhat weaker version of (ii), replacing the Dobrushin condition dob(W ) < 1/D
by 8D∆(W ) < 1. In fact, the expression (20.22) yields itself more directly to a
proof of (i)⇒(ii) than to a proof of (b): naively, if the frequency of any induced
subgraph converges to some value, then so do the homomorphism entropies. The
main issue is to obtain good tail bounds, which can be done similarly as in the
proof above, as long as we are satisfied with proving the convergence for very small
∆(W ); but if we want a bound that is sharp up to a constant, then we need more
technical computations. We refer to the paper of Borgs, Chayes, Kahn and Lovász
[2012] for these details.
Remark 20.19. It is a natural question to ask which sequences of bounded de-
gree graphs are right-convergent in the sense that their homomorphism entropies
converge for all soft-core target graphs. Gamarnik [2012] studies this problem for
sparse random graphs, but the general question is unsettled. It is also natural to
ask whether local-global convergence can be characterized by any right-convergence
condition.
Exercise 20.20. Let G and G′ be two graphs on the same set of nodes [n], such
that |E(G)△E(G′ )| ≤ εn. Prove that δ⊙ nd
(G1 , G2 ) ≤ 2ε.
Exercise 20.21. Let H be a weighted graph with positive edgeweights and (Gn ),
a bounded degree graph sequence for which the sequence (ent∗ (Gn , H)) is conver-
gent. Let G′n be obtained from Gn by deleting o(v(Gn )) nodes and edges. Prove
that (ent∗ (G′n , H)) is convergent.
Exercise 20.22. Let H be a weighted graph with at least one positive edgeweight.
Prove that the sequence ent∗ (Pn Pm , H) is convergent as n, m → ∞, and the
same holds for the sequence ent∗ (Cn Pm , H), provided n is restricted to even
numbers.
Exercise 20.23. Let H be a weighted graph whose edges with positive
weight form a connected and nonbipartite graph. Prove that the sequence
ent∗ (Cn Pm , H) is convergent as n, m → ∞.
Exercise 20.24. Let σ be an involution-invariant measure. Show how to express
s(σ) in terms of the associated Bernoulli graphing.
CHAPTER 21
21.1. Hyperfiniteness
A notion related to Følner sequences in the theory of amenable groups is “hyper-
finiteness” for general graph families with bounded degree, which can be extended
to graphings in a natural way. This notion was introduced (in different settings) by
Kechris and Miller [2004], Elek [2007b] and Schramm [2008]. Hyperfiniteness of a
graph family has a number of important consequences, like testability of many graph
properties. Quoting an informal remark by Elek, hyperfinite bounded-degree graph
families and graphings behave as nicely as dense graph sequences and graphons do.
383
384 21. ON THE STRUCTURE OF GRAPHINGS
Example 21.3 (Planar graphs). More generally, the family of planar graphs
with degree bounded by D is hyperfinite. Indeed, let G be such a graph on n
nodes. The Lipton–Tarjan
√ Planar Separation Theorem [1979] says that G has a
set S of at most 3 n nodes such that every connected component of G − S has at
most 2n/3 nodes. We repeat this with every connected component of the remaining
graph until all components have at most K nodes.
The estimation of the number of deleted nodes is somewhat tricky and it is left
to the reader as Exercise 21.21.
Example 21.4 (Random regular graphs). Let us generate a random D-regular
graph Gn on n nodes, for every even n, by choosing one of the D-regular graphs
uniformly at random. The family of graphs obtained is not hyperfinite with prob-
ability 1 if D ≥ 3. The following heuristic argument to prove this can be made
precise rather easily. If Gn is (ε, k)-hyperfinite, then V (G) can be split into two sets
of size between n/2 −k and n/2+ k in such a way that the number of edges between
the two classes is at most εn. On the other hand, let {S1 , S2 } be a partition of
[n] into two classes of size about n/2, and let Z denote the number of edges in
Gn connecting the two classes. The expected number of such edges is about Dn/4.
Furthermore, Z is highly concentrated around its mean, and so the probability that
Z ≤ εn is o(2−n ) if ε is small enough. There are fewer than 2n such partitions of
[n], so with high probability all of these have more than εn edges connecting the
two classes.
Example 21.5 (Expanders). We call a family E ⊆ G of graphs an expander
family if there is a c > 0 such that for every graph G ∈ E and every S ⊆ V (G)
with |S| ≤ v(G)/2, we have eG (S, V (G) \ S) ≥ c|S|. An infinite family of expander
graphs is not hyperfinite. Indeed, T ⊆ E(G) and G−T has components G1 , . . . , Gr ,
and all of these have fewer than v(G)/2 nodes, then
1∑ ( ) ∑
r r
|T | = eG V (Gi ), V (G) \ V (Gi ) ≥ cv(Gi ) = cv(G).
2 i=1 i=1
It can be shown that the family of random D-regular graphs in Example 21.4 is an
expander family with probability 1.
The following explicit construction for an expander graphing was given (in a
different context) by Margulis [1973]. Consider the space R2 /Z2 , a.k.a. the torus.
Let us connect every point (x, y) to the points (x ± y, y) and (x, y ± x) (additions
modulo 1; we can leave out the axes if we don’t want loops). This graph is the
support of the measure preserving family consisting of the two maps (x, y) 7→
(x + y, y) and (x, y) 7→ (x, x + y), and hence it is a graphing. Furthermore, this
graphing is an expander, and hence not hyperfinite. This is not easy to prove; for
a proof based on Fourier analysis, see Gabber and Galil [1981].
Example 21.6. A special case of a hyperfinite family is a family of graphs with
subexponential growth, familiar from group theory. To be precise, for a function
f : N → N we say that a family H of graphs has f -bounded growth, if for any
graph G ∈ H, any v ∈ V (G) and any m ∈ N, the number of nodes in the m-
neighborhood of v is at most f (m). We say that H has
( subexponential
) growth, if it
has f -bounded growth for some function f such that ln f (m) /m → 0 (m → ∞).
It was asked by Elek and proved by Fox and Pach [unpublished] that this property
implies hyperfiniteness (Exercise 21.22).
21.1. HYPERFINITENESS 385
component of G \ S has at most k nodes. For x ∈ V , let Cx denote the node set of
the connected component of G \ S containing x. The sets Cx partition V . Let x be
a random point of G, then Cx is a random member of R, which has the following
two properties:
(1) If we select Cx first, and then select a uniform random point y ∈ Cx (note
that Cx is finite!), then y is distributed according to λ.
(2) If ∂(X) denotes the number of edges of G connecting X to V (G)\X (where
X is a finite subset of V (G)), then
( ∂(C ) )
x
E = η(S) ≤ ε.
|Cx |
Both of these properties can be easily verified using the Mass Transport Prin-
ciple (similarly to the proof of Proposition 18.50).
This motivates the next definition: we call a probability distribution τ on R a
fractional partition (into parts in R), if selecting Y ∈ R according to τ , and then
a point y ∈ Y uniformly, we get a point distributed according to λ; and we define
the boundary value of τ as
( ∂(Y) )
∂(τ ) = E .
|Y|
We say that G is fractionally (ε, k)-hyperfinite, if there is a fractional partition τ
such that ∂(τ ) ≤ ε. It follows from the discussion above that every (ε, k)-hyperfinite
graphing is fractionally (ε, k)-hyperfinite. The converse is not true (cf. Example
21.12), but we have the following weak converse:
Lemma 21.10. If a graphing is fractionally (ε, k)-hyperfinite, then it is
(ε log(8D/ε), k)-hyperfinite.
Proof. We use the Greedy Algorithm to construct a partition from a frac-
tional partition τ that establishes that G is fractionally (ε, k)-hyperfinite. Similar
algorithms are well known in combinatorial optimization, but here we have to be
careful, since we are going to construct an uncountable family of sets, and have
to make sure that the partition we obtain has the property that the set of edges
connecting different classes is Borel (and of course has small measure). We do our
construction in r = ⌊log(2D/ε)⌋ phases.
We start with R0 = U0 = ∅. In the j-th phase, let Uj,0 = Uj−1 be the
union of previously selected sets. Let Rj,1 be the set of sets Y ∈ R such that
∂(Y ) < ε2j−1 |Y \ Uj,0 |. Let Qj,1 be a maximal Borel set of sets Rj,1 such that
the sets Y \ Uj,0 are disjoint. Such a set exists by the following construction. Let
H0 be the intersection graph of Rj,0 . It is easy to see that H0 is a Borel graph
with bounded degree, and so it contains a maximal stable set that is Borel (this is
implicit in the proof of Theorem 18.3, see Exercise 18.11). Let Uj,1 be the union of
Uj,0 and the sets in Qj,1 .
The phase is not over; we select a maximal Borel family Qj,2 of sets Y ∈ R
such that the sets Y \ Uj,1 are disjoint and ∂(Y ) < ε2j−1 |Y \ Uj,1 |. We let Uj,2 be
the union of Uj,1 and the sets in Qj,2 . We repeat this k + 1 times, to finish the j-th
phase (after a while, we may not be adding anything). Let Qj be the family of sets
Y ∈ R selected in the j-th phase, and let Uj = Uj,k be their union.
We repeat this for j = 1, . . . , r. Let Q = Q1 ∪ · · · ∪ Qr be the set of all sets
Y ∈ R selected. For every Y ∈ Qj , let Y 0 = Y \ Uj−1 (this is the set of nodes first
21.1. HYPERFINITENESS 387
covered by Y ). Let T0 be the set of all edges incident with any node of V \ Ur ,
and let T1 denote the set of all edges connecting any set Y ∈ Q to its complement.
Clearly every connected component of G − (T0 ∪ T1 ) has at most k nodes.
Next we show that
(21.1) ∂(Y ) ≥ ε2j−1 |Y \ Uj |
for every Y ∈ R (selected or not) and 1 ≤ j ≤ r. Suppose (by way of contradiction)
that ∂(Y ) < ε2j−1 |Y \ Uj |. Then Y ̸⊆ Uj , and hence it was not selected in the j-th
phase or before. But Y was eligible for selection throughout the j-th phase, and if
it was not selected, then (by the maximality of the family selected) Y must contain
a point of Uj,i \ Uj,i−1 for i = 1, . . . , k + 1, which is impossible since |Y | ≤ k. This
proves (21.1).
We want to bound the measure of T0 ∪T1 . We start with T0 . Select a random set
Y ∈ R from the distribution τ , and a random point y ∈ Y . Then y is distributed
according to λ, and hence by (21.1) and the definition of the fractional partition τ
we have
( |Y \ U | ) ( ∂(Y) )
j
(21.2) λ(V \ Uj ) = P(y ∈ / Uj ) = E ≤E ≤ 21−j
|Y| ε2j−1 |Y|
for every 1 ≤ j ≤ r. In particular, we have λ(V \ Ur ) ≤ 21−r ≤ 2ε/D by the choice
of r, and hence
∫
η(T0 ) ≤ deg(x) dx ≤ Dλ(V \ Ur ) ≤ 2ε.
V \Ur
Hence
η(T0 ∪ T1 ) ≤ 2ε + ε log(2D/ε) = ε log(8D/ε).
Theorem 21.9 follows from our characterization of local equivalence (Theorem
18.59), Lemma 21.10 and the following rather simple couple of facts.
Proposition 21.11. Let φ : G1 → G2 be a local isomorphism between graphings
G1 and G2 .
388 21. ON THE STRUCTURE OF GRAPHINGS
easy to check that for every edge of G induced by A or induced by V (G) \ A, there
is at least one node of degree at most one among the corresponding four nodes of
G△ \ S. The remaining nodes of G△ \ S have degree at most 2. Hence
1( )
|E(G△ ) \ S| ≤ |E(G)| − eG (A, V \ A) + 2(|V (G△ )| − |E(G)| + eG (A, V \ A))
2
9n
≤ + Maxcut(G),
2
and so
9n
|S| ≥ 6n − Maxcut(G) > ,
2
since G is nonbipartite. So G△ is (3/4, 3)-hyperfinite. For the appropriate choice
of G, we can prove more: let Gn be a random D-regular graph and G′n , a random
D-regular bipartite graph on n nodes, then (G′n )△ is (3/4, 3)-hyperfinite. On the
other hand, Maxcut(Gn ) < 1.41n with high probability (McKay [1982], see also
Hladky [2006]), and we get that (Gn )△ is not even (4/5, 3)-hyperfinite.
Let G and G′ be local-global limit graphings of the sequences (Gn )△ and
(Gn ) , respectively (or of appropriate subsequences), then G and G′ are locally
′ △
by a colored graphing (G′ , S). It follows from the definition of convergence that
|Sn |/v(Gn ) → λ(S) (where λ is the node measure in (G′ , S)), and also that almost
all connected components of G′ − S have at most k nodes. Hence the uncolored
graphing G′ is hyperfinite. Since G′ and G are locally equivalent, it follows by
Theorem 21.9 that G is hyperfinite.
To prove the “only if” part, we invoke Theorem 19.16. By selecting an ap-
propriate subsequence, we may assume that the sequence (Gn ) is locally-globally
convergent, and so it has a limit graphing G′ such that δ⊙ nd
(Gn , G′ ) → 0 for ev-
ery k ≥ 0. Clearly G and G are locally equivalent, and hence G′ is hyperfinite
′
by Theorem 21.9. This means that for every ε > 0 there is an m ≥ 1 such that
V (G′ ) has a Borel 2-coloring with read and blue such that every connected m-node
subgraph contains a red point, and λ′ {red points} ≤ ε. These properties can be
read off from the 1-balls and the m-balls, respectively. It follows by the assump-
tion that Gn → G′ in the local-global sense that for a large enough n, Gn has
a 2-coloring such that the set Rn of red nodes satisfies |Rn | ≤ 2εv(Gn ), and the
number of m-neighborhoods that contain a connected blue subgraph with m + 1
nodes is at most εv(Gn ). Adding the roots of these m-neighborhoods to Rn , we
get a set Rn′ ⊆ V (Gn ) with |Rn′ | ≤ 3εn such that every connected component of
Gn − Rn has at most m nodes.
We state another result, in a sense dual to Theorem 21.13:
Theorem 21.14. A graphing is hyperfinite if and only if it is the limit of a hyper-
finite graph sequence.
Proof. In view of Theorem 21.13, it suffices to prove that every hyperfinite
graphing is the limit of a locally convergent graph sequence. (So the Aldous–Lyons
conjecture holds for hyperfinite graphings.) Let G be a hyperfinite graphing, and
let ε > 0. Let S be a subset of edges with η(S) = ε such that every connected
component of G − S is finite. Proposition 19.1 implies that
(21.3) δ⊙ (G, G − S) ≤ 4ε1/ log(2D) .
For every graph F ∈ G, let aF be the measure ∑ of points in G − S whose
connected component is isomorphic
∑ to F . Since F a F = 1, we can
∑ choose a finite
set H of graphs such that F ∈H
/ a F ≤ ε/D. Let n > (D/ε) F ∈H v(F ), and
nF = ⌊aF n/v(F )⌋ (so that the rationals nF v(F )/n approximate the real numbers
aF with common denominator). For every F ∈ / H, let us delete the edges of all
connected components of G − S isomorphic to F . For every F ∈ H, let us delete
the edges of a set of connected components of G − S isomorphic to F so that the
remaining connected components cover a set of measure nF v(F )/n; it is not hard
to see that this can be done so that a Borel graph remains. The measure of the set
T of deleted edges can be bounded as follows:
∑ D ∑ D( nF v(F ) ) ε D ∑ v(F )
η(T ) ≤ aF + aF − ≤ + ≤ ε.
2 2 n 2 2 n
F ∈H
/ F ∈H F ∈H
Trivially, 0 ≤ f (G) ≤ 1.
Let, say, G1 , . .∑
. , Gm be those components of G that are not (ε, δ)-homogeneous,
and suppose that i=1 v(Gi ) = p > (ε/2D)n. Let Vi′ ⊆ V (Gi ) be an (ε, δ)-island,
m
and let Vi = V (Gi ) \ Vi′ . Let Ci be the set of edges connecting Vi and Vi′′ , then
′′
|Ci | ≤ δ|Vi′ |. Finally, let G′ be obtained from G by removing the edges in the sets
394 21. ON THE STRUCTURE OF GRAPHINGS
Ci . We want to show that if many of the parts are not ε-homogeneous, then f (G′ )
is substantially larger than f (G). To keep the notation in check, set G′i = G[Vi′ ],
G′′i = G[Vi′′ ], n = v(G), ni = v(Gi ), n′i = v(G′i ) etc. Since the radius r is fixed, we
don’t have to show it in notation, and write ρi = ρGi ,r , ρ′i = ρG′i ,r etc.
Fix any i ∈ [k] and any B ∈ Br , and consider the difference of their contribu-
tions to f (G′ ) and f (G):
n′i ′ n′′ ni
(21.5) ρi (B)2 + i ρ′′i (B)2 − ρi (B)2
n n n
n′i ( ′ )2 n′′ ( )2
= ρi (B) − ρi (B) + i ρ′′i (B) − ρi (B)
n n
2 ( ′ ′ ′′ ′′
)
+ ρi (B) ni ρi (B) + ni ρi (B) − ni ρi (B) .
n
Here the first term will provide the gain, the second is nonnegative, while the third
is an error term. To estimate the “gain” term, first we sum over all balls B:
∑( )2 1 (∑ ′ )2 4 ε2
ρ′i (B) − ρi (B) ≥ |ρi (B) − ρi (B)| = dvar (ρ′i , ρi )2 ≥ .
b b b
B B
Summing over i and using that n′i ≥ εni by the definition of an island, we get
∑ n′i ( ′ )2 ∑ ni ε2 ε4
ρi (B) − ρi (B) ≥ ε ≥ .
n i
n b Db
i,B
To estimate the error term, we argue that the quantity |n′i ρ′i (B) + n′′i ρ′′i (B) −
ni ρi (B)| is the increase or decrease in the number of neighborhoods isomorphic to
B when the edges in Ci are deleted; since deletion of an edge can change at most
2Dr balls with radius r, we have
∑
n′i ρ′i (B) + n′′i ρ′′i (B) − ni ρi (B) ≤ 2Dr |Ci |,
B
• It gives a partition of the nodes such that most bipartite graphs between
different classes are homogeneous (random-like). Several extensions of the Regular-
ity Lemma to sparse graphs in this sense are known (see e.g. Kohayakawa [1997],
Gerke and Steger [2005], Scott [2011]), but they are more-or-less meaningless, or
very weak, for graphs that have bounded degree.
• It gives a decomposition of the graph into simpler, homogeneous subgraphs.
Theorem 21.25 describes such a decomposition. However, this result is clearly not
the ultimate word: the (ε, δ)-homogeneous pieces it produces can still have a very
complicated structure.
• It implies that an arbitrarily large (simple, dense) graph can be “scaled
down” to a graph whose size depends on the error bound only, and which is almost
indistinguishable from the original by sampling. Proposition 19.10 shows that such
a “downscaling” is also valid for bounded degree graphs; Unfortunately, it is non-
effective, and provides no algorithm for the construction of the smaller graph.
• It provides an approximate code for the graph, which has bounded size (de-
pending on the error we allow), from which basic parameters of the graph can be
reconstructed, and from which graphs can be generated on an arbitrary number of
nodes that are almost indistinguishable from the original graph by sampling. In
this sense, a Regularity Lemma may exist, and should be very useful once we learn
how to work with it. While not quite satisfactory, I feel that the results mentioned
above justify cautious optimism.
CHAPTER 22
The algorithmic theory of large graphs with bounded degree is quite extensive.
Similarly as in the case of dense graphs, we can formulate the problems of parameter
estimation, property distinction, property testing, and computing a structure.
However, it seems that the theory in the bounded degree case is lacking the
same sort of general treatment as dense graphs had, in the form of useful general
conditions for parameter estimations (like Theorem 15.1), treatment of property
distinction in the limit space (Section 15.3), and the use of regularity partitions
and representative sets in the design of algorithms (Section 15.4). The most im-
portant tools that are missing are analogues of the Regularity Lemma and of the
cut distance.
Our discussions in this chapter, accordingly, will be more an illustration of
several interesting and nontrivial results than a development of a unifying theory.
But even so, graph limit theory provides a useful point of view for these results.
take, and what function of them you compute, this single bit of information (degree
2 or degree 3) will not distinguish three possibilities (G, G′ and GG′ ).
On the other hand, some other facts extend from the dense case with more or
less difficulty. The following theorem of Elek [2010a] connects parameter estimation
with convergence (recall that the analogous result for dense graphs was trivial).
Theorem 22.3. A bounded graph parameter f is estimable if and only( if for )ev-
ery locally convergent graph sequence (Gn ), the sequence of numbers f (Gn ) is
convergent.
Proof. The “only if” part is easy: from similar graphs we get similar sam-
ples and so we compute similar estimates. Let us make this precise. Suppose
that f is estimable, and let (Gn ) be a locally convergent graph sequence. Let
0 < ε < 1/8, we want to show that |f (Gn ) − f (Gm )| < ε if n, m are large enough.
By the definition of estimability, we have a positive integer k and an estimator
function g : (Bk )k → R such that (22.1) holds. If n, m are large enough, then
δ⊙ (Gn , Gm ) ≤ 1/(4k2k ), and hence dvar (ρk,Gn , ρk,Gm ) ≤ 1/(4k). This means that
we can couple a random node v ∈ V (Gn ) with a random node u ∈ V (Gm ) so that
BGn ,k (v) ∼= BGm ,k (u) with probability at least 1 − 1/(4k). If we sample k indepen-
dent nodes v1 , . . . , vk from Gn and k independent nodes u1 , . . . , uk from Gm , then
with probability more than 3/4, we have BGm ,k (u1 ) ∼ = BGn ,k (v1 ), . . . , BGm ,k (uk ) ∼ =
BGn ,k (vk ). With positive probability, we have (simultaneously BGm ,k (u1 )) ∼ =
BGn ,k (v ), . . . , B (u ) ∼
= B (v ), f (G ) − g B (v ), . . . , B (v ) ≤
1 Gm(,k k Gn ,k k n) Gn ,k 1 Gn ,k k
ε and f (Gm ) − g BGm ,k (u1 ), . . . , BGm ,k (uk ) ≤ ε. But in this case we have
|f (Gn ) − f (Gm )| ≤ 2ε, which we wanted to prove.( )
The converse is a bit trickier. Suppose that f (Gn ) is convergent for every
locally convergent graph sequence (Gn ). Given ε > 0, we want to find a suitable
positive integer k and construct an estimator g : (Bk )k → R. The condition on
f implies that for every ε > 0 there is an ε′ > 0 such that if δ⊙ (G, G′ ) ≤ ε′ then
|f (G) − f (G′ )| ≤ ε. Let r be chosen so that 21−r < ε′ , and let k > 2r/(εε′ ).
The estimator we construct will only depend on the r-balls around the roots
of the k-balls. So we will construct a function g : (Br )k → R. For every sequence
b = (B1 , . . . , Bk ) ∈ (Br )k , let ρb denote the distribution of a randomly chosen
element of the sequence. We define the estimator as follows:
′
f (G) where G is any graph with dvar (ρG,r , ρb ) ≤ ε /4,
g(b) = if such a graph exists,
0 otherwise.
To show that this is a good estimator, let G ∈ G be any graph, ( and let v1 , . . . , vk )∈
V (G) be uniformly chosen random nodes, and let b = BG,r (v1 ), . . . , BG,r (vk ) .
By the choice of k, elementary probability theory gives that with probability at
least 1 − ε, we have dvar (ρb , ρG,r ) ≤ ε′ /4. If this happens, then in the definition
of g(b) the first alternative applies, and so g(b) = f (G′ ) for some graph G′ that
satisfies dvar (ρG′ ,r , ρb ) ≤ ε′ /4. This implies that dvar (ρG′ ,r , ρG,r ) ≤ ε′ /2. Then we
have by (19.3)
1 ε′
δ⊙ (G, G′ ) ≤ r + ≤ ε′ .
2 2
By the definition of ε′ , this implies that |f (G) − f (G′ )| ≤ ε.
22.1. ESTIMABLE PARAMETERS 399
Corollary 22.4. For every estimable graph parameter f there exists a graphing pa-
rameter fb that is continuous in the δ⊙ distance such that f (Gn ) → fb(G) whenever
Gn → G.
Notice that continuity in the δ⊙ distance implies invariance under local equivalence.
Proof. It is easy to see (using Theorem 22.3) that the parameter fb is uniquely
determined for graphings that represent limits of convergent graph sequences, and
it is continuous in the δ⊙ distance. However, we don’t know if all graphings are like
that (cf. Conjecture 19.8). To complete the proof, we can use Tietze’s Extension
Theorem to extend the definition of fb to all graphings.
This possible non-uniqueness of the extension may be connected with the fact
that it is typically not easy to see the “meaning” of the extension of quite natural
graph parameters (cf. Supplement 20.17).
Our discussion in Section 20.2 shows that parameters of the type G 7→
ent∗ (G, H) = log t(G, H)/v(G) are estimable provided the weighted graph H is
sufficiently dense. In the next two sections we describe a couple of further interest-
ing examples of estimable graph parameters.
Not all natural parameters of graphs are estimable (see Examples 20.16 and
22.5). However, it was shown by Elek [2010a] that if we restrict ourselves to testing
properties on hyperfinite graphs, then many of these become testable. The method
is similar to property testing for hyperfinite graphs, which will be discussed later.
Example 22.5 (Independence ratio). Recall that α(G) denotes the maximum
size of a stable set in graph G. The independence ratio α(G)/v(G) is not estimable.
Let Gn be a random D-regular graph on 2n nodes, and G′n be a random bipartite
D-regular graph on 2n nodes. It is clear that α(G′n ) = n. In contract, α(Gn ) ≤
(1 − 2cD )n with high probability, where cD > 0 depends only (Bollobás [1980]).
The interlaced sequence (G1 , G′1 , G2 , G′2 , . . . ) is locally convergent (as discussed in
Example 19.7), but the independence ratios oscillate between 1/2 and something
less than 12 − cD , so they don’t converge.
However, it we restrict ourselves to the sequence Gn , then the independence
ratios form a convergent sequence; this is a recent highly nontrivial result of Bayati,
Gamarnik and Tetali [2011].
22.1.1. Number of spanning trees. Lyons [2005] proved that the number of
spanning trees tree(G), suitably normalized, is an estimable parameter of bounded
degree graphs. He in fact proved a more general result, allowing the degrees to be
unbounded, as long as the average degree remains bounded and the degrees don’t
vary too much; we treat the bounded case only, and refer for the exact statement
of the more general result to the paper.
Let G be a connected graph with n nodes and m edges, whose degrees are
bounded by D as always in this part of the book. It is easy to see that tree(G) ≤
Dv(G) ; a bit sharper,
∏
tree(G) ≤ deg(v),
v∈V (G)
whence
1 1 ∑
log tree(G) ≤ log deg(v).
n n
v∈V (G)
400 22. ALGORITHMS FOR BOUNDED DEGREE GRAPHS
The right hand side is clearly bounded and estimable, which reassures us that we
have the right normalization.
( )
Theorem 22.6. The graph parameter log tree(G) /v(G) is estimable for connected
bounded degree graphs.
Proof. Let G be a connected graph with all degrees bounded by D. It will
be convenient to choose D generously so that all degrees are in fact at most D/2.
We add D − deg(v) loops to each node v, to make the graph regular (here, a loop
adds only 1 to the degree) and also to make sure that its adjacency matrix A is
positive semidefinite. This does not change the number of spanning trees, and from
the samples of the original graph the samples of this augmented graph are easily
generated just by adding loops.
We start with developing formulas for tree(G) and its logarithm. Here we face
an embarrassment of riches: there are many formulas for tree(G) in the literature,
and possibly others would also work. We use (5.41), which we write as
1 ( t∗ (Cr , G) 1)
∑∞
1 n−1 log n
(22.2) log tree(G) = log D − − (log e) − .
n n n r=1
r Dr n
For every fixed r, the quantity t∗ (Cr , G) − 1/n is estimable. Since the other terms
in (22.2) are trivially estimable, we are almost done. But the problem is that we
have an infinite sum, and we need a convergent majorant. (This is where it becomes
important that we have subtracted 1/(nr) in every term!)
Lemma 22.7. For any r ≥ 0 and v ∈ V (G),
1 homv (Cr• , G) 1 2D1/3
≤ ≤ + .
n Dr n (r + 1)1/3
(It may help with the digestion of this formula that Dr = homv (Pr• , G), and so
the ratio in the middle expresses the probability that a random walk started at v
returns to v after r steps. Since the endpoint of a random walk becomes more and
more independent of the starting point, this probability tends to 1/n by elementary
properties of random walks. The main point is that the upper bound gives a uniform
bound on the rate of this convergence.)
Averaging over all nodes v, the lemma implies that
t∗ (Cr , G) 1 2D1/3
(22.3) 0≤ r
− ≤ .
D n (r + 1)1/3
This gives a convergent majorant, independent of G, for the infinite sum in (22.2),
which proves that (1/n) log tree(G) is estimable.
Proof of Lemma 22.7. Let P = (1/D)A (this is the transition matrix of the
random walk on G), and yr = P r 1v (this is the distribution of a random walk after
r steps). Clearly t∗v (Cr• , G)/Dr = 1T v P 1v = 1v yr . Since P is positive semidefinite,
r
we see from here that the values yr (v) are monotone decreasing, and since P r → n1 J
(where J is the all-1 matrix), yr (v) → n1 as r → ∞. This implies the lower bound
in the lemma.
To get the upper bound, we note that
∑∞ ∑∞
1
(22.4) ytT (I − P )yt + ytT (P − P 2 )yt = 1 −
t=0 t=0
n
22.1. ESTIMABLE PARAMETERS 401
Indeed, the matrices I − P and P − P 2 are positive semidefinite, hence all terms
here are nonnegative. Furthermore, P yt = yt+1 , so if we stop the sums at m steps,
then the middle terms telescope out, and we are left with y0T y0 − ym+1 T
ym+1 , where
y0 y0 = 1 and ym+1 ym+1 → 1/n.
T T
From (22.4) it follows that there is a t ≤ r such that ytT (I − P )yt ≤ 1/(r + 1).
Let x = 12 (yt (v) + n1 ), and let u be the closest node to v with yt (u) ≤ x. Consider a
shortest path v0 v1 . . . vk , where v0 = v and vk = u. Since yt (v0 ), . . . , yz (vk−1 ) ≥ x,
we must have k ≤ 1/x. On the other hand,
( )2 ( )2
yt (v) − x ≤ yt (v0 ) − yt (vk )
( ) ( )2
= (yt (v0 ) − yt (v1 ) + · · · + yt (vk−1 ) − yt (vk ))
( )
≤ k (yt (v0 ) − yt (v1 ))2 + · · · + (yt (vk−1 ) − yt (vk ))2
Dk D
≤ DkytT (I − P )yt ≤ ≤ .
r+1 x(r + 1)
Hence
D
(22.5) (yt (v) − x)2 x ≤ .
r+1
Substituting the definition of x, we get
( 1 )3 8D
yt (v) − ≤ 8(yt (v) − x)2 x ≤ .
n r+1
Since we know that yr (v) ≤ yt (v), this proves the lemma.
We note that Lyons gets a better estimate, with 1/2 in the exponent of r + 1
rather than 1/3, but the simpler bound above was good enough for our purposes.
( )
Once we know that the graph parameter log tree(G) /v(G) is estimable,
we
( also know ) that if Gn is a locally convergent graph sequence, then
log tree(Gn ) /v(Gn ) tends to a limit. From the proof, it is not difficult to fig-
ure out what the limiting graphing parameter (or involution-invariant-distribution-
parameter) is. We formulate the answer for a graphing, but it is easy to translate
this to the Benjamini–Schramm model. Given a graphing G and a number D such
that D/2 is an upper bound on the degrees, pick a random node x, and start a
random walk from x, where you have to add D − deg(y) loops to node y as you
go along. Let Xr be the indicator that the random walk returns to x after r steps
(not necessarily the first time). With this notation, we have
∑∞
log tree(Gn ) 1
(22.6) −→ log D − E(Xr ).
v(Gn ) r=1
r
The expression on the right describes the limit as a function of the limiting graphing.
(Note that its value may be −∞.)
Exercise
∑ 22.8. Suppose that we want to estimate the variance of the degrees, i.e.,
v∈V (G) (deg(v) − d0 )2 /v(G), where d0 is the average degree. (a) Show that this
parameter is (estimable. (b) Prove that ) we
( cannot estimate it using an estimator
)
of the form g BG,k (v1 ), . . . , BG,k (vk ) = h(BG,k (v1 ))+· · ·+h(BG,k (vk )) /k with
any function h : Bk → R.
402 22. ALGORITHMS FOR BOUNDED DEGREE GRAPHS
Example 22.10 (Forests). Let us look at a simple example that illustrates some of
the difficulties in designing algorithms for property testing for graphs with bounded
degree, even for monotone properties. Suppose that we want to test whether a graph
G is a forest. Our first thought might be to test whether a random ball contains a
cycle. Certainly, if it does, then the graph is not a forest. But drawing a conclusion
in the other direction is not justified: if the graph G has large girth, then every ball
will be tree, and G would be very far from being a forest. This shows that (unlike
in the dense case) P is not a good test property for itself. If in addition to this
22.2. TESTABLE PROPERTIES 403
we estimate the average degree and eliminate small components, we can design a
test for being a forest (Goldreich and Ron [2008]). To fill in the details makes an
interesting exercise.
We can use limit objects to give the following condition for testability (the
proof is immediate).
Proposition 22.11. A graph property P is not testable if and only if there exists
an ε > 0 and two convergent sequences of graphs (Gn ) and (Hn ) with Gn ∈ P and
d1 (Hn , P) > ε that have a common local limit.
22.2.3. Hyperfinite properties. Hyperfiniteness is particularly important
in property testing. Just as in the dense case, property testing is about the interplay
between the sampling distance and the edit distance, and these two distances are
intimately related for hyperfinite graphs.
Benjamini, Schramm and Shapira show that hyperfiniteness is in a sense
testable. This does not make sense as said, since hyperfiniteness is a property
of a family of graphs, not of a single graph. But if we quantify hyperfiniteness,
then we can turn it into a meaningful statement.
Proposition 22.12. For every ε there is an ε′ such that for any positive integer
k, the properties P1 = {(ε′ , k)-hyperfinite} and P1 = {not (ε, k)-hyperfinite} are
distinguishable.
Proof. Suppose that this is false, then by Theorem 22.9 there exist an ε > 0,
a sequence εn → 0, graphs Gn , G′n ∈ G and positive integers kn such that Gn is
(εn , kn )-hyperfinite, G′n is not (ε, kn )-hyperfinite, and δ⊙ (Gn , G′n ) → 0. Then the
sequence (Gn ) is hyperfinite. Let us select a convergent subsequence, then the limit
graphing of this subsequence is hyperfinite by Theorem 21.13. But the sequence
(G′n ) has the same limit, so it must be hyperfinite, by the same theorem. This
implies that there is a positive integer k such that all members of the sequence
are (ε, k)-hyperfinite. Since (G′n ) is not (ε, kn )-hyperfinite, it follows that kn < k
for all n. This implies that almost all connected components of G have at most k
elements, but then it follows from Gn → G that all but a o(1) fraction of connected
components of G′n have at most k elements, which implies that (G′n ) is an (ε, k)-
hyperfinite sequence.
Benjamini, Schramm and Shapira [2010] proved an important analogue of The-
orem 15.24: every minor-closed property of bounded degree graphs is testable. As
noted by Elek, the theorem can be extended to any monotone hyperfinite graph
property.
Theorem 22.13. Every monotone hyperfinite property of graphs with bounded de-
gree is testable.
The property of being a forest is certainly minor-closed, so the example discussed
at the end of the introduction above is a special case. As another special case,
planarity of bounded degree graphs is testable.
Proof. Let P be a monotone hyperfinite graph property, and suppose that it is
not testable. Then there exist an ε > 0 and two sequences of graphs (Gn ) and (Fn )
such that Gn ∈ P, d1 (Fn , P) > ε and δ⊙ (Gn , Fn ) → 0. We may assume that both
sequences are locally convergent, and so they have a common weak limit graphing
404 22. ALGORITHMS FOR BOUNDED DEGREE GRAPHS
Monotonicity of the property P was used in the proof above only in a somewhat
annoying technical way, and one would like to extend the argument to all hyperfinite
properties P. One must be careful though: the property that “G is a planar graph
with an even number of nodes” is hyperfinite, but not testable (a large grid with
an even number of nodes cannot be distinguished from a large grid with an odd
number of nodes by neighborhood sampling). But the method works with a little
twist: suppose that two graphs F and G have the same number of nodes, and we
know that G ∈ P, and the sampling distance of G and F is small; then it follows
that the edit distance of F from P is small. This implies that P is testable in a
non-uniform sense. For an exact formulation and details, see Newman and Sohler
[2011].
22.3. COMPUTABLE STRUCTURES 405
Exercise 22.15. Let G and G′ be (ε, k)-hyperfinite graphs with the same number
of nodes n. Prove that they can be overlayed so that
1
|E(G)△E(G′ )| ≤ 2(1 + Dk )ε + Dδ⊙ k
(G, G′ ).
n
same answer: that they want to remain unmatched (which gives an empty match-
ing, very far from being optimal). Symmetry does not allow them to give any other
answer, at least deterministically.
We can break the symmetry and find a matching close to the optimum, if we
allow the agents to flip coins. We can consider the coinflips generated by any agent
as a real number between 0 and 1, and call this the local random seed of the agent.
This takes an infinite number of coinflips, but only a finite and bounded number of
them has to be generated during the run of the algorithm.
Preprocessing. In our examples for computing a structure in the dense case,
preprocessing (computing a representative set) played a large role. In the bounded
degree case, there is less room for preprocessing. We can think of two kinds of
preprocessing:
• We can do some preliminary computation (perhaps randomized) indepen-
dently of the graph, and inform the agents about the result. If we are lazy, we can
just let the agents do this computation for themselves. The only information they
need for this is the random seed we use during the computation. So it suffices to
generate a random number in [0, 1] and tell it to all the agents. We call this number
the global random seed. (Note: they could generate a random number themselves,
but this would not be the same for all agents!)
• We can do preliminary computation using information about the graph. This
could be based on the distribution of r-balls in G for some fixed r (which is the
realistic possibility for us to obtain information about the graph), but perhaps we
have some other information about the graph (like somebody tells us that it is
connected). Again, we can let the agents work, just have to pass on to them the
information about the graph they need. In the strongest form, we let the agents
know what the graph is (up to isomorphism).
The task. Assume that our agents have to compute a decoration f : V (G) → C,
where C is a finite set. Not all decorations will be feasible, but we assume that
the feasibility criterion is local, i.e., there is an r ∈ N and a set of feasible C-
decorated r-neighborhoods such that a decoration is feasible if and only if every
r-neighborhood is feasible.
The goal is to find an “optimal” decoration. The decoration is evaluated locally
in the following sense: we associate a value ω(B) ∈ [0, 1] to every C-decorated
r-ball F ∈ F , and we want to minimize the average value of r-balls. Setting
ω(v) = ω(BG,r (v), f |BG,r (v) ), the cost of the decoration is defined by
1 ∑
w(f ) = ω(v).
v(G)
v∈V (G)
The agents want to compute a decoration f for which w(f ) is as small as possible.
Example 22.16 (Proper coloring). Suppose that we want to compute a proper
k-coloring of G. Then we choose C = [K] for some very large K, the feasibility
criterion is that the coloring should be proper (clearly this can be verified from the
1-neighborhoods), and we evaluate the coloring by imposing a penalty of 1 on every
node with color larger than k.
Example 22.17 (Maximum matching). Suppose that we want to compute a
maximum matching in G. Then we can take C = [D + 1]. Decoration with i,
22.3. COMPUTABLE STRUCTURES 407
i ≤ D, means that the node is matched with its i-th largest neighbor (in the order
of their local seeds); decoration with D + 1 means that the node is unmatched.
The feasibility criterion is clearly local. We impose a penalty of 1 for decoration by
D + 1.
Example 22.18 (Max-flow-min-cut). Suppose that we are given a 3-coloring
of the nodes of a graph G by red, white and green. We consider all edges to
have capacity 1, and would like to find a maximum flow from the red nodes to
the green nodes. This means a decoration of every node v by a rational vector
f (v) = (f1 , . . . , fD ), where fi is the flow it is sending to its i-th highest weighted
neighbor (this can be negative or positive). The sum of entries of f (v) is the gain
γ(v) of the node. Feasibility means that the flow on any edge uv, indicated in the
decoration of u, is the negative of the flow on this edge indicated in the decoration
of v; furthermore, the gain is 0 at every white node, nonnegative at every red node,
and nonpositive at every green node. The objective function is the sum of γ(v)
over the red nodes.
Computing the minimum cut fits in the framework quite easily too: We decorate
every node by either “LEFT” or “RIGHT”. Feasibility means that all red nodes
are decorated by “LEFT” and all green nodes are decorated by “RIGHT”. The
objective value is half of the average number of neighbors of a node on the other
side.
Algorithm 22.19.
Input: A graph G with maximum degree D and no isolated nodes in the agent
model, and an error bound ε.
Output: A random matching M such that with probability at least 1 − ε, |M | ≥
(1 − ε)ν(G).
As in most matching algorithms, we start with the empty matching, and aug-
ment it using augmenting paths: these are paths that start and end at unmatched
nodes, and every second edge of them belongs to the current matching M . Aug-
menting along such a path interchanging the matching edges and non-matching
edges) increases the size of the matching by 1. Of course, in our setting we will
have to augment simultaneously along many disjoint augmenting paths, to make
measurable progress. We will augment along augmenting paths of length at most
k = ⌈3/ε⌉, which we call short augmenting paths.
This is again done in rounds. It will be convenient to assume that in each
round, a new local seed is generated for every agent v. (They could get this from
a single random real number in [0, 1], by using all the bits in even position in the
first round, half of the remaining bits in the second, etc.) This way the rounds
will be independent of each other in the probabilistic sense. Augmentation along
many disjoint augmenting paths will be carried out simultaneously by our agents.
It is clear that agents looking at their neighborhoods with radius k will discover all
short augmenting paths. The problem is that there will be conflicts: these short
augmenting paths are not disjoint. To this end, we define when a path is better
than another, and we will augment only along those paths that are better than any
path intersecting them.
To be precise, we define that path P is better than path Q if walking along
both paths, starting from their endnodes with higher local seed, the first node that
is different has higher local seed in P than in Q. We will augment along paths
that are better than any path intersecting them; we call such a path locally best. If
we allow agents to explore their neighborhoods with radius 2k, then every locally
best short augmenting path will be discovered by at least one agent, who will carry
out the augmentation (i.e., send a message to the agents along the path how their
mates are to be changed). Several agents may do so for a given path, but there will
be no conflict between their messages.
The above is repeated q = 4D2k ⌈log(1/ε)⌉ times, then we stop and output the
current matching.
The idea in the analysis is that in a particular phase, we either find many
good short paths, and hence make substantial progress, or the number of all short
augmenting paths is small, in which case we have an almost maximum matching.
Let as call a node eligible (at a certain phase) if at least one short augmenting
path starts at it (such a node is of course unmatched). Let Mi be matching after the
i-th round, and let Xi denote the number of eligible nodes. Let M ′ be a maximum
matching, and consider the set Mi ∪ M ′ . This set of edges consists of the common
edges of Mi and M ′ , and cycles and paths whose edges alternate between M and
M ′ . Every cycle contains the same number of edges from Mi and M ′ . Paths that
contain more edges from M ′ than from Mi have to end with edges in M ′ at both
ends, and so they are augmenting paths. Thus the number of augmenting paths
is at least |M ′ | − |Mi | = ν(G) − |Mi |. The number of augmenting paths among
these that have length more than k is less than 2|M ′ |/k, so there are at least
22.3. COMPUTABLE STRUCTURES 409
( )
(1 − 2/k)ν(G) − |Mi | short augmenting paths, and Xi ≥ 2 − (4/k) ν(G) − 2|Mi |
eligible nodes.
Let u be an eligible node after phase i. All the short augmenting paths inter-
secting any of the short augmenting paths starting at u stay within BG,2k (u). Since
|BG,2k (u)| ≤ D2k , there is a chance of at least p = 1/D2k that u is the node with
highest local seed among them. Then the best path starting at u will be augmented
upon, and hence u has a chance of at least p to become matched in that round.
This means that
( ) 1 (k − 2 )
E |Mi+1 | Mi ≥ |Mi | + pXi ≥ |Mi | + p ν(G) − |Mi |
2 k
(here expectation is taken over random choices in the (i + 1)-st round), which we
can write as
(k − 2 ) (k − 2 )
E ν(G) − |Mi+1 | Mi ≤ (1 − p) ν(G) − |Mi | .
k k
Taking expectation over Mi , we get
(k − 2 ) (k − 2 )
E ν(G) − |Mi+1 | ≤ (1 − p)E ν(G) − |Mi | .
k k
Hence
(k − 2 ) (k − 2 ) k−2
E ν(G) − |Mq | ≤ (1 − p)q E ν(G) − |M0 | = (1 − p)q ν(G).
k k k
By Markov’s Inequality, this implies that
(k − 2 k−2 )
P ν(G) − |Mq | > ν(G) ≤ k(1 − p)q ≤ ke−pq ≤ ε.
k k2
So with probability at least 1 − ε, we have
k−2 k−2 (k − 1)(k − 2)
|Mq | ≥ ν(G) − 2
ν(G) = ν(G) ≥ (1 − ε)ν(G).
k k k2
This proves that the algorithm works as claimed.
We have seen that without local seeds, there is no way to approximately com-
pute a maximum matching. So the matching problem can be solved in model (B)
but not in (A).
Indeed, suppose that (ρ, y) is a point contained in the left side, then (ρ, y) ∈ K,
and (22.8) implies that there is an F ∈ G such that y ≥ u(ρ) > LF (ρ) − ε. On the
other hand, (ρ, y) ∈ HF implies that y ≤ LF (ρ) − ε, a contradiction.
Hence by Helly’s Theorem, there is a finite set of ∩graphs F1 , . . . , Fm ∈ A2r ,
where m ≤ |Br | + 1, such that K ∩ H = ∅, where H = i≤m HFi . Since K and H
are convex, there is a halfspace defined by a linear inequality y−L(ρ) ≤ b containing
H but disjoint from K. This means two things:
(a) The inequality y ≥ u(ρ) (ρ ∈ A2r ) implies that y − L(ρ) > b. This last
condition means that u(ρ) > L(ρ) + b for every ρ ∈ A2r .
(b) The linear inequalities y − LFi (ρ) ≤ ε imply the inequality y − ∑L(ρ) ≤ b.
By
∑ the Farkas Lemma, there are nonnegative numbers α i such that i αi = 1,
i αi LFi = L, and b ≥ −ε. The numbers α i form a probability distribution α on
[m].
Now we can give the following instruction to our agents: Use the even bits of
the public random number g0 to pick an i ∈ [m] from the distribution α. (All
agents will pick the same i.) Then pretend that you are working on the graph Fi ,
and compute the decoration fFi according to algorithm (C) (using the remaining
bits of g0 as the public random number). Then (using (22.7) in the last step) the
agents achieve a cost that is almost as good as the cost they could achieve knowing
the graph:
( ) ∑ ( ) ∑
E w(fFi ) = αj E w(fFj ) = αj LFj (ρG,2r ) = L(ρG,2r )
j j
≤ u(ρG,2r ) − b ≤ u(ρG,2r ) + ε ≤ c(G) + ε.
22.3.4. Computable structures and Borel sets. We conclude this chap-
ter with sketching a connection between algorithmic problems and measure theory.
Elek and Lippner [2010] give another algorithm for computing an almost maximum
matching in a large bounded degree graph. Their approach is based on the connec-
tions with Borel graphs, which were discussed in Section 18.1. Instead of describing
a second matching algorithm, we only illustrate the idea on a simpler example, by
showing how the proof that every Borel graph with degrees at most D has a Borel
coloring with D + 1 colors (Theorem 18.3) can be turned into an algorithm.
In the proof of Theorem 18.3, we start with constructing a countable Borel
coloring. This part of the argument can be translated easily. We have to select
an explicit countable basis for the Borel sets in [0, 1); for example, we can choose
intervals of the form [a/b, (a + 1)/b), where 0 ≤ a < b are integers. We have to
assign a positive integer index to each of these intervals, say (2a + 1)2b . Then every
agent picks the interval with smallest index that contains his local seed but not the
local seed of any of his neighbors. Now the agents have indices (they can forget the
seeds from now on). Trivially, adjacent agents have different indices.
Next, every agent whose index is smaller than the indices of his neighbors
changes his index to 1, and labels himself FINISHED. (In the proof of Theorem
412 22. ALGORITHMS FOR BOUNDED DEGREE GRAPHS
18.3, only those with index 2 not adjacent to any node with index 1 did so in the first
round; but it is easy to see that eventually all nodes with a locally minimal index
will change to 1). Next, all those agents whose index is smaller than the indices of
all their unfinished neighbors change their indices to the smallest possible, etc.
At this point comes an important difference: for Borel coloring, we could repeat
this infinitely many times, but here we have a time bound. Those nodes that
managed to change their indices are now properly colored with D+1 colors; however,
there will be some who are stuck with their large original indices. Down-to-earth
work starts here to show that their number is a small fraction of v(G). We don’t
go into the details (see Exercises 22.22, 22.23).
Exercise 22.21. Describe how Algorithm 22.19 can be simplified in two simpler
versions of the problem: (a) we only want to find a maximal (non-extendable)
matching (of course, with an error); (b) somebody marks a matching for us, and
we have to test whether it is maximum (again, with some error).
Exercise 22.22. Prove that any constant time distributed algorithm that con-
structs a legitimate coloring of a cycle will use, with high probability, more than
100 colors.
Exercise 22.23. Prove that for every ε > 0 there is a k ≥ 1 such that the
algorithm (with k rounds) as described above will produce a coloring in which
fewer than εv(G) nodes have color larger than D.
Part 5
types of k-broken graphs. The entry in the intersection of the row corresponding
to G1 and the column corresponding to G2 is f (G1 ∗ G2 ). Note that for k = 0, we
have M (f, 0) = M ′ (f, 0), but for other values of k, connection and edge-connection
matrices are different.
Let G be a finite graph. An edge-coloring model is determined by a mapping
h : Nq → R, where q is positive integer. We call h the node evaluation function.
Here we think of [q] as the set of possible edge colors; for any coloring of the edges
and d ∈ Nq , we think of h(d) as the “value” of a node incident with dc edges
with the color c (c ∈ [q]). In statistical physics this is called a vertex model: the
edges can be in one of several states, which are represented by the color; an edge-
coloring represents a state of the system, and (assuming that h > 0) ln h(d) is the
contribution of a node (incident with dc edges with color c) to the energy of the
state.
There are many interesting and important questions to be investigated in con-
nection with edge-coloring models; we will only consider what in statistical physics
terms would be called its “partition function”. To be more precise, for an edge-
coloring φ : E(G) → [q] and node v, let degc (φ, v) denote the number of edges e
incident with node v with color φ(e) = c. So the vector deg(φ, v) ∈ Nq is the “local
view” of node v. The edge-coloring function of the model is defined by
∑ ∏ ( )
col(G, h) = h deg(φ, v) .
φ: E(G)→[q] v∈V (G)
Recall that we allow the graph ⃝ consisting of a single edge with no endpoints; by
definition, col(⃝, h) = q. We also allow that q = 0, in which case col(G, h) = 1 if G
has no edges, and col(G, h) = 0 otherwise. We could of course allow complex valued
node evaluation functions, in which case the value of the edge-coloring function can
be complex.
Example 23.1 (Number of perfect matchings). The number of perfect match-
ings can be defined by coloring the edges by two colors, say black and white, and
requiring that the number of black edges incident with a given node be exactly
one. This means that this number is col(., h), where h : N2 → R is defined by
h(d1 , d2 ) = 1(d1 = 1). The number of all matchings could be expressed similarly.
Example 23.2 (Number of 3-edge-colorings). This number is col(., h), where
h : N3 → R is defined by h(d1 , d2 , d3 ) = 1(d1 , d2 , d3 ≤ 1).
Example 23.3 (Spectral decomposition of a graphon). Recall the definition
(7.18) and expression (7.25) for t(F, W ) in terms of the spectrum of TW . We
can consider χ as a coloring of E(F ) with colors 1, 2, . . . . Then Mχ (v) depends
only
( on the) numbers of edges with different colors, and so we can write Mχ (v) =
h deg(χ, v) , and we get
∑ ∏ ∏ ( )
t(F, W ) = col(G, λ, h) = λχ(e) h deg(χ, v) .
χ: E(G)→[q] e∈E(F ) v∈V (G)
However, this is not a proper edge-coloring model, since the value of the circle,
which is the number of colors, is infinite in general.
The following facts about the edge-connection matrices of edge-coloring func-
tions are easy to prove along the same lines as Proposition 5.64:
418 23. OTHER COMBINATORIAL STRUCTURES
product of these tensors, and then summing over all choices of the indices. Note
that every index occurs twice, so we could call this “tracing out” every index.
These tensor networks play an important role in several areas of physics, but
we can’t go into this topic in this book.
This setup allows for a more general construction. If we have a tensor network
with k broken edges, then the value associated with the graph will depend on the
color of these edges, in other words, it will be described by an array (Ai1 ,...,ik : ir ∈
[q]). So the graph with k broken edges can be considered as a gadget itself.
We can break down the procedure of assembling a tensor network from the
gadgets (with or without broken edges) into two very simple steps:
(a) We can take the disjoint union of two gadgets; if the gadgets have k and
l legs, respectively, the union has k + l legs. In terms of multilinear algebra, this
means to form the tensor product of two tensors.
(b) We can fuse two legs of a gadget. If (Ai1 ,...,ik : ir ∈ [q]) is the tensor
describing the gadget, and (say) we fuse legs k − 1 and k, then we get the tensor
∑
Bi1 ,...,ik−2 = Ai1 ,...,ik−2 ,j,j .
j∈[q]
If we replace every edge by this path of length 3, then the value of the graph does
not change. However, we can group together every original gadget B with the
orthogonal matrices next to, to get a gadget B A , which—in multilinear algebra
terms—is obtained from B by applying the linear transformation A to every slot.
If we replace every gadget B in the kit by B A , then the value of the tensor network
does not change.
Figure 23.1. Replacing every edge by a path with the same or-
thogonal transformation at both inner nodes (just facing the op-
posite direction), and regrouping does not change the value.
Now consider a tensor network with broken edges. If we replace every tensor B
in the kit by B A , then the matrices A and AT along the unbroken edges still cancel
each other, but on the broken edges, one copy still remains. In other words, if we
apply the same orthogonal transformation to every slot of every tensor in the kit,
then the tensor defined by a tensor network with broken edges undergoes the same
transformation.
In particular, if all tensors in the kit have the property that a particular or-
thogonal transformation applied to all their slots leaves them invariant, then the
23.3. HYPERGRAPHS 421
same holds for every assembled tensor. The theorem of Schrijver [2008a] asserts
that this is the only obstruction to assembling a given tensor.
Theorem 23.7. Let T be a traced tensor algebra generated by a set S of tensors,
including the identity tensor 1(i = j) (i, j ∈ [q]). Then a tensor T is in T if
and only if it is invariant under every orthogonal transformation that leaves every
tensor in S invariant.
The special case when the generating tensors are symmetric describes edge-
coloring models. This can be viewed as an analogue of Theorem 6.38, with the role
of the edges and nodes interchanged. Regts [2012] showed how Theorem 23.7 yields
an exact formula for the edge-connection rank of edge-coloring models.
Example 23.8 (Number of perfect matchings revisited). The tensor model
for this graph parameter is a bit more complicated than in Example 23.1. We have
2 edge colors (which will be convenient to call 0 and 1), so we work over R2 ; but
we need to specify a tensor for every degree d, expressing that exactly one edge is
black:
Ti1 ,...,id = 1(i1 + · · · + id = 1).
It is easy to see that no orthogonal transformation, applied to all slots, leaves
this tensor invariant, so it follows from Theorem 23.7 that every tensor can be
assembled from this kit. (We note that the tensor is invariant under permuting
the slots; however, this symmetry is not preserved under composition of tensor
networks.)
Example 23.9 (Number of 3-edge-colorings revisited). To construct a tensor
model for the number of 3-edge-colorings, we work over R3 . We again need to specify
a tensor for every degree expressing that the edges have different colors:
Ti1 ,...,id = 1(i1 , . . . , id are different)
(for d > 3, we get the 0 tensor). Permuting the colors (i.e., the coordinates in
the underlying vector space R3 ) leaves this tensor invariant, and these are the only
orthogonal transformations of R3 with this property. Theorem 23.7 implies that a
tensor is invariant under the permutations of the coordinates of R3 if and only if it
can be assembled from this kit.
23.3. Hypergraphs
When talking about generalizing results on graphs, the first class of structures
that comes to mind is hypergraphs (at least to a combinatorialist). So it is per-
haps surprising that to extend the main concepts and methods developed in this
book (quasirandomness, limit objects, Regularity Lemma, and Counting Lemma)
to hypergraphs is highly nontrivial. Even the “right” formulation of the Regularity
Lemma took a long time to find, and in the end both the Regularity Lemma and
the limit object turned out quite different from what one would expect as a naive
generalization. Nevertheless, the issue is essentially solved now, thanks to the work
of Chung, Elek, Graham, Gowers, Rödl, Schacht, Skokan, Szegedy, Tao and others.
A full account of this work would go way beyond the possibilities of this book, but
we will give a glimpse of the results.
By an r-uniform hypergraph, or briefly r-graph, ( we
) mean a pair H = (V, E),
where V = V (H) is a finite set and E = E(H) ⊆ Vr is a collection of r-element
422 23. OTHER COMBINATORIAL STRUCTURES
subsets. The elements of V are called nodes, the elements of E are called edges.
So 2-graphs are equivalent to simple graphs. We can define the homomorphism
number hom(G, H) of an r-graph G into an r-graph H in the natural way, as the
number of maps φ : V (G) → V (H) for which φ(A) ∈ E(H) for every A ∈ E(G).
The homomorphism density of G in H is defined as one expects, by the formula
hom(G, H)
t(G, H) =
|V (G)||V (H)|
Quasirandomness can be defined by generalizing the condition on the density
of quadrilaterals. We need to define a couple of special hypergraph classes. Let Knr
( )
denote the complete r-uniform hypergraph on [n] (i.e., E(Knr ) = [n] r
r ). Let Lk be
the “complete r-partite hypergraph” defined on the node set V1 ∪· · ·∪Vr , where the
Vi are disjoint k-sets, and the edges are all r-sets containing exactly one element
from each Vi . Clearly t(Krr , H) = t(Lr1 , H) is the edge density of H. It is not
r
hard to prove that t(Lrk , H) ≥ t(Krr , H)k for every H (this generalizes inequality
2.9 from the Introduction). We define the quasirandomness of H as the difference
r
qr(H) = t(Lr2 , H) − t(Krr , H)2 .
A sequence (Hn ) of hypergraphs is called quasirandom with density p if
r
t(Krr , Hn ) → p and qr(Hn ) → 0, or equivalently, t(Lr2 , Hn ) → p2 . It was proved
by Chung and Graham [1989] that this implies that t(G, Hn ) → pe(G) for every
r-graph G, so the equivalence of conditions (QR2) and (QR3) for quasirandomness
in the Introduction (Section 1.4.2) generalizes nicely.
As a first warning that not everything extends in a straightforward way, let
us try to generalize (QR5). A first guess would be to consider disjoint sets
X1 , . . . , Xr ⊆ V , and then stipulate that the number of edges with one endpoint in
each of them is p|X1 | . . . |Xr | + o(nr ). (For simplicity of presentation, we assume
that v(Hn ) = n.) This property is indeed valid for every quasirandom sequence,
but it is strictly weaker than quasirandomness. It is not well-defined what the
“right” generalization is; we state one below, which is a version of a generalization
found by Gowers. Several other equivalent conditions are given by Kohayakawa,
Rödl and Skokan [2002].
Proposition 23.10. A sequence (Hn ) of hypergraphs is quasirandom with density
p if and only if for every (r−1)-graph Gn on V (Hn ), the number of( edges
) of Hn that
induce a complete subhypergraph in Gn is t(Krr−1 , Gn )t(Krr , Hn ) nr + o(nr ).
In the case of simple graphs (r = 2), let Hn be a simple graph with edge density
p. The 1-graph Gn means simply a subset of V (Hn ), and K21 is just a 2-element
set. So the condition says that the number of edges of the graph Hn induced by
the set Gn is asymptotically
( ) ( ( ) ( )
n |Gn | )2 2e(Hn ) n |Gn |
t(K21 , Gn )t(K22 , Hn ) = ∼ p ,
2 n n2 2 2
and so we get condition (Q4). For general r, the condition can be rephrased as
follows: for a random r-set X ⊆ V , the events that X is complete in Gn and X is
an edge in Hn are asymptotically independent.
The last remark takes us to another complication.
Example 23.11. Let G(n, 1/2) be a random graph and let Tn denote the 3-graph
formed by the triangles in G(n, 1/2). Then Tn is a 3-graph with density 1/8, which
is random in some sense, but it is very different from the random 3-graph Hn on
23.3. HYPERGRAPHS 423
[n] obtained by selecting every edge independently with probability 1/8. In fact,
the sequence (Hn ) is quasirandom with probability 1 (this is not hard to see), while
Tn has a very small intersection with every quasirandom 3-graph by Proposition
23.10. Also, Tn has some special features, like no 4-set of nodes contains exactly 3
edges of Hn .
On the other hand, Tn is totally homogeneous. It has no special global struc-
ture; more concretely: on any two disjoint k-sets we see independent copies of the
same random hypergraph. If we want to generalize the Regularity Lemma, it has to
reflect the difference between Tn and Hn , and similarly for the generalization of the
notion of graphons. Which of these sequences should tend to a constant function?
We show how to overcome this difficulty, starting with the construction of the
limit object. We say that a sequence of r-graphs (Hn ) is convergent, if v(Hn ) → ∞
and t(F, Hn ) has a limit as n → ∞ for every r-graph F . Let t(F ) denote this
limit. How to represent this limit function, in other words, what is the hypergraph
analogue of a graphon? The natural guess would be a symmetric r-variable function
W : [0, 1]r , which would represent the limit by
∫ ∏
t(F, W ) = W (xi1 , . . . , xir ) dx.
[0,1]r {i ,...,i }∈E(F )
1 r
The example of the hypergraphs Hn and Tn above show that this cannot be right.
The only reasonable candidate for their limit object would be the function W ≡ 1/8,
which represents correctly the limiting densities for the sequence Hn , but not for
the sequence Tn . We could make life even more complicated, and consider the
intersection Hn ∩ Tn , which is a random 3-graph with expected density 1/64, and
the limiting densities are even more complicated.
For r > 3, one could construct a whole zoo of homogeneous random hyper-
graphs, generalizing the construction of Hn and Tn . After several steps of general-
ization, one arrives at the following: we generate a random coloring of Knj for every
0 ≤ j ≤ r (with any number of colors). To decide whether an r-subset X ⊆ [n]
should be an edge, we look at the colors of its subsets, and see if this coloring be-
longs to some prescribed family of colorings of 2X . (We assume that the prescribed
family is invariant under permutations of X.)
While this example warns us of complications, it also suggests a way out: we
describe the limit not in the r-dimensional but in the 2r -dimensional space. In fact,
the limit object turns out to be a subset, rather than a function, which is a gain
(it is of course very little relative to the increase in the number of coordinates).
[r]
Consider the set [0, 1]2 (so we have a coordinate xI for every I ⊆ [r]; the
coordinate for ∅ will play no role, we can think of it as 0). Let us note that the
[r]
symmetric group Sr acts on the power set 2[r] , and hence also on [0, 1]2 . Let
[r]
U ⊆ [0, 1]2 be a measurable set that is invariant under the action of Sr . We call
such a set a hypergraphon.
For every hypergraphon U , we define the density of an r-graph F as follows.
We assign independent random variables XS , uniform in [0, 1], to every subset
S ⊆ V (F ) with |S| ≤ r. For every edge A = {a1 , . . . , ar } ∈ E(F ), and every
I ⊆ [r], we denote by AI the subset {ai : i ∈ I}, and we consider the point
[r]
X(A) ∈ [0, 1]2 defined by (X(A))I = XAI (this depends on the ordering of A, but
424 23. OTHER COMBINATORIAL STRUCTURES
this will not matter thanks to our symmetry assumption about U ). Now we define
( )
t(F, U ) = P X(A) ∈ U for all A ∈ E(F ) .
To illuminate the meaning of this formula a little, consider the case r = 2.
Then we have U ⊆ [0, 1]3 , where the three coordinates correspond to the sets {1},
{2} and {1, 2} (as we remarked above, the empty set plays no role). For a graphon
W , we define the set UW = {(x1 , x2 , x12 ) ∈ [0, 1]3 : x12 ≤ W (x1 , x2 )}. Then it is
easy to see that t(F, UW ) = t(F, W ) for any simple graph F .
Elek and Szegedy [2012] prove the following.
Theorem 23.12. For every convergent sequence (Hn ) of r-graphs there is a hy-
pergraphon U such that t(F, Hn ) → t(F, U ) for every r-graph F .
The limit graphon is essentially unique up to some “structure preserving trans-
formations”, which are more difficult to define than in the case of graphs and we
don’t go into the details. Elek and Szegedy [2012] give several applications of
Theorem 23.12. For a given hypergraphon U , they define U -random hypergraphs
and prove that they converge to U . They derive from it the Hypergraph Removal
Lemma due to Frankl and Rödl [2002], Gowers [2006], Ishigami [2006], Nagle, Rödl
and Schacht [2006] and Tao [2006a]. As a refreshing exception, the statement of
this lemma is a straightforward generalization of the Removal Lemma for graphs
(Lemma 11.64); the proof of Elek and Szegedy is similar to our second proof in
Section 11.8. They also derive the Hypergraph Regularity Lemma using Theorem
23.12, using a stepfunction approximation of hypergraphons.
This brings us to the Hypergraph Regularity Lemma, a very important but
also quite complicated statement. There are several essentially equivalent, but not
trivially equivalent forms, due to Frankl and Rödl [1992], Gowers [2006, 2007],
Rödl and Skokan [2004], Rödl and Schacht [2007a, 2007b]. Proving the appropriate
Counting Lemma for these versions is a further difficult issue, and I will not go into
it. But I must not leave this topic without stating at least one form, based on the
formulation of Elek and Szegedy [2012], which in fact generalizes the strong form
of the Regularity Lemma (Lemma 9.5).
We have to define what we mean by “regularizing” a hypergraph. For ε, δ > 0
and k ∈ N, we define an (α, β, k)-regularization of a r-graph H on [n] as follows. For
every i ∈ [r], we partition the complete hypergraph Kni into r-graphs Gi,1 , . . . , Gi,k .
Let us think of the edges in Gi,j as colored with color j. This defines a partition
P of the edges of Knr , where two r-sets are in the same class if the colorings of
their subsets are isomorphic. The family {Gi,j : i ∈ [r], j ∈ [k]}, together with an
r-graph G on [n] will be called an (α, β, k)-regularization of H, if
(a) every r-graph Gi,j has quasirandomness at most α, and
(b) G is the union of some of the classes of P, and
( )
(c) |E(H)△E(G)| ≤ β nr .
Now we can state one version of the Hypergraph Regularity Lemma.
Lemma 23.13 (Strong Hypergraph Regularity Lemma). For every r ≥ 2
and every sequence ϵ = (ε0 , ε1 , ...) of positive numbers there is a positive integer
kϵ such that for every r-graph H there is an integer k ≤ kϵ such that H has an
(εk , ε0 , k)-regularization.
The main point is that to regularize H, we have to partition not only its node
set, but also the set of i-tuples for all i ≤ r. Just like in the graph case, we could
23.4. CATEGORIES 425
demand that the i-graphs Gi,j have almost the same number of edges for every
fixed i. Of course, the prize we have to pay for stating a relatively compact version
is that it takes more work to apply it; but we don’t go in that direction.
The extension of the theory exposed in this book to hypergraphs is not com-
plete, and there is space for a lot of additional work. Just to mention a few loose
ends, it seems that no good extension of the distance δ has been found to hy-
pergraphs (just as in the case of limit objects or the regularity lemma, the first
natural guesses are not really useful). Another open question is to extend these re-
sults to nonuniform hypergraphs, with unbounded edge-size. The semidefiniteness
conditions for homomorphism functions can be extended to hypergraphs (see e.g.
Lovász and Schrijver [2008]), but perhaps this is just the first, “naive” extension.
One area of applications of these conditions is extremal graph theory. The work of
Razborov [2010] shows that generalizations of graph algebras and of the semidefi-
niteness conditions can be useful in extremal hypergraph theory. However, we have
seen that graph algebras can be defined in the setting of gluing along nodes and
also along edges, and this indicates that for hypergraphs a more general concept of
graph algebras may be useful.
23.4. Categories
The categorial way of looking at mathematical structures is quite prevalent
in many branches of mathematics. In graph theory, the use of categories (as a
language and also as guide for asking question in a certain way) has been practiced
mainly by the Prague school, and has lead to many valuable results; see e.g. the
book by Hell and Nešetřil [2004].
One can go a step further and consider categories (with appropriate finiteness
assumptions) as objects of combinatorial study on their own right. After all, cat-
egories are rather natural generalizations of posets, and there is a huge literature
on the combinatorics of posets. However, surprisingly little has happened in the
direction of a combinatorial theory of categories; some early work of Isbell [1991],
Lovász [1972] and Pultr [1973], and the more recent work of Kimoto [2003a, 2003b]
can be cited.
Working with graph homomorphisms, we have found not only that the cate-
gorial language suggests very good questions and a very fruitful way of looking at
our problems, but also that several of the basic results about graph homomorphism
and regularity can be extended to categories in a very natural way. The goal of this
section is to describe these generalizations, and thereby encourage a combinatorial
study of categories. (Appendix A.8 summarizes some background.)
(a) If both a and b have at least one morphism into c, then a and b are isomor-
phic.
(b) There exists an isomorphism from a × c to b × c that commutes with the
projections of a × c and b × c to c.
So if there is any isomorphism σ in Figure 23.3, then there is one for which the
diagram commutes.
Figure 23.3.
The conditions are very similar to those in Theorem 5.54, except that there
the graphs cannot have loops and the matrices are indexed by monomorphisms
only. As a consequence, the characterization concerns homomorphism numbers
into weighted graphs, which has not been extended to categories so far.
The proof of Theorem 23.16 is built on similar ideas as the proof of Theorem
5.54 in Chapter 6, using algebras associated with the category. Since it is instructive
how such algebras can be defined, we describe their construction below; for the
details of the proof, we refer to the paper of Lovász and Schrijver [2010].
For two objects a and b in a locally finite category K, a formal linear com-
bination (with real coefficients) of morphisms in K(a, b) will be called a quantum
morphism. Quantum morphisms between a and b form a finite dimensional linear
space Q(a, b). Let
∑ ∑
x= xφ φ ∈ Q(a, b) and y= yψ ψ ∈ Q(b, c),
φ∈K(a,b) ψ∈K(b,c)
then we define ∑
xy = xφ yψ φψ ∈ Q(a, c).
φ∈K(a,b)
ψ∈K(b,c)
With this definition, quantum morphisms form a category Q on the same set of
objects as K. (Of course, Q is not locally finite any more, but it is locally finite
dimensional.)
We can be more ambitious and take formal linear combinations of morphisms
in Kaout (for a fixed object a), to get a linear space Qout
a . This space will be infinite
dimensional in general, but it has interesting finite dimensional factors. For each
object a, the pushout operation ∧ defines a semigroup on Kaout . Let Qout a denote
its semigroup
⊕ algebra of all formal finite linear combinations of morphisms in Kaout .
So Qa = b Q(a, b).
out
This operation extends linearly to define xy ∗ for x ∈ Q(a, b) and y ∈ Q(c, b). It is
not hard to check that x(zy)∗ = (xy ∗ )z ∗∑
, and ⟨x, yz ∗ ⟩ = ⟨xz, y⟩.
For every quantum morphism x = φ xφ φ ∈ Q(a, b) and every object c, we
define the c-norm of x by
∥xβ∥∞
∥x∥c = max .
β∈K(b,c) |K(a, b)|
This norm generalizes the cut norm: if a = K2 and c = K2◦ , then a symmetric
quantum morphism x ∈ Q(a, b) is a weighting of the edges of b, and it is not hard
to see that ∥x∥ /2 ≤ ∥x∥c ≤ ∥x∥ .
Let cm denote the m-th direct power of the object c. The first inequality in
the following lemma generalizes the Frieze–Kannan Weak Regularity Lemma 9.3,
while the second implies the Original Regularity Lemma of Szemerédi 9.2.
Lemma 23.20. Let K be a locally finite category having finite direct products. Let
a, b and c be three objects in K, and let m ≥ 1. Then for every x ∈ Q(a, b) there
exists a morphism φ ∈ K(b, cm ) and a quantum morphism y ∈ Q(a, cm ) such that
1
∥x − yφ∗ ∥c ≤ √ ∥x∥2
m
and
1
∥x − yφ∗ ∥c2m ≤ √ ∥x∥2 .
log∗ m
The Weak Regularity Lemma is obtained, as described above, by taking a = K2
and c = K2◦ and applying the first bound. Note that a morphism in K(b, cm )
corresponds to a partition of V (G) into 2m classes. The Original Regularity Lemma
can be derived from the second bound similarly. Strong versions can be generalized
as well, but for the details we refer to Lovász [Notes].
There are many unsolved questions here: can the Counting Lemma be general-
ized to categories? Do the notions of convergence and limit objects be formulated
23.5. AND MORE... 429
in an interesting way? Could these results shed new light on hypergraph limits and
regularity lemmas? Or perhaps even on sparse regularity lemmas?
Exercise 23.21. Let K be a locally finite category, and let c be an object. Prove
that every monomorphism in K(c, c) is an isomorphism.
Exercise 23.22. Let K be a locally finite category, and let c, d be two objects.
Suppose that there are monomorphisms in K(c, d) and in K(d, c). Prove that c
and d are isomorphic.
Exercise 23.23. Let K be a locally finite category, and let c and d be two objects.
For any two morphisms α ∈ K(a, a′ ) and β ∈ K(b, b′ ), let Nα,β denote the number
of 4-tuples of morphisms (φ, ψ, µ, ν) (φ ∈ K(c, a), ψ ∈ K(c, b), µ ∈ K(a′ , d), ν ∈
K(b′ , d)) such that φαµ = ψβν. Prove that the matrix N = (Nα,β ), where α and
β range over all morphisms of the category, is positive semidefinite.
Exercise 23.24. Let a and b be two objects in a locally finite category. Suppose
that the direct powers a × a and b × b exist and are isomorphic. Prove that a and
b are isomorphic.
Exercise 23.25. Let a, b, c, d be four objects in a locally finite category K such
that the direct products a × c, b × c, a × d and b × d exist, a × c and b × c are
isomorphic, and d has at least one morphism into c. Prove that a × d and b × d
are isomorphic.
{a1 , . . . , ak } ⊆ [n], we can define a permutation π[A] ∈ Sk by letting π[A]i < π[A]j
iff πai < πaj . For a permutation τ ∈ Sk , let Λ(τ, π) denote the number( ) of sets A with
π[A] = τ , and define the density of τ in π by t(τ, π) = Λ(τ, π)/ nk . A sequence
of permutations π1 , π2 , . . . (on larger and larger sets) is convergent, if for every
permutation τ , the number t(τ, πn ) tends to a limit as n → ∞. Every convergent
permutation sequence has a limit object in the form of a coupling measure on [0, 1]2 ,
which is uniquely determined. Král and Pikhurko [2012] have used this machinery
of limit objects to prove a conjecture of Graham on permutations.
I have already mentioned the limit theory of metric spaces due to Gromov
[1999]. While developed with quite different applications in mind, this turns out to
be closely related to our theory of graph limits. Gromov considers metric spaces
endowed with a probability measure, and defines distance, convergence and limit
notions for them. A simple graph G can be considered as a special case, where
the distance of two adjacent nodes is 1/2, the distance of two nonadjacent nodes
is 1, and the probability distribution on the nodes is uniform. Under this corre-
spondence, our notion of graph convergence is a special case of Gromov’s “sample
convergence” of metric spaces. Vershik [2002, 2004] considers random metric spaces
on countable sets, and defines and proves their universality. He also characterizes
isomorphism of metric spaces with measures in terms of sampling, analogously to
Theorem 13.10. In a recent paper, Elek [2012b] explores this connection and shows
how Gromov’s notions imply results about graph convergence, and also how results
about graph limits inspire answers to some questions about metric spaces. Perhaps
Gromov’s theory can be applied to graph sequences that are not dense, using the
standard distance between nodes in the graph.
One of the earliest limit theories is John von Neumann’s theory of continuous
geometries. The idea here is that if we look at higher and higher dimensional vector
spaces over (say) the real field, then the obvious notion of their limit is the Hilbert
space. But, say, we are interested in the behavior of subspaces whose dimension
is proportional to the dimension of the whole space. Going to the Hilbert space,
this condition becomes meaningless. Neumann constructed a limit object, called
a continuous geometry, in which the “dimensions” of subspaces are real numbers
between 0 and 1. This construction can be extended to certain geometric lattices
(Björner and Lovász [1987]), but its connection with the theory in this book has
not been explored.
Perhaps most interesting from the point of view of quasirandomness and limits
are sequences of integers, due to their role in number theory. (After all, Szemerédi’s
Regularity Lemma was inspired by his solution of the Erdős–Turán problem on
arithmetic progressions in dense sequences of integers.) Often sequences are con-
sidered modulo n; this gives a finite group structure to work with, while one does
not lose much in generality. Ever since the solution of the Erdős–Turán problem
for 3-term arithmetic progressions by Roth [1952], through the general solution by
Szemerédi [1975], through the work of Gowers [2001] on “Gowers norms”, to the
celebrated result of Green and Tao [2008] on arithmetic progressions of primes, a
central issue has been to define and measure how random-like a set of integers is.
I will not go into this large literature; Tao [2006c] and Kra [2005] give accessible
accounts of it. What I want to point out is the exciting asymptotic theory of struc-
tures consisting of an abelian group together with a subset of its elements, and
more generally, abelian groups with a function defined on them. There has been a
23.5. AND MORE... 431
lot of parallel developments in this area, most notably the work of Green, Tao and
Ziegler [2011] and of Szegedy [2012a]. Not surprisingly, the latter is closer to the
point of view taken in this book, and develops a theory of limit objects of functions
on abelian groups, which is full of surprises but also with powerful results. (For
example, to describe the limits of abelian groups, non-abelian groups are needed!)
The theory has connections with number theory, ergodic theory, and higher-order
Fourier analysis. This explains why I cannot go into the details, and can only refer
to the papers.
APPENDIX A
Appendix
This is perhaps easier to understand in a matrix algebra setting. Let M(L) denote
the set of L × L matrices A in which Axy = 0 for any two lattice elements x y. It
is easy to see that M(L) is closed under addition, matrix multiplication and matrix
inverse (if an inverse exists), and so it is a matrix algebra. One special matrix of
importance is the zeta matrix Z ∈ M(L) defined by Zxy = 1(x ≤ y). Clearly Z is
invertible, and M = Z −1 is a matrix with integer entries, called the Möbius matrix.
The entries of M give the Möbius function: Mxy = µ(x, y).
For ∑every function f : L → C, we define its (upper) summation function
g(x) = y≥x f (y). From g, we can recover function f by the formula f (x) =
∑
y≥x µ(x, y)g(y). This is again better seen in a matrix form: we consider f and
g as vectors in CL , then g = Zf , which is equivalent to f = Z −1 g = M g. Of
course, we can turn the lattice upside down, and derive similar formulas for the
lower summation.
The following simple but very useful matrix identity is due to Lindström [1969]
and Wilf [1968]. Let f : L → R be any function, Af be the L × L matrix with
(Af )xy = f (x ∨ y). Then
(A.1) Af = Zdiag(M f )Z T .
An important consequence of this identity states that Af is positive semidefinite if
and only if the Möbius inverse of f is nonnegative.
Example A.1. If L is the lattice of subsets of a finite set S, then µ(X, Y ) =
(−1)|Y \X| for all X ⊆ Y ⊆ S. Möbius inversion is equivalent to the inclusion-
exclusion formula in this case.
Example A.2. Consider the lattice of partitions Πn of the finite set [n], where
the bottom element is the discrete partition P0 (with n classes), the top element
is the indiscrete partition P1 (with one class), and P ≤ Q means that P refines
Q. The Möbius function of this lattice is given by the Frucht–Rota–Schützenberger
Formula
∏
(A.2) µP = µ(0, P ) = (−1)n−|P | (|S| − 1)!
S∈P
433
434 A. APPENDIX
where |P | denotes the number of classes in the partition P. (This easily implies a
formula for µ(Q, P ), but we won’t need it.)
For the partition lattice, we need some simple identities: for every P ∈ Πn ,
∑
(A.3) (x)|R| = x|P |
R≥P
By Möbius inversion,
∑
(A.4) µP x|P | = (x)n ,
P
and from the Lindström–Wilf Formula,
∑
(A.5) µP µQ x|P ∨Q| = (x)n .
P,Q
See Van Lint and Wilson [1992] for more on the Möbius function of a lattice.
This definition does not in any way indicate the many uses this polynomial has.
The recurrence relation
(A.7) tut(G; x, y) = tut(G − e; x, y) − tut(G/e; x, y),
where e ∈ E(G) is any edge that is not a cut-edge or a loop, says much more (here
G/e denotes the graph obtained from G by contracting e, i.e., deleting one copy
of e and identifying its endpoints). If the G has i loops and j cut-edges, and no
other edges, then tut(G; x, y) = xi y j . The Tutte polynomial is multiplicative over
connected components. There are many graph invariants that satisfy recurrence
(A.7) (or some very similar recurrence), and these can be expressed as substitutions
into the Tutte polynomial (or some slight modification of it).
One often uses the following version of the Tutte polynomial, sometimes called
the cluster expansion polynomial:
∑
(A.8) cep(G; u, v) = uc(A) v |A| .
A⊆E(G)
This differs from the usual Tutte polynomial T (x, y) on two counts: first, instead
of the variables x and y, we use u = (x − 1)(y − 1) and v = y − 1; second, we scale
by uc(E) v |V | .
The cluster expansion polynomial satisfies the following identities: (a)
cep(G; u, v) = vcep(G/e; u, v) + cep(G − e; u, v) for all edges e that are not loops;
(b) cep(G; u, v) = qcep(G − i; u, v) if i is an isolated node; cep(G; u, v) = ue(G) if
G is a graph consisting of a single node. These relations determine the value of
A.2. THE TUTTE POLYNOMIAL 435
the polynomial for any substitution. (See e.g. Welsh [1993] for more on the Tutte
polynomial.)
Chromatic polynomial. Let G = (V, E) be a multigraph with n nodes. For every
nonnegative integer q, we denote by chr(G, q) the number of q-colorations of G (in
the usual sense, where adjacent nodes must be colored differently). Clearly chr(G, q)
does not depend on the multiplicities of edges (as long as these multiplicities are
positive), and chr(G, q) = 0 if G has a loop.
Let chr0 (G, k) denote the number of k-colorations of G in which all colors occur.
Then clearly
∑
v(G) ( )
q
(A.9) chr(G, q) = chr0 (G, k) .
k
k=0
This implies that chr(G, q) is a polynomial in q with leading term q n and constant
term 0, which is called the chromatic polynomial of G. One can evaluate this poly-
nomial for non-integral values of q, when it has no direct combinatorial meaning.
We define chr(K0 , q) = 1.
It is easy to see that if q is a positive integer, then for every e ∈ E(G),
(A.10) chr(G, q) = chr(G − e, q) − chr(G/e, q).
Since this equation for polynomials holds for infinitely many values of q, it holds
identically. If i is an isolated node of G, then we have chr(G, q) = qchr(G − i; q).
From these recurrence relations a number of properties of the chromatic polyno-
mial are easily proved, for example, that its coefficients alternate in sign. Most
importantly, they imply that the chromatic polynomial is a special substitution of
the cluster expansion polynomial: chr(G, q) = cep(G; q, −1). From formula (A.8)
we get
∑
(A.11) chr(G; q) = (−1)|A| q c(A) .
A⊆E(G)
The coefficient of the linear term in the chromatic polynomial is called the chromatic
invariant of the graph. It will be convenient to consider this quantity with an
adjusted sign
∑ ′
cri(G) = (−1)e(G )−v(G)+1 ,
G′
′
where G ranges through all connected spanning subgraphs of G. It follows from
(A.10) that if G is a simple graph, then for every e ∈ E(G),
(A.12) cri(G) = cri(G − e) + cri(G/e).
This implies by induction that cri(G) > 0 if G is connected and cri(G) = 0 if G is
disconnected.
Spanning trees. Let tree(G) denote the number of spanning trees in the graph
G. This parameter has played an important role in the development of algebraic
graph theory; formulas for its computation go back to the work of Kirchhoff in the
mid-19th century. The number of spanning trees satisfies the recurrence relation
(A.13) tree(G) = tree(G − e) + tree(G/e)
436 A. APPENDIX
for every edge that is not a loop. It is best to define tree(K1 ) = 1 and tree(K0 ) =
0. One gets by direct substitution in (A.6) that for every connected graph G,
tree(G) = tut(G; 1, 1).
There are many other expressions in the literature for tree(G). Perhaps the
best known is Kirchhoff’s Formula (also called the Matrix Tree Theorem) saying
that tree(G) is equal to any cofactor of the Laplacian LG = AG − DG (here AG is
the adjacency matrix of G and DG is the diagonal matrix composed of the degrees).
There are many useful inequalities for tree(G), of which we mention two: the
trivial bound
∏
(A.14) tree(G) ≤ dG (u),
u
and the relation with the chromatic invariant, which follows easily by induction
from the recurrences (A.13) and (A.12):
(A.15) 0 ≤ chr(G) ≤ tree(G).
Nowhere zero flows. Let flo(G, q) denote the number of nowhere-zero q-flows.
To be precise, we fix an orientation of the edges for any graph G, and count maps
−
→
f : E (G) → Zq such that the sum of flow values on edges entering a given node
is equal to the sum of flow values on edges leaving the node. This number is given
by |tut(0, q − 1)|.
is Borel. A standard probability space (with small variations, also called a Lusin,
Lebesgue or Rokhlin space) is the completion of a Borel probability space (i.e., we
add all subsets of sets of measure 0 to the sigma-algebra). Standard probability
spaces have many useful properties, some of which will be mentioned below; in a
sense, they behave as you would expect them to behave.
In this sense Borel (or standard) spaces are quite special. On the other hand,
they are general enough so that we can restrict our attention to them; this is due
to the following fact:
Proposition A.3. Every probability space on a countably generated separating
sigma-algebra can be embedded into a Borel space in the sense that it is isomor-
phic up to nullsets to the restriction of a Borel space to a subset with outer measure
1.
A.3.2. Measure preserving maps. Let (Ωi , Ai , πi ) (i = 1, 2) be probability
spaces. A map φ : (Ω1 , A1 , π(1 ) → (Ω)2 , A2 , π2 ) is measure preserving, if φ−1 (A) ∈
A1 for every A ∈ A2 , and π1 φ−1 (A) = π2 (A). (So the name is a bit misleading,
because it is φ−1 rather than φ that preserves measure.) A measure preserving
map is not necessarily bijective; for example, the map [0, 1] → [0, 1] defined by
x 7→ 2x mod 1 is measure preserving. We say that a measure preserving map φ is
invertible, if it is bijective and φ−1 is also measure preserving.
If φ : (Ω1 , A1 , π1 ) → (Ω2 , A2 , π2 ) is measure preserving, then for every inte-
grable function f : (Ω2 , A2 , π2 ) → R we have
∫ ∫
( )
(A.16) f φ(x) dπ1 (x) = f (x)dπ2 (x).
Ω1 Ω2
Let S [0,1] denote the semigroup of measure preserving maps [0, 1] → [0, 1], and
let S[0,1] be the group of invertible measure preserving maps [0, 1] → [0, 1].
One of the most important properties of standard probability spaces is that
under mild conditions, their measure preserving images are also standard.
Proposition A.4. Let (Ω1 , A1 , π1 ) be a standard probability space and let
(Ω2 , A2 , π2 ) be another probability space where A2 has a countable subset sepa-
rating any two points of Ω2 . Let φ : Ω1 → Ω2 be a measure preserving map.
Then (Ω2 , A2 , π2 ) is standard, and Ω′2 = Ω2 \ φ(Ω1 ) has measure 0. Furthermore,
if φ is bijective, then φ−1 is an isomorphism (Ω′2 , A2 |Ω′2 , π2 |Ω′2 ) → (Ω1 , A1 , π1 ). In
particular, φ−1 is also measure preserving.
Remark A.5. It is usually a matter of taste or convenience whether we decide
to work on a complete space or on a countably generated space. One tends to be
sloppy about this, and just say, for example, that the underlying probability space
is [0, 1], without specifying whether we mean the sigma-algebra of Borel sets or of
Lebesgue measurable sets.
Often, one implicitly assumes that the Borel sigma algebra is defined as the
set of Borel sets in a Polish space, and uses topological notions like open sets or
continuous functions to define measure theoretic notions. This is sometimes un-
avoidable (see e.g. the definition of weak convergence below), but the same Borel
sigma-algebra can be defined by very different topological spaces, and this is im-
portant in some cases even in this book. I will use this topological representation
only where it is necessary.
438 A. APPENDIX
A.3.3. The space of measures. Let T be a topological space, and let P(T )
denote the set of probability measures on the Borel subsets of T . We say that a
sequence of measures µ1 , µ2 , · · · ∈ P(T ) converges weakly to a probability measure
µ ∈ P(T ), if
∫ ∫
f dµn → f dµ (n → ∞)
T T
for every continuous bounded function f : S → R. Most often we need this notion
in the case when T is a compact metric space, so we don’t have to assume the
boundedness of f . This notion of convergence defines a topology on P(T ), which
we call the topology of weak convergence.
By Prokhorov’s Theorem (see e.g. Billingsley [1999]; this is not the most general
form), for a compact metric space K, the space P(K) is compact in the topology
of weak convergence, and also metrizable. (One can describe explicit metrizations,
like the Levy-Prokhorov metric, but we don’t need them.)
There is an important warning about weak convergence: it is not a purely
measure theoretic notion, but topological. In other words, we can have a sequence
of measures on a Borel sigma-algebra (Ω, B) that is weakly convergent if we put one
topology on Ω with the given Borel sets, but not convergent if we put another such
topology on Ω. Sometimes we play with this, and change the topology (without
changing its Borel sets) to suit our needs.
applied to the partial sums of any sequence of independent random variables with
finite expectations.
Many applications in combinatorics use martingales through the following con-
struction.
Example A.9 (Doob’s Martingale). Let (Ω, A, π) be a probability space and
let f : Ω → R be an integrable function. Let Y1 , . . . , Yn be independent random
elements of Ω from the distribution π, and let Xk = E(f (Y1 , . . . , Yn | Y1 , . . . , Yk ).
Then (X1 , . . . , Xn ) is a martingale.
Example A.10. Let f : [0, 1] → R be an integrable function, and let P1 , P2 , . . .
be a sequence of partitions of [0, 1] into a finite number of measurable parts such
that Pn+1 is a refinement of Pn . Let Y ∈ [0, 1] be a uniform random point, and
consider the sequence Xk = fPk (Y ). Then (X1 , X2 , . . . ) is a martingale. Instead of
[0, 1], we could of course consider any probability space, for example, [0, 1]2 , which
shows the connection of martingales with the stepping operator.
There are (at least) three theorems on martingales that are relevant for com-
binatorial applications; these play an important role in our book as well.
Let (X0 , X1 , . . . ) be a sequence of random variables. A random variable T with
nonnegative integral values is called a stopping time (for the sequence (X0 , X1 , . . . )),
if for every k ≥ 0, the event T = k, conditioned on X1 , . . . , Xk , is independent of
the variables Xk+1 , Xk+2 . . . (In computer science, this is often called a stopping
rule: we decide whether we want to stop after k steps depending on the values of
the variables we have seen before, possibly using some new independent coin flips).
The Martingale Stopping Theorem (a.k.a. Optional Stopping Theorem) has
many versions, of which we state one:
Theorem A.11. Let (X1 , X2 , . . . ) be a supermartingale for which |Xm+1 − Xm |
is bounded (uniformly for all m), and let T be a stopping time for which E(T ) is
finite. Then E(XT ) ≤ X0 .
For a martingale, we have equality in the conclusion, and for a submartingale,
we have the reverse inequality in the conclusion.
The following fact is called the Martingale Convergence Theorem (again, we
don’t state it in its most general form).
Theorem A.12. Let (X1 , X2 , . . . ) be a martingale such that supn E(|Xn |) < ∞.
Then (X1 , X2 , . . . ) is convergent with probability 1.
Applying this theorem to the martingale in Example A.10, we get that if f is
integrable, then the functions fPk tend to a limit almost everywhere. This limit
may not be the function f itself, but it is equal to f almost everywhere if any two
points of [0, 1] are separated by one of the partitions Pn (cf. also Proposition 9.8).
If we want to prove that a random variable is highly concentrated around its
average, most of the time we use Azuma’s Inequality (or one of its corollaries).
Theorem A.13. Let (X1 , X2 , . . . ) be a martingale such that |Xm+1 − Xm | ≤ 1 for
every m ≥ 0. Then
( )
P Xm > X0 + λ < e−λ /(2m) .
2
we get the following inequality, which (up to minor variations) is called Bernstein’s,
Chernov’s or Hoeffding’s:
For us, it will be most convenient to use the following corollary of Azuma’s
Inequality, obtained by applying it to the martingale in Example A.9:
Example A.16. Let (Y1 , Y2 , . . . ) be i.i.d. real valued random variables with
E(|Yi |) < ∞, and let Xk = (Y1 + · · · + Yk )/k. Then (X1 , X2 , . . . ) is a reverse
martingale. So partial sums Y1 + · · · + Yk form a martingale, but dividing by the
number of terms, we get a reverse martingale. (The latter is a bit trickier to verify.)
Applying this theorem to Example A.16, we can derive the Strong Law of Large
Numbers. We refer to the book of Williams [1991] for more on martingales.
where the Si are the steps and xi ∈ Si . Conversely, every exponential sum
∑
k
s(k) = ai bki
i=1
∑
with ai > 0 and i ai = 1 can be thought of as the moment sequence of a stepfunc-
tion. An infinite sum of this type can also be represented as the moment sequence
of a function (with countably many “steps”). Proposition A.18 implies that the
values s(k) of such an exponential sum uniquely determine the numbers ai and bi .
A.4. MOMENTS AND THE MOMENT PROBLEM 443
This fact is “self-refining” in the sense that the following seemingly stronger
statement easily follows from it.
Proposition A.21. Let ai , bi , ci , di be nonzero real numbers (i = 1, 2, . . . ), such
that bi ̸= bj and d∑
i ̸= dj for i ̸= ∑
j. Assume that there is a k0 ≥ 0 such that for all
∞ ∞
k ≥ k0 , the sums i=1 ai bki and i=1 ci dki are convergent and equal. Then the two
sums are formally equal, i.e., there is a permutation π of N such that ai = cπ (i)
and bi = dπ (i).
Returning to stepfunctions with a finite number of steps, we note that they can
be characterized in terms of their moment matrices:
Proposition A.22. A function is a stepfunction if and only if its moment matrix
has finite rank. In this case, the rank of the moment matrix is the number of steps.
Stepfunctions are determined by a finite number of moments, and this fact
characterizes them. To be more precise,
Proposition A.23. (a) Let f ∈ L∞ [0, 1] be a stepfunction with m steps, and let
g ∈ L∞ [0, 1] be another function such that Mk (f ) = Mk (g) for k = 0, . . . , m. Then
f and g have the same moments.
(b) For every function g ∈ L∞ [0, 1] and every m ≥ 0 there is a stepfunction
f ∈ L∞ [0, 1] with m steps so that Mk (f ) = Mk (g) for k = 0, . . . , m − 1.
These results can be extended to functions f : [0, 1] → [0, 1]d quite easily; we
only formulate those that we are using in the book. Such a function is called a
stepfunction if its range is finite. Moments don’t form a sequence, but an array
with d indices (a d-array for short). For a = (a1 , . . . , ad ) ∈ Nd , the corresponding
moment of f = (f1 , . . . , fd ) is defined by
∫1
Ma (f ) = f1 (x)a1 . . . fd (x)ad dx.
0
infinite symmetric matrix whose rows and columns are indexed by vectors in Nn ,
and Mu,v = Au+v . Semidefiniteness of the moment matrix does not characterize
moment sequences if d ≥ 2, but they do at least when the function values are
bounded by 1 (Berg, Christensen and Ressel [1976], Berg and Maserick [1984]):
Proposition A.24. A d-array A is the moment array of some measurable function
f : [0, 1] → [−1, 1]d if and only if A0...0 = 1, M (A) is positive semidefinite and
|Av | ≤ 1 for all v ∈ Nd . Furthermore, f is a stepfunction if and only if M (A) has
finite rank, and the rank of M (A) is equal to the number of steps of f .
Again, stepfunctions are determined by their moments:
Proposition A.25. (a) Let f : [0, 1] → [0, 1]d be a stepfunction with m steps,
and let g : [0, 1] → [0, 1]d be another function such that Ma (f ) = Ma (g) for
a ∈ {0, . . . , m}d . Then there are measure preserving maps φ, ψ ∈ S [0,1] such that
f ◦ φ = g ◦ ψ almost everywhere. In particular, g is a stepfunction, and Ma (f ) =
Ma (g) for a ∈ Nd .
(b) For every function f : [0, 1] → [0, 1]d and every finite set S ⊆ Nd , there is
a stepfunction g : [0, 1] → [0, 1]d with at most |S| + 1 steps so that Ma (f ) = Ma (g)
for all a ∈ S.
444 A. APPENDIX
A next question would be to define moments for functions f : [0, 1]d → [0, 1].
It is not enough to use here sequences or arrays. For d = 2, the right amount of
information is contained in a graph parameter, and the subgraph densities t(F, f )
show many properties analogous to the classical results described above. Theorem
13.10, Theorem 11.52 together with Proposition 14.61, Theorem 5.54 and Theorem
16.46 are analogues of Theorems A.18, A.20, A.22, and A.23(a). Other results
(e.g, the Monotone Reordering Theorem A.19 or Theorem A.23(b) do not seem to
generalize to d = 2 in any natural way. The case d ≥ 3 clearly corresponds to
hypergraphs, where, as discussed in Chapter 23.3, new difficulties arise, and many
of the interesting questions are open.
if (xi1 , . . . , xirj ) ∈ Rij for a large set of indices i. It is easy to see that this definition
is correct in the sense that ([x1 ], . . . , [xrj ]) ∈ Rj depends only on the equivalence
classes [x1 ], . . . , [xrj ] and not on which representative xi is chosen from [xi ].
A very important property of ultraproduct of structures is stated in the follow-
ing theorem:
Proposition A.27 (Loś’s Theorem). If every structure Ai (i = 1, 2, . . . ) satisfies
a first order sentence Φ, then their ultraproduct also satisfies Φ.
As a special case, we can look at a sequence of finite simple graphs Gi =
(Vi , Ei ), i.e., finite sets Vi with a symmetric irreflexive binary relation Ei . The
ultraproduct of them is also a simple graph: the symmetry and irreflexivity of the
relation on the ultraproduct is easy to check (or it follows from Loś’s Theorem,
since these properties of the relation can be expressed by a first-order sentence:
∀x∀y(xy ∈ E ↔ yx ∈ E), and ∀x(xx ∈ / E)). If all the graphs have degrees bounded
by D, then so does their ultraproduct, since this property can be expressed by a
first-order sentence.
Ultralimit of a numerical sequence. As a nice application of an ultrafilter
ω we can associate a “limit” to every bounded sequence of numbers. (This is a
special construction for a Banach limit of bounded sequences.) Let (a1 , a2 , . . . )
(ai ∈ [u, v]) be a bounded sequence of real numbers. We say that a real number
a is the ultralimit of the sequence (in notation limω ai = a) if for every ε > 0, the
set {i : |ai − a| > ε} is Small. (Note: ordinary convergence to a would require
that this set is finite.) It is not hard to prove that every bounded sequence of real
numbers has a unique ultralimit. Furthermore, if limω ai = a and ai ∈ [u, v] for
every i, then a ∈ [u, v].
Ultraproduct of measures.
∏ Let (Vi , Ai ) be a sigma-algebra for i∏= 1, 2, . . . .
The sets of the form ω Ai (Ai ∈ Ai ), considered as subsets of V = ω Vi , form
a Boolean algebra B (they are closed under finite union, intersection, and∏com-
∏ algebra B generates a sigma-algebra on V = ω Vi ,
plementation). The Boolean
which we denote by A = ω Ai .
Next, suppose that there is a probability measure πi on (Vi , Ai ); then we define
a setfunction on B by
(∏ )
π Ai = lim πi (Ai ).
ω
ω
It is not hard to see that π is finitely additive, and a bit harder to see that it is
a measure on B, i.e., if B1 , B2 · · · ∈ B and ∩∞ n=1 Bn = ∅, then limn π(Bn ) = 0.
Trivially π(V ) = 1. It follows by Carathéodory’s Measure Extension Theorem (see
e.g. Halmos [1950]) that π extends to a probability measure on A (which we also
denote by π). Thus (V, A, π) is a probability space,∏which we call the ultraproduct
of the probability spaces (Vi , Ai , π). We write π = ω πi . (This is a special case of
a Loeb space; see Loeb [1979].)
A.8. Categories
As in other sections of this Appendix, I only summarize some basic notation,
definitions and examples that are necessary to understand certain parts of the book.
For more definitions and facts in category theory, see e.g. Adámek, Herrlich and
Strecker [2006].
A category K consists of a set of objects Ob(K), and, for any two objects
a, b ∈ Ob(K), a set K(a, b) of morphisms. Two morphisms α ∈ K(a, b) and
β ∈ K(b, c) have a product αβ ∈ K(a, c), and this multiplication is associative
(whenever defined). For α ∈ K(a, b), we set t(α) = a (tail of α) and h(α) = b (head
of α). Let Kain [Kaout ] denote the set of morphisms with h(α) = a [t(α) = a]. (If you
want to think about morphisms as maps, then please note that I am writing the
product so that the maps should be applied in the order going from left to right.)
For every object a, we have the identity morphism ida ∈ K(a, a), which has
the property that ida α = α for every α ∈ Kaout and αida = α for every α ∈ Kain .
An isomorphism between objects a and b is a morphism ξ ∈ K(a, b) which has an
inverse ζ ∈ K(b, a) such that ξζ = ida and ζξ = idb . Two objects are isomorphic if
there is an isomorphism between them.
For every object a, we introduce an equivalence relation on Kain by α ≃ β if and
only if β = αγ for some isomorphism γ. We say that α and β are left-isomorphic. It
is also clear that if α1 , α2 ∈ K(a, b), φ ∈ K(b, c), and α1 and α2 are left-isomorphic,
then so are α1 φ and α2 φ.
We can delete any object from a category and still have a category, so to really
work in a category (in particular, to prove the existence of a particular object), we
must assume the existence of certain objects and morphisms. The existence of these
is usually easily verified when we want to apply our results to specific categories,
in particular, to the category of graphs.
Terminal and zero objects. An object is terminal, if every object has a
unique morphism into it. Any two terminal objects are isomorphic, and we will
assume that the terminal object, if it exists, is unique.
Dually, an object is a zero object, if it has a unique morphism into any object.
Product and coproduct. A set of morphisms πi ∈ K(c, ai ) (i ∈ I) is called
a product, if for every set of morphisms φi ∈ K(d, ai ) (i ∈ I) there is a unique
morphism ξ ∈ K(d, c) such that φi = ξπi for all i ∈ I. We also say that the
object c is the product of objects ai . It is easy to see that the product is uniquely
determined up to isomorphism. For two objects a and b, we denote by a × b their
product. We write a×k for the k-fold product a × · · · × a. We say that the category
has products, if every finite set of objects has a product.
Coproducts are defined by turning the arrows around: A set of morphisms
σi ∈ K(ai , c) (i ∈ I) is called a coproduct, if for every set of morphisms φi ∈ K(ai , d)
(i ∈ I) there is a unique morphism ξ ∈ K(c, d) such that φi = σi ξ. For two objects
448 A. APPENDIX
groups etc.) are locally finite. These categories have many pleasant properties,
for example, if two objects have monomorphism into each other, then they are
isomorphic.
Let us continue with some examples.
Example A.34. The category of finite simple graphs with loops (where morphisms
are homomorphisms, i.e., adjacency-preserving maps) has all the nice properties
defined above.
It is trivial that this category is locally finite and has epi-mono decompositions.
The terminal object is the single node with a loop, and the zero object is the empty
graph. The looped complete graph on 2 nodes can serve as a right-generator object,
and the single node is a left-generator.
To construct the pullback of two homomorphisms α : a → c and β : b → c,
take the direct (categorical) product d of the two graphs a and b, together with
its projections πa and πb onto a and b, respectively, and take the subgraph d′
of d induced by those nodes v for which (πa α)(v) = (πb β)(v), together with the
restrictions of πa and πb onto d′ . (This essentially the same construction as used
in the proof of Lemma 5.38 and in the statement of Theorem 5.59.)
To construct the pushout of two homomorphisms α : c → a and β : c → b,
take the disjoint union d of the two graphs a and b, and identify the nodes α(x) and
β(x) for every node x of c. Note that in the case when c is the edgeless graph on [k],
then this is just the product of two k-multilabeled graphs, as defined in Chapter 4.
Coproduct in this category means disjoint union. Product means the direct
product of two graphs as defined in Section 3.2.
Example A.35. Reversing the arrows in the category of finite simple graphs with
loops (Example A.34) gives another category with the above properties, since the
collection of these properties is invariant under reversing arrows.
These examples can be extended to simplicial maps between finite simplicial
complexes, homomorphisms between directed graphs, hypergraphs, etc.
Example A.36 (Partially ordered sets). Let (P, ≤) be a partially ordered set.
For every pair x, y ∈ P such that x ≤ y, we define a unique morphism φx,y .
There is only one way to define the composition: φx,y φy,z = φx,z , which makes
sense because of the transitivity of the relation. This category is locally finite, and
every morphism is both a monomorphism and an epimorphism, so it has (trivial)
epi-mono decompositions.
If the poset is a lattice with lowest element 0 and highest element 1, then 0
is a zero object and 1 is a terminal object. Furthermore, for any two morphisms
φc,a and φc,b , the morphisms (φa,a∨b , φb,a∨b ) form their pushout. Every element is
a left and right generator (in a vacuous way).
Bibliography
M. Abért and T. Hubai: Benjamini-Schramm convergence and the distribution of chromatic roots
for sparse graphs, http://arxiv.org/abs/1201.3861
J. Adámek, H. Herrlich and G.E. Strecker: Abstract and Concrete Categories: The Joy of Cats,
Reprints in Theory and Applications of Categories 17 (2006), 1–507.
S. Adams: Trees and amenable equivalence relations, Ergodic Theory Dynam. Systems 10 (1990),
1–14.
R. Ahlswede and G.O.H. Katona: Graphs with maximal number of adjacent pairs of edges, Acta
Math. Hung. 32 (1978), 97–120.
R. Albert and A.-L. Barabási: Statistical mechanics of complex networks, Rev. Modern Phys. 74
(2002), 47–97.
D.J. Aldous: Representations for partially exchangeable arrays of random variables, J. Multivar.
Anal. 11 (1981), 581–598.
D.J. Aldous: Tree-valued Markov chains and Poisson-Galton-Watson distributions, in: Microsur-
veys in Discrete Probability (D. Aldous and J. Propp, editors), DIMACS Ser. Discrete Math.
Theoret. Comput. Sci. 41, Amer. Math. Soc., Providence, RI. (1998), 1–20.
D. Aldous and R. Lyons: Processes on Unimodular Random Networks, Electron. J. Probab. 12,
Paper 54 (2007), 1454–1508.
D.J. Aldous and M. Steele: The Objective Method: Probabilistic Combinatorial Optimization
and Local Weak Convergence, in: Discrete and Combinatorial Probability (H. Kesten, ed.),
Springer (2003) 1–72.
N. Alon (unpublished)
N. Alon, R.A. Duke, H. Lefmann, V. Rödl and R. Yuster: The algorithmic aspects of the regularity
lemma, J. Algorithms 16 (1994), 80–109.
N. Alon, E. Fischer, M. Krivelevich and M. Szegedy: Efficient testing of large graphs, Combina-
torica 20 (2000) 451–476.
N. Alon, W. Fernandez de la Vega, R. Kannan and M. Karpinski: Random sampling and approx-
imation of MAX-CSPs, J. Comput. System Sci. 67 (2003) 212–243.
N. Alon, E. Fischer, I. Newman and A. Shapira: A Combinatorial Characterization of the Testable
Graph Properties: It’s All About Regularity, Proc. 38th ACM Symp. on Theory of Comput.
(2006), 251–260.
N. Alon and A. Naor: Approximating the Cut-Norm via Grothendieck’s Inequality SIAM J. Com-
put. 35 (2006), 787–803.
N. Alon, P.D. Seymour and R. Thomas: A separator theorem for non-planar graphs, J. Amer.
Math. Soc. 3 (1990), 801–808.
N. Alon and A. Shapira: A Characterization of the (natural) Graph Properties Testable with
One-Sided Error, SIAM J. Comput. 37 (2008), 1703–1727.
N. Alon, A. Shapira and U. Stav: Can a Graph Have Distinct Regular Partitions? SIAM J. Discr.
Math. 23 (2009), 278–287.
N. Alon and J. Spencer: The Probabilistic Method, Wiley–Interscience, 2000.
N. Alon and U. Stav: What is the furthest graph from a hereditary property? Random Struc.
Alg. 33 (2008), 87–104.
451
452 BIBLIOGRAPHY
B. Bollobás: A probabilistic proof of an asymptotic formula for the number of labelled regular
graphs, Europ. J. Combin. 1 (1980), 311–316.
B. Bollobás: Random Graphs, Second Edition, Cambridge University Press, 2001.
B. Bollobás, C. Borgs, J. Chayes and O. Riordan: Percolation on dense graph sequences, Ann.
Prob. 38 (2010), 150–183.
B. Bollobás, S. Janson and O. Riordan: The phase transition in inhomogeneous random graphs,
Random Struc. Alg. 31 (2007) 3–122.
B. Bollobás, S. Janson and O. Riordan: The cut metric, random graphs, and branching processes,
J. Stat. Phys. 140 (2010) 289–335.
B. Bollobás, S. Janson and O. Riordan: Monotone graph limits and quasimonotone graphs, In-
ternet Mathematics 8 (2012), 187–231.
B. Bollobás and V. Nikiforov: An abstract Szemerédi regularity lemma, in: Building bridges,
Bolyai Soc. Math. Stud. 19, Springer, Berlin (2008), 219–240.
B. Bollobás and O. Riordan: A Tutte Polynomial for Coloured Graphs, Combin. Prob. Comput.
8 (1999), 45–93.
B. Bollobás and O. Riordan: Metrics for sparse graphs, in: Surveys in combinatorics 2009,
Cambridge Univ. Press, Cambridge (2009) 211–287.
B. Bollobás and O. Riordan: Random graphs and branching processes, in: Handbook of large-scale
random networks, Bolyai Soc. Math. Stud. 18, Springer, Berlin (2009), 15–115.
C. Borgs: Absence of Zeros for the Chromatic Polynomial on Bounded Degree Graphs, Combin.
Prob. Comput. 15 (2006), 63–74.
C. Borgs, J. Chayes, J. Kahn and L. Lovász: Left and right convergence of graphs with bounded
degree, http://arxiv.org/abs/1002.0115
C. Borgs, J. Chayes and L. Lovász: Moments of Two-Variable Functions and the Uniqueness of
Graph Limits, Geom. Func. Anal. 19 (2010), 1597–1619.
C. Borgs, J. Chayes, L. Lovász, V.T. Sós and K. Vesztergombi: Counting graph homomorphisms,
in: Topics in Discrete Mathematics (ed. M. Klazar, J. Kratochvil, M. Loebl, J. Matoušek,
R. Thomas, P. Valtr), Springer (2006), 315–371.
C. Borgs, J.T. Chayes, L. Lovász, V.T. Sós and K. Vesztergombi: Convergent Graph Sequences
I: Subgraph frequencies, metric properties, and testing, Advances in Math. 219 (2008), 1801–
1851.
C. Borgs, J.T. Chayes, L. Lovász, V.T. Sós and K. Vesztergombi: Convergent Graph Sequences
II: Multiway Cuts and Statistical Physics, Annals of Math. 176 (2012), 151–219.
C. Borgs, J.T. Chayes, L. Lovász, V.T. Sós and K. Vesztergombi: Limits of randomly grown graph
sequences, Europ. J. Combin. 32 (2011), 985–999.
C. Borgs, J.T. Chayes, L. Lovász, V.T. Sós, B. Szegedy and K. Vesztergombi: Graph Limits and
Parameter Testing, STOC38 (2006), 261–270.
L. Bowen: Couplings of uniform spanning forests, Proc. Amer. Math. Soc. 132 (2004), 2151–2158.
G. Brightwell and P. Winkler: Graph homomorphisms and long range action, in Graphs, Mor-
phisms and Statistical Physics, DIMACS Ser. Disc. Math. Theor. CS, American Mathematical
Society (2004), 29–48.
W.G. Brown, P. Erdős and M. Simonovits: Extremal problems for directed graphs, J. Combin.
Theory B 15 (1973), 77–93.
W.G. Brown, P. Erdős and M. Simonovits: On multigraph extremal problems, Problèmes combi-
natoires et théorie des graphes, Colloq. Internat. CNRS 260 (1978), 63–66.
O.A. Camarena, E. Csóka, T. Hubai, G. Lippner and L. Lovász: Positive graphs,
http://arxiv.org/abs/1205.6510
S. Chatterjee and P. Diaconis: Estimating and understanding exponential random graph models,
http://arxiv.org/abs/1102.2650
S. Chatterjee and S.R.S Varadhan: The large deviation principle for the Erdős–Rényi random
graph, Europ. J. Combin. 32 (2011), 1000–1017.
454 BIBLIOGRAPHY
G. Elek and G. Lippner: An analogue of the Szemerédi Regularity Lemma for bounded degree
graphs, http://arxiv.org/abs/0809.2879
G. Elek and B. Szegedy: A measure-theory approach to the theory of dense hypergraphs, Advances
in Math. 231 (2012), 1731–1772.
P. Erdős: On sequences of integers no one of which divides the product of two others and on some
related problems, Mitt. Forsch.-Inst. Math. Mech. Univ. Tomsk 2 (1938), 74–82;
P. Erdős: On some problems in graph theory, combinatorial analysis and combinatorial number
theory, in: Graph Theory and Combinatorics, Academic Press, London (1984), 1–17.
P. Erdős, L. Lovász and J. Spencer: Strong independence of graphcopy functions, in: Graph
Theory and Related Topics, Academic Press (1979), 165-172.
P. Erdős and A. Rényi: On random graphs I, Publ. Math. Debrecen 6 (1959), 290–297.
P. Erdős and A. Rényi: On the evolution of random graphs, MTA Mat. Kut. Int. Közl. 5 (1960),
17–61.
P. Erdős and M. Simonovits: A limit theorem in graph theory, Studia Sci. Math. Hungar. 1
(1966), 51–57.
P. Erdős and A.H. Stone: On the structure of linear graphs, Bull. Amer. Math. Soc. 52 (1946),
1087–1091.
W. Feller, An Introduction to Probability Theory and its Applications, Second edition, Wiley, New
York (1971).
E. Fischer: The art of uninformed decisions: A primer to property testing, The Computational
Complexity Column of the Bulletin of the European Association for Theoretical Computer
Science 75 (2001), 97-126.
E. Fischer and I. Newman: Testing versus Estimation of Graph Properties, Proc. 37th ACM Symp.
on Theory of Comput. (2005), 138–146.
D.C. Fisher: Lower bounds on the number of triangles in a graph, J. Graph Theory 13 (1989),
505–512.
J. Fox: A new proof of the graph removal lemma, Annals of Math. 174 (2011), 561–579.
J. Fox and J. Pach (unpublished)
F. Franek and V. Rödl: Ramsey Problem on Multiplicities of Complete Subgraphs in Nearly
Quasirandom Graphs, Graphs and Combin. 8 (1992), 299–308.
P. Frankl and V. Rödl: The uniformity lemma for hypergraphs, Graphs and Combin. 8 (1992),
309–312.
P. Frankl and V. Rödl: Extremal problems on set systems, Random Struc. Alg. 20 (2002),
131–164.
M. Freedman, L. Lovász and A. Schrijver: Reflection positivity, rank connectivity, and homomor-
phisms of graphs, J. Amer. Math. Soc. 20 (2007), 37–51.
A. Frieze and R. Kannan: Quick approximation to matrices and applications, Combinatorica 19
(1999), 175–220.
O. Gabber and Z. Galil: Explicit Constructions of Linear-Sized Superconcentrators, J. Comput.
Syst. Sci. 22 (1981), 407–420.
D. Gaboriau: Invariants l2 de relations d’equivalence et de groupes, Publ. Math. Inst. Hautes.
Ètudes Sci. 95 (2002), 93–150.
D. Gamarnik: Right-convergence of sparse random graphs, http://arxiv.org/abs/1202.3123
D. Garijo, A. Goodall and J. Nešetřil: Graph homomorphisms, the Tutte polynomial and “q-state
Potts uniqueness”, Elect. Notes in Discr. Math. 34 (2009), 231–236.
D. Garijo, A. Goodall and J. Nešetřil: Contractors for flows, Electronic Notes in Discr. Math. 38
(2011), 389–394.
H.-O. Georgii, Gibbs Measures and Phase Transitions, de Gruyter, Berlin, 1988.
S. Gerke and A. Steger: The sparse regularity lemma and its applications, Surveys in Combina-
torics (2005), 227–258.
456 BIBLIOGRAPHY
A. Kechris, S. Solecki and S. Todorcevic: Borel chromatic numbers, Advances in Math. 141
(1999) 1–44.
H.G. Kellerer: Duality Theorems for Marginal Problems, Z. Wahrscheinlichkeitstheorie verw.
Gebiete 67 (1984), 399–432.
K. Kimoto: Laplacians and spectral zeta functions of totally ordered categories, J. Ramanujan
Math. Soc. 18 (2003), 53–76.
K. Kimoto: Vandermonde-type determinants and Laplacians of categories, preprint No. 2003-24,
Graduate School of Mathematics, Kyushu University (2003).
J. Kock: Frobenius Algebras and 2D Topological Quantum Field Theories, London Math. Soc.
Student Texts, Cambridge University Press (2003).
Y. Kohayakawa: Szemerédi’s regularity lemma for sparse graphs, in: Sel. Papers Conf. Found. of
Comp. Math., Springer (1997), 216–230.
Y. Kohayakawa, B. Nagle, V. Rödl and M. Schacht: Weak hypergraph regularity and linear
hypergraphs, J. Combin. Theory B 100 (2010), 151–160.
Y. Kohayakawa and V. Rödl: Szemerédi’s regularity lemma and quasi-randomness, in: Recent
Advances in Algorithms and Combinatorics, CMS Books Math./Ouvrages Math. SMC 11,
Springer, New York (2003), 289–351.
Y. Kohayakawa, V. Rödl and J. Skokan: Hypergraphs, quasi-randomness, and conditions for
regularity, J. Combin. Theory A 97 (2002), 307–352.
I. Kolossváry and B. Ráth: Multigraph limits and exchangeability, Acta Math. Hung. 130 (2011),
1–34.
J. Komlós, J. Pach and G. Woeginger: Almost Tight Bounds for epsilon-Nets, Discr. Comput.
Geom. 7 (1992), 163–173.
J. Komlós and M. Simonovits: Szemerédi’s Regularity Lemma and its applications in graph the-
ory, in: Combinatorics, Paul P. Erdős is Eighty (D. Miklós et. al, eds.), Bolyai Society
Mathematical Studies 2 (1996), pp. 295–352.
S. Kopparty and B. Rossman: The Homomorphism Domination Exponent, Europ. J. Combin.
32 (2011), 1097–1114.
S. Kopparty: Local Structure: Subgraph Counts II, Rutgers University Lecture Notes,
http://www.math.rutgers.edu/~sk1233/courses/graphtheory-F11/hom-inequalities.pdf
D. Kozlov: Combinatorial Algebraic Topology, Springer, Berlin, 2008.
B. Kra: The Green-Tao Theorem on arithmetic progressions in the primes: an ergodic point of
view, Bull. of the AMS 43 (2005), 3–23.
D. Král and O. Pikhurko: Quasirandom permutations are characterized by 4-point densities,
http://arxiv.org/abs/1205.3074
D. Kunszenti-Kovács (unpublished)
M. Laczkovich: Closed sets without measurable matchings, Proc. of the Amer. Math. Soc. 103
(1988), 894–896.
M. Laczkovich: Equidecomposability and discrepancy: a solution to Tarski’s circle squaring prob-
lem, J. Reine und Angew. Math. 404 (1990), 77–117.
M. Laczkovich: Continuous max-flow min-cut theorems, Report 19. Summer Symp. in Real Anal-
ysis, Real Analysis Exchange 21 (1995–96), 39.
J.B. Lasserre: A sum of squares approximation of nonnegative polynomials, SIAM Review 49
(2007), 651–669.
J.L.X. Li and B. Szegedy: On the logarithimic calculus and Sidorenko’s conjecture,
http://arxiv.org/abs/math/1107.1153v1
B. Lindström: Determinants on semilattices, Proc. Amer. Math. Soc. 20 (1969), 207–208.
R. Lipton and R.E. Tarjan: A separator theorem for planar graphs, SIAM Journal on Applied
Mathematics 36 (1979), 177–189.
BIBLIOGRAPHY 459
P.A. Loeb: An introduction to non-standard analysis and hyperfinite probability theory, Prob-
abilistic Analysis and Related Topics 2 (A.T. Bharucha-Reid, editor), Academic Press, New
York (1979), 105–142.
D. London: Inequalities in quadratic forms, Duke Mathematical Journal, 33 (1966), 511–522.
L. Lovász: Operations with structures, Acta Math. Hung. 18 (1967), 321–328.
L. Lovász: On the cancellation law among finite relational structures, Periodica Math. Hung. 1
(1971), 145–156.
L. Lovász: Direct product in locally finite categories, Acta Sci. Math. Szeged 23 (1972), 319–322.
L. Lovász: Kneser’s conjecture, chromatic number, and homotopy, J. Combin. Theory A 25
(1978), 319-324.
L. Lovász: Combinatorial Problems and Exercises, Akadémiai Kiadó - North Holland, Budapest,
1979; reprinted by AMS Chelsea Publishing (2007).
L. Lovász: Connection matrices, in: Combinatorics, Complexity and Chance, A Tribute to Do-
minic Welsh Oxford Univ. Press (2007), 179–190.
L. Lovász: The rank of connection matrices and the dimension of graph algebras, Europ. J. Com-
bin. 27 (2006), 962–970.
L. Lovász: Very large graphs, in: Current Developments in Mathematics 2008 (eds. D. Jerison,
B. Mazur, T. Mrowka, W. Schmid, R. Stanley, and S. T. Yau), International Press, Somerville,
MA (2009), 67–128.
L. Lovász: Subgraph densities in signed graphons and the local Sidorenko conjecture, Electr.
J. Combin. 18 (2011), P127 (21pp).
L. Lovász: Notes on the book: Large networks, graph homomorphisms and graph limits,
http://www.cs.elte.hu/~lovasz/book/homnotes.pdf
L. Lovász and A. Schrijver: Graph parameters and semigroup functions, Europ. J. Combin. 29
(2008), 987–1002.
L. Lovász and A. Schrijver: Semidefinite functions on categories, Electr. J. Combin. 16 (2009),
no. 2, Special volume in honor of Anders Björner, Research Paper 14, 16 pp.
L. Lovász and A. Schrijver: Dual graph homomorphism functions, J. Combin. Theory A , 117
(2010), 216–222.
L. Lovász and M. Simonovits: On the number of complete subgraphs of a graph, in: Combina-
torics, Proc. 5th British Comb. Conf. (ed. C.St.J.A.Nash-Williams, J.Sheehan), Utilitas Math.
(1976), 439–441.
L. Lovász and M. Simonovits: On the number of complete subgraphs of a graph II, in: Studies in
Pure Math., To the memory of P. Turán (ed. P. Erdös), Akadémiai Kiadó (1983), 459-495.
L. Lovász and V.T. Sós: Generalized quasirandom graphs, J. Combin. Theory B 98 (2008),
146–163.
L. Lovász and L. Szakács (unpublished)
L. Lovász and B. Szegedy: Limits of dense graph sequences, J. Combin. Theory B 96 (2006),
933–957.
L. Lovász and B. Szegedy: Szemerédi’s Lemma for the analyst, Geom. Func. Anal. 17 (2007),
252–270.
L. Lovász and B. Szegedy: Contractors and connectors in graph algebras, J. Graph Theory 60
(2009), 11–31.
L. Lovász and B. Szegedy: Testing properties of graphs and functions, Israel J. Math. 178 (2010),
113–156.
L. Lovász and B. Szegedy: Regularity partitions and the topology of graphons, in: An Irregular
Mind, Szemerédi is 70, J. Bolyai Math. Soc. and Springer-Verlag (2010), 415–446.
L. Lovász and B. Szegedy: Finitely forcible graphons, J. Combin. Theory B 101 (2011), 269–301.
L. Lovász and B. Szegedy: Random Graphons and a Weak Positivstellensatz for Graphs, J. Graph
Theory 70 (2012) 214–225.
460 BIBLIOGRAPHY
B. Ráth and L. Szakács: Multigraph limit of the dense configuration model and the preferential
attachment graph, Acta Math. Hung. 136 (2012), 196–221.
A.A. Razborov: Flag Algebras, J. Symbolic Logic, 72 (2007), 1239–1282.
A.A. Razborov: On the minimal density of triangles in graphs, Combin. Prob. Comput. 17 (2008),
603–618.
A.A. Razborov: On 3-hypergraphs with forbidden 4-vertex configurations, SIAM J. Discr. Math.
24 (2010), 946–963.
G. Regts: The rank of edge connection matrices and the dimension of algebras of invariant tensors,
Europ. J. Combin. 33 (2012), 1167–1173.
C. Reiher: Minimizing the number of cliques in graphs of given order and edge density (manu-
script).
N. Robertson and P. Seymour: Graph Minors. XX. Wagner’s conjecture, J. Combin. Theory B
92, 325–357.
V. A. Rohlin: On the fundamental ideas of measure theory, Translations Amer. Math. Soc., Series
1, 10 (1962), 1–54. Russian original: Mat. Sb. 25 (1949), 107–150.
K. Roth: Sur quelques ensembles d’entiers, C. R. Acad. Sci. Paris 234 (1952), 388–390.
V. Rödl and M. Schacht: Regular partitions of hypergraphs: Regularity Lemmas, Combin. Prob.
Comput. 16 (2007), 833–885.
V. Rödl and M. Schacht: Regular partitions of hypergraphs: Counting Lemmas, Combin. Prob.
Comput. 16 (2007), 887–901.
V. Rödl and J. Skokan: Regularity lemma for k-uniform hypergraphs, Random Struc. Alg. 25
(2004), 1–42.
R. Rubinfeld and M. Sudan: Robust characterization of polynomials with applications to program
testing, SIAM J. Comput. 25 (1996), 252–271.
I.Z. Ruzsa and E. Szemerédi: Triple systems with no six points carrying three triangles, in: Com-
binatorics, Proc. Fifth Hungarian Colloq., Keszthely, Bolyai Society–North Holland (1976),
939–945.
R.H. Schelp and A. Thomason: Remark on the number of complete and empty subgraphs, Combin.
Prob. Comput. 7 (1998), 217–219.
O. Schramm: Hyperfinite graph limits, Elect. Res. Announce. Math. Sci. 15 (2008), 17–23.
A. Schrijver: Tensor subalgebras and first fundamental theorems in invariant theory, J. of Algebra
319 (2008) 1305–1319.
A. Schrijver: Graph invariants in the edge model, in: Building Bridges Between Mathematics and
Computer Science (eds. M. Grötschel, G.O.H. Katona), Springer (2008), 487–498.
A. Schrijver: Graph invariants in the spin model, J. Combin. Theory B 99 (2009) 502–511.
A. Schrijver: Characterizing partition functions of the vertex model by rank growth (manuscript)
A. Scott and A. Sokal: On Dependency Graphs and the Lattice Gas, Combin. Prob. Comput. 15
(2006), 253–279.
A. Scott: Szemerédi’s regularity lemma for matrices and sparse graphs, Combin. Prob. Comput.
20 (2011), 455–466.
A.F. Sidorenko: Classes of hypergraphs and probabilistic inequalities, Dokl. Akad. Nauk SSSR
254 (1980), 540–543.
A.F. Sidorenko: Extremal estimates of probability measures and their combinatorial nature (Rus-
sian), lzv. Acad. Nauk SSSR 46 (1982), 535–568.
A.F. Sidorenko: Inequalities for functionals generated by bipartite graphs (Russian) Diskret. Mat.
3 (1991), 50–65; translation in Discrete Math. Appl. 2 (1992), 489–504.
A.F. Sidorenko: A correlation inequality for bipartite graphs, Graphs and Combin. 9 (1993),
201–204.
A.F. Sidorenko: Randomness friendly graphs, Random Struc. Alg. 8 (1996), 229–241.
B. Simon: The Statistical Mechanics of Lattice Gasses, Princeton University Press (1993).
462 BIBLIOGRAPHY
M. Simonovits: A method for solving extremal problems in graph theory, stability problems, in:
Theory of Graphs, Proc. Colloq. Tihany 1966, Academic Press (1968), 279–319.
M. Simonovits: Extremal graph problems, degenerate extremal problems, and supersaturated
graphs, in: Progress in Graph Theory, NY Academy Press (1984), 419–437.
M. Simonovits and V.T. Sós: Szemerédi’s partition and quasirandomness, Random Struc. Alg. 2
(1991), 1–10.
M. Simonovits and V.T. Sós: Hereditarily extended properties, quasi-random graphs and not
necessarily induced subgraphs. Combinatorica 17 (1997), 577–596.
M. Simonovits and V.T. Sós: Hereditary extended properties, quasi-random graphs and induced
subgraphs, Combin. Prob. Comput. 12 (2003), 319–344.
Ya.G. Sinai: Introduction to ergodic theory, Princeton Univ. Press (1976).
B. Szegedy: Edge coloring models and reflection positivity, J. Amer. Math. Soc. 20 (2007),
969–988.
B. Szegedy: Edge coloring models as singular vertex coloring models, in: Fete of Combinatorics
(eds. G.O.H. Katona, A. Schrijver, T. Szőnyi), Springer (2010), 327–336.
B. Szegedy: Gowers norms, regularization and limits of functions on abelian groups,
http://arxiv.org/abs/1010.6211
B. Szegedy (unpublished).
E. Szemerédi: On sets of integers containing no k elements in arithmetic progression”, Acta
Arithmetica 27 (1975) 199–245.
E. Szemerédi: Regular partitions of graphs, Colloque Inter. CNRS (J.-C. Bermond, J.-C. Fournier,
M. Las Vergnas and D. Sotteau, eds.) (1978) 399–401.
T. Tao: A variant of the hypergraph removal lemma, J. Combin. Theory A 113 (2006), 1257–1280.
T.C. Tao: Szemerédi’s regularity lemma revisited, Contrib. Discrete Math. 1 (2006), 8–28.
T.C. Tao: The dichotomy between structure and randomness, arithmetic progressions, and the
primes, in: Proc. Intern. Congress of Math. I (Eur. Math. Soc., Zürich, 2006) 581–608.
A. Thomason: Pseudorandom graphs, in: Random graphs ’85 North-Holland Math. Stud. 144,
North-Holland, Amsterdam, 1987, 307–331.
A. Thomason: A disproof of a conjecture of Erdős in Ramsey theory, J. London Math. Soc. 39
(1898), 246–255.
P. Turán: Egy gráfelméleti szélsőértékfeladatról. Mat. Fiz. Lapok bf 48 (1941), 436–453.
J.H. van Lint and R.M. Wilson: A Course in Combinatorics, Cambridge University Press (1992).
V. Vapnik and A. Chervonenkis: On the uniform convergence of relative frequencies of events to
their probabilities, Theor. Prob. Appl. 16 (1971), 264–280.
A.M. Vershik: Classification of measurable functions of several arguments, and invariantly dis-
tributed random matrices, Funkts. Anal. Prilozh. 36 (2002), 12–28; English translation: Funct.
Anal. Appl. 36 (2002), 93–105.
A.M. Vershik: Random metric spaces and universality, Uspekhi Mat. Nauk 59 (2004), 65–104;
English translation: Russian Math. Surveys 59 (2004), 259–295.
D. Welsh: Complexity: Knots, Colourings and Counting, London Mathematical Society Lecture
Notes 186. Cambridge Univ. Press, Cambridge, 1993.
D. Welsh and C. Merino: The Potts model and the Tutte polynomial, Journal of Mathematical
Physics, 41 (2000), 1127–1152.
H. Whitney: The coloring of graphs, Ann. of Math. 33 (1932), 688–718.
H.S. Wilf: Hadamard determinants, Möbius functions, and the chromatic number of a graph,
Bull. Amer. Math. Soc. 74 (1968), 960–964.
D. Williams: Probability with Martingales, Cambridge Univ. Press, 1991.
E. Witten: Topological quantum field theory, Comm. Math. Phys. 117 (1988), 353–386.
E. Zeidler: Nonlinear functional analysis and its applications. Part I, Springer Verlag, New York,
1985.
BIBLIOGRAPHY 463
A.A. Zykov: On Some Properties of Linear Complexes, Mat. Sbornik 24 (1949), 163–188; Amer.
Math. Soc. Transl. 79 (1952), 1–33.
Author Index
465
466 AUTHOR INDEX
469
470 SUBJECT INDEX
stability number, 46
stable set polynomial, 65
stationary walk, 439
stepfunction, 115, 442
stepping operator, 144
subgraph sampling, 5
support graph, 336
ultrafilter, 444
principal, 444
ultralimit, 445
ultrametric, 331
ultraproduct, 444
uniform attachment graph, 188
unlabeling, 86
Vapnik–Chervonenkis dimension,
VC-dimension, 446
variation distance, 12
Voronoi cell, 228
473
474 NOTATION INDEX
Fk• , FS• k-labeled and S-labeled graphs, 39 I(W ) induced subgraphs of graphon, 247
Fkmult multigraphs on [k], 38
[k] = {1, 2, . . . , k}, 12
Fkstab k-labeled graphs with stable labeled
K1•• 2-multilabeled graph on one node, 39
set, 95
K(a, b) morphisms, 447
flo(G, q) nowhere-zero q-flows, 436 • , K •• partially labeled complete
Ka,b
F ∗ conjugate in concatenation algebra, 85 a,b
bipartite graphs, 39
G1 G2 Cartesian sum, 40 Kain , Kaout morphisms into and out of a, 447
G1 × G2 categorical product, 40 χ(G) chromatic number, 41
G1 G2 strong product, 40 Kn complete graph, 38
◦ looped complete graph, 38
G1 ∗ G2 gluing k-broken graphs, 416 Kn
• , K •• partially labeled complete graphs,
G→ edge-rooted graphs, 340 Kn n
G• , G • rooted graphs, 339 39
G•F extensions of F , 339 Kn r complete hypergraph, 422