0% found this document useful (0 votes)
23 views124 pages

Probabilistic Paradigm Combin-Chap 1-6

Uploaded by

Gaurav Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views124 pages

Probabilistic Paradigm Combin-Chap 1-6

Uploaded by

Gaurav Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 124

THE PROBABILISTIC PARADIGM: A

COMBINATORIAL PERSPECTIVE
Niranjan Balachandran
Dept. of Mathematics, IIT Bombay.

1
Contents

1 Preface 5

2 Notation with asymptotics 9

3 The Basic Idea 13


3.1 Lower bounds on the Ramsey number R(n, n) . . . . . . . . . . . . . . . 13
3.2 Tournaments and the Sk Property . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Sum-Free Sets of Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 The Distinguishing chromatic number of Levy graphs . . . . . . . . . . . 19
3.5 Colored hats and a guessing game . . . . . . . . . . . . . . . . . . . . . . 22
3.6 The 1-2-3 theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Managing Expectations and the Markov bound 27


4.1 Revisiting the Ramsey Number R(n, n) . . . . . . . . . . . . . . . . . . . 28
4.2 An approximate form of Caratheodory’s theorem . . . . . . . . . . . . . 28
4.3 Graphs with many edges and high girth . . . . . . . . . . . . . . . . . . . 29
4.4 List Chromatic Number and minimum degree . . . . . . . . . . . . . . . 32
4.5 The Varnavides’ averaging argument . . . . . . . . . . . . . . . . . . . . 36
4.6 A conjecture of Daykin-Erdős and its resolution . . . . . . . . . . . . . . 37
4.7 Graphs with high girth and large chromatic number . . . . . . . . . . . . 41

5 Dependent Random Choice 45


5.1 A graph embedding lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 An old problem of Erdős . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3 A special case of Sidorenko’s conjecture . . . . . . . . . . . . . . . . . . . 49
5.4 The Balog-Szemerédi-Gowers Theorem . . . . . . . . . . . . . . . . . . . 53

6 The Second Moment Method 59


6.1 Variance of a Random Variable and Chebyshev’s theorem . . . . . . . . . 59
6.2 The Erdős-Ginzburg-Ziv theorem: When do we need long sequences? . . 60
6.3 Distinct subset sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.4 The space complexity of approximating frequency moments . . . . . . . . 63
6.5 Uniform Dilations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

2
6.6 Resolution of the Erdős-Hanani Conjecture: The Rödl ‘Nibble’ . . . . . . 68

7 Basic Concentration Inequalities - The Chernoff Bound and Hoeffding’s inequality 79


7.1 The Chernoff Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.2 First applications of the Chernoff bound . . . . . . . . . . . . . . . . . . 81
7.3 Discrepancy in hypergraphs . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.4 Projective Planes and Property B . . . . . . . . . . . . . . . . . . . . . . 82
7.5 Graph Coloring and Hadwiger’s Conjecture . . . . . . . . . . . . . . . . . 84
7.6 Why the Regularity lemma needs many parts . . . . . . . . . . . . . . . 86

8 Property B: Lower and Upper bounds 89


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.2 Beck’s result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.3 The Radhakrishnan-Srinivasan (R-S) improvement . . . . . . . . . . . . . 92
8.4 And then came Cherkashin and Kozik... . . . . . . . . . . . . . . . . . . 95

9 More sophisticated concentration: Talagrand’s Inequality 97


9.1 Talagrand’s Inequality in metric probability spaces . . . . . . . . . . . . 97
9.2 First examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.3 An Improvement of Brook’s Theorem . . . . . . . . . . . . . . . . . . . . 100
9.4 Almost Steiner Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9.5 Chromatic number of graph powers . . . . . . . . . . . . . . . . . . . . . 106

10 Martingales and Concentration Inequalities 111


10.1 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
10.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
10.3 Azuma’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
10.4 The Shamir-Spencer Theorem for Sparse Graphs . . . . . . . . . . . . . . 113
10.5 The Pippenger-Spencer Theorem . . . . . . . . . . . . . . . . . . . . . . 114
10.6 A Conjecture of Erdős-Faber-Lovász (EFL) and a theorem of Kahn . . . 118

Bibliography 123

3
1 Preface

One of the most popular motifs in mathematics in recent times, has been the study of the
complementarity between the notions of ‘Structure’ and ‘Randomness’ in the sense that
most mathematical structures seem to admit this broad dichotomy of characterisation.
It is little wonder then, that probability theory has become a ubiquitous tool in as varied
areas of mathematics as Differential equations, Number theory, and, Combinatorics, to
name a few, as it brings forth the language to describe ‘randomness’.

Combinatorics is one of the areas where this dichotomy has played a very critical role
in the resolution of several very interesting problems. Combinatorics has long been held
as an area which , unlike many other areas of mathematics, does not involve a great deal
of theory building. However, it is of course not true that there is no general sense of
‘theory in combinatorics; to quote Gowers from his iconic essay, ‘The two Cultures of
Mathematics’, “The important ideas of combinatorics do not usually appear in the form
of precisely stated theorems, but more often as general principles of wide applicability.”
What plays an equivalent role in Combinatorics for ‘theory’ would be something akin to
‘general principles’ which shape the manner in which the combinatorist generally forms
his/her view. And one of the principal principles at work in combinatorics is the Proba-
bilistic paradigm.

The word ‘paradigm’ as listed on dictionary.com describes ‘a framework containing


the basic assumptions, ways of thinking, and methodology that are commonly accepted by
members of a scientific community’. The probabilistic paradigm in combinatorics, was
initiated largely due by the seminal work of Paul Erdős who ushered in the language,the
viewpoint, and perspectives it offers to problems in combinatorics, and today, probabilis-
tic tools are an indispensable part of the combinatorist’s arsenal.

One of the main reasons for the ubiquity and all-pervasive nature of the method is
that it provides a tool to deal with the ‘local-global’ problem. More specifically, many
problems of a combinatorial nature ask for a existence/construction/enumeration of a fi-
nite set structure that satisfies a certain combinatorial structure locally at every element.
The difficulty in many a combinatorial problem is to construct structures that are ‘locally
good’, everywhere. A significant part of this difficulty arises from the fact that, often,
there seem to be several possible choices for picking a local structure, but no canonical

5
choices, consequently, it is not clear which local choices are preferential. The probabilis-
tic paradigm enables one to consider all these ‘local’ patches simultaneously and provide
what one could call ‘weak’ conditions for building a global patch from the local data. In
recent times, many impressive results settling several long standing open problems, via
the use of the probabilistic method, essentially relying on this principle. But one thing
that stands out in all these results is: the techniques involved are often quite subtle, ergo,
one needs to understand how to use these tools, and think probabilistically.

Coming to existing literature on this subject, there are some truly wonderful mono-
graphs - the ones by Alon-Spencer [5], Spencer [26], Bollobás [6], Janson et al [16],
Molloy-Reed[21] spring readily to mind - on the probabilistic method in combinatorics,
specifically. In addition to this, one can find several lecture notes’ compilations on the
probabilistic method, on the internet. So, what would we seek to find in another book?
What ought it to offer one that is, say missing, in all this plethora of material that is
already available?

Most of the available material attempt to keep the proofs easy to follow and simple
to verify. But that invariably makes the proof appear rather magical, and almost always
obscures the thought processes behind them. Indeed, many interesting (probabilistic)
arguments appear in situations that do not seem to involve any probability at all, so
a certain sprezzatura is distinctly conveyed. So a new book could certainly do with a
deconstructionist perspective.

This book arose as a result of lectures for a graduate course - first at Caltech, and then
later at IIT Bombay - with the goal of providing that sense of perspective. Tim Gowers
has on more than one occasion written about ‘The Exposition Problem’: “Solving an
open exposition problem means explaining a mathematical subject in a way that renders
it totally perspicuous. Every step should be motivated and clear; ideally, students should
feel that they could have arrived at the results themselves.” 1 An entire book in this spirit
has not appeared before, and that is what this book really attempts to do.

This book could be criticsed as ‘a bunch of deconstructions of some specific results


that arise from the idiosyncrasies of the author’s choices’. Indeed, while some of the
results that appear are best known, there are others that are not best possible - even
within the material that appears in the book. My counter to that would be - Yes....and
No. The choice of material that has been included here is indeed a reflection of my own
tastes and preferences. But the theorems and results that appear here also reflect an
aspect of each of the techniques that are discussed, in a very specific sense; if for instance
a result appears in the chapter on the Second moment method, then the second moment
computation there is key to the eventual result. In that sense, the book is tightly struc-
tured.
1
This is from his blog where he in turn was quoting Tim Chow.

6
I do not (deliberately) include proofs or detailed discussions of many important results
from probability although I do state the ones in their full form for their utility within the
confines of this book. The reason is twofold: this is basically a book on combinatorics,
which forms and informs the topics of interest in the first place. Secondly, the imperative
is to provide a perspective into probabilistic heuristics and reasoning and not get into
the details and technicalities of probabilistic results in themselves. I list some sources as
references throughout the text for related reading.

As mentioned earlier, principles in combinatorics play the role of theory in most other
areas in mathematics. Most experts are aware acquainted with these principles, and have
some other principles of their own2 these never see an explicit mention in books (though
some blogs like that of Tao or Gowers do a fabulous job there), and it is for a well-
founded reason: these principles are more akin to rule-of-thumb and a formal statement
attempting to put this in words will inevitably be an oversimplification that amounts to
an incorrect statement. But my opinion is, these simplistic heuristics go a great way in
laying a pathway, not just towards solving open conjectures, but also allow us to pose
newer interesting questions. Towards that end, I put as an encapsulation, one core princi-
ple from each chapter; each chapter’s title includes an epigram that attempts a heuristic
describes the underlying principle.

I thank all my students who very actively and enthusiastically acted as scribes for the
lectures over the years, and those scribed notes, formed the skeleton for this book.

2
á la Groucho Marx, perhaps.

7
2 Notation with asymptotics

This is a brief primer on the Landau asymptotic notation. Given functions f, g, we write
f (n)
f  g (resp. f  g) if lim → ∞ (resp. → 0). We also write f = o(g) to denote
n→∞ g(n)
that f  g. We write f = O(g) (resp. f = Ω(g)) if there exists an absolute constant
C > 0 and n0 such that for all n ≥ n0 , |f (n)| ≤ C|g(n)| (resp. if |f (n)| ≥ C|g(n)|), and
f (n)
finally when we write f = Θ(g) then we mean f = O(g) and g = O(f ). If lim = 1,
n→∞ g(n)
then we write f ∼ g.

Here is a simple proposition that is left as an exercise.

Proposition 1. Suppose f, g, h are functions defined on the integers.

1. (Transitivity) f = o(g) and g = o(h) implies f = o(h). A similar statement holds


for O and Θ as well.

2. If g = o(f ) then f + g = Θ(f ).

The main advantage of using this notation is (besides making several statements look
a lot neater than they would if written in their exact analytic form) that the asymp-
totic notation allows us to ‘invert’ some functions in an asymptotic sense when an exact
inversion is not feasible. But before we make that precise, here is a simple proposition
that follows easily from the definition of the notation.To make this precise, consider the
following example. Suppose n ≤ Ck log k. Then how do we get a lower bound for k in
terms on n?

Here is a simple trick (and this will be used repeatedly in the book). The given
inequality gives us
cn
k≥
log k
and since clearly k ≤ n (otherwise there is nothing to discuss further), this gives us
cn
k≥
log n

9
n
which is best possible asymptotically since if k = log n
then
 
n log log n
k log k = (log n − log log n) = n 1 − = n(1 − o(1)).
log n log n

One can use this idea more iteratively as well as we shall see in the book.

Computations with asymptotics is an art, and also needs a bit of practice before one
can get comfortable with it. We illustrate one case in a little more detail; this is the
calculation from the second chapter (which is omitted there). Suppose for a fixed k, we
 k
wish to maximize n − nk 2−(2)+1 . Note that as n increases, this quantity eventually be-
comes negative, so one needs to find the optimal n for which this is large. Unfortunately,
the usual maxima/minima methods of calculus are not directly applicable here, so the
perspective is motivated more by an eye on the asymptotics.

Since the given quantity cannot exceed n, from an asymptotic perspective, we would
be happy if this quantity is at least as large as n/2. To see if we can achieve that, since
k k
k! ≥ ke , we have nk ≤ en

k
, so
 
n −(k2)  en k
2 ≤ .
k k2(k−1)/2

Now set  en k
4n = .
k2(k−1)/2
This gives
k
! k−1
k
n = 41/k 2(k−1)/2
e
k
= 41/(k−1) ( )k/(k−1) 2k/2
e
k2k/2
= (1 + o(1))
e
for k large.

Here is a second example, again from the second chapter.


 4  Again, the question is the
s4
following: Given d, maximize s such that d > 4 s log 2 ss . Again, since we shall not


4
 4  4
be too concerned by constants, let us set d = 8 ss log 2 ss . Write Y = 2 ss ; then this
 
gives us d = 4Y log(Y ), and so by our discussions from earlier, this gives us Y = Ω logd d .
4
As for the asymptotics of Y , since ss ≤ (es3 )s ≤ s4s , we have s4s ≥ Ω( logd d ) so taking

10
log both sides gives us 4s log s ≥ log d(1− o(1)). Once again, by the same argument as
from earlier, this gives us s ≥ Ω logloglogd d .

To the uninitiated, we close this chapter by recommending Asymptopia by J. Spencer


[?] for a fascinating introduction to the world of asymptotics.

11
3 The Basic Idea

If you cannot think of anything


clever, or worse, cannot think of
anything, roll the die and take
your chances.

he probabilistic paradigm or more simplistically, the probabilistic method, is based


on the following premise: Given a (combinatorial) problem, one may set up a probability
on the underlying set, and one may then compute (or estimate) the probability of an
event not occurring. If this probability is less than one, then the corresponding set that
describes the occurrence of the event is nonempty. This is a rather simple description
of the method, and yet hardly anything that explains how one might go about this is
revealed. Our goal in this book is to attempt a deconstruction of this process.

We shall assume a basic familiarity with the notions of probability theory and graph
theory. As a good reference, we recommend [28] for probability theory, and [29] for graph
theory.

3.1 Lower bounds on the Ramsey number R(n, n)


We start with the instance that ‘ started it all’ - the famous Erdős lower bound on the
Ramsey numbers. Ramsey theory, roughly stated, is the study of how “order” grows
in systems as their size increases. In the language of graph theory, the first result that
founded the basics of Ramsey theory is the following:

Theorem 2. (Ramsey, Erdős-Szekeres) Given a pair of integers s, t, there is an integer


R(s, t) such that if n ≥ R(s, t), any 2-coloring of the edges of Kn using colors red and
blue must yield either a red Ks or a blue Kt .

A fairly simple recursive upper bound on R(s, t) (proved inductively, and a good
exercise if you haven’t seen it before) is given by

R(s, t) ≤ R(s, t − 1) + R(s − 1, t),

13
which gives us
 
k+l−2
R(s, t) ≤
k−1

and thus, asymptotically, that


4s
R(s, s) ≤ C √
s

for some constant C and for s sufficiently large.


A constructive lower bound on R(s, s), discovered by Nágy, is the following:
 
s
R(s, s) ≥ .
3

Explicitly, his construction goes as follows: take any set S, and turn the collection of all
3-element subsets of S into a graph by connecting subsets iff their intersection is odd. The
graph represents the red edges, and the non-edges are the blue edges. It is a not-entirely
trivial exercise to show that this coloring admits no monochromatic clique of size s.

There is a rather large gap between these two bounds; one natural question to ask,
then, is which of these two results is “closest” to the truth? Turán believed that the
correct order was of the order s2 . Erdős, in 1947, in a tour-de-force paper “’’ disproved
Turán’s conjecture in a rather strong form:

Theorem 3. (Erdős) For s ≥ 3,

R(s, s) > b2s/2 c.

Proof. A lower bound entails a coloring of the edges of Kn using colors red and blue
with no monochromatic complete subgraph on s vertices. If one looks at Nágy’s exam-
ple, one is tempted to think of a ‘global’ recipe for coloring the edges in some manner
that witnesses the lack of sufficiently large monochromatic complete subgraphs. The
reason for this ‘global’ outlook is, if one starts with an ad-hoc coloring of the edges,
there seems to be plenty of leeway to color each edge one way or the other before our
color choices force our hand, and even on the occasions that they do, it is hard to see if
earlier choices could have been altered to improve upon the number we have so far. And
lastly, it is hard to see how this pattern (if one could use such a word here!) generalises
for large s. And in this conundrum of a situation, where local choices for edge colorings
do not seem clear, Erdős appealed to the principle explicated as the slogan of the chapter:

If you cannot think of anything clever, or worse, cannot think of anything, roll the die
and take your chances.

14
Fix n and consider a random 2-coloring of the edges of Kn . In other words, let us
n
work in the probability space (Ω, P r) = (all 2-colorings of Kn ’s edges, P r(ω) = 1/2( 2 ) .)
An alternate way of describing this would be to consider a random 2-coloring to be one
where each edge of Kn is independently colored red or blue with probability 1/2, since
there is no reason to prefer one color over another.

For some fixed set R of s vertices in V (Kn ), let AR be the event that the induced
subgraph on R is monochromatic. Then, we have that
 n s  n s
P(AR ) = 2 · 2( 2 )−(2) /2( 2 ) = 21−(2) .

Thus, we have that the probability of at least one of the AR ’s occurring is bounded
by
 
[ X n 1−(k2)
P( AR ) ≤ P(AR ) = 2 .
s
|R|=s R⊂Ω,|R|=s

s
If we can show that ns 21−(2) is less that 1, then we know that with nonzero probability


there is a 2-coloring ω ∈ Ω in which none of the bad events AR ’s occur! In other words,
we know that there is a 2-coloring of Kn that avoids both a red and a blue Ks , even
though we do not have such a coloring explicitly!
Solving, we see that
n 1−(2s) ns 1+(s/2)−(s2 /2) 21+s/2 ns
 
2 < ·2 = · s2 /2 < 1
s s! s! 2
whenever n = b2s/2 c, s ≥ 3.

3.2 Tournaments and the Sk Property


A tournament is simply an oriented Kn ; in other words, it’s a directed graph on n
vertices where for every pair (i, j), there is either an edge from i to j or from j to i, but
not both. A tournament T is said to have property Sk if for any set of k vertices in the
tournament, there is some vertex that has a directed edge to each of those k vertices.
One way to think of this is to imagine a tournament of some game where each pair of
players play each other - there are no draws - and for players i, j we shall indicate by the
directed edge (i, j) the outcome of the game between these players with i beating j. In
these terms, the property Sk indicates that in this tournament, for every set of k players,
there was always some player who beat them all.

One natural question to ask about the Sk property is the following:


Question 4. For a given arbitrary k, is there always a tournament with property Sk ? If
yes, how small can such a tournament be?

15
This problem again seeks an orientation of the edges that achieves Sk for starters,
and when that is done, to see if this property has a universality to it. Note that unlike
the Ramsey problem it is not true that all sufficiently large tournaments have property
Sk . Indeed, a transitive tournament - a tournament where the players come seeded, and
all the games between them respect their rankings - clearly does not possess Sk since no
one beats the top ranked player.

For small k - 1, 2, 3 - one can answer these questions to some degree of satisfaction;
indeed, we can calculate values of Sk through ad-hoc arguments:

• If k = 1, a tournament will need at least 3 vertices to satisfy Sk (take a directed


3-cycle.)

• If k = 2, a tournament will need at least 5 vertices to satisfy Sk .

• If k = 3, a tournament will need at least 7 vertices to satisfy Sk (related to the


Fano plane.)

For k = 4, constructive methods have yet to find an exact answer. Indeed, constructive
methods have been fairly bad at finding asymptotics for how these values grow. And
again, anyone who takes a stab at this problem, realises very quickly that the fundamental
problem here is one of choice; there does not seem to be a canonical choice for orienting
edges one way or another, for each edge, and again, as with the Ramsey problem, it is
hard to unravel which choices lead to what outcomes. And so, we bring out the maxim
once more.

Proposition 5. (Erdős) There are tournaments that satisfy property Sk on O(k 2 2k )-


many vertices.

Proof. Consider a random tournament: in other words, for every edge (i, j) of Kn direct
the edge i → j with probability 1/2 and from j → i with probability 1/2. Again, this
uniformity in choosing the edge orientation reflects our ambiguity for not preferring either
direction.

Fix a set S of k vertices and some vertex v ∈ / S. What is the probability that v has
an edge to every element of S? Relatively simple: in this case, it’s just 1/2k , so that the
probability that v fails to have a directed edge to each member of S is 1 − 1/2k . We shall
notate this by event as v 6→ S.

For different vertices v ∈


/ S, the events v 6→ S are all independent since these events
are determined by the edge orientations of disjoint sets of edges, so we know in fact that
n−k
/ S, v 6→ S) = 1 − 1/2k
P(for all v ∈ .

16
n

There are k
-many such possible sets S; so, by using the union bound again, we have
 
n n−k
P(There exists S such that for all v ∈
/ S, v 6→ S) ≤ · 1 − 1/2k .
k

As before, it suffices to force the right-hand side to be less than 1 as this means that
there is at least one orientation of the edges of Kn on which no such subsets S exist – i.e.
that there is a tournament that satisfies Sk . k
This takes us into a world of approximations. Using the approximations nk ≤ en

k
and 1 − x ≤ e−x , we calculate:
 n−k
−1/2k
e <1
 en k k
⇔ < e(n−k)/2
k
⇔k(1 + log(n/k)) · 2k + k < n

Motivated by the above, take n > 2k · k; this allows us to make the upper bound

k(1 + log(n/k)) · 2k + k < k(1 + log(k2k /k)) · 2k + k


 
k 2 1 1
= 2 · k · log(2) · 1 + +
k log(2) k2k log(2)
= k 2 2k log(2) · (1 + O(1));

so, if n > k 2 2k log(2) · (1 + O(1)) we know that a tournament on n vertices with property
Sk exists.
Remark: The asymptotics of this problem are still not known exactly. However, it
is known (as was shown by Szekeres) that a tournament on n players satisfying Sk needs
ck2k vertices for some absolute constant c > 0.

As we move to our next few instances of the basic method, we introduce the basic
tool that gets the probabilistic method going. For a real -valued random variable X on
a finite probability space, the Expectation of X (denoted E(X)) is defined as
X
E(X) := xP(X = x).
x∈R

Note that since the sum is finite since the probability space is finite.

The expectation is the first important tool that one plays with, in this process, and
one of the reasons it is a useful and simple quantity to play with, is that for random

17
variables X, Y on the same space, E(X + Y ) = E(X) + E(Y ).1 The main handle the
expectation gives us is the following: If E(X) ≥ α, then with positive probability, X ≥ α.
A similar statement holds for other inequalities as well.

A philosophical point before we get into applications of the idea. Why is the expec-
tation a useful tool? Here is a heuristic: An expectation computation, in some sense,
enumerates ordered pairs, and the formal definition for the expectation, fixes one of the
parameters of the ordered pair, and enumerates over the other parameter while fixing
the former. But one of the oldest combinatorial insights is that one might interchange
the order of enumeration and this principle allows us to reinterpret the same computa-
tion from another perspective. This has the advantage of making what seems a ‘global’
computation, the sum of ‘local’ computations.

3.3 Sum-Free Sets of Integers


This is another gem originally due to Erdős. A set B ⊂ R is called sum-free if the sum
of any two elements in B does not lie in B.

Theorem 6. Every set of n nonzero integers contains a sum-free subset of size ≥ n/3.

Proof. For ease of notation, let us write B = {b1 , . . . bn }. Firstly, (and this is now a
standard idea in Additive Combinatorics), we note that it is easier to work over finite
groups than the integers, so we may take p large so that all arithmetic in the set A (in Z)
may be assumed to be arithmetic in Z/pZ. Furthermore, if we assume that p is prime,
we have the additional advantage that the set is now a field, which means we have access
to the other field operations as well. Thus we pick some prime p = 3k + 2 that’s (for
instance) larger than twice the maximum absolute value of elements in B, and look at B
modulo p – i.e., look at B in Z/pZ. Because of our choice of p, all of the elements in B
are distinct mod p.
Now, look at the sets

xB := {xb : b ∈ B} in Z/pZ,

and let

N (x) = |[k + 1, 2k + 1] ∩ xB| .

We are then looking for an element x such that N (x) is at least n/3. Why? Well, if
this happens, then at least a third of xB’s elements will lie between p/3 and 2p/3; take
those elements, and add any two of them to each other.This yields an element between
2p/3 and p, and thus one that’s not in our original third; consequently, this subset of
1
In more fanciful terms, expectation as an operator on the space of random variables is linear operator
with operator norm 1.

18
over a third of xB is sum-free. But this means that this subset is a sum-free subset of B,
because p was a prime; so we would be done.

So, the question again is: Is there a clever way of choosing an x that would optimally
bring a big chunk of xB into the middle? Not really. So, let’s just roll the dice - pick
x uniformly (there is no reason to pick one element more than another) at random, and
examine the expectation of N (x):
k+1
1x·b∈[k+1,2k+1] = n ·
X 
E(N (x)) = ≥ n/3.
b∈B
3k + 1

Thus, some value of x must make N (x) exceed n/3, and thus insure that a sum-free
subset of size n/3 exists.
Remark: One can ask the same question more generally on an arbitrary abelian
groups, and there, the corresponding constant is 2/7 (see [2]). For the integers, it remained
a hard problem to determine if the constant 1/3 could be improved, and as it turns out,
1/3 is indeed the best possible constant (see [12]).

3.4 The Distinguishing chromatic number of Levy graphs


For a graph G = (V, E) let us denote by Aut(G), its full automorphism group. A labelling
of the vertices using the labels {1, . . . , r} is said to be distinguishing (or r-distinguishing)
if no nontrivial automorphism of the graph preserves all of the vertex labels. The Dis-
tinguishing Chromatic Number, a variant of the usual chromatic number of the graph
introduced by Collins and Trenk, is defined as the minimum number of colors r, needed
to color the vertices of the graph so that the coloring is both proper (adjacent vertices
receive different colors) and distinguishing. In other words, the distinguishing chromatic
number of a graph G (denoted χD (G)) is the least integer r such that the vertex set V (G)
can be partitioned into sets V1 , V2 , . . . , Vr such that each Vi is independent in G, and for
every I 6= π ∈ Aut(G) there exists some color class Vi such that π(Vi ) 6= Vi . It is not hard
to see that this variant is well defined, and in recent times, this variant has attracted a
lot of attention.

Here, we shall consider the following specific problem. Let Fq denote the finite field
of order q, and let us denote the vector space F3q over Fq by V . Let P be the set of 1-
dimensional subspaces of V and L, the set of 2-dimensional subspaces of V . We shall refer
to the members of these sets by points and lines, respectively. The Levi graph of order
q, denoted by LGq , is a bipartite graph defined as follows: V (LGq ) = P ∪ L, where this
describes the partition of the vertex set; a point p is adjacent to a line ` if and only if p ∈ `.

The choice of terminology of ‘lines’, ‘points’ is because the pair (P, L) is a Projective
plane of order q, so every pair of points lie on a unique line, and every pair of lines have

19
a unique common point. For more, we refer the reader to [15], for instance.

The fundamental theorem of projective geometry [15] states that the full group of
automorphisms of the projective plane P G(2, Bq ) is induced by the group of all non-
singular semi-linear transformations P ΓL(F3q ). If q = pn for a prime number p, P ΓL(F3q ) ∼
=
3 3 ∼ 3
P GL(Fq ) o Gal(Fq /Fp ). In particular, if q is a prime, we have P ΓL(Fq ) = P GL(Fq ), so
P ΓL(F3q ) is a subgroup of the full automorphism group of LGq . The full group is larger
since it also includes maps induced by isomorphism of the projective plane with its dual.

Proposition 7. χD (LGq ) = 3 for all prime powers q ≥ 7.

Proof. First, let us see why 2 colors will not do. It is easy to see that LGq is connected, so
the only proper colorings correspond to the vertex partition (P, L). But every non-trivial
map A ∈ P GL(F3q ) induces an automorphism of LGq which keeps the two color classes
intact, and that establishes that χD (LGq ) > 2.

To get a proper distinguishing 3-coloring of LGq , one may imagine partitioning L =


L1 ∪ L2 and have as the three color classes, P, L1 , L2 . To see what makes this a distin-
guishing coloring (it is clearly proper), consider for starters, an automorphisms φ induced
by P ΓL(F3q ); call it φ. If φ preserves the other two color classes then for every line ` ∈ L,
φ(`) is in the same color class. In particular, the orbit

Orbφ (`) := {`, φ(`), . . .}

is contained within the same color class. And this happens for every automorphism.

Thus, a partition of L such that for at least one of the automorphisms induced by
P ΓL(F3q ), this property of all its orbits being within the same part is violated. This sug-
gests a random partition of L, i.e., for each ` ∈ L, set it in L1 or L2 independently, and
uniformly. A bad event in this context would be the presence of a nontrivial automor-
phism that maps both these partitions into themselves, or in terms of the observation
above, a bad event Eφ is the event where for each ` ∈ L, the orbit of ` is contained
entirely in the part Li containing it. This idea can be captured more generally as follows.

Suppose a graph G is given a vertex coloring using χ(G) colors and suppose C1 is one
of its color classes. Let G be the subgroup of Aut(G) consisting of all automorphisms that
C1 as a set. For each A ∈ G, let θA denote the total number of distinct orbits induced
by the automorphism A in C1 . Fix an integer t ≥ 2, and partition C1 randomly into t
parts, i.e., for each v ∈ C1 , pick uniformly and independently, an element in {1, 2, . . . , t}
and assign v to the corresponding part.

For φ ∈ G, let Eφ denote the event that φ fixes every color class. Observe that if φ
fixes a color class containing a vertex v, then all other vertices in the set orbφ (v) are also

20
in the same color class. Moreover the probability that Orbφ (v) is in the same color class
of v, equals t1−|Orbφ (v)| . Then
Y
P(Eφ ) = t1−|Orbφ (v)| = tθφ −|C1 |
θφ

Let N ⊂ G denote the set of all automorphisms which fixes each of the t parts that were
partitioned, and let N = |N |. Then note that
X 1
E(N ) ≤ (3.1)
φ∈G
t|C1 |−θφ

If E(N ) ≤ f (G) := A∈G tθA −|C1 | < r, where r is the least prime dividing |G|, then with
P
positive probability N < r. Since N is in fact a subgroup of G, N divides |G|, so it
follows that with positive probability, N = 1, which means, we have a distinguishing
proper coloring using χ(G) + t − 1 colors.

Define

F ixφ (S) := {v ∈ S : φ(v) = v}, (3.2)


Fφ (S) = |F ixφ (S)|, (3.3)
F (S) := max Fφ (S) (3.4)
φ∈G
φ6=I

|C1 |−F (C1 )


. Since θφ ≤ F (C1 ) + 2
X F (C1 )−|C1 | F (C1 )−|C1 |
E(N ) ≤ t 2 = |G|t 2 .
A∈G

Thus, if F (C1 ) < |C1 | − 2 logt |G| then there exists a distinguishing proper χ(G) + t − 1
coloring of the graph.

Let us return to our setting. Set G = P GL(F3q ). It is a simple exercise to check that
every A ∈ P GL(F3q ) which is not the identity fixes at most q + 2 points of LGq . Hence

(q 2 + q + 1) − (q + 2) q 2 + 2q + 3
θA ≤ q + 2 + = .
2 2
Consequently,
(q 8 − q 6 − q 5 + q 3 )
f (G) < +1 (3.5)
t(q2 +1)/2
For q = 7, t = 2, the right hand side of (3.5) is approximately 1.16. Since the right hand
side of inequality (3.5) is monotonically decreasing in q, it follows that f (G) < 2 for q ≥ 7,
hence χD (LGq ) ≤ 3.

21
Remark: It turns out that χD (LG5 ) = 3 as well, and this again follows the same
argument. The only difference is that the analysis above does not work, and one needs
to explicitly compute f (G) In this case, for t = 2 one calculate f (G) explicitly (using a
computer program) to obtain f (G) ≈ 1.2 to see that χD (LG5 ) = 3. For more results on
the distinguishing chromatic number, see [7].

3.5 Colored hats and a guessing game


There are n friends standing in a circle so that everyone can see everybody else. On each
person’s head a randomly chosen hat - either black or white - is placed. After they have
had a look at each other, they must make a claim on their hat color, or declare that
they are unable to determine the color from what they have seen. They cannot hear the
answers of their friends, and cannot communicate with each other in any manner, but
they may make a strategy prior to the placement of the hats. They are awarded a grand
prize if at least one person gets the color of her hat correct, and no one gets her hat color
wrong. In the latter case, they are all punished.

One easy strategy to achieve a 50% success is if one of the friends takes a random
guess, and the others pass on their guess (“I don’t know the color of my hat”). The friends
wish to devise a strategy that increases the probability of their getting the reward. The
question is, do they have a better strategy? Again, the question is to be viewed as one
that deals with n large but fixed.

First, let us formalize the problem. Denoting white and black by 1 and 0 respec-
tively, any configuration of hats (on the friends’ heads) is a point in {0, 1}n . Thus the ith
member witnesses a vector xi = (xi,1 , . . . , xi,i−1 , ∗, xi,i+1 , . . . , xi,n ), and these vectors are
compatible in the sense that for all distinct i, j and k 6= i, j, we have xi,k = xj,k .

Suppose there exists a set L ⊂ {0, 1}n such that for every element (x1 , x2 , . . . , xn ) ∈
W = {0, 1}n \ L there is an element (y1 , y2 , . . . , yn ) ∈ L such that the set {i | xi 6= yi }
has size 1; call such a set L desirable. The upshot is the rather crucial observation that a
desirable set allows us to strategize as follows. Person i knows xj for all j 6= i, so if there
is a unique value of xi so that (x1 , x2 , . . . , xn ) ∈ W then person i declares that her hat
color is xi .

This allows the friends to argue as follows. If they have a desirable set on their hands,
then the strategy outlined above works unless the color choice profile of the hats cor-
responds to a point of L. Consequently, the probability that the friends get the award
following this strategy equals 1 − |L|
2n
. Thus to maximize this probability, they need a
‘small’ desirable set L. And since there is no canonical choice here, we shall pick it ran-
domly.

22
Pick a set X by choosing each element of {0, 1}n independently. But this time, the
probability distribution is not clear. Picking each element with probability 1/2 as in
the preceding examples would result in a very large set with high probability - this will
become more formal in later chapters. So for the moment, let us pick each such element
with probability p, where p is a parameter that is to be determined later.

For a fixed x ∈ {0, 1}n , let 1x∈L denote the random variable that takes value 1 if
x ∈ L and 0 otherwise. Note that E(1x∈L ) = P(1x∈L ). By linearity of expectation,
1x∈L ) = E(1x∈L ) = P(1x∈L ) = 2n p.
X X X
E(|X|) = E(
x∈{0,1}n x∈{0,1}n x∈{0,1}n

Let Y be the set of elements which differ from the chosen elements in at least two
coordinates. For a fixed x ∈ {0, 1}n , it is easy to see that x ∈ Y is equivalent to saying
that no element in the ‘ball’2 B(x) consisting of all the elements which differ from x in
at most one coordinate are not chosen into X. Since |B(x)| = n + 1 (the element x and
the n elements that differ from x in exactly one coordinate) we have
X
E(|Y |) = P(x ∈ Y ) = 2n (1 − p)n+1 .
x∈{0,1}n

so But now consider the set L = X ∪ Y ; this is indeed a desirable set! Furthermore,
E(|L|) = E(|X| + |Y |) = 2n (p + (1 − p)n+1 ). Minimizing this over p ∈ [0, 1] (basic calcu-
1
lus), gives us p = 1 − (n+1)1/n .

Plugging this back in the preceding expression gives us


 
n nx
E(|L|) = 2 · 1 −
n+1
where x = (n + 1)−1/n . Since this expression look cumbersome, we backtrack and work
differently. Note that p + (1 − p)n+1 ≤ p + e−(n+1)p we minimise the latter function. That
gives us p = log(n+1)
n+1
, and plugging this in gives
   
n 1 + log(n + 1) n 2 log n
E(|L|) ≤ 2 ≤2
n+1 n
for all n ≥ 3 (the last inequality is a simple exercise). By the probabilistic maxim, there
then exists a set L whose size is at most O( logn n ) fraction of the total size, which then
means that the friends can achieve a success rate of 1 − 2 logn
n
.

Remark: A concise (and low complexity) description of optimal sized desirable sets
is possible for n of the form 2k , and this is through what are known as Hamming codes.
One can also prove similar results when the friends are assigned hats that may take any
one of q different colors.
2
This term is not a loose one. The Hamming distance d( x, y) which counts the number of coordinates
where x, y differ is indeed a legitimate metric

23
3.6 The 1-2-3 theorem
The following question was first posed by Margulis: Given i.i.d random variables X, Y
according to some distribution F , is there a constant C (independent of F ; that is the
important thing) such that

P(|X − Y | ≤ 2) ≤ CP(|X − Y | ≤ 1)?

Note that it is far from obvious that such a C < ∞ must even exist. However, it is
easy to see that such a C must be at least 3. Indeed, some X, Y are uniformly distributed
on the even integers {2, 4, . . . , 2n} then it is easy to check that P(|X − Y | ≤ 1) = 1/n
and P(|X − Y | ≤ 2) = n3 − n22 . It was finally proved by Kozlov in the early 90s that the
constant C = 3 actually works. Alon and Yuster shortly thereafter gave another proof
which was simpler and had the advantage that it actually established

P(|X − Y | ≤ r) < (2r − 1)P(|X − Y | ≤ 1)

for any positive integer r ≥ 2 which is also the best possible constant one can have for
this inequality. We shall only show the weaker inequality with ≤ instead of the strict
inequality. We shall later give mention briefly how one can improve the inequality to the
strict inequality though we will not go over all the details.

Proof. The starting point for this investigation is based on one of the main tenets of
Statistics: One can estimate (well enough) parametric information about a distribution
from (large) finite samples from the same. In other words, if we wish to get more infor-
mation about the unknown F , we could instead draw a large i.i.d sample X1 , X2 , . . . , Xm
for a suitably large m and then the sample percentiles give information about F with
high probability. This is in fact the basic premise of Non-parametric inference theory.

So, suppose we did draw such a large sample. Then a ‘good’ estimate for P(|X − Y | ≤
1) would be the ratio
{(i, j) : |Xi − Xj | ≤ 1}
.
m2
A similar ratio, namely,
{(i, j) : |Xi − Xj | ≤ r}
m2
should give a ‘good’ estimate for P(|X − Y | ≤ r). This suggests the following question.
Question 8. Suppose T = (x1 , x2 , . . . , xm ) is a sequence of (not necessarily distinct)
reals, and Tr := {(i, j) : |xi − xj | ≤ r}. Is it true that |Tr | ≤ (2r − 1)|T1 |?
If this were false for some real sequence, one can consider F appropriately on the
numbers in this sequence and maybe force a contradiction to the stated theorem. Thus,

24
it behooves us to consider this (combinatorial) question posed above.

Let us try to prove the above by induction on m. For m = 1 there is nothing to prove.
In fact, for m = 1 one in fact has strict inequality. So suppose we have (strict) inequality
for r − 1 and we wish to prove the same for r.

Fix an i and let T 0 = T \ {xi }. Consider the interval I := [xi − 1, xi + 1] and let
SI = {j|xj ∈ I}, and let |SI | = s. Then it is easy to see that
|T1 | = |T10 | + (2s − 1).
Now in order to estimate |Tr |, note that we need to estimate the number of pairs (j, i)
such that |xi − xj | ≤ r. Suppose i was chosen such that |SI | is maximum among all
choices for xi . Then observe that if we partition
[xi −r, xi +r] = [xi −r, xi −(r−1)) · · · , [xi −2, x1 −1), [xi − 1, xi + 1], (xi +1, xi +2], · · · , (xi +(r−1), xi +r]
as indicated above, then in each of the intervals in this partition there are at most s
values of j such that xj is in that corresponding interval. This follows by the maximality
assumption about xi .

In fact, a moment’s thought suggests a way in which this estimate can be improved.
Indeed, if we also choose xi to be the largest among all xk that satisfy the previous
criterion, then note that each of the intervals (xi + l, xi + (l + 1)] can in fact contain at
most s − 1 xj ’s. Thus it follows (by induction) that
|Tr | ≤ |Tr0 |+2(r−1)s+(2s−1)+2(r−1)(s−1) < (2r−1)|T10 |+(2r−1)(2s−1) = (2r−1)|T1 |.
This completes the induction and answers the question above, in the affirmative, with
strict inequality.
Now, we are almost through. Suppose we do sample i.i.d observations X1 , X2 , . . . , Xm
from the distribution F , and define the random variables T1 := {(i, j) : |Xi − Xj | ≤ 1}
and Tr := {(i, j) : |Xi − Xj | ≤ r} , then note that
X
E(T1 ) = P(|Xi − Xj | ≤ 1) + m = (m2 − m)p1 + m,
i6=j

where p1 = P(|Xi − Xj | ≤ 1). Similarly, we have


E(Tr ) = (m2 − m)pr + m
with pr = P(|Xi − Xj | ≤ r). By the inequality
Tr < (2r − 1)T1
we have
(m2 − m)pr + m = E(Tr ) < (2r − 1)E(T1 ) = (2r − 1)((m2 − m)p1 + m).
2r−2
This simplifies to pr < (2r − 1)p1 + m−1
. As m → ∞, the desired result follows.

25
As mentioned at the beginning, Alon and Yuster in fact obtain strict inequality.
We shall briefly describe how they go about achieving that. They first prove that if
pr = (2r − 1)p1 , then if we define pr (a) = P(|X − a| ≤ r) there exists some a ∈ R such
that pr (a) > (2r − 1)p1 (a). Once this is achieved, one can tweak the distribution F as
follows.

Let X be a random variable that draws according to the distribution F with probabil-
ity 1 − α and picks the number a (the one satisfying the inequality pr (a) > (2r − 1)p1 (a))
with probability α for a suitable α. Let us call this distribution G. Then from what we
(G) (G) (G)
just proved above, it follows that pr ≤ (2r − 1)p1 . Here pr denotes the probability
pr = P(|X − Y | ≤ r) if X, Y are picked i.i.d from the distribution G instead. However,
(G)
if we calculate these terms, we see that pr = pr (1 − α)2 + 2α(1 − α)pr (a) + α2 , so the
above inequality reads

pr (1 − α)2 + 2α(1 − α)pr (a) + α2 ≤ (2r − 1) p1 (1 − α)2 + 2α(1 − α)p1 (a) + α2




which holds if and only if


β
α≥ where β = pr (a) − (2r − 1)p1 (a) > 0.
r−1+β
But since choosing α is our prerogative, picking α smaller than this bound yields a con-
tradiction and completes the proof.

As we complete this chapter, we leave the reader with an important caveat. The power
of the probabilistic method becomes more evident only when one runs out of ideas, so
to speak. Where one has a deterministic argument that seems to work, the probabilistic
method is not to be unsheathed as it will be suboptimal. To illustrate this point, suppose
[n] := {1, 2, . . . , n}, and suppose we wish to show that for n ≤ m, there is an injective
function from [n] to [m]. We pick a random function φ which maps for each x ∈ [n] a
uniformly random member of [m] as its image, and independently for x ∈ [n]. For a fixed
pair x, y ∈ [n], the probability that x and y are mapped to the same element in [m] by φ
(n)
is 1/m. Hence the union bound tells us that if m2 < 1 then with positive probability, the
random function φ is injective. In other words, our methods of this chapter only testify
towards the existence of an injective function for m ≥ cn2 , and the suboptimality of this
conclusion is evident to all. One may ask if this suboptimality is due to the union bound,
or if the reason is something subtler. We shall return to this point in a later chapter.

26
4 Managing Expectations and the Markov
bound

If the expected value of a


non-negative random variable is
small, then the random variable
is not very likely to take large
values.

As we saw in the latter half of the preceding chapter, Expectation of a random variable
is one of the most useful tools within the Probabilistic paradigm. One of the reasons the
expectation appears a very natural tool is because most combinatorially relevant functions
can be regarded as random variables that tend to get robust with a larger population, so
the expected value gives an idea of where a ‘typical’ observation of the random variable
lies and that is often a very useful start. For instance, suppose our random variable in
question counts the number of undesirable events. Having its expected value less than
one is good for our cause. But even if that is not the case, having a low value of its
expectation has useful consequences - it instantiates the existence of a configuration with
very few undesirable events.

There is another important feature that the expectation computation provides. While
the expectation tells us that the random variable in question can take small/large values
relative to its expected value, it does not tell us how likely such an outcome might be.
Here is a concrete instantiation of the same. Consider a random variable X that takes
the value n2 with probability 1/n and 0 with probability 1 − 1/n (for n large). The
expected value is n and yet the random variable itself is non-zero with very low probabil-
ity. Of course, this example also illustrates that one needs a relative perspective on what
large/small ought to be. But if the expected value of a non-negative random variable is
small, then the random variable is not very likely to take large values.

In this chapter, we shall expound upon these two principles.

27
4.1 Revisiting the Ramsey Number R(n, n)
Let us revisit the problem of the lower bounds for R(n, n). As usual, color the edges of
the complete graph Kn red or blue with equal probability, and independently for distinct
 k
edges. Then the expected number of monochrome copies of Kk is m := nk 2−(2)+1 . Thus
there is a coloring of the edges in which there are at most m monochrome copies of Kk .
Now, from each such monochrome copy, delete a vertex; then the resulting graph on n−m
 k
vertices has no monochrome Kk ! Thus we get R(k, k) > n − nk 2−(2)+1 .
Now, to see if this improves upon our earlier bound, we need to do some calculus.
If m = n/2, then we get R(k, k) > n/2. Some routine asymptotics (see the chapter on
asymptotics for the detail) give us

k
R(k, k) > (1 + o(1)) 2k/2
e
for large k.

4.2 An approximate form of Caratheodory’s theorem


One of the most powerful consequences of the probabilistic paradigm is that it allows for
an ‘approximate’ version that allows for more efficient (albeit randomized) algorithms,
and here we present one such instance.

The Caratheodory theorem is a well known and fundamental theorem in discrete


geometry. It states: Every point in the convex hull of a set T ⊂ Rn can be written as a
convex combination1 of at most n + 1 points of T , and the number n + 1 is best possible.
Now, consider an approximate version of this statement: Suppose we wish to approximate
a point in the convex hull of T . Do we still need close to n points to be able to make
the approximation? More precisely, suppose x is a point in the convex hull. How small a
number k of points of T are needed so that some convex combination of those points is
close to x?

Theorem 9. Suppose T is a subset of Rn of bounded diameter2 d, and suppose ε > 0.


2
Then one can find points x1 , . . . , xt with t ≤ dε2 ) such that there is a convex combination
y of the xi such that
kx − yk ≤ ε
where the norm is the usual L2 -norm.

The interesting feature of this theorem is that the ambient dimension n does not
feature at all! The only thing that matters is how good an approximation we are hoping
1
P A convex combination of x1 , . . . , xt is a sum of the form λ1 x1 + · · · + λt xt with λi ≥ 0 for all i and
i i = 1.
λ
2
The diameter of a set A is defined as the supremum of the distances between pairs of points on A.

28
to find. This is again a feature that appears on more than one occasion when one
encounters randomized methods.
Proof. Without loss of generality, by translating T , we assume that the radius of T is at
most 1, i.e., for all x ∈ T , we assume that kxk ≤ 1. Let x be a point in the convex hull of
T . By Caratheodory’sP theorem, there exist x1 , . . . , xm with m ≤ n + 1 such that x = λi xi
where λi > 0 and i λi = 1. The λi summing to one suggests a natural random variable.
Let y bePthe (vector-valued) random variable with P(y = xi ) = λi for i = 1, . . . , m. Then
E(y) = i λi xi = x. Here the expectation is taken coordinate-wise.

Picking from the general principle that an average of i.i.d (independent and identically
distributed) random variables converges to its expected value, it is somewhat natural
to consider y1 , . . . , yk independent and distributed as y above, and look at the average
z := k1 (y1 + · · · + yk ). Clearly, E(z) = x. To see how well this fares, let us compute (or
estimate) E(kz − xk2 . For the random variable y,

E(ky − xk2 ) = E kyk2 + kxk2 − 2hy, xi




= E kyk2 − kxk2


Xm
λi kxi k2 − kxk2

=
i=1
≤ 1.
1
Pk
Set zi = yi − x so that z − y = k i=1 zi . Then
2
P
2 Ek i zi k
Ekz − xk =
k2 !
k
1 X X
= 2 Ekzi k2 + 2 Ehzi , zj i
k i=1 i<j
1

k
where the last inequality comes from the preceding calculation and the fact that zi , zj
are independent which consequently gives Ehzi , zj i = 0 for all i < j. In particular, this
computation tells us that there exist k vectors yi for which kz − xk2 ≤ 1/k and z as
above. The rest is a routine consequence.
Remark: The theme of approximations is closely tied with the probabilistic paradigm,
and we will encounter the motif several times in the book.

4.3 Graphs with many edges and high girth


The girth of a graph G is the size of its smallest cycle (should a cycle exist) and if the
graph is acyclic, then its girth is infinite. It is both intuitively and mathematically clear

29
that as the number of edges in a graph increases (proportional to the total possible num-
ber of edges) then its girth can go down dramatically. So, for a fixed parameter k, the
following extremal problem is both natural and of great interest to the extremal combi-
natorist: What is the maximum possible number of edges in a graph on n vertices with
girth at least k?

The following simple argument gives an upper bound. Suppose the graph G has
minimum degree at most d. Set ` = k−2 2
. If the girth of G is at least k, then for any
vertex v, the subgraph induced on the `-fold neighborhood centered at v, i.e., the set of
vertices at a distance of at most ` from v, is a tree This follows since if this was not a
tree, then there is a cycle contained in this graph. However, since any vertex w is at a
distance of at most ` from v, the size of the cycle is at most 2` + 1 < k contrary to the
assumption. Since each vertex has degree at least d, this induced subgraph has at least
`

d (d − 1) − 1
1 + d + d(d − 1) + · · · + d(d − 1)`−1 = 1 +
d−2
vertices.

Now for a given graph G with average degree d, note that if we remove a vertex of
degree at most k (for some k - we’ll see what k to set) then the modified graph has
average degree at least nd−2k
n−1
. If we set k = d2 then this last expression is at least d.
In other words, if we delete a vertex of degree at most d/2 then the average degree of
the graph does not decrease in this process. Consequently, this process must eventually
terminate and when it does, every vertex has degree at least d/2. It is a simple exercise
to show that if the average degree of a graph is at least Cn2/(k−2) for a suitably large C
then G must have a cycle of size at most k − 1. In other words, the maximum number of
2
edges in a graph with girth at least k is O(n1+ k−2 ).

A lower bound was established by Erdős.

Theorem 10. For a given integer k ≥ 4 and n somewhat large, there exist graphs on n
1
vertices with girth at least k with Ω(n1+ k−2 ) edges.

Proof. To establish a lower bound, one needs to construct a graph with girth at least k and
as many edges as possible. Like in the Ramsey problem, the small cases for k (k = 3, 4 for
starters) are well studied; indeed one knows the best possible bounds in these cases. But
in the general case, the specificity of the examples in the smaller cases makes it harder to
generalize. And so Erdős did what came to him naturally; he picked the graph at random.

Let us construct a random graph where each edge is chosen independently with prob-
ability p where p (as in a previous example) will be determined later. The ‘bad instances’
here are incidences of small cycles. Indeed, for 3 ≤ t ≤ k − 1 let Nt denote the number

30
of t-cycles in G. Then

n(n − 1) · · · (n − t + 1) t (np)t
E(Nt ) = p <
2t 2t
since every cyclic permutation of size t counts a particular t-cycle exactly 2t times - the
first vertex is picked in one of t ways, and the orientation in one of two possible ways.
This gives
X (np)3 + · · · + (np)k−1 (np)k−1
E( Nt ) ≤ ≤ .
3≤t≤k−1
6 3

n pn2

On the other hand, the expected number of edges of G is E(e(G)) = 2 3
for n large
enough.

The key insight again of Erdős was: If the number of small cycles is not that large,
say, it is at most half the total number of edges, then one may throw away one edge from
each of the small cycles, thus eliminating all small cycles, and yet, retaining at least half
the total number of edges. This suggests, that if

n2 p
(np)k−1 ≤
2

then by linearity of expectation3


!
X E(e(G))
E e(G) − Nt ≥
t<k
2

so that there is an instance with this inequality holding. That gives us a lower bound on
the number of edges of a graph with girth at least k.

The computation now is straightforward. We leave it to the reader to see that the
c
inequality we have forced gives us p = n(k−3)/(k−2) (for a small constant c > 0) which in
1
turn gives us a bound of e(G) = Ω(n1+ k−2 ).

Remark: As it turns out, the random construction here is not best possible, and the
considered opinion of the experts in extremal combinatorics, is that the exponent that
appears in the upper bound is the truth. However that remains an open problem. The
best known bound is a remarkable algebraic construction by Lazebnik, Ustimenko and
4
Woldar [19] which gives a lower bound of Ω(n1+ 3(k−2)−+ε ) which, in terms of the exponent
of n is ‘halfway’ between the randomized construction and the simple upper bound. This
again reinforces the caveat: If one can, one should strive for non-random constructions.
3
Again, the linearity is key here.

31
4.4 List Chromatic Number and minimum degree
The list chromatic number is a notion introduced by Erdős, Rubin and Taylor in their
seminal paper that sought to address what was called the ‘Dinitz problem’. This variant
of the usual chromatic number goes as follows. For a graph G, let L = {Lv |v ∈ V (G)}
be a collection of subsets of some set C indexed by the vertices of G. These are to be
interpreted as lists of colors assigned to each vertex. An L-coloring of G is an assignment
of an element χ(v) ∈ Lv for each v ∈ V such that if u and v are adjacent vertices in
G, then χ(u) 6= χ(v). In the parlance of colorings, this is a choice of color assignments
to each vertex such that no two adjacent vertices are assigned the same color. The list
chromatic number of G, denoted χl (G), is the smallest k such that for any family L with
|Lv | ≥ k for all v, G is L-colorable. It is not hard to see that this is well defined, and in
fact, the usual chromatic number of G corresponds to the case where all the vertex lists
are identical. The next result shows that the reverse inequality need not hold.

The list chromatic number is a very interesting invariant for a host of reasons. One
natural way to motivate this notion is the following. Suppose we attempt to properly
color the vertices of a graph using colors, say, 1, . . . , k and to suppose we are given a
partial coloring. For each uncolored vertex v, let Dv denote the set of colors that appear
among any of its neighbors from the partial coloring. Then the partial coloring extends
to a proper k coloring of the graph if and only if the induced graph on the remaining
uncolored vertices is L-colorable where Lv := [k] \ Dv . So in that sense, list colorings
arise quite naturally in connection with proper colorings.

As was observed in the seminal paper of Erdős, Rubin, and Taylor, there are bipartite
graphs with arbitrarily large list chromatic number.

2k−1

Theorem 11. (Erdős, Rubin, Taylor) χl (Kn,n ) > k if n ≥ k
.

Proof. We wish to show there is some L = {Lv |v ∈ V (G)} with |Lv | = k for each
v ∈ V (G) such that Kn,n is not L-colorable. Let A and B denote the two partition
classes of Kn,n , i.e., the two sets of vertices determined by the natural division of the
complete bipartite graph Kn,n into two independent subgraphs.
Now we construct L. Take the set of all colors from which we can construct Lv ’s
to be {1, 2, ..., 2k − 1}. Since n ≥ 2k−1 k
, which is the number of possible k-subsets
of {1, 2, ..., 2k − 1}, we can choose our Lv ’s for the v’s in B so that each k-subset of
{1, 2, ..., 2k − 1} is Lv for some v ∈ B, and similarly we choose lists for vertices of A.
If S is the set of all colors that appear in some Lv with v ∈ B, then S intersects every
k-element subset of {1, 2, ..., 2k − 1}. Then we must have that |S| ≥ k (since otherwise
its complement has size ≥ k and thus contains a subset of size k disjoint from S). But
then since |S| ≥ k, by choice of lists there exists a ∈ A with La ⊂ S. Since a is adjacent
to every vertex in B, so no L-coloring is possible.

32
Another interesting feature of the list chromatic number is the following result due to
Alon.

Theorem 12. (Alon) Suppose d denotes the minimum degree of G. Then


 
log d
χl (G) = Ω .
log log d

This is quite at variance with the usual chromatic number since one has bipartite
graphs with arbitrarily large minimum degree.

Proof. If the result holds, then it also holds in the case the chromatic number is two
(that is the first nontrivial case), so let us first assume that the graph is bipartite with
partition classes A and B, and |A| ≥ |B|.

We shall assume that the minimum degree is sufficiently large (for asymptotic rea-
sons). In order to establish a lower bound of the form χl (G) > s, we need to assign lists
of size s to each vertex and ensure that from these lists, a list coloring is not possible.
Suppose C is a set of colors from which we shall allocate lists to each of the vertices of G.
Without loss of generality, let C := {1, 2, ...., L} for some L to be fixed/determined later.

How do we show that a vertex a ∈ A cannot be colored from its list? Let us take a
cue from the previous result: Suppose a vertex a ∈ A has, among the lists assigned to
its neighbors in B, all the possible s-subsets of C. Consider a choice of colors assigned
to the vertices of B from their respective lists, and let W be the set of colors that are
witnessed by this choice. Note that since the neighbors of a witness all possible s-subsets
of C, W ∩ S 6= for all S ⊂ C of size s, so that in particular, |W | ≥ L − s + 1. If this choice
extends successfully to a choice for a, then La must contain an element from a very small
set, viz., C \ W , which has at most s − 1 colors of C. Now, if there are several such vertices
a ∈ A (i.e., that witness every s-subset as the list of one of its neighbors) then this same
criterion must be met by each of these vertices. And that is not very likely to happen if
we were to allot random lists to the vertices of A! This potentially sets up a contradiction.

Let us set this in motion. Call a vertex a ∈ A critical if among its neighbors, all
possible s-subsets of C appear. To achieve this, assign for each b ∈ B, the set Lb to be
an s-subset of C uniformly at random and independently over different vertices. Then
the probability that a is not critical is equal to the probability that there exists some
s-subset T of C such that no neighbor of a is assigned T as its list. Since there are Ls
possible T ’s it follows by the union bound that

  !d  
L 1 L −d/(Ls)
P (a is not critical) ≤ 1 − L ≤ e .
s s
s

33
Now assume that d  Ls . Then by the above, P (a is not critical) < 12 . So if N


denotes the number of critical vertices of A,


X |A|
E(N ) = P(a is critical) > .
a∈A
2

Thus there exists an assignment of lists for vertices in B, {Lv |v ∈ B}, such that the num-
ber of critical vertices is greater than |A|
2
. Fix these choices for the lists for the vertices
of B.

Fix a color palette w from these assigned lists, i.e., a choice of an element each from
the collection {Lv |v ∈ B}. Denote as W = W (w) the set of colors that appear among
the vertices on B from the palette w.

Since there exists critical a ∈ A, W has nonempty intersection with all s-subsets of
[L], so |W | ≥ L − s + 1. If an extension of w to a coloring to a exists for a critical vertex
a, then as we observed earlier, exists, La ∩ W 6= ∅.

Since we haven’t yet dealt with the color lists for A, let us pick color lists for the
vertices of A uniformly at random from the s-subsets of C. Then for a critical a ∈ A

(s − 1) L−1

s−1 s2
P (w extends to a exists) ≤ L
 < .
s
L

For an extension of w to G to exist, we need an extension of w to all critical vertices


of A. Since there are s|B| possible w’s and the number of critical vertices is greater than
|A|
2
, we have(since the color lists for the vertices of a are picked independently)

|A|/2  12 !|B|
s2 s2
 
P (an extension to a coloring of G exists) ≤ s|B| ≤ s
L L
q
s2
which is less than 1 if s < 1, or equivalently, if L > s4 , so set L = 2s4 .. Recall the
L
L
assumption made earlier that d  Ls . We needed this to make Ls e−d/( s ) < 21 , which is
 

equivalent to d > Ls log(2 Ls ).


 

In summary, if  4   4 
2s 2s
d≥4 log 2 ,
s s
then there is a collection of lists L = {Lv |v ∈ G} with |{Lv }| = s for all v ∈ G such that
no L-coloring of G exists, i.e., χl (G) > s. Again, arriving at the lower bound as stated
in the theorem is a good exercise in asymptotics. For the precise details, see the first

34
chapter on asymptotics.

This is all under the assumption that G was bipartite. But it is a very simple fact
that every graph contains a ‘large’ bipartite subgraph:
Lemma 13. For any graph G, there exists a subgraph H of G with V (H) = V (G) such
that H is bipartite and dH (v) ≥ 21 dG (v) for all v ∈ V (G).
To see why, consider a partition of the vertex set into two parts so that the number of
crossing edges (across the partition) is maximized. It is now a straightforward observation
to see that for every vertex at least half of its neighbors must be in the other part since
otherwise we can move that vertex to the other part. This partition in particular produces
a bipartite subgraph H as stated. This completes the proof of the lemma and the theorem
as well.

Alon later improved his bound to χl (G) > ( 12 − o(1)) log d with d = δ(G). We shall
show a proof of a slightly weaker form of this result (χl (G) ≥ c log d for some constant
c > 0 in a later chapter by a different probabilistic paradigm. Alon also conjectured that
χl (G) ≤ O(∆(G)) where ∆(G) denotes the maximum degree of G. That remains an open
problem.

We now move on to the next principle outlined in the introduction of this chapter,
and that is this fairly easy inequality.

Theorem 14. (The Markov Inequality) IF X is a non-negative value random variable


then
E(X)
P(X ≥ a) ≤ .
a
As seen in the example earlier, the expectation of a random variable being ‘large’ does
not guarantee that its takes large values with high probability. But if the random variable
is bounded, then it must take ‘somewhat large’ values with ‘ not-too-little’ probability:

Proposition 15. Suppose X is a non-negative values random variable and suppose X ≤


M with probability one. Then for a < E(X),

E(X) − a
P(X ≥ a) ≤ .
M −a
The idea of proof is straightforward. Write
Z Z
E(X) = X dP + X dP
x<a x≥a
≤ aP(X ≤ a) + M (1 − P(X ≥ a))

and now the conclusion follows by a straightforward computation.

35
4.5 The Varnavides’ averaging argument
An old conjecture due to Erdős and Turán was the following: Given ε > 0 and an in-
teger k ≥ 3, there exists N0 := N0 (ε, k) such that the following holds: If N ≥ N0 and
A ⊂ {1, . . . , N } with |A| ≥ εN then A contains a k-term arithmetic progression (k-AP
for short). This conjecture came on the heels of the theorem due to van der Waerden
that states that given positive integers, r, k there exists an integer W (r, k) such that if
N ≥ W , then any r-coloring of the integers in [N ] := {1, . . . , N } necessarily contains a
monochromatic k-AP. The Erdős-Turán conjecture basically captures the intuitive idea
that van der Waerden’s theorem holds because it does so on the most popular color class.
This conjecture was settled in the affirmative for k = 3 by Roth, and then later in its full
generality by Szemerédi.

What we are after (following Varnavides) is a generalization of this result. The state-
ment we aim to prove is that, if N is sufficiently large, and |A| ≥ εN then A in fact
contains as many k-APs as there can be (upto a constant multiple):

Theorem 16. (Varnavides) Given ε > 0 and k ≥ 3, there exists δ := δ(ε, k) such that
the following holds. Any subset A ⊂ [N ] of size at least εN contains at least δN 2 APs of
length k.

Note that the total number of possible k term APs in [N ] is determined by specifying
the first term, and the common difference, so there are at most N 2 k-APs in all. Thus,
this theorem is best possible up to the constant.

Proof. By Szemerédi’s result, we know that there is an N0 := N0 (ε, k) such that any
subset A ⊂ [N0 ] of size at least εN0 contains at least one k-AP. The simplest thing one
can imagine doing is, cutting the set [N ] into linear chunks of length N0 ; clearly at least
one of these chunks must meet A in at least an ε proportion of its size. That unfortu-
nately does not give us anything new. But one thing is does suggest is, the number of such
chunks that have a reasonable proportion of A in them will each give us one distinct k-AP.

But this breaking into chunks is a little too wasteful. Every AP of size N0 is again a
model for the set [N0 ] so one might want to consider all possible APs of length N0 and
see how many among them meet A in a significantly large portion. Of course, the same
k-AP might be a part of several such N0 -APs, so there is a double counting issue to sort
out anyway. But that immediately suggests the following.

Pick an N0 -AP at random, i.e., pick x0 , d uniformly and independently from [N ] with
d 6= 0 and let K := K(x0 , d) = {x0 , x0 + d, . . . , x0 + (N0 − 1)d}. One problem this imme-
diately poses is that not all possible such pairs give rise to K ⊂ [N ]. To overcome this
nuisance, let us work in Z/N Z. The relevant random variable now is |A∩K|. To compute
the expectation using the linearity of expectation, we need to compute P(a ∈ K) for an
arbitrary a ∈ [N ]. To count the number of pairs (x0 , d) such that K(x0 , d) contains a,

36
observe that if a = x0 is the first term, then all possible choices for d count such valid
K, otherwise, there are N − 1 choices for x0 , and for each of those, if a is the ith term of
K(x0 , d) (with N0 − 1 choices for i) this determines d uniquely, provided we can solve the
corresponding linear equation a = x0 + id for d. Again, this makes things messy, but as
we have observed earlier, this works provided N is prime.

So, let us start again. Instead of working in Z/N Z, pick a prime p ∈ (2N, 4N ) and
let us then work in Z/pZ. We choose p > 2N since that ensures that addition of two
elements in {1, . . . , N } is the same as addition in Z/pZ. We also don’t want p to be too
large because we need to estimate p in terms of N . Now pick x0 , d uniformly from Z/pZ
as outlined before, and let K := K(x0 , d). Then
X |A| ε
E(|K ∩ A| = P(a ∈ K) = N0 ≥ N0
a∈A
p 4

by the assumption on the size of A. We would ideally like it to be the case that |K ∩ A|
takes somewhat large values with not-too-small probability, and by Proposition 15
 ε  ε
P |K ∩ A| ≥ N0 >
8 8
since |K ∩ A| ≤ N0 . This suggests the following: Let N0 = N0 (ε/8, k) from Szemerédi’s
theorem. Then it follows by everything seen above that with probability at least ε/8,
K ∩ A contains a k-AP. But on the other hand, if A is the set of all k-APS contained in
A, then
X
P(K ∩ A contains a member of A) ≤ P(P ⊂ K)
P ∈A
N0 (N0 − 1)
≤ |A|
N (N − 1)
since to determine if K(x0 , d) contains P , we have at most N0 (N0 − 1) choices for deter-
mining the first and second elements of P in K. Rearranging terms, this gives
ε
|A| ≥ N2
16N0 (N0 − 1)
and that completes the proof.
Remark: The constants N0 (ε, k) that Szemerédi’s proof offer are extremely large and
are of a tower type.

4.6 A conjecture of Daykin-Erdős and its resolution


Suppose H ⊂ P([n]) is a hypergraph. One can construct a graph GH with vertex set
E(H), and E 6= F are adjacent in GH iff E ∩ F = ∅.

37
Why would one define this particular graph? This is actually one of the most well-
studied instances of a graph arising naturally from hypergraphs. The Kneser graphs are
instances of this when the underlying hypergraph is the complete uniform hypergraph of
order r, i.e., V (H) = [n] and E(H) = [n]

r
for some r < n/2. One of the other motiva-
tions for studying this graph comes from the problem of explicit constructions for Ramsey
graphs: We say that a graph on n vertices is k-Ramsey (if the k is implicitly clear, we
simply say Ramsey graph) if it neither contains an independent set4 nor a clique5 of order
k. Note that if the edges of a k-Ramsey G are colored red, and the remaining edges of
Kn are colored blue then by the definition of G being k-Ramsey, this coloring does not
contain a monochromatic Kk .

As seen earlier, the first explicit construction of a k-Ramsey graph due to Nágy was
by considering the graph GH where H was the complete uniform hypergraph of order 3.
As seen before, Erdős proved that R(k, k) > Ω(k2k/2 ) but his proof did not provide an
explicit deterministic construction. This also suggests the following question: Suppose
1
e(H) = 2( 2 +δ)n for some δ > 0. Is there a hypergraph H on n vertices such that the
graph GH is n-Ramsey? If true, this would in one stroke improve upon the probabilistic
lower bound, and also provide an explicit construction for n-Ramsey graphs.

While this would indeed be nice, it seems like asking for too much. And to see why
that would be the case, suppose G is a Ramsey graph on n vertices, i.e., suppose G
contains neither an independent set, nor a clique of order 2 log2 n (Why?!). We claim
that e(Gn ) must be rather large. Indeed, a celebrated theorem in extremal graph theory
due to Turán (in an alternate formulation) states that a graph on n vertices admits an
n
independent subset of size at least d+1 , where d is the average degree of the vertices of
the graph. If the Ramsey graph Gn satisfies e(Gn ) ≤ cn2−δ for some constants c, δ > 0
n
then by Turán’s theorem, 6 α(Gn ) ≥ d+1 = Ω(nδ ), so such graphs could not be Ramsey.
1
Since examples of Ramsey graphs of size 2( 2 +δ)n seemed difficult to contruct (they
still are!) this line of argument possibly convinced Daykin and Erdős to conjecture the
following:
1
Conjecture 17 (Daykin-Erdős). If |H| = m = 2( 2 +δ)n , then

d(H) := #{{E, F } ∈ H | E ∩ F = ∅} = o(m2 ).

Note that if m = 2n/2 then in fact there do exist hypergraphs H for which the graph
GH are dense (though not Ramsey graphs). For instance, take the set [n] and partition
it into two sets A, B of size n/2 each, and consider H to consist of all subsets of A along
with all subsets of B. Since A, B are disjoint, GH has all edges of the type (E, F ) where
4
A subset S of vertices is called independent if no two vertices of S are adjacent.
5
A clique in a graph is a set of pairwise adjacent vertices.
6
α(G) denotes the size of a largest independent subset of vertices in G,

38
A ⊂ A, F ⊂ F . The conjecture of Daykin and Erdős says that this cannot be improved
upon if the exponent were strictly greater than 1/2.

The Daykin-Erdős conjecture was settled by Alon and Füredi in 1985.

Theorem 18. (Alon-Füredi) Suppose 0 < δ < 1 is fixed, and suppose n is sufficiently
2
large. If |H| = m = 2(1/2+δ)n then d(H) < cm2−δ /2 for some positive constant c.

Proof. Let us see how we could go about this. If the graph GH is dense, one should
expect to find two large disjoint subsets S, T of V (GH ) which constitute a dense pair,
i.e., one ought to expect to see lots of edges between S and T . If this pair witnesses all
possible edges, one has ! !
[ \ [
S T = ∅.
S∈S T ∈T
[
So if the pair S, T constitute a dense pair, it would not be a stretch to expect that S
[ S∈S
and T are almost disjoint from each other. But if these sets are also ‘large’ subsets
T ∈T
of [n], this appears unlikely and would probably give a contradiction.

To see if we can pull


[this off, let us try something simpler first: Suppose there exists an
S such that A(S) := S is large, and further, that the number of sets E ∈ H satisfying
S∈S
E ∩ A(S) = ∅ is also large. For the sake of simplicity, suppose |A(S)| > n2 . Then

#{E ⊂ H | E ∩ A(S) = ∅} < 2n/2

so if there exists S such that


[
n
• A(S) = S satisfies |A(S)| > 2
(which is likely), and
S∈S

• #{E ∈ H | E ∩ A(S) = ∅} ≥ 2n/2 ,

then we have a contradiction.

Let us begin formally now. We seek a collection S with A(S) being very large. Since
we are bereft of specific choices, pick S1 , S2 , · · · , St ∈ H uniformly and independently
for some t that shall be determined later. If A(S) has at most n/2 elements, then there
exists T ⊂ [n] such that |T | = n/2 and each Si ⊂ T . Fix such a choice for T .

n
#{E ∈ H | E ⊂ T } #{E ∈ H | E ⊂ T } 22 1
P(S1 ⊂ T ) = = 1 ≤ 1 = .
|H| 2( 2 +δ)n 2( 2 +δ)n 2δn

39
Therefore by the union bound,
  t
2n

n 1 1
P(|A(S)| ≤ n/2) ≤ δn
< tδn
= (tδ−1)n .
n/2 2 2 2
Thus, to ensure that this is a low probability event, we need tδ − 1 > 0, or equivalently,
t > 1δ .

n/2
X part, we want X := #{E ∈ H | E ∩ A(S) = ∅} to be at least 2 .
For the second
Writing X = 1{E∩A(S)=∅} we have
E∈H
X
E(X) = P(E ∩ A(S) = ∅).
E∈H

Fix E ∈ H.
 t
d(E)
P(E ∩ A(S) = ∅) = P(E ∩ Si = ∅ for all i = 1, . . . , t) =
m
where d(E) is the degree of E in GH . Denoting e(GH ) = M , we have
X X  d(E) t
E(X) = P(E ∩ A(S) = ∅) =
E∈H E∈H
m
!
1 1 X
= t−1 (d(E))t
m m E∈H
2t M t
≥ .
m2t−1
E[X]
By proposition 15 we have (setting a = 2M
)
 
1 2t M t−1
1 E[X] m2t−1
− 2
 
M
E[X] 2M
P X ≥ aM ≥ E[X]
=  
1− 2M 1− 2t−1 M t−1
m2t−1

which gives us
n 1
P(|A(S)| > ) ≥ 1 − (tδ−1)n .
2 2
If
2t M t−1
m2t−1 1
t−1 M t−1 > ,
1 − 2 m2t−1 2(tδ−1)n
then both events as outlined in the sketch happen simultaneously and our contradiction
is achieved. Choose t = 2δ . If M = cm2 , then this forced inequality is feasible for a
suitable t that depends only on δ and c. To determine the upper bound for M as in the
statement of the theorem is a straightforward exercise.

40
4.7 Graphs with high girth and large chromatic number
While it is easy to ensure that a graph constructed has a high chromatic number (make
a clique of that size as a subgraph), it became a considerably harder task of ensuring
that the same holds if we forbid large cliques. The first such question that arose was the
following:

Question 19. Do there exist graphs with chromatic number k (for any given k) and
which are also triangle free?

This was settled with the ‘Mycielski construction’ in the affirmative. This led to
the next natural question: What if we also forbid 4 cycles? Tutte produced a sequence
of graphs with girth 6 and arbitrarily large chromatic number, but the bigger question
loomed large: Do there exist graphs with arbitrarily large chromatic number and also
arbitrarily large girth? It took the ingenuity of Erdős to settle this in the affirmative.

To see why this is a little surprising, note that insisting on large girth g, simply implies
that for each vertex v, the induced subgraph on the set of vertices at a distance at most
g/2 is a tree, which can be 2-colored. Yet, it is indeed conceivable that the chromatic
number of the entire graph varies vastly from the chromatic number of small induced
subgraphs.

This again fits the general template we have discussed. We need a graph G in which
locally small induced subgraphs are trees, and yet, the graph itself has large chromatic
number. A random graph appears a sound candidate for such a possibility.

Theorem 20. (Erdős) There are graphs with arbitrarily high girth and chromatic number.

Proof. Let Gn,p denote a random graph on n vertices, where each pair of vertices {x, y}
is added independently with probability p. Let n be sufficiently large. For the random
graph to give us what we seek we want:

• Gn,p will have relatively few small cycles with reasonably high probability.

• G has large chromatic number with reasonably high probability.

Fix a number `, and let N` denote the number of cycles of length at most ` in Gn,p . As
seen before,
`
X nj pj (np)3 (np)`−2 − 1 (np)`
E(N` ) ≤ ≤ · ≤ .
j=3
2j 6 (np) − 1 2

Hence by the Markov inequality,

P r(|N` | ≥ (np)` ) ≤ 1/2

41
in other words, with probability at least 1/2 Gn,p has at most (np)` cycles of size at most
`. This is the first step.
To show that the chromatic number of our random graphs is large, we need to un-
derstand how one might bound the chromatic number from below. Doing this directly,
by working with the chromatic number itself, would be rather ponderous. But a simple
observation based on the definition tells us that since each color class is an independent
set, we have
n
χ(G) ≥
α(G)
where α(G) is the independence number of the graph. How does this help?
Let us examine α(G) in Gn,p . We need this to be small since we want the chromatic
number to be large. Then
P(α(G) ≥ m) = P(there exists an independent subset of G of size m)
X
≤ P(there are no edges inside S)
S⊂V,|S|=m
 
n m
= (1 − p)( 2 ) < (n exp(−p(m − 1)/2))m .
m
To get a handle on these parameters we have freely introduced, note that the term in-
m−1

side the parenthesis
l m in the last inequality above can be expressed as exp log n − ( 2
p) ,
so if m = 3 log
p
n
, then the probability that Gn,p contains an independent set of size at
least m goes to zero as n → ∞.

So, what
l domwe have on our hands now? If p is chosen in some manner, and then we
set m = 3 logp
n
then with positive probability (certainly) we have that G has at most
9np)` cycles of size at most ` AND that its independence number is at most m. Pick such
a G.
As before, we shall perform some deletions to G to rid it of all small cycles. But unlike
the earlier instance, if we deleted edges, we run the risk of pumping up its independence
number, so this time let us delete vertices instead. The advantage is that vertex deletions
result in an induced subgraph of the original graph, so its independence number remains
the same.
1/g
This suggests that we set (np)` ≤ n/2, or equivalently, p < Cnn for some constant C.
λ
So, set p = nn for some λ ∈ (0, 1/`), and m as suggested. Then remove an arbitrary vertex
from each small cycle from G, and call the resulting graph G0 . Then G0 has girth ≥ ` and
at least n/2 vertices. Finally, since deleting vertices doesn’t decrease the independence
number of a graph,
|V (G0 )| n/2 np nλ
χ(G0 ) ≥ ≥ ≥ = ,
α(G0 ) α(G) 6 log n 6 log n

42
which goes to infinity as n grows large.
Remark: There have been subsequently many constructive forms of this result, with
the first one by Lovász, and then subsequently by many others. Many of those construc-
tions actually construct hypergraphs with the same property. The nicest description of
such graphs however are the Ramanujan graphs constructed by Lubotzky-Phillips-Sarnak
([20]). But the proof involves some sophisticated number theory.

43
5 Dependent Random Choice

Sometimes, the desired object


is not the random object itself,
but an associate of it.

In this chapter, we consider another aspect of tweaking a randomized construction:


Sometimes it pays off to pick the object of desire not be picking it directly as a random
object, but rather pick another object randomly and then pick a relevant associated object
to the randomly picked object, to be our desired object. This sounds a bit roundabout but
on quite a few occasions, it turns out to be the correct thing to do.
The premise for some of the investigations in this chapter is motivated by the following
question: Given a ’small’ graph H, how many edges must a graph G have in order that
H ⊂ G? We denote by ex(H; n) the maximum number of edges in an n vertex graph
G which is H-free. If H is not bipartite then theorem of Erdős-Stone-Simonovits settles
this upto a multiplicative factor of 1 + o(1). But if H is bipartite then the Erdős-Stone-
Simonovits theorem only tells us that ex(n; H) = o(n2 ). This begs the following question:

Question 21. For H bipartite, what is the correct value of α with 1 ≤ α < 2 such that
ex(n; H) = Θ(nα(H) )?

Suppose H = (A∪B, E) with |A| = a, |B| = b. One constructive way to find a copy of
H in a large graph G is to try and embed H into G, one vertex at a time. Suppose there
is a large subset A0 of G into which the vertices of A have already been embedded in
some fashion. Let B = {v1 , v2 , · · · vb } and suppose that we have embedded v1 , · · · vi−1 into
V (G). The new idea that provides a scheme by which this inductive procedure extends
to embedding vi as well is the following: Suppose vi has degree r, and suppose that every
r-subset of A0 has many common neighbors in G. One elementary bound here is that the
number of common neighbors is at least a + b. Since A has been embedded into A0 , this
gives a set U ⊂ A0 of size ≤ r which should be the neighbor set for vi . Since |U | ≤ r and
it has at least a + b common neighbors in G there is some available choice for vi in V (G)
which is not a vertex that has already been taken! In short, we have the following

Proposition 22. Let H be bipartite, H = (A ∪ B, E) with |A| = a, |B| = b, any vertex


in B has degree at most r. Suppose there is a set A0 ⊂ V (G) of size at least a such that

45
every r-subset of A0 has at least a + b common neighbors in G. Then H can be embedded
in G.

The main idea here which basically inverts the usual way of approaching embeddings
presents a scenario which might allow one to establish other conditions that ensure that
a graph H can be embedded into a bigger graph G. And the technical condition that the
idea proposes leads to the following question: Given a graph G, under what conditions
can one ensure that there exists a subset of vertices A0 of size at least a such that every
r-subset of A has at least a + b common neighbors?

Before we address this question, let us see why picking the subset A0 randomly is not
a good idea. Suppose G is bipartite with both parts of considerable size. Then a random
set is very likely to pick at least one vertex from each of the parts and then the condition
cannot be satisfied. Since we do not choose our actual objects of interest by the random
method but rather in this dependent manner, this method is referred to as the method
of Dependent Random Choice.

5.1 A graph embedding lemma

Let V (H) = A ∪ B, |A| = a, |B| = b, let A0 be subset of V (G) containing all the vertices
of A. We seek to embed the graph H in G as described in the preceding section, and this
brings us concretely to the following question: How do we determine a set A0 such that
every r-subset of A0 has many common neighbors in G?

The main idea here is to invert this search: instead of picking the set A0 (say, ran-
domly) and hoping to have found one with many common neighbors, pick a set T and let
A0 be the set of those vertices which contain T among their neighbors. This is a healthy
heuristic since by fiat, we know that all the chosen vertices have the vertices of T among
their neighbors.

Indeed, over t rounds, pick a vertex vi uniformly at random and independently across
the rounds. Call this set T and consider the set of common neighbors of T - we shall

46
denote that by N ∗ (T ). Then
X
E(|N ∗ (T )|) = P(v ∈ N ∗ (T ))
v∈V
X  d(v) t
=
v∈V
n
!t
1 1X
≥ d(v)
nt−1 n v∈V
¯t
(d)
=
nt−1
where d denotes the average degree of the vertices of G. The inequality above follows
from Jensen’s inequality for convex functions.
Let Y denote the number of r-subsets U of N ∗ (T ) such that U has fewer than m
common neighbors. Then
X
E(Y ) ≤ P(U ⊂ N ∗ (T )).
U ⊂V (G),|U |=r
|N ∗ (T )|<m

If U ⊂ N ∗ (T ), it means that every choice for T was picked from among the common
t
neighbors of U , so P(U ⊂ N ∗ (T )) ≤ mn
. Consequently,
 
n m t
E(Y ) ≤ ( )
r n
which implies
¯ t n  m t
(d)

E(|N (T )| − Y ) ≥ t−1 −
n r n
so that there exists (by the method of alterations as seen in the preceding chapter)
¯t  m t
A0 ⊂ N ∗ (T ) of size at least n(d) n
t−1 − r n
such that every r-subset of A0 has at least m
common neighbors. This gives the following
Theorem 23. (Alon, Krivelevich, Sudakov) H is bipartite with vertex partition (A, B),
1
and if every vertex of B has degree ≤ r, then ex(n; H) = OH (n2− r ).
Proof. We only need to fill in the gaps now. Note that
1 1
e(G) ≥ Cn2− r =⇒ d¯ ≥ 2Cn1− r

where C = CH is a constant depending on H. To complete the proof, we need


¯ t n  a + b t
(d)
− ≥ a.
nt−1 r n

47
Now plugging in the lower bound for d from before, we have
t t
¯ t n a + b
(d) ((2C)t nt− r )t nr a + b

t
− ( ) ≥ − .
nt−1 r n nt−1 r! n

Now, setting r = t gives that the last expression is at least

(a + b)r
(2C)r − >a
r!
 1/r
1 (a+b)r
with C > 2
a+ r!
and that completes the proof.

Before we move on, we highlight the inequality obtained earlier to the status of an
observation.

Observation 24. Suppose the average m tdegree of a graph G is d. Then there exists a
dt n
subset A0 of size at least nt−1 − r n such that every r-subset of A0 has at least m
common neighbors.

The peculiar aspect of this observation is that the parameter t which appears is not
present in the consequence, so it is more of a driving parameter that gives a condition to
make a conclusion.

5.2 An old problem of Erdős


We now take a look at another old problem of Erdős that was settled in the affirmative
following the Dependent Random Choice line. But first, we need a definition.

Definition 25. A topological copy of a graph H is formed by replacing every edge of H


by a path such that paths corresponding to distinct edges are internally disjoint, i.e., have
no common internal vertices.

Erdős conjectured that if e(Gn ) ≥ cp2 n, then there is a topological copy of Kp in G.


This was proved in 1998 by Bollobás and Hind. Erdős’ conjecture implies that there is a
topological copy of K√n in Gn if e(Gn ) ≥ cn2 .

Definition 26. A t-subdivision where each edge is replaced by a path with ≤ t internal
vertices.

Erdős also asked if ε-dense graphs, i.e., graphs Gn with e(Gn ) ≥ εn2 admit a 1-
subdivision of KΩ(√n) . More formally, is there a 1-subdivision of Kδ√n in an ε-dense
graph for some absolute δ = δ(ε) > 0? Note that the Bollobás-Hind result does not
establish a 1-subdivision, since the paths in the topological copy could involve some long
paths.

48
The following perspective is key:

If one seeks to embed a fixed bipartite graph into another graph, and if all the vertices
on one side of the bipartite graph have somewhat small degree, then the Dependent Choice
method gives a handle - effectively reducing the problem to a calculation - on proving suf-
ficiency results.

In the Erdős problem above, note that a strict 1-subdivision of the complete graph
Ka , i.e., one where each edge of Ka is subdivided to get a path of length two, corresponds
a

to a bipartite graph with parts of size a, 2 , respectively. Observe that every vertex in
a
the part of size 2 has degree 2 since each of these vertices is placed in an edge of the
original Ka , and hence has degree 2. Thus, the Dependent Random Choice technique
appears a likely tool.
Theorem 27. (Alon, Krivelevich, Sudakov) If e(Gn ) ≥ εn2 , then G has a 1-subdivision
of Kε3/2 √n .
Proof. If we think along the lines of the embedding procedure that we discussed in the
previous sections, then as remarked above, we have a sufficiency condition provided we
make a back calculation. Indeed, we would have the result we seek if
¯ t n  m t
(d)
− ≥ a.
nt−1 r n
Here r = 2, m = a + a2 < 2 a2 < a2 , and d¯ ≥ 2εn.
 

Consequently,
n2 a2t
LHS > (2ε)t n − .
2 nt

For a = δ n and δ = ε3/2 , we have

n2 2t
 
t t
LHS > ε 2 n − ε
2

so if the second term in the square bracket equals n then we may factor out n from both
log n
these terms. This basically boils down to setting t = 2 log(1/ε) so that
√ √ √
n t+1 n t n 2 log(1/ε)
log 2
LHS > (2 − 1) > 2 = n
2 2 2

As n goes large, this beats a = δ n and settles the conjecture.

5.3 A special case of Sidorenko’s conjecture


One of the most beautiful conjectures in extremal graph theory is Sidorenko’s conjecture.
To get to it, we need a definition first:

49
Definition 28. A graph homomorphism between graphs H, G is a map φ : V (H) → V (G)
such that whenever uv is an edge in H, φ(u)φ(v) is an edge in G.

Homomorphism capture graph adjacencies at the local level, i.e., for each vertex u the
neighbors of u are mapped to the neighbors of the image of u. The map φ is not required
to be injective, so for instance, there is a homomorphism from K3 to any odd cycle. It is
usually of greater interest to consider isomorphisms between graphs, i.e., injective maps
φ such that φ also preserves non-adjacencies. But if H is small compared to G, then
the non-injective maps are asymptotically far fewer than the injective ones, so homomor-
phisms are easier to study, since to count the number of homomorphisms, one can think
of it in terms of embeddings where for each vertex u ∈ H, the neighbors of u in H are
mapped into neighbors of its image in G.

Let hH (G) denote the number of homomorphisms from H to G. The homomorphism


hH (G)
density of H in G denoted tH (G) is defined as tH (G) := |V (G)|V (H)| .

Sidorenko’s conjecture (also attributed to Erdős and Simonovits) states that for any
bipartite graph H, among all graphs G with edge density p, the random graph G(n, p)
has asymptotically the least number of copies of H. More formally,

Conjecture 29. (Sidorenko): Suppose H = (A, B, E) is bipartite and suppose G is a


graph with edge density p. Then tH (G) ≥ pe(H) .

One way to intuit this is that a random graph tends to ‘spread out’ all the copies
of H so that no conglomeration of the copies of H is possible. While there have been
several attacks on this problem with beautiful results by several researchers, the problem
still remains open. In this section we shall see a beautiful result due to Conlon, Fox and
Sudakov (2010). But before we get to that result, let us quickly see how Sidorenko’s con-
jecture may also be stated in terms of counting homomorphisms instead of dealing with
homomorphism densities. Let |V (H)| = n, e(H) = m, and suppose hH (G) ≥ cH pm N n
holds for all graphs G on N vertices with pN 2 /2 edges. Here cH is a constant that de-
pends only on H. We first observe that this establishes the Sidorenko conjecture for H.

Indeed, suppose tH (G) < pm for some graph G with edge density p. The idea is
to ‘boost’ up the edge density of H in another related graph, by what is called the
‘tensoring trick’. For graphs G1 , G2 , the (weak) product G1 × G2 is the graph on the
vertex set V (G1 ) × V (G2 ) and (u, v) is adjacent to (u0 , v 0 ) if and only if uu0 ∈ E(G1 )
and vv 0 ∈ E(G2 ). The simplicity of this definition leads to many things, but here, the
relevant point is that for any H, tH (G1 × G2 ) = tH (G1 )tH (G2 ). We shall denote by G⊗r
the r-fold product G × · · · × G.
tH (G)
Consider 0 ≤ c = pm
< 1, if possible. Then for any integer r ≥ 1

cH prm ≤ tH (G⊗r ) = tH (G)r = cr pmr = cr (pr )m

50
since pr is the edge density of G⊗r and the assumption about the number of homomor-
phisms of H in any graph. But since c < 1, this yields a contradiction if r is sufficiently
large.

Theorem 30. (Conlon, Fox, Sudakov) Suppose H = (A, B, E) is bipartite with n = a + b


vertices and m edges, and suppose there is a vertex in A (which shall be referred to as a
special vertex) which is adjacent to all the vertices of B. Then for any graph GN with at
2
least pN 2 /2 edges, the number of homomorphisms from H → G is at least (2n)−n pm N n .
Consequently, Sidorenko’s conjecture holds for H.

Proof. Let A = {u1 , . . . , ua } and B = {w1 , . . . , wb }. Let us start with a scheme for
constructing homomorphisms from H to G. One curious aspect of the hypothesis of the
theorem is the assumption about the existence of a special vertex in A. But if we were
to approach this from a constructive perspective, it makes certain things more rigid and
natural: Suppose u1 ∈ A is special. To construct a homomorphism φ one vertex at a
time, we shall first fix the image φ(u1 ) = x1 in G, and then (since we are interested
in homomorphisms which need not be injective) fix a sequence B = (y1 , . . . , yb ) with
yi ∈ N (x1 ) which would act as the image of B under φ, and then finally, choose the
images of the other ui ∈ A.

Suppose we have chosen yi ∈ N (x1 ) that act as φ(wi ). To see if this sequence B is a
good extension to the choice for x1 , let us examine how well this extends to the other ui
as a homomorphism. Each ui picks a subsequence B0 := (yi1 , . . . , yik ) that corresponds
to the neighbors of ui , so one is guaranteed many homomorphism extensions for defining
φ(ui ) if N ∗ (B0 ) is large. Since the neighbors of the ui may be arbitrary subsets of B, a
naturally good choice B is one for which for every subsequence B0 := (yi1 , . . . , yik ), the
set N ∗ (B0 ) is ‘large’. This begs the question: How large is ‘large’ ? Since Sidorenko’s
conjecture posits that the random graph generates the least number of homomorphisms,
it is reasonable to compare the size of the N ∗ (B) with pd(ui ) N since in a random graph
of edge density p, the expected number of common neighbors for a set of size k is pk N .

So, we formally postulate: A sequence B = (y1 , . . . , yb ) is desirable if for each


1 ≤ k ≤ b the subsequence B0 = (yi1 , . . . , yik ) has |N ∗ (B0 )| ≥ αpk N for some small
α that we will pin down later. We define a vertex x ∈ V (G) to be good if the number of
b
desirable sequences B = (y1 , . . . , yb ) with yi ∈ N (x) is at least d(x)
2
. Again, this is not
out of the blue; if x were to act as φ(u1 ) the number of possible sequences of neighbors
of x is at most d(x)b , and we require that at least half of those are desirable sequences.
We denote by Good, the set of good vertices in G.

This gives us a back-of-the-envelope estimate on the number of homomorphisms from


H into G to be at least !
X d(x1 )b Y a
αpd(ui ) N .
x ∈ Good
2 i=2
1

51
To explain this, we first pick x1 to be a good vertex, and fix x1 as the image of the special
u1 . For each desirable sequence (y1 , . . . , yb ) which will serve as (φ(w1 ), . . . , φ(wb )), we
pick choices for the remaining ui . Since B is desirable, there are at least αpd(ui ) N choices
for each ui .

Hence
a
!
X d(x1 )b Y αa−1 pm−b N a−1 X
αpd(ui ) N = d(x1 )b
x1 ∈ Good
2 i=2
2 x1 ∈ Good
a−1 m−b a−1
P b
α p N x1 ∈ Good d(x1 )
≥ N
2 N
!b
αn−1 pm−b N a X
= d(x1 )
2N b x1 ∈ Good

so if we can prove a lower bound of the form


X
d(x1 ) ≥ Ω(pN 2 )
x1 ∈ Good

then we are through.

This sets us a goal, but there is still a lot packed tightly into the notion of what it
means for a vertex to be good. To unspool this a bit, suppose that a vertex x ∈ / Good.
An alternate way to state this is: Suppose B is picked uniformly at random from (N (x))b .
Then the probability that B is not desirable is at most 1/2.

Let B = (y1 , . . . , yb ), and let us fix 1 ≤ i1 < · · · < ik ≤ b, and set B0 = (yi1 , . . . , yik ).
If there exists β > 0 such that P(|N ∗ (B0 )| < αpk N ) < β for all k and choices of (i1 , . . . , ik )
the
P(B is not desirable) < 2b β ≤ 2n−1 β
and if we choose β = 2−n then this contradicts that x ∈
/ Good.

This motivates the following definition. For 1 ≤ k ≤ b, we say that a vertex x is prob-
lematic for k (which we shall denote by x ∼ k) for a positive integer k ≤ b if the number
of subsequences B0 = (y1 , . . . , yk ) ∈ N (x)k with |N ∗ (B0 )| < αpk N is somewhat large, say
at least βd(x)k . By our terminology, x ∈ Good if and only if x is not problematic for k
for each 1 ≤ k ≤ b.

We now invoke the Dependent Random Choice principle: Small subsets of the common
neighborhood of small random sets admit many common neighbors. Let B0 be a random
k-sequence of vertices from V , and let COUNTk be the number of vertices x ∈ V such

52
that B0 ∈ N (x)k and |N ∗ (B0 )| < αpk N . Then
X
E(COUNTk ) = P(B0 ∈ N (x)k and |N ∗ (B0 )| < αpk N ) (5.1)
x∈V
X X  d(x) k
0 k
≥ P(B ∈ N (x) ) ≥ β (5.2)
x∼k x∼k
N
!k
X
≥ βN 1−2k d(x) (5.3)
x∼k

by the convexity of the function f (x) = xk and the definition of x being problematic for
k. On the other hand,

E(COUNTk ) < αpk N (5.4)

since any such x that is counted in this which admits B0 ∈ N (x)k with |N ∗ (B0 )| < αpk N
must necessarily satisfy x ∈ N ∗ (B0 ). Thus we have
 1/k
X α
d(x) < pN 2 .
x∼k
β

Now, if α < β, then summing this over all k ≤ b gives


 1/b
X α
d(x) < b pN 2
β
x∈
/ Good

1 1
X pN 2
so, if we take β = 2n
,α = 4n nn
we have d(x) ≥ and the proof is complete.
x∈ Good
2

5.4 The Balog-Szemerédi-Gowers Theorem


The last section of this chapter deals with a deep result in Additive Combinatorics,
originally due to Balog and Szemerédi, and then was reproved by Gowers with stronger
estimates than in the original. To motivate the statement, we set up some terminology.
For sets A, B ⊂ Z for an ambient abelian group, we mean by A + B (called the sum
set) the set of all elements of the form a + b with a ∈ A, b ∈ B. One of the principal
questions in Additive Combinatorics studies the size of sum sets in interesting abelian
groups. One of the foundational theorems in this direction is Freiman’s theorem which
states that if |A + A| ≤ K|A| for some bounded constant K, then A is a large subset of a
well-structured set, called a Generalized Arithmetic progression, so in that sense the size
of |A + A| measures how structured the set A is.

53
But suppose we only have access to consider pairs of sums for a restricted number of
pairs of A. More precisely, consider a bipartite graph G = GA with both vertex partitions
corresponding to the set A and we only have access to the pairs of sums a + b whenever
ab ∈ E(G). We shall denote by A +G A the set {a + b : a, b ∈ A, ab ∈ E(G)}. How much
information does A +G A capture about |A + A|?

Let |A| = n . If the graph G is sparse, then there is not much hope of gathering
much information about A, so suppose that G is dense, i.e., e(G) ≥ αn2 for some fixed
0 < α < 1. If |A +G A| ≤ cn for some absolute constant c then could we conclude that
|A + A| ≤ c1 n for some c1 ?

A moment’s thought tells us that such a conclusion is too good to be true. Indeed,
suppose R is a random subset of [1, n2 ] where each x ∈ [1, n2 ] is picked uniformly and
independently with probability n1 , say, and let A = [1, n] ∪ R . Let G be the graph
corresponding to the pairs ab with a, b ∈ [1, n]. Then |A +G A| = 2n − 1 andt it is not
hard to see that P(|R + R| ≥ Ω(n2 )) = Ω(1).

However, in this example the set A had a large subset A0 = [1, n] for which |A0 + A0 | =
2|A0 |. This motivates the following modification: If |A +G A| = O(n), then is there
a large structured subset of A? The answer in the affirmative is the substance of the
Balog-Szemerédi-Gowers theorem:

Theorem 31. Suppose 0 < α < 1, c > 0 are reals then there exist c0 , c00 (depending on
c, α) such that the following holds and a constant n0 (c, k) such that the following holds.
Suppose n ≥ n0 and A is a subset of the integers of size n, and suppose that for a graph
G = GA with e(G) ≥ αn2 we have |A+G A| ≤ cn then there exists A0 ⊆ A with |A0 | ≥ c0 |A|
with |A0 + A0 | ≤ c00 n.

We will prove a slight generalization of this for sets A, B and A +G B.

There is a natural injection from paths of length 3 in G to elements in A+B: Suppose


x, x0 , x00 ∈ A +G B and x = a + b0 , x0 = a0 + b, x00 = a0 + b0 . Then a + b = x − x0 + x00 , so
one can associate to the path ab0 a0 b in G, the element a + b ∈ A + B. Hence every path
of length 3 in G corresponds to a unique pair (a, b) which corresponds to an element in
A + B. If we can lower bound the number of paths of length 3 corresponding to each
element in A+B then we have an upper bound on the size of A+B. Since G is dense it is
reasonable to expect many paths of length 3 between most pairs of vertices (but perhaps
not all pairs) so we might need to restrict to subsets A0 and B 0 so that this property holds.

More precisely, suppose we can find A0 ⊆ A, B 0 ⊆ B such that |A0 | ≥ c0 n, |B 0 | ≥ c00 n


and between every pair (a, b) ∈∈ A0 × B 0 there are Ω(n2 ) paths of length 3. Then there

54
are at least Ω(n2 ) triples (x, x0 , x00 ) ∈ (A +G B)3 such that

x − x0 + x00 = a + b,
x = a + b0
x00 = a0 + b

holds for some (a0 , b0 ) ∈ A × B. Since by assumption |A +G B| ≤ cn the number of triples


(x, x0 , x00 ) ∈ (A +G B)3 ≤ c3 n3 , so

X
c3 n 3 ≥ #{(x, x0 , x00 |y = x − x0 + x00 )} (5.5)
y∈A0 +B 0

≥ Ω(n2 )|A0 + B 0 | (5.6)

which gives |A0 + B 0 | = O(n).

So we would be done if we can answer the following question affirmatively:


Question 32. If G = G[A, B] is bipartite, |A| = |B| = n and e(G) ≥ κn2 can we find
subsets A0 ⊆ A, B 0 ⊆ B with |A0 | ≥ c0 n, |B 0 | ≥ c00 n such that for all a ∈ A0 , b ∈ B 0 there
are Ω(n2 ) paths ab0 a0 b ?
κn κn2
Let A1 = {a ∈ A : d(a) ≥ 2
}. Then e(A \ A1 , B) ≤ 2
, so

κn2
n|A1 | = |A1 | · |B| ≥ e(A1 , B) ≥
2
which gives
κn
|A1 | ≥ .
2
Now let us speculate a bit.
Conjecture 33. Suppose e(A, B) ≥ κn2 . There exist absolute constants α, δ > 0 de-
pending only on κ such that the following holds: There exists A0 ⊆ A with |A0 | ≥ α|A|
such that every pair {a, a0 } in A0 has at least δ|B| common neighbors in B.
This conjecture is a natural first-line-of-attack. If conjecture 33 holds, then get
U ⊆ A1 , |U | ≥ α|A1 | such that every pair {a, a0 } in U have at least δ|B| common neigh-
bors; by the arguments outlined earlier, this gives us that |U | ≥ ακn/2.

Now we choose B1 ⊂ B to consist of those vertices with large degree into U - that
would provide many choices for the 3rd edge. If |B1 | = Ω(n), we are through.

Let µ > 0 be a parameter that we shall fix later. Set

B1 := {b ∈ B : d(b, U ) ≥ µ|U |} (5.7)

55
and again exactly as before, since e(B \ B1 , U ) ≤ µ|U |n and e(U, B) ≥ |U |( κn
2
), we have
κ
|U | · |B1 | ≥ e(U, B1 ) ≥ ( − µ)|U |n (5.8)
2
so that
κn
|B1 | ≥ . (5.9)
4
|
We now claim that (U, B1 ) will do the job. Indeed, since each b ∈ B1 has ≥ κ|U 4
|
neighbors in U , b has at least κ|U
4
− 1 neighbors in U \ {a}. For each a0 ∈ N (b) \ {a},
there exist at least δn − 1 common neighbors for a, a0 in B \ {b}, so that there are at least
κ κδ ακδ 2
( |U | − 1)(δn − 1) ≥ |U |n ≥ n (5.10)
4 16 32
paths of length 3 from a to b.

Unfortunately, here is the bombshell; Conjecture 33 is FALSE! For an explicit counter


example, see [18].

How does one salvage this? If this line of argument can still be exploited, the next
natural question in place of Conjecture 33 would be
Question 34. Does conjecture 33 hold if the words ’every pair’ are replaced by ‘most
pairs’? More precisely, suppose κ > 0, and G = G[A, B] is bipartite with edge density
κ. Does there exist subset A0 ⊆ A such that |A0 | ≥ α|A| such that there are at least
(1 − ε)|A0 |2 ordered pairs of A0 , each of which have at least δ|B| common neighbours in
B, for some suitable α, δ, ε depending only on κ?
First, let us see if this weakening still yields the desired outcome. Let U, B be chosen
as before, but with U being the set guaranteed by an affirmative answer to question 34
rather than the erroneous Conjecture 33. To be precise, call (a, a0 ) a pair bad if they have
fewer than δ|B| common neighbors in B. Let U ⊂ A1 such that |U | ≥ α|A1 | and with
at most ε|U |2 bad pairs. Then again as before, b ∈ B1 implies that d(b, U ) ≥ κ4 |U |. But
this time, instead of using U , we refine it further. Let
κ
A0 := {a ∈ U : a is in at most |U | bad pairs}. (5.11)
8
Since the total number of bad pairs in U is at most ε|U |2 , the total number of bad
pairs featuring a vertex in U \ A0 is at least κ8 |U |(|U | − |A0 |), So
κ
ε|U |2 ≥ |U |(|U | − |A0 |)
8
which gives

|A0 | ≥ (1 − )|U | (5.12)
κ
56
So for instance if ε = κ
16
then |A0 | ≥ 21 |U |.

So for (a, b) ∈ A0 × B1 , the number of paths of length 3 from a to b is at least

κ κ κδ ακδ 2
( |U | − 1 − |U |)(δn − 1) ≥ |U |n ≥ n = Ω(n2 ) (5.13)
4 8 32 64
as before with only slightly worse constants. Thus we are just left with proving settling
question (34) in the affirmative. And the good news is

Theorem 35. Suppose 0 < ε < 1 and G = G[A, B] is bipartite with e(G) = κ|A||B|.
There exist constants α, δ > 0 that depend only on κ, ε such that the following holds:
There exists A0 ⊆ A satisfying

• |A0 | ≥ α|A|,

• All pairs (a, a0 ) in A0 except for at most ε|A0 |2 admit at least δ|B| common neighbors
in B.

Proof. We go back to the Dependent Random Choice heuristic: Small subsets of the
common neighborhood of small random sets admit many common neighbors. Pick b ∈ B
at random and let A0 = N (b). Then
1 X
E[|A0 |] = d(B) = κ|A|. (5.14)
|B| b∈B

As before, call a pair (a, a0 ) bad if the number of common neighbors is at most δ|B|, and
let BAD denote the number of bad pairs (a, a0 ) in A. Then
X
E[|BAD|] = P (b is chosen from a set of size at most δ|B|) (5.15)
(a,a0 )∈BAD

≤ δ|A|2 (5.16)

Our goal is to find a b such that |N (b)| = Ω(|A|) and |BAD| ≤ ε|N (b)|2 . Note that by
Cauchy-Schwarz,
1 δ
E(|A0 |2 − |BAD|) ≥ κ2 |A|2 − |A|2 (5.17)
ε ε
εκ2
Setting δ = 2
, we have

1 κ2 |A|2
E[|A0 |2 − |BAD|] ≥
ε 2
so that there exists b ∈ B such that
1 κ2 |A|2
|N (b)|2 − |BAD| ≥
ε 2
57
Thus we have

|BAD| ≤ ε|N (b)|2 = ε|A0 |2 (5.18)


κ|A|
|A0 | = |N (b)| ≥ √ (5.19)
2
which proves theorem(35) and consequently, the Balog-Szemerédi-Gowers theorem.

58
6 The Second Moment Method

When you have a random


variable of interest, and you
can compute its expectation,
you should try to compute the
variance next.

The method of using expectation of random variables is a very useful and powerful
tool, and its strength lies in its ease of computing the expectation. However, in order to
prove stronger results, one needs to obtain results which prove that the random variable
in concern takes values close to its expected value, with sufficient (high) probability. The
method of the second moment, as we shall study here gives one such result which is due
to Chebyshev. We shall outline the method, and illustrate a couple of examples. The
last section covers one of the most impressive applications of the second moment method
- Pippenger and Spencer’s theorem on coverings in uniform almost regular hypergraphs.

6.1 Variance of a Random Variable and Chebyshev’s theorem


For a real random variable X, we define Var(X) := E(X − E(X))2 whenever it exists. It
is easy to see that if Var(X) exists, then Var(X) = E(X 2 ) − (E(X))2 .

Theorem 36 (Chebyshev’s Inequality). Suppose X is a random variable, and suppose


E(X 2 ) < ∞. The for any positive λ,

Var(X)
P(|X − E(X)| ≥ λ) ≤ .
λ2
Proof. Var(X) = E[(X − E(X))2 ] ≥ λ2 P(|X − E(X)| ≥ λ).

The use of Chebyshev’s inequality, also called the Second Moment Method, applies
in a very wide context, and it provides a very basic kind of ‘concentration about the
mean’ inequality. The applicability of the method is most pronounced when the variance
is of the order of the mean, or smaller. We shall see in some forthcoming chapters that

59
concentration about the mean can be achieved with much greater precision in many sit-
uations. What, however still makes Chebyshev’s inequality useful is the universality of
its applicability.

If X = X1 + · · · + Xn , then the following simple formula calculates Var(X) in terms


of the Var(Xi ). For random variables X,Y, define the Covariance of X and Y as
Cov(X, Y ) := E(XY ) − E(X)E(Y ).
For X = X1 + · · · + Xn , we have
X X
Var(X) = Var(Xi ) + Cov(Xi , Xj ).
i i6=j

This is a simple consequence of the definition of Variance


P and Covariance. In particular,
if the Xi ’s are pairwise independent, then Var(X) = i Var(Xi ).

The (usually) difficult part of using the second moment method arises from the diffi-
culty of calculating/estimating Cov(X, Y ) for random variables X, Y . One particularly
pleasing aspect of the second moment method is that this calculation becomes much sim-
pler if for instance we have pairwise independence of the random variables which is much
weaker than the joint independence of all the random variables.

The preceding example illustrates one important aspect of the applicability of the second
moment method: If Var(Xn ) = O(E(Xn )) and E(Xn ) → ∞ then Chebyshev’s inequality
gives
P(|Xn − E(Xn )| > εE(Xn )) = o(1).
In particular, Xn is ‘close to’ E(X) with high probability.

6.2 The Erdős-Ginzburg-Ziv theorem: When do we need long


sequences?
Our first application in this section that arises more as an outcome of curiosity, and is in
fact a probabilistic statement.

The Erdős-Ginzburg-Ziv theorem states that every sequence of length 2n − 1 of ele-


ments of Zn contains a subsequence of size n whose sum equals zero. This is best possible
in the sense that the sequence (0n−1 1n−1 ) admits no such zero-sum subsequence. But a
more natural question that arises is: How necessary is the length 2n − 1? In other words,
are there other sequences that look nothing like these and yet need to be significantly long
to witness a zero-sum subsequence of length n? If you had a typical sequence of elements
from Zn , then how long does it need to be to contain a zero-sum subsequence of length n?

The answer, perhaps surprisingly, is that one typically needs much shorter sequences.

60
Theorem 37. Suppose X := (X1 , . . . , Xn+2 ) be a random Zn -sequence, i.e., suppose Xi
are chosen uniformly and independently from Zn . Then with high probability, X contains
a zero-sum subsequence of length n.

set up some notation. For a subset I ⊂ [1, n + 2] we shall denote by XI


Proof. We first P
the sum XI := i∈I Xi . Consider the indicator random variables I(XI ) := 1 if XI = 0
and zero otherwise. Let

H := {I ⊂ [n + 2] : |I| = n},
X
N := I(XI ).
I∈H

Then  
X 1 n+2 (n + 2)(n + 1)
E(N ) = P(XI = 0) = = ,
I∈H
n n 2n
and, X X
V ar(N ) = V ar(I(XI )) + Cov(I(XI ), I(XJ )).
I∈H I6=J
I,J∈H

The main observation is that since Xi ’s are i.i.d, it follows that the XI are pairwise inde-
pendent. Indeed pick i ∈ I \ J and j ∈ J \ I and condition on the values of the random
variables {X` }`6=i,j ; this determines Xi , Xj uniquely, so the conditional (and hence also
the unconditional probability) of XI = XJ = 0 is 1/n2 = P(XI = 0) · P(XJ = 0).

Consequently, Cov(I(XI ), I(XJ )) = 0 for I 6= J ∈ H. Also, Var(I(XI )) = n1 (1 − n1 ), so


 
X 1 1 (n + 2)(n + 1)
V ar(N ) = Var(I(XI )) = 1− .
I∈
n n 2
H

Therefore, by Chebyshev’s inequality we have,


1
(1 − n1 )
 
Var(N ) 2 1
P(N = 0) ≤ P(|N − E(N )| ≥ E(N )) ≤ = 1 2 =O
(E(N ))2 4
(1 + n
)(n + 1) n

which implies that P(N > 0) → 1. This completes the proof.


Remark: The theory of zero-sum problems considers various instances where one is
interested in sequences of elements of an abelian group admitting zero-sum subsequences
with other characteristics. Interestingly, many of these group invariants behave marked
differently for random sequences. For instance, the Davenport constant of a group is the
minimum m such that every sequence of m elements from the group admits a non-trivial
zero sum subsequence. The Davenport constant of the cyclic group Z/nZ is n (easy to see)
whereas its random analogue (as described above) is of the order (1 + o(1)) log2 n. One
also has analogues of these invariants which admit weights, and the random analogues

61
of these analogues are typically much smaller. The only interesting instance where this
is not the case is when we allow weighted sums for the subsequences with weights in
{−1, 1}. For Z/nZ the corresponding weighted Davenport constant is log2 n whereas the
random analogue is (1/2 + o(1)) log2 n. For more such results, see [8].

6.3 Distinct subset sums


For the next application, we need a definition.

P 38. We say a set of positive integers {x1 , x2 , . . . , xk } is said to have distinct


Definition
sums if xi ∈S xi are all distinct for all subsets S ⊆ [k].

For instance, if xk = 2k , then we see that {x1 , x2 , . . . , xk } has distinct sums. Erdős
posed the question of estimating the maximum size f (n) of a set {x1 , x2 , . . . , xk } with
distinct sums and xk ≤ n for a given integer n. The preceding example shows that
f (n) ≥ blog2 nc + 1.

Erdős conjectured that f (n) ≤ blog2 nc + C for some absolute constant C. He was able
to prove that f (n) ≤ log2 n + log2 log2 n + O(1) by a simple counting argument. Indeed,
there are 2f (n) distinct sums from a maximal set {x1 , x2 , . . . , xk }. On the other hand,
since each xi is at most n, the maximum such sum is at most nf (n). Hence 2f (n) < nf (n).
Taking logarithms and simplifying gives us the aforementioned result.
As before, here is a probabilistic spin. Suppose {x1 , x2 , . . . , xk } has distinct sums. Pick
a random subset S of [k] by picking each element of [k] with equal P probability and
independently. This random subset gives the random sum XS := xi ∈S xi . Now
2
E(XS ) = 2 (x1 + x2 + · · · + xk ). Similarly, Var(XS ) = 4 (x1 + x2 + · · · + xk ) ≤ n4k ,
1 1 2 2 2

so by Chebyshev we have

n2 k
P(|XS − E(XS )| < λ) ≥ 1 − .
4λ2

Now the key point is this: since the set has distinct sums and there are 2k distinct subsets
of {x1 , x2 , . . . , xk }, for any integer r we have that P(XS = r) ≤ 21k ; in fact it is either 0
or 21k . This observation coupled with Chebyshev’s inequality gives us

n2 k 2λ + 1
1− 2
≤ P(|XS − E(XS )| < λ) ≤ .
4λ 2k

Optimizing for λ we get

Proposition 39. f (n) ≤ log2 n + 21 log2 log2 n + O(1).

62
6.4 The space complexity of approximating frequency moments
One of the paradigmatic features of the probabilistic method is that it suggests different
perspectives to many problems, and one of the features of probabilistic thinking is to be
more accepting of approximate solutions, provided we have a control on the errors that
accrue. This section features one such result due to Alon, Matias, and Szegedy.

One of the features of the Theory of Complexity is to study efficient handling of re-
sources in various algorithmic computational problems (see [30] for a fantastic overview
of the subject). Usually, the resource that is optimized is run time of an algorithm. In
this section, we look at an optimization for space constraints.

Suppose A = {a1 , . . . , am } is a sequence of elements from [N ] := {1, . . . , N }, and for


each 1 ≤ i ≤ m let mi denote the number of occurrences of the element i in A. Define
for each k ≥ 0
N
X
Fk := mki
i=1
which are referred to as the frequency moments of the sequence. In particular, F0 denotes
the number of distinct members of the sequence, F1 is the number of elements of the
sequence (which is always m) and F2 is called the repeat rate of the sequence and so on.
We also define

F∞ := max mi
1≤i≤n

the most popular element of the sequence A1 . For various reasons, one wishes to com-
pute/estimate these statistics of a given sequence as the provide useful information about
A.

It is straightforward to see that the frequency moments can be efficiently computed if


we maintain a full histogram - keep a counter for each mi as we scan over the data once)
of the data - which requires memory space of size Ω(N ). But the problem of interest here
is to if it can be done efficiently with lesser memory space (i.e. with o(N )) for storage
and processing. More precisely, suppose we are allowed to scan the data once and we
have limited memory. One of the first interesting results is that for accurate computation
of the frequency moments, one cannot improve upon the memory allocation (see [3]). So,
we relax our requirements a little bit. We allow for a relative error in computing the Fi
up to a factor of 1 − λ for some fixed 0 < λ < 1, provided we have control on the error
probability. The feature of this section is the following theorem of Alon, Matias, and
Szegedy [3].
Theorem 40. Suppose k > 0 and 0 <, λ, ε < 1. There is a randomized algorithm that,
given a sequence A = (a1 , . . . , am ) of elements from [N ] computes after scanning the
1
The reason for the ∗ in the definition is that if we adopt the usual `p notation, then F∞ =
limk→∞ (Fk )1/k whereas the Fi are not defined with a k th power.

63
sequence in one pass, a number Y such that the probability that Y deviates from Fk by
more than λFk is at most ε. Most importantly, the algorithm only uses
 
k log(1/ε) 1−1/k
O N (log N + log m)
λ2

memory bits.

Proof. First, note that the statement does not require that we know the size of the
sequence A in advance. But for starters, let us assume that m is known. Since we seek a
randomized algorithm, the key first step is to identify a random variable whose expected
value is the parameter of interest, viz., Fk for each k. A first natural guess is to do the
following. Pick p uniformly from [m] and consider R := |{q ≥ p : aq = ap }|. Since we seek
to estimate Fk , the first natural choice is the random variable X := mRk . But a quick
check reveals why it is not good enough, and also how one can fix it. Indeed, suppose
the element i ∈ [N ] occurs in positions i1 < · · · < iu for some u. The contribution
from the element i towards Fk is uk . However, in computing the expected value of X,
the contribution instead is uk + (u − 1)k + · · · + 1k , so a fix for this would be to let
X = m(Rk − (R − 1)k ). Then

E(X) = (mk1 − (m1 − 1)k ) + ((m1 − 1)k − (m1 − 2)k ) + · · · + (2k − 1k )


+ ((m2 )k − (m2 − 1)k ) + · · · + (2k − 1k ) + · · ·
+ (mkN − (mN − 1)k ) + ((mN − 1)k − (mN − 2)k ) + · · · + (2k − 1k )
= Fk

as desired. Also, if we pre-process the random choice prior to the one pass, then the
number of storage bits needed is at most O(log N + log m) bits that are needed to keep
track of the element ap and the number of occurrences of ap starting from position p in A.

To see how good an estimate X is, we follow the maxim in the epigraph of this chapter:
After you have computed the expectation of a random variable, you should try to compute
the variance. Towards that end, we see

N
!
2 m2 X
E(X ) = (mki − (mi − 1)k )2 + · · · + (2k − 1k )2 + 12k
m i=1
N
!
X
≤ m kmk−1
i (mki − (mi − 1)k ) + · · · + k2k−1 (2k − 1k ) + k12k−1
i=1
XN
≤ km m2k−1
i
i=1
= kF1 F2k−1

64
where we basically use the fact that
k−1
X
k k
a − b = (a − b) ak−i bi ≤ (a − b)kak−1
i=0

for positive reals a > b > 0. To bound this further, let M = max1≤i≤N mi . Then

F1 F2k−1 ≤ F1 M k−1 Fk
N
!(k−1)/k
X
≤ F1 mki Fk
i=1
1−1/k 1/k 2−1/k
≤ N Fk Fk
1−1/k 2
= N Fk

where in the penultimate line we use the power mean inequality:


!1/k
1 X 1 X k
mi ≤ mi .
N i N i

Hence if we were to sample X1 , . . . , Xs independently as above (for some s to be deter-


mined) and then take X to be their mean, then by Chebyshev,
Var(X) N
P(|X − Fk | ≥ λFk ) ≤ 2 2
≤ 2 1/k
λ Fk λ sN

so that if s = N 1−1/(2k) we already have a saving in the number of memory bits since
we still only require O(s(log N + log m)) bits of memory. But one can do better; let
(X1 , . . . , Xs ) be independently sampled according to X as above, and let Y be their
1−1/k
mean - only this time, we take s = Cnλ2 for some constant C, which does not quite
give the high probability estimate we want but rather P(|Y − Fk | > λFk ) ≤ 1/C. But
repeat this process r times (for some r to be determined), and then report the value
Z = Median(Y1 , . . . , Yr ).

random variable Ỹi to equal 1 if Yi ∈ [Fk −λFk , Fk +λFk ] and zero otherwise
Define the P
and let Z̃ := ri=1 Ỹi , so that we can bound tail probabilities of Z̃ by the distribution
of the Binomial variable Bin(r, 1 − 1/C). If Z lies outside [Fk − λFk , Fk + λFk ] then Z̃
is less than r/2. If C = 8, say, one√can again use the Chebyshev bound (we omit these
details) to show that for r = O(1/ ε) one has P(|Z − Fk | > λFk ) ≤ ε.

But again, this is not optimal; the Binomial distribution approximates the Gaussian
random variable for sufficiently large r, so deviation from the expectation is an expo-
nentially unlikely event. A more precise form of this appears in the next chapter which
establishes exponential decay away from the expected value for the Binomial distribu-
tion. Thus, (and these details will be more clear in the next chapter) one can take

65
r = O(log(1/ε)) and the proof is complete.

The last point is to deal with this when m is not known apriori. In that case, start
with m = 1, and choose ap as in the randomized algorithm stated above. If m is not one,
we update m = 2, and replace p = 1 with p = 2 with probability 1/2. More generally,
having reached m0 , if m > m0 then we replace p with m0 + 1 with probability m01+1 . It
is not hard to see that this keeps the argument intact and the implementation still only
needs O(log m + log N ) bits.

Remark: We have throughout assumed implicitly, that m is not much larger than a
polynomial in N , but if m grows, say exponentially with N , then there are older results
that give a similar saving in memory.
The paper [3] includes several other interesting results. For instance, they show that the
space complexity results here are almost best possible: for k ≥ 6, randomized algorithms

need at least Ω(n1−5/k ) memory bits, and that the estimation of F∞ requires Ω(N ) bits.
Another beautiful result is that the estimation of F2 can actually be done with only
O(log N ) memory bits. This uses the fact that there is a simple deterministic construction
of a set (using what are known as BCH codes) of O(N 2 ) {-1,1}-valued vectors of length
N which are four-wise independent, i.e., any 4 of the coordinates are independently
distributed amongst the possible 4-tuples of {−1, P 1}. If v := (v1 , . . . , vN ) is randomly
picked from the above set, define X := hv, Ai = N i=1 vi ai . For the details and other
interesting results, we refer the reader to the paper [3].

6.5 Uniform Dilations


We start with some definitions. By T we mean the 1-dimensional torus, i.e., T := R/Z.
For 0 < ε < 1, a subset X ⊂ T is called ε-dense if it meets every interval in T of length ε.
A dilation of X ⊂ T is nX := {nx : x ∈ X} for some integer n where the multiplication
is performed modulo one, i.e., in T. For any ε > 0, Berend and Peres defined the integer
k(ε) to be the least integer k such that for any X ⊂ T of size at least k, there is some
dilation nX which is ε-dense. Moreover, they showed that

Ω(1/ε2 ) ≤ k(ε) ≤ O(1/ε)O(1/ε)

and posed the question of determining the correct order of k(ε). This was achieved by
Alon and Peres, who proved the following theorem.

Theorem 41. Given γ > 0, there exists ε0 = ε0 (γ) such that for ε < ε0 , every set T ⊂ T
of cardinality at least ε−(2+γ) has an ε-dense dilation nX. In other words,

Ω(ε−2 ) ≤ k(ε) ≤ ε−(2+γ)

for all γ > 0.

66
We will see a proof of this in the special case when X consists entirely of rationals
with the same prime denominator p so that every element in X is of the form x/p. Under
this assumption, we first observe that the problem reduces to considering dilations of
subsets in the finite field Fp . In this case, the lower bound is of th eright order upto a
multiplicative constant.

To state the precise form of the stronger result, suppose p is a prime and ε > 0. De-
fine k(ε, p) to be the minimum integer k such that the following holds. For every subset
X ⊂ Fp of size at least k, there exists n such that nX intersects every interval of size at
least εp. Here, by an interval of length r we mean a set of the form {a, a+1, . . . , a+r −1}.
In words, this states that when a set is of size then some dilate of this set is fairly well-
spread out, and so touches all the intervals of length εp.

Here is the theorem of Alon and Peres [4] in its exact form.
Theorem 42. For every prime p and 0 < ε < 1 for which εp is an integer,
4
k(ε, p) ≤ .
ε2
Proof. We shall omit floors and ceilings in our presentation below for convenience. Let
X := {x1 , . . . , xk } ⊂ F∗p . At first glance, the main issue is that there are p different
intervals of length εp while any dilate aX contains only O(ε−2 ) elements, so it seems
quite a task for the dilate to meet every interval of size εp. But a simple trick makes this
task realistically feasible. Set s = 2/ε and let I1 , . . . , Is be disjoint intervals of length εp
2
that partition Fp . Since each interval I of length εp necessarily contains one of the Ii , it
suffices to show that there exists a ∈ Fp such that |aX ∩ Ii | > 0 for each of these intervals.

As with our previous rules of thumb, it is quite natural to try a random dilate of X.
Let a ∈ Fp be a random element and for a fixed interval Ii in the partition above, observe
that
X εk
E(|aX ∩ Ii |) = P(a ∈ {x/x1 , . . . , x/xk }) = .
x∈I
2
i

As before, to compute the variance, we have


X εk X
E(|aX ∩ Ii |2 ) = P(x, y ∈ aX) = + P(x, y ∈ aX)
x,y∈Ii
2 x6=y
x,y∈Ii

and the last double sum poses a bit of a problem. The brilliant idea of Alon and Peres
was to modify the random process so as to make this sum computable. And to do that,
they consider affine translates of the set as well.

Indeed, pick a, b ∈ Fp independently and consider the set aX + b. The key point is
that if there are a, b such that aX + b meets every interval of size εp, then since translates

67
of intervals are again intervals, the same holds for aX as well! Furthermore, for x 6= y
the events x ∈ aX + b and y ∈ aX + b are pairwise independent, so we have
X εk
E(|(aX + b) ∩ Ii |) = P(x, y ∈ aX + b) =
x∈Ii
2
X
Var(|(aX + b) ∩ Ii |) = Var(1x∈aX+b )
x∈Ii
εk
≤ .
2
Hence by Chebyshev,
2
P((aX + b) ∩ Ii = ∅) <
εk
which implies that
4
P(aX + b ∩ Ii = ∅ for some 1 ≤ i ≤ s) ≤ <1
ε2 k
for k as in the theorem.
Remark: To move from this special case to all intervals in T as in the main theorem
stated at the beginning of this section, the idea is to pick a large enough A, and pick
a ∈ {1, . . . , A} randomly, and b ∈ T uniformly, and consider the affine translate aX + b
as before. As before, we shall fix a partition of T into intervals of size ε/2 and fix such
an interval I. Again, E(|(aX + b) ∩ I| = εk/2, but now computing the variance is a
little more complicated. If we write X = {x1 , . . . , xk } then it is not hard to show that
computing the variance of the aforementioned random variable can be estimated in terms
of the differences xi −xj . The technical ideas deal with how one can control the covariance
terms, and this involves a few more subtleties than the simple result above.
The paper [4] contains several other general results on when one can find dilations that
are ε-dense. For instance, they also show that one can pick a dilate n which is prime for
which nX is ε-dense, and further, if k ≥ ε−(4+γ) then there is an ε-dense dilation of the
form n2 X as well. We direct the interested reader to the paper [4] for other results.

6.6 Resolution of the Erdős-Hanani Conjecture: The Rödl ‘Nibble’


One of the most effective proof techniques in Combinatorics is the method of induction.
How would the method of induction blend with the probabilistic method? How does one
effectively carry out an ‘inductive probabilistic method proof’ ?

The Rödl ‘Nibble’ refers to a probabilistic paradigm (pioneered by Vojtech Rödl) for
a host of applications in which a desirable combinatorial object is constructed via a ran-
dom process, through a series of several small steps, with a certain amount of control over
each step. Subsequently, researchers realized that Rödl’s method can be extended as a

68
paradigm to a host of other constructions, particularly for coloring problems in graphs,
and matchings/coverings problems in hypergraphs. Indeed, the proof of Erdős-Hanani
conjecture - the result that launched the Rödl Nibble - is an instance of a covering prob-
lem of a specific hypergraph. In this section, we shall see a resolution of the Erdős-Hanani
conjecture following a latter simplification by Pippenger and Spencer [22].

We start with a definition. As  always, [n] denotes the set {1, . . . , n}. Suppose r, t ∈ N.
An r-uniform covering for [n] t
is a collection A of r-subsets of [n] such that for each
[n]

t-subset T ∈ t , there exists an A ∈ A such that T ⊂ A. An r-uniform packing for
[n] [n]
 
t
is a collection A of r-subsets of [n] such that for each t-subset T ∈ t
, there exists
at most one A ∈ A such that T ⊂ A.

When t = 1, if r divides n, then there obviously exists a collection A of r-subsets of


[n], |A| = n/r, such that A is both an r-uniform covering and packing for [n]

1
= [n]. In
general, there exists a covering of size dn/re and a packing of size bn/rc.

Let M (n, r, t) be the size of a minimum covering, and m(n, r, t) be the size of a
maximum packing. A simple combinatorial counting argument shows that
n

t
m(n, r, t) ≤ r ≤ M (n, r, t).
t

Indeed, if one were to consider a covering, then for each t-subset, there is at least one
r-subset containing it; conversely, each r-subset contains rt t-subsets, so the first inequal-
ity is obtained. The argument is similar for maximal packings. It then  rseems natural to
n

ask if there exists a collection A of r-subsets of [n] with size |A| = t / t such that A is
both an r-uniform covering and packing for [n]

t
. This is called a t − (n, r, 1) design and
is also referred to as a Steiner t-design.

In the 60s, Erdős and Hanani proved that

M (n, r, 2) m(n, r, 2)
lim n
 r = lim n r = 1.
n→∞
2
/ 2 n→∞
2
/ 2

and further conjectured that this is true for all positive integers r ≥ t. In a sense, the
conjecture posits that as n grows large, one gets more room to attempt to fit these r-
subsets to cover all t-subsets, so as n gets larger, one ought to be able to get closer to
as tight a packing (or covering) as one can. This conjecture was settled affirmatively by
Vojtech Rödl in 1985.

Here, we consider a more general problem. Suppose r ≥ 2 is a fixed integer. By an


r-uniform hypergraph we mean a hypergraph H = (V, E) on a set V of size n such that

69
each e ∈ E has size r. The degree of a vertex in a hypergraph is the same as the one we
have encountered in the graph case, i.e., d(x) = |{E ∈ E : E 3 x}|. Given an r-uniform
hypergraph H on n vertices which is D-regular for some D, i.e. d(x) = D for all x ∈ V ,
we seek a covering (resp. a packing) of H which is as tight as possible, i.e., a covering
(resp. packing) of size approximately n/r. This more general question subsumes the
[n]

Erdős-Hanani question: consider the hypergraph H = (V, E) where V = t and the
edges of H correspond to r-subsets of [n] with each such r-subset E containing  all the
r
vertices x that correspond to t-subsets of E. It is easy to see that this is an t -uniform
regular hypergraph with degree D = n−t r−t
.

Let ε > 0. Note that in this new formulation, if we can find a packing of size (1−ε)n
r
,
then there are at most εn vertices uncovered. Hence, we can find a covering of size
(1−ε)n
r
+ εn = (1 − (r − 1)ε) nr . On the other hand, if we can find a covering A of size
(1+ε)n
r
, then for every x which is covered by d(x) hyperedges, we delete d(x) − 1 of them.
The number of deleted edges is at most
 
X X X (1 + ε)n
(d(x) − 1) = d(x) − 1 = |{(x, E) : E ∈ A}| − n = · r − n = εn
x∈V x∈V x∈V
r

(1+ε)n
so there is a packing of size at least r
− εn = (1 − (r − 1)ε) nr . The upshot is:

n
Finding a covering of size approximately r
is equivalent to finding a packing of size
approximately nr .

We shall try to obtain a covering A of size ≤ (1 + ε) nr for n sufficiently large. Since


we seek a covering, we do not have to worry if some vertex is covered more than once.

Let us try a simple probabilistic idea first to see what its shortcomings are. Suppose
we decide to pick each edge of the hypergraph H independently with probability p. We
seek a collection E ∗ with |E ∗ | ≈ nr ; if we start with an almost regular graph of degree
D, then r|E| ≈ nD, so that implies that we need p ≈ D1 . Let us see how many vertices
get left behind by this probabilistic scheme. A vertex x is uncovered only if every edge
containing it is discarded. In other words, the probability that a vertex x gets left behind
is approximately (1 − 1/D)D ≈ 1/e. This is now a problem because this implies that the
expected number of vertices that go uncovered is approximately n/e which is a constant
proportion of the total number of vertices.

Rödl’s idea was to, as we described in the beginning of this section, attempt an inductive
procedure: We pick a small number of edges, so that the rest of H is as ‘close as possible’ to
the original one. If the inductive procedure were to take over for the modified hypergraph,
then after several “nibbles” into the hypergraph we are left with a very small proportion
of the vertex set that is yet uncovered. But for these, we pick an arbitrary edge for each
of these vertices to get a covering for the entire vertex set.

70
However, note that after each step, some of the regularity conditions of the hyper-
graph are bound to be violated, so for the inductive procedure to apply to the smaller
hypergraph the hypotheses would have to be milder. We will get to this point momen-
tarily.

As H is D-regular, r|E| = |{(x, E) : x ∈ E ∈ E}| = nD ⇒ |E| = nD r


. Since our
modified hypergraph ought to be as close as possible to the original hypergraph, we need
to pick a few - but not too few! - edges in the first step. If we want to pick about εnr
edges, we will need P(E is picked) = Dε .

This sets our paradigm into motion. In the first ‘step’, each edge E ∈ E is picked
independently with probability p = Dε . If E ∗ is the set of chosen edges then we have

εn
E[|E ∗ |] = .
r

Also, the probability that a vertex x is not covered in this process is (1 − ε/D)d(x) ≈ e−ε .

In the rest of this section, and also in subsequent chapters, we shall adopt Pip-
penger and Spencer’s wonderfully terse notation. We shall write x = a ± b to mean
x ∈ (a − b, a + b). Also, since there will be many constants that keep popping up, we
shall throw in new variables to denote various small quantities which can all be tied down
eventually, if need be.

Getting back to our first step, after a ‘nibble’, the rest of the hypergraph is no longer
regular, so as mentioned earlier, we need to make the hypotheses milder, so we propose:
Given an r-uniform hypergraph H on n vertices such that d(x) = D(1 ± δ) for all x ∈ V
for some small δ > 0, we want to find a covering of size ≈ nr . For this to work, our first
step necessarily has to reduce the degrees of all the vertices that are not covered during
round one, and that is still a little too strong. So, the hypothesis needs to be milder:

Given an r-uniform hypergraph H on n vertices such that except at most δn vertices,


d(x) = D(1 ± δ) for other vertices x ∈ V .

So under the milder hypothesis we wish to find a collection of edges E ∗ such that

1. |E ∗ | = εn
r
(1 + δ 0 ),
 S 
2. |V ∗ | = ne−ε (1 ± δ 00 ) where V ∗ := V \ E ,
E∈E ∗

3. For all x ∈ V ∗ except at most δ 000 |V ∗ | of the vertices, if d∗ (x) denotes its degree in
the residual hypergraph then d∗ (x) = D(1 ± µ).

71
To explain this requirement, let 1x = 1{x∈/ any edge of E ∗ } . If each edge is picked indepen-
dently with probability ε/D then
X ε d(x)  ε D(1±δ)
E(|V ∗ |) = 1− ≈ n(1 − δ) 1 − ≈ n(1 − δ)e−ε(1±δ) ≈ ne−ε (1 ± δ 00 ).
x∈V
D D

Furthermore,
X 
Var(|V ∗ |) = Var 1x
x∈V
X X
= Var(1x ) + Cov(1x , 1y )
x∈V x6=y

If d(x) = D(1 ± δ) and d(y) = D(1 ± δ), then

Cov(1x , 1y ) = E[1x,y ] − E[1x ]E[1y ]


 ε d(x)+d(y)−d(x,y)  ε d(x)+d(y)
= 1− − 1−
D  D 
 ε d(x)+d(y)  ε −d(x,y)
= 1− 1− −1
D D
ε
≈ e−2ε(1±δ) (e− D d(x,y) − 1)

where d(x, y) denotes the codegree of x and y, i.e., d(x, y) = |{E : x, y ∈ E}|.
ε
Note that e− D d(x,y) − 1 is very small provided that d(x, y)  D.  This r−t
is true in
[n] n−t
the original Erdős-Hanani problem, where V = t , since D = r−t = O(n ), while
1 ∪T2 |
d(x, y) = n−|T ≤ n−t−1
 
r−|T1 ∪T2 | r−t−1
= O(nr−t−1 )  D, where x and y corresponds to t-subsets
T1 and T2 respectively.

Before we make our speculation outright, there is one more aspect that suggests that
the hypothesis should be made milder. Suppose that for some vertex x the degree of
x is super large, i.e., D = o(d(x)). Since we wish to retain the r-uniformity of the
hypergraphs, our process would entail throwing away all edges that intersect some vertex
of V \ V ∗ to get to the modified hypergraph. But if d(x) is very large, then it is somewhat
likely that some edge containing x is chosen, and since x would get picked, all edges that
contain x will have to be discarded to get to the residual hypergraph, and we may lose
too many edges in this process. So, to prevent this, we may want d(x) = O(D) for all x.
This motivates the following tentative statement:
Lemma 43. (‘Nibble’ lemma) Suppose r ≥ 2 is a positive integer, and k, ε, δ ∗ > 0 are
given. Then there exist δ0 (r, k, ε, δ ∗ ) > 0 and D0 (r, k, ε, δ ∗ ) such that for all n ≥ D ≥ D0
and 0 < δ ≤ δ0 , if H is an r-uniform hypergraph on n vertices satisfying
(i) except at most δn vertices, d(x) = D(1 ± δ) for other vertices x ∈ V ,

72
(ii) d(x) < kD for all x ∈ V ,

(iii) d(x, y) < δD,

then there exists E ∗ ⊂ E such that

(a) |E ∗ | = εn
r
(1 ± δ ∗ );

 S 
(b) |V ∗ | = ne−ε (1 ± δ ∗ ), where V ∗ = V \ E ;
E∈E ∗

(c) Except at most δ ∗ |V ∗ | vertices, d∗ (x) = De−ε(r−1) (1 ± δ ∗ ), where d∗ is the degree on


the induced hypergraph on V ∗ .

We say that H is an (n, k, D, δ)-hypergraph when (i), (ii) and (iii) hold for H.
This lemma says that if H is an (n, k, D, δ)-hypergraph then it contains an induced
(n∗ , k ∗ , D∗ , δ ∗ ) -hypergraph H∗ where

δ∗ = δeε(r−1) ,
n∗ = ne−ε (1 ± δ ∗ ),
k∗ = keε(r−1) ,
D∗ = De−ε(r−1) (1 ± δ ∗ )

To see why these are the relevant new parameters, consider for instance, the parameter
δ;
δ ∗ De−ε(r−1)
d∗ (x, y) ≤ d∗ (x, y) < δD = δ ∗ D∗ forces δ = = δ ∗ e−ε(r−1)
D
and that gives δ ∗ = δeε(r−1) . Similarly for the parameter k, d∗ (x) ≤ d(x) < kD = k ∗ D∗
forces k ∗ = De−ε(r−1)
kD
= keε(r−1) .

Let us see if this lemma is good enough to take us through. If we repeat the nibble
t times (where t shall shortly be determined) then we have δ = δ0 < δ1 < · · · < δt with
δi = δi−1 eε(r−1) , and H = H0 ⊃ H1 ⊃ · · · ⊃ Ht . Note that this establishes a cover of size
t−1
P
|Ei | + |Vt | where
i=1
i
Y
−ε −εi
|Vi | = |Vi−1 |e (1 ± δi ) ≤ ne (1 + δj )
j=1

and
i
ε|Vi−1 | εne−ε(i−1) Y
|Ei | = (1 ± δi ) ≤ (1 + δj ),
r r j=1

73
so the size of the cover is
t−1 t−1
! t t
X X εne−ε(i−1) Y
−εt
Y
|Ei | + |Vt | ≤ (1 + δi ) + ne (1 + δi )
i=1 i=1
r i=1 i=1
t
! t
!
Y n X
= (1 + δi ) εe−ε(i−1) + re−εt
i=1
r i=1
t
!  
Y n ε −εt
≤ (1 + δi ) + re .
i=1
r 1 − e−ε

Pick t such that e−εt < ε - for instance take t = 2ε−1 log(1/ε). For this t, pick δ small
Qt ε
enough such that (1 + δi ) ≤ 1 + ε. Since lim = 1, the limit of this expression
i=1 ε→0 1 − e−ε
goes to nr as ε → 0. Therefore, all that remains is to prove the ‘Nibble’ Lemma 43.

Proof. (Proof of Lemma 43) We will use subscripts δ(i) to denote various small constants.
Keeping with the probabilistic paradigm, we pick each edge of H independently with
probability Dε . Let E ∗ be the set of picked edges.

We say x ∈ V is good if d(x) = (1 ± δ)D, else we say that x is bad. Note that

|{(x, E) : x ∈ E}| ≥ |{(x, E) : x good}| > (1 − δ)D · (1 − δ)n = (1 − δ)2 Dn.

On the other hand,

|{(x, E) : x ∈ E}| = |{(x, E) : x good}| + |{(x, E) : x bad}|


≤ (1 + δ)D · n + kD · δn.

So
(1 − δ)2 Dn Dn
≤ |E| ≤ (1 + (k + 1)δ) which gives
r r
Dn
|E| = (1 ± δ(1) ).
r
Hence X ε Dn εn
E[|E ∗ |] = P(E is picked) = (1 ± δ(1) ) = (1 ± δ(1) ).
E∈E
D r r
Let 1E = 1{E is picked} . By independence, Var(|E ∗ |) = Var(1E ) ≤ E[|E ∗ |]. By Cheby-
P
 E∈E
 ∗ |)
shev’s inequality, we get P |E | − E[|E |] > δ(2) E[|E |] ≤ δVar(|E
∗ ∗ ∗
2 E[|E ∗ |]2 . So if n  0, then
(2)

εn εn
|E ∗ | = (1 ± δ(1) )(1 ± δ(2) ) = (1 ± δ ∗ )
r r
74
with high probability, yielding (a).

Let 1x = 1{x∈/ any edge of E ∗ } . Note that


X ε d(x) X  ε D(1+δ)
E[|V |] = 1− ≥ 1− ≥ e−ε (1 − δ(3) ) · (1 − δ)n.
x∈V
D x good
D

On the other hand,


X  ε d(x) X  ε d(x)
E[|V ∗ |] = 1− + 1−
x good
D x bad
D
X  ε D(1−δ)
≤ 1− + δn
x good
D
≤ e−ε (1 + δ(4) ) · n + δn

So
ne−ε (1 − δ(3) )(1 − δ) ≤ E[|V ∗ |] ≤ ne−ε (1 + δ(4) + δeε )
implying
E[|V ∗ |] = ne−ε (1 ± δ(5) ).

Again, as we compute the variance, we see


X X X
Var(|V ∗ |) = Var(1x ) + Cor(1x , 1y ) ≤ E[|V ∗ |] + Cor(1x , 1y )
x∈V x6=y x6=y

where

Cov(1x , 1y ) = E[1x,y ] − E[1x ]E[1y ]


 ε d(x)+d(y)−d(x,y)  ε d(x)+d(y)
= 1− − 1−
D  D 
 ε d(x)+d(y)  ε −d(x,y)
= 1− 1− −1
D D
 
ε −δD
≤1· 1− − 1 ≤ eεδ − 1 which is small.
D

This implies Var(|V ∗ |) = o(E[|V ∗ |]2 ). By Chebyshev’s inequality, we get


  Var(|V ∗ |)
P |V ∗ | − E[|V ∗ |] > δ(6) E[|V ∗ |] ≤ 2 .
δ(6) E[|V ∗ |]2

So if n  0, |V ∗ | = ne−ε (1 ± δ(5) )(1 ± δ(6) ) = ne−ε (1 ± δ ∗ ) with high probability, yielding


(b).

75
To prove (c), suppose x survives after the removal of E ∗ . Fix an E ∈ E such that
E 3 x. We wish to estimate the probability that E also survives conditioned on the
assumption that x survives. Let FE = {F ∈ E : x ∈/ F, F ∩ E 6= ∅}. Then E survives if
and only if FE ∩ E ∗ = ∅.

Call E ∈ E bad if E contains at least one bad vertex. Suppose x is good, and E is
good. Then
r−1
 ε (r−1)(1±δ)D−( 2 )δD
P(E survives | x survives) = 1 − (1 ± δ(7) )
D
 ε (r−1)D
= 1− (1 ± δ(8) ).
D
Let Bad(x) := {E : E is bad and does not contain x}. If |Bad(x)| < δ(9) D, then
E[d∗ (x)] = De−ε(r−1) (1 ± δ(10) ).

Now, the question is: how many x have |Bad(x)| ≥ δ(9) D? Call x Incorrigible if x
is good but |Bad(x)| ≥ δ(9) D. We now want to bound the size of VINCOR := {x ∈ V :
x is incorrigible}. Note that
|{(x, E) : |Bad(x)| ≥ δ(9) D}| ≥ δ(9) D · |VINCOR |.
On the other hand,
|{(x, E) : |Bad(x)| ≥ δ(9) D}| ≤ |{(x, E) : E is bad}|
≤ r|{(x, E) : x is bad}|
≤ r(kD)(δn).
r(δn)k
Hence, |VINCOR | := δ ∗ n = δ(9)
. Therefore, except at most δ ∗ n vertices, the remaining
vertices x satisfy E[d∗ (x)] = De−ε(r−1) (1 ± δ(10) ).

Let 1E = 1{E survives} . For those x that are neither incorrigible nor bad,
X X
Var(d∗ (x)) = Var(1E ) + Cov(1E , 1F )
E∈E E6=F
X

≤ E[d (x)] + Cov(1E , 1F ) + δ(9) D · (1 + δ)D · 1
E6=F good
X X

≤ E[d (x)] + Cov(1E , 1F ) + Cov(1E , 1F )
E6=F good E6=F good
E∩F ={x} |E∩F |>1

+ δ(9) (1 + δ)D2
X
≤ E[d∗ (x)] + Cov(1E , 1F ) + (r − 1)δD · (1 + δ)D · 1
E6=F good
E∩F ={x}

+ δ(9) (1 + δ)D2 .

76
Now, denote by FE the collection of those edges that intersect E non-trivially. Then,

Cov(1E , 1F ) = E[1E,F ] − E[1E ]E[1F ]


 ε |FE ∪FF |  ε |FE |+|FF |
= 1− − 1−
D  D 
 ε  |F E |+|F F | ε −|FE ∩FF |
= 1− 1− −1
D D
 ε −|FE ∩FF |
≤ 1− −1
D
 ε −(r−1)2 δD 2
≤ 1− − 1 ≤ eε(r−1) δ which is small.
D
All these together imply Var(d∗ (x)) = o(E[d∗ (x)]2 ). By Chebyshev’s inequality, d∗ (x) =
De−ε(r−1) (1 ± δ ∗ ) with high probability.

Let N = |{x good : d∗ (x) 6= e−ε(r−1) D(1 ± δ ∗ )}|. Now Markov’s inequality gives
E[N ] < δ(11) n, so all except δ ∗ n vertices satisfy (c). This completes the proof of the
Nibble lemma, and hence of the proof of the Erdős-Hanani conjecture as well.
Remark: The theory of Steiner designs is one of the oldest problems in Design theory.
The existence and explicit constructions of Steiner 2-designs for all feasible parameters
(parameters (n, r) for which the corresponding numbers are integers) and very large set
sizes (n  0) is the pioneering work of R. Wilson [31, 32, 33] beginning in the early
70s. The problem of existence of Steiner t-designs for t ≥ 6 was open completely until
P. Keevash [17] in 2014 settled this by a tour-de-force algebraic probabilistic argument.
Keevash’s proof is a little too involved for us to include in this book, but we will see some
relevant ideas in later chapters.

77
7 Basic Concentration Inequalities - The
Chernoff Bound and Hoeffding’s inequality

It is often the case that the random variable of interest is a sum of independent ran-
dom variables. In many of those cases, the theorem of Chebyshev is much weaker than
what can be proven. Under reasonably mild conditions, one can prove that the random
variable is tightly concentrated about its mean, i.e., the probability that the random vari-
able is ‘far’ from the mean decays exponentially, and this exponential decay is crucial in
several probabilistic applications.

The distribution of the sum of i.i.d random variables, suitably normalized, behaves
like the Standard Gaussian; that is the import of the Central Limit Theorem (CLT for
short) in Probability, so in that sense, the Chernoff bound has its antecedents from much
earlier - indeed this goes back to Laplace. But the CLT is a limiting theorem, whereas
the Chernoff bounds are not. This qualitative difference is also very useful from an algo-
rithmic point of view.

In this chapter, we consider a few prototypes1 of such results along with some combi-
natorial applications.

7.1 The Chernoff Bound


One of the first such results is the following version of what we shall call a Chernoff type
bound or simply, a Chernoff bound:

Proposition 44 (Chernoff Bound). Let Xi ∈ {−1,P 1} be independent random variables,


with P[Xi = −1] = P[Xi = 1] = 12 , and let Sn = ni=1 Xi . For any a > 0 and any n,
2
P[Sn > a] < e−a /2n .
1
Actually it is a misnomer to call them Chernoff bounds because these also date back to Chebyshev.
But they were independently discovered by Chernoff, and the name has stuck since.

79
Proof. Consider eλXi , with λ to be optimized. Then E[eλXi ] = (eλ + e−λ )/2 = cosh(λ).
Taking the Taylor expansion, we see that
∞ ∞
 X λ2k X (λ2 /2)k 2
E eλXi = = eλ /2

<
k=0
(2k)! k=0 k!
Since the Xi are independent,
 P  Y 2
E eλSn = E e λXi = E[eλXi ] = cosh(λ)n < eλ n/2
 
i

By Markov’s Inequality,
 E[eλSn ] 2
P eλSn > eλa ≤ < eλ n/2−λa

eλa
2
Since P[Sn > a] = P[eλSn > eλa ], we see that P[Sn > a] < eλ n/2−λa . Optimizing this
2
bound by setting λ = a/n, we see that P[Sn > a] < eλ n/2 , as desired.
Proposition 44 can be generalized and specialized in various ways. We state two such
modifications here.
Proposition 45 (Chernoff Bound (Generalized Version)). Let p1 , . . . , pn ∈ [0, 1], and let
Xi be independent random variables such that PnP[Xi = 1 − pi ] = pi and P[Xi = −pi ] =
1 − pi , so that E[Xi ] = 0 for all i. Let Sn = i=1 Xi . Then
2 /n 2 /n
P[Sn > a] < e−2a and P[Sn < −a] < 2e−2a
Letting p = n1 (p1 + . . . + pn ), this can be improved to
2 /pn+a3 /2(pn)2
P[Sn > a] < e−a
Proposition 46 (Chernoff Bound (Binomial Version), see [16]). Let X ∼ Binomial(n, p),
and let 0 ≤ t ≤ np. Then
−t2
 
P[|X − np| ≥ t] ≤
2(np + t/3)
2 /3np
and the last expression is at most 2e−t if t ≤ np.
In all three cases, the independence assumption can be removed while preserving the
exponential decay (although with a worse constant).

Before we move on to some applications, we make a quick remark. While the afore-
mentioned version of the Chernoff bound holds always, its efficacy, especially when we
wish to establish that some event occurs with high probability only works when np → ∞.
If p = O(1/n) so that np = O(1) then this bound does not work as well. And this is again
an observation that goes back to Poisson; the Binomial distribution, suitably normalized,
can be well approximated by the standard Gaussian when the expected value goes to
infinity with n, and if the expected value is bounded by a constant, then for large n, the
behavior is more like the Poisson. We will return to this point in a later chapter.

80
7.2 First applications of the Chernoff bound
We start with a return to a result from a previous chapter. Recall the randomized
algorithm to determine the frequency moments using a sub-linear number of bits. We had
a sequence of random variables (Y1 , . . . , Yr ) with E(Yi ) = Fk and P(|Y −Fk | > λFk ) ≤ 1/8.
Our final report was the random variable Z = Median(Y1 , . . . , Yr ) and our interest was
in obtaining a bound for r such that P(|Z − Fk | > λFk ) ≤ ε. Towards that end, we
had defined the random variable Ỹi to equal 1 if Yi ∈ [Fk − λFkP , Fk + λFk ] and zero
r
otherwise, so that Ỹi is distributed as Ber(7/8). Setting Z̃ := i=1 Ỹi allows us to
estimate this probability by the bounds of a Binomial random variable Bin(r, 7/8). If
Z ∈/ [Fk − λFk , Fk + λFk ], then Z̃ is√ less than r/2. The second moment method gives
P(|Z − Fk | > λFk ) ≤ ε for r = O(1/ ε). Instead, if we use the Chernoff bound, then we
have  
−9r
P(|Z − Fk | > λFk ) ≤ P(Z̃ < r/2) ≤ exp
126
1

so that we can, as claimed earlier, take r = O log( ε ) to get the same probability bound
as stated.

7.3 Discrepancy in hypergraphs


The notion of Discrepancy is the topic of very deep study and there are even a few mono-
graphs dedicated to this (see []). Here, we shall contend ourselves with a very basic result.
Given a hypergraph H = (V, E) and a 2-coloring c : V (P H) → {−1, 1}, the discrepancy
of an edge E for the coloring c is simply discc (E) := | v∈E c(v)|. The discrepancy of
the coloring c is the maximum discrepancy that c produces amongst the edges of H,
i.e., disc(c)) := maxE∈E discc (E)}. The discrepancy of the hypergraph is the minimum
discrepancy amongst all two colorings of H, i.e., disc(H) := minc disc(c). In words, the
discrepancy of a hypergraph attempts to measure how equitably one can partition the
vertex set into two parts in the sense that that every edge of the hypergraph gets parti-
tioned as equitably as possible. One of the most celebrated results in Discrepancy theory
is the theorem of Spencer that √ states that a hypergraph on n vertices and O(n) edges
has a discrepancy of order O( n). This was subsequently given an algorithmic proof by
N. Bansal [9] and several others, with the current ‘book proof’2 due to T. Rothvoß [25].

But here, we shall prove a much more modest statement, which is also the starting
point for many of the improved results in this direction.
Proposition
√ 47. Suppose H is a hypergraph on n vertices and m edges. Then disc(H) =
O( n log m).
Proof. To show an upper bound on the discrepancy, we need to exhibit a coloring c for
which the discrepancy is small. Pick a random coloring, i.e., for each v assign it the color
2
This was Erdős’ notion of the best/cleanest possible proof of any result. For more on this, see [1].

81
P
1 or −1 independently. For an edge E with |E| = k, let XE := v ∈ Ec(v). Then
2
E(XE ) = 0, so √ using the Chernoff bound to decree P(|XE | > t) ≤ 2 exp−t /2k < 1/m
suggests t = O( n log m) since k ≤ n. By choice, this implies that the expected number
of edges with discrepancy greater than t is less than one, so again by the method of
expectations(!), there is a coloring c such that all the edges discrepancies are at most t.
This completes the proof.

Remark: It is not hard√ to show that there are hypergraphs with n vertices and n edges
with a discrepancy of Ω( n), so the result of Spencer is asymptotically tight. But as we
have already seen with the Rödl nibble method, some probabilistic constructions cannot
achieve the desired goal in a single step process, and the proofs for the sharp discrepancy
follow a random process. We will address random processes in a later chapter.

7.4 Projective Planes and Property B


Given a hypergraph H = (V, E), we say that H has property B if there exists S ⊆ V such
that for all E ∈ E both S and S intersect E. This is an extension of the notion of graph
colorings to hypergraphs, where the notion of proper coloring now (considering that for
hypergraphs one has more than one interpretation of what a proper coloring ought to be)
is that there is no monochromatic edge.

Unlike the graph case where there is a very simple algorithmic characterization of
2-colorability, the problem of deciding when a hypergraph has property B is far from well
understood. Indeed, one of Erdős/ oldest and returning motifs was to determine m(n)
the minimum number of edges in an n-uniform hypergraph that does not have property B.

One of the earliest observations regarding property B was the following due to Lovász,
which effectively comes from an algorithmic procedure to attempt to 2-color the vertices:
If H is such that |E1 ∩ E2 | =
6 1 for all pairs of distinct edges E1 , E2 , then H is 2-colorable,
and therefore has property B. Indeed, number the vertices 1, . . . , n. Color each vertex,
in order, avoiding monochromatic edges. It is easily seen that by the assumptions on
H, this must yield a valid coloring. So for now, let us work with a situation where the
hypergraph violates this condition in the extreme, i.e., suppose that H has the property
that every pair of edges meet at exactly 1 vertex. Examples of such hypergraphs arise
from the projective planes which we have encountered in Chapter 1. The Fano Plane,
shown here with each edge represented as a line, shows that such hypergraphs are not
necessarily 2-colorable. Following Erdős, we now define a stronger version of property B,
which we will refer to as Property B(s).

Definition 48 (Property B(s)). A hypergraph H = (V, E) has property B(s) if there


exists S ⊆ V such that for every edge E, 0 < |E ∩ S| ≤ s.

82
If we set s = n − 1, then for n-uniform hypergraphs, property B(s) is the same as the
usual property B.

Recall the notion of a Projective Plane of order n, denoted by ¶n : It is an (n + 1)-


uniform hypergraph ¶n := (P, L) with |P| = n2 + n + 1 (called points), |L| = n2 + n + 1
(called lines) such that every pair of points determines a unique edge, and every pair of
lines intersect in a unique point.
Theorem 49 (Erdős, Silverman, Steinberg). There exist constants k, K such that for
all n there exists a subset S of the points of ¶n with k log n ≤ |L ∩ S| ≤ K log n for all
L ∈ L.
f (n)
Proof. Choose S at random, with each point x placed in S with probability p = n+1
, for
some f (n) to be determined later.

Fix a line L, and let SL = |S ∩ L|. Note that E[SL ] = (n + 1)p = f (n). By the
Chernoff Bound, P[|SL − f (n)| > f (n)/2] < 2e−f (n)/12 . Since ¶n contains n2 + n + 1 lines,
P[There exists L such that |SL − f (n)| > f (n)/2] < 4e−Cf (n) n2
for some absolute constant C. Therefore, if eCf (n) > Ω(n2 ), a set S with the desired
property exists. This in turn tells us that setting f (n) = O(log n) gives us the stated
result for sufficiently large n.
Remark: Erdős conjectured that for the projective planes, a much stronger state-
ment holds: There exists an absolute constant s such that for all sufficiently large n, the
projective plane of order n has property B(s).

The problem of determining m(n) which was alluded to earlier remains one of the
most elusive problems in extremal combinatorics. We will, later in this book, see a proof
of the statement r 
n n
Ω 2 ≤ m(n) ≤ O(n2 2n )
log n
which still marks the best known result to date.

83
7.5 Graph Coloring and Hadwiger’s Conjecture
IN this section we see a counterexample to a conjecture of Hájos in an attempt to solve
the famous Hadwiger conjecture. To get there, we first need a couple of definitions. For
an edge e = uv in a graph G, the contraction of e is a graph denoted G/e obtained by
deleting the vertices u, v and replacing it with a new vertex ve which is adjacent to all
the neighbors of u and v counting with multiplicity. In other words, if a vertex w was
adjacent to both u, v in G, then ve has two edges to w in G/e.

Definition 50 (Graph Minor). Given a graph G, H is a minor of G if H can be obtained


from G by a sequence of edge deletions edges and edge contractions and deletions of
isolated vertices.

Definition 51 (Subdivision). A graph H is a subdivision of G if H can be made isomor-


phic to a subgraph of G by inserting vertices of degree 2 along the edges of H.

One can think of H as a subgraph of G in which disjoint paths are allowed to act as
edges. Note that if H is a subdivision of G, then H is also a minor of G; however, the
converse is false in general.

The deep conjecture of Hadwiger is the following

Conjecture 52 (Hadwiger’s Conjecture). Let G be a graph with χ(G) ≥ p. Then G


contains Kp as a minor.

Paraphrasing, Hadwiger’s conjecture states that for a graph G to be p-colorable, the


clique on p vertices is forbidden as a minor. The best known result towards settling
Hadwiger’s conjecture is the celebrated Robertson-Seymour Theorem on graph minors
[24]which shows that p-colorability is characterized by a finite set of forbidden minors.

Hadwiger’s Conjecture is notoriously diffcult. Indeed, the special case of p = 5is


equivalent to the four color theorem for planar graphs3 . One way is straightforward: if
χ(G) ≥ 5 then by the conjecture G contains K5 as a minor and is therefore nonplanar as
a consequence of Kuratowski’s theorem (see [29]). But the other way needs more work.
The conjecture is currently open for p > 6. In fact, all known proofs of Hadwiger’s con-
jecture for p = 5, 6 use the four-color theorem.

Due to the apparent difficulty of Hadwiger’s Conjecture, Hajós strengthened the con-
jecture to state that if χ(G) ≥ p, then G contains Kp as a subdivision. But as it is usually
the case, this strengthened conjecture turned out to be false as was shown by Catlin via
an explicit counterexample. However, the motivating question really is: How good a
conjecture is the strengthened version? If the counterexamples were freak instances, then
3
The four color theorem states that ever planar graph, i.e., graph that can be embedded on the plane,
is 4-colorable.

84
maybe one at least had an asymptotically strong statement since subdivisions are eas-
ier to understand from a verification perspective unlike minors. But later, Erdős and
Fajtlowicz put this possibility to rest showing that the conjecture is almost never true.
n
Theorem 53 (Erdős, Fajtlowicz). There exist graphs G such that χ(G) ≥ 3 log n
and G
has no K3√n subdivision.
Proof. Let G = (V, E) be a random graph on n vertices, with each edge placed in the
graph with probability 1/2. We first show that with high probability, G has large chro-
matic number, and then also that G has no large Kp subdivision.

Since χ(G) ≥ n/α(G), let us examine an upper bound for α(G). We have
P[α(G) ≥ x] = P[there exists a set of x vertices which form an independent set]
   x
n −(x2) n
≤ 2 ≤ x−1
x 2 2
Set x = 2 log n + 3 so that 2(x−1)/2 = 2n; then
 2 lg n+3
1 1
P[α(G) ≥ x] ≤ = 2
2 8n
so with high probability, α(G) ≤ 2 log n + 3 < 3 log n.

Now suppose that G contains Kt as a subdivision. Since Kt contains 2t edges, G




must contain as many disjoint paths. Now, each vertex of G must either be a vertex of
the Kt subdivision, or else be contained in at most one of the paths. Since there are n
t

vertices in G, √
we end up forcing many of these paths to be single edges if 2 = Ω(n).
Setting t = 3 n the argument outlined gives us that at least 3n of the paths in the
subdivision of Kt must be single edges of G.

Fix a set U ⊂ V , |U | = 3 n. If U forms the vertices of a K3√n subdivision, then
e(U ) ≥ 3n. By the Chernoff Bound we have
1
P[|e(U ) − E[e(U )]| ≥ E[e(U )]] ≤ 2e−E[e(U )]/48
4
so that √
P[e(U ) ≥ 3n] ≤ 2e−(9n−3 n)/192
< e−n/25
which implies that
P[U forms the vertices of a K3√n subdivision] < e−n/25
Hence by the union bound
   √ 3√n
n −n/25 e n
P[G has a K3√n subdivision] < √ e < e−n/25 = o(1)
3 n 3

85
as n → ∞. So with high probability, G does not contain a K3√n subdivision.
n
Thus, it follows that with high probability, χ(G) ≥ 3 log n
and G has no K3√n subdi-
vision, as desired.

Remark: This result shows that the chromatic number of a graph is a more esoteric
global feature of the graph. In fact, the determination of the chromatic number of the
random graph Gn,p is an interesting problem which still has many unresolved facets, and
we will examine some related results in the forthcoming chapters.
The fact that the chromatic number of a graph is a somewhat enigmatic invariant is
further evidenced by the following theorem due to Erdős: Given ε > 0, and an integer
k, there exist graphs G = Gn (for n sufficiently large) such that χ(G) > k, while every
induced subgraph H on εn vertices satisfies χ(H) ≤ 3. This was again based on a random
graph construction, and the interested reader can see this result in [5].

7.6 Why the Regularity lemma needs many parts


One of the most fundamental results in extremal graph theory is the celebrated Regularity
lemma of Szemerédi, and the result is essentially a probabilistic paradigmatic statement
for dense graphs4 : Every graph can be decomposed into a bounded number of parts such
that the graph between these parts is basically ‘random-like’.

To make this more precise, we need the notion of an ε regular partition. For a pair
of sets (not necessarily disjoint) U, W of vertices of a graph G, we denote by e(U, W )
the number of pairs (u, w) ∈ U × W such that uw ∈ E(G)and by teh density of the
pair (U, W ) we means d(U, W ) := e(U,W )
|U ||W |
. A pair (u, W ) is called ε-regular if whenever
A ⊂ U, B ⊂ W with |A| ≥ ε|U | and |B| ≥ ε|W | then the densities of the pairs (A, B)
and (U, W ) differ by at most ε, i.e., |d(A, B) − d(U, W )| ≤ ε. A partition of the vertex
set V = ∪ki=1 Vi is called an ε-regular partition if the number of irregular pairs (Vi , Vj )
from among these sets is at most εk 2 . The regularity lemma then states the following:
Given ε > 0, there exists M = Oε (1) such that every graph admits an ε-regular partition
into at most M parts. We will see this in greater detail in a subsequent chapter (Chapter ).

The regularity lemma has been found to be of deep consequence in extremal graph
theory, and since the proof procedure has an algorithmic flow to it, the regularity lemma
also finds applications in Theoretical Computer Science (Property Testing). However,
one drawback in the algorithmic application of the Regularity lemma is that the number
M that is obtained through the proof is a tower of 2’s with the height of the tower is
Ω(1/ε5 ). This makes the result immensely interesting and useful theoretically, but com-
pletely useless from a practical point of view. The natural question that arises is: Do we
really need such a large M ? Gowers [13]) settled this question in the affirmative. More
precisely, there exist graphs G for which every ε-regular partition has partition size a
4
If the graph is not dense, then the statement in the usual version is completely tautological.

86
tower of 2’s with height 1/εc where 0 < c < 1 is an absolute constant.

While we shall not prove Gowers’ result here, we shall give an easier and a weaker
version of his result which also appears in the same paper.

Theorem 54. Given 1/1024 > ε > 0, there exist graphs such that every ε-regular parti-
tion has size a tower of 2’s with the height of the tower being of the order log2 (1/ε).

87
8 Property B: Lower and Upper bounds

We have seen a brief glimpse of Property B in hypergraphs in the preceding chapter.


As with many other problems and notions in this book, the main study into this notion
was pioneered by Erdős and some of the problems introduced then are still open. In this
brief digression of a chapter, we shall look at the current bounds (lower and upper) in
the corresponding problem.

8.1 Introduction
For an integer n ≥ 2, an n−uniform hypergraph H is an ordered pair H = (V, E), where
V is a finite non-empty set of vertices and E is a family of distinct n−subsets of V. A
2-coloring of H is a partition of its vertex set hv into two color classes, R and B (for red,
blue), so that no edge in E is monochromatic. A hypergraph is 2-colorable if it admits a
2-coloring. For an n−uniform hypergraph, we define

m(n) := arg min {H = (V, E) is 2-colorable} (8.1)


|E|

2-colorability of finite hypergraphs is also known as “Property B”. In [?], Erdős showed
that 2n−1 < m(n) < O(n2 2n ).

Let us start with a brief look at these results. The first of these, namely, that any
n-uniform hypergraph with at most 2n−1 edges is 2-colorable is an immediate consequence
of considering a random coloring and computing the expected number of monochromatic
edges. The upper bound for m(n) too follows from a simple randomized construction,
and here is the gist.
 q 
1 n
n n
In [10], Beck proved that m(n) = Ω(n 3 2 ) and this was improved to m(n) = Ω 2 log n

by Radhakrishnan et al in [23]. In fact, Erdős-Lovaśz conjecture that m(n) = Θ(n2n ).


Here, we outline the proofs of both Beck’s and Radhakrishnan’s results.

89
We will begin with some notation, if an edge S ∈ H is monochromatic, we will denote
it as S ∈ M,, and in addition, if it is red (blue), we write S ∈ RED (S ∈ BLU E). Also
for a vertex v ∈ V, v ∈ RED and v ∈ BLU E have a similar meaning. We shall freely
abuse notation and denote by RED (resp. BLU E) both, the set of points colored RED
as well as the set of edges of H that are colored RED and this should not create any
confusion, hopefully.

8.2 Beck’s result


Theorem 55 ([10]).
1
m(n) = Ω(n 3 2n )
1
Proof. We will show that m(n) > cn 3 −o(1) 2n , getting rid of o(1) will need some asymp-
totic analysis which is not relevant to the class and hence is not presented here. Let
1
m := |E| = k2n−1 , we will show that k > cn 3 −o(1) . The hypergraph will be colored in two
steps.
Step 1: Randomly color all vertices with red or blue with probability 1/2 and indepen-
dently.
Step 2: Randomly re-color vertices that belong to monochromatic edges independently
with probability p.
For an edge S, S(1) denotes its status after step 1 and S(2) its status after step 2. For
a vertex v ∈ V, v(1) and v(2) have similar meanings. Let N1 denote the number of
monochromatic edges after step 1, then note that E(N1 ) = k. Also let N denote the
number of monochromatic edges after step 2. For an appropriately chosen p, we will
show that E(N ) < 1.
X X
E(N ) = P (S(2) ∈ M) = (P (S(2) ∈ RED) + P (S(2) ∈ BLU E))
S∈E S∈E
X
=2 P (S(2) ∈ RED)
S∈E
P (S(2) ∈ RED) = P (S(1) ∈ M, S(2) ∈ RED) + P (S(1) ∈
/ M, S(2) ∈ RED)
| {z } | {z }
P1 P2

It is easy to bound P1
pn + (1 − p)n
P1 = P (S(1) ∈ RED, S(2) ∈ RED) + P (S(1) ∈ BLU E, S(2) ∈ RED) =
2n
2(1 − p)n

2n
(8.2)
In (8.2), we used the fact that p is small, in particular p < 0.5, this will be validated in
the following analysis. Towards analyzing P2 , note that, for the vertices that were blue

90
after step 1 to have turned red, they must belong to blue monochromatic edges, i.e., for
each v ∈ S that is blue, there is an edge T such that T ∩ S 6= Φ and T ∈ BLU E. Define
EST := event S(1) ∈
/ M, T (1) ∈ BLU E, S ∩ T 6= Φ and S(2) ∈ RED
Then we have
X
P2 ≤ P (EST ) (8.3)
T 6=S

Let U := {v ∈ S \ T | v(1) ∈ BLU E} and EST U := event S ∩ T 6= Φ, T (1) ∈ BLU E,


U ∈ BLU E and S(2) ∈ RED, then
 
_ X
P (EST ) = P  EST U  ≤ P (EST U )
U ⊆S\T U ⊆S\T

For a fixed triple (S, T, U ), for U to even flip it must belong to some other edge which is
blue after step 1. But for an upper bound, let is just flip to red.
1 p
P(EST U ) ≤ p|S∩T |+|U | = (2p)|S∩T |−1 p|U |
22n−|S∩T | 22n−1
p
≤ p|U |
22n−1
Using this in (8.3), we have
n−1  
X p |U | p X n − 1 |U |
P(EST ) ≤ p ≤ p
22n−1 22n−1 |U |
U ⊆S\T |U |=0
n−1 n
(1 + p) p 2p(1 + p) 2p exp(np)
= ≤ ≤
22n−1 22n 22n
X 2mp exp(np)
=⇒ P(EST ) ≤ (8.4)
S6=T
22n

Using (8.2),(8.3),(8.4), we get (recall that m = k2n )


X  m2 p exp(np) (1 − p)n 
E(N ) ≤ 2 2n
+
S
2 2n
= 2 k 2 p exp(np) + k(1 − p)n

(8.5)

For an arbitrary  > 0, let p = (1+)nlog k , then k(1 − p)n ≤ k exp(−np) = k − and
3+
k 2 p exp(np) = k (1+)
n
log k
. So, (8.5) gives

2k 3+ (1 + ) log k
E(N ) ≤ 2k − + (8.6)
n
So, if k ∼ n1/3−2/3 , then (8.6) will be less than 1, so that P(N = 0) > 0.

91
8.3 The Radhakrishnan-Srinivasan (R-S) improvement
Theorem 56 ([23]).
 r 
n n
m(n) = Ω 2 (8.7)
log n

(R-S) take Beck’s recoloring idea and improve it. Their technique is motivated by the
following observation

Observation 57. Suppose S is monochrome after step 1, then it suffices to re-color just
one vertex in S, the rest can stay as is. So, after the first vertex in S changes color, the
remaining vertices can stay put unless they belong to other monochromatic edges.

This motivates the following modification, do not re-color all vertices simultaneously,
put them in an ordered list and re-color one vertex at a time. Here is the modified step
2.
Step 2: For a given ordering, if the first vertex lies in a monochromatic edge, flip its
color with probability p. After having colored vertices 1, . . . , i − 1, if vertex i is in a
monochromatic edge after having modified the first i − 1 vertices, then flip its color with
probability p.

The analysis proceeds along similar to that in the previous section until (8.2). Con-
sider P2 . The last blue vertex v of S changes color to red because there is some T 6= S
such that T was blue after step 1 and |S ∩ T | = 1. We shall say that S blames T (which
we shall denote by S 7−→ T ) if this happens. Also, none of the vertices in T that were
considered before v change their color to red. To summarize,

Lemma 58. S 7−→ T iff

1. |S ∩ T | = 1, call this vertex v.

2. T (1) ∈ BLU E and v is the last blue vertex in S.

3. All vertices before v in S change color to red.

4. No vertex of T before v changes color to red.

Then,
!
_ X
P2 ≤ P S 7−→ T ≤ P(S 7−→ T ) (8.8)
T 6=S T 6=S

Fix an ordering π on the vertices. With respect to this ordering, let v be the (iπ + 1)th
vertex in S and the (jπ + 1)th vertex in T . If the index of w is less than that of v, we

92
write is as π(w) < π(v). Also define,

Sv− := {w ∈ S | π(w) < π(v)}


Sv+ := {w ∈ S | π(w) > π(v)}

Tv− and Tv+ have similar meanings. To compute P(S 7−→ T ), we will need to list some
probabilities
p
1. P(v(1) ∈ BLU E, v(2) ∈ RED) =
2
1
2. P ((T \ v)(1) ∈ BLU E) =
2n−1
1
3. P(Sv+ (1) ∈ RED) =
2n−iπ −1

4. P(Tv− (2) ∈
/ RED | T (1) ∈ BLU E) = (1 − p)jπ

5. For w ∈ S with π(w) < π(v),

1+p
P((w(1) ∈ RED) or (w(1) ∈ BLU E, w(2) ∈ RED) | S ∈
/ M) =
2

So, subject to this ordering π,


 iπ
p 1 1 1+p
P(S 7−→ T ) ≤ · n−1 · n−iπ −1 · (1 − p)jπ ·
2 2 2 2
p
= 2n−1 (1 − p)jπ (1 + p)iπ (8.9)
2

Let the ordering π be random. Then P(S 7−→ T ) = Eπ P(S 7−→ T | π). A random ordering
is determined as follows. Each vertex picks a real number uniformly at random from the
interval (0, 1), this real number is called its delay. Then the ordering is determined by
the increasing order of the delays.

Lemma 59.
p
P(S 7−→ T ) = E (P(S 7−→ T | π)) ≤ (8.10)
22n−1

Proof. Let the delay of a vertex w be denoted by `(w). Let U := {w ∈ S \ v|w(1) ∈


BLU E}, then `(w) ≤ `(v), since v, by definition, is the last blue vertex in S. Also for
each w ∈ T , either `(w) > `(v) or w did not flip its color in step 2. So, for w ∈ T
P (`(w) ≤ `(v), w flips color) = px, so P(`(w) > `(v) or w did not flip) = (1 − px). Now,

93
conditioning on `(v) ∈ (x, x + dx) and with some abuse of notation, we can write

1
P (S 7−→ T, |U | = u | `(v) = x) = xu p1+u (1 − px)n−1
22n−1 |{z} |{z}
| {z } `(U )≤x U ∪ {v} flip to red
coloring after step 1
n−1  Z 1
X n−1 1
=⇒ P(S 7−→ T ) ≤ 2n−1
p1+u xu (1 − px)n−1 dx
u=0
u 0 2
n−1 
!
1 
n−1
Z
p X
= (px)u (1 − px)n−1 dx
22n−1 0 u=0
u
Z 1
p
= (1 − p2 x2 )n−1 dx
22n−1 0
p
≤ (8.11)
22n−1

mp 2(1−p)n
Proof of theorem 56. Using (8.11) in (8.8), we get P2 ≤ 22n−1
. Recall that P1 ≤ 2n
,
summing over all edges S, we get

k(1 − p)n k 2 p
E(N ) ≤ + (8.12)
2 2

Compare (8.12) with (8.5) and note that exp(np) is not present in (8.12). For an arbitrary
 > 0, setting p = (1+)nlog k and approximating (1 − p)n ≈ exp(−np), we get

k 2 log k
 
−
E(N ) ≤ 0.5 k + (1 + ) (8.13)
n

q
n
Clearly k ∼ log n
makes E(N ) < 1 giving the result.

Spencer’s proof of lemma 59. Aided by hindsight, Spencer gives an elegant combinatorial
argument to arrive at (8.11). Given the pair of edges S, T with |S ∩ T | = 1, fix a matching
between the vertices S \ {v} and T \ {v}. Call the matching µ := {µ(1), . . . , µ(n − 1)},
where each µ(i) is an ordered pair (a, b), a ∈ S \ {v} and b ∈ \{v}, define µs (i) := a and
µt (i) := b. We condition on whether none, one or both vertices of µ(i) appear in Sv− ∪ Tv− ,
for each 1 ≤ i ≤ n − 1. Let Xi = |µ(i) ∩ (Sv− ∪ Tv− )|. Since the ordering is uniformly

94
random, Xi and Xj are independent for i 6= j. From (8.9), consider E ((1 − p)jπ (1 + p)iπ ).
 Pn−1 − Pn−1 −

E (1 − p)jπ (1 + p)iπ | µ ∩ Sv− ∪ Tv− = E (1 − p) i=1 I(µ(i)∩Sv 6=Φ) (1 + p) i=1 I(µ(i)∩Tv 6=Φ)


n−1
!
Y − −
=E (1 − p)I(µs (i)∈Sv ) (1 + P )I(µt (i)∈Tv )
i=1
n−1  
I(µs (i)∈Sv− ) I(µt (i)∈Tv− )
Y
= E (1 − p) (1 + P )
i=1
n−1
Y 
1
= (1 − p + 1 + p + 1 + 1 − p2 )
i=1
4
n−1
Y p2

= 1− <1
i=1
4

which implies that

E (1 − p)jπ (1 + p)iπ = E E (1 − p)jπ (1 + p)iπ | µ ∩ Sv− ∪ Tv− < 1


 

and that implies


p
P(S 7−→ T ) <
22n−1
and completes the proof.

8.4 And then came Cherkashin and Kozik...


A nice coda to this chapter is a beautiful argument due to Cherkashin-Kozik (2015)[11]
(a ‘Book Proof’) which greatly simplifies the R-S argument (though it gives the same
bound) to essentially ridding the argument of the recoloring as well. But, it builds upon
the ideas from the previous results, and all the previous hard work now gives payoff in a
very satisfactory manner.

As before, suppose e(H) = k2n−1 . The coloring algorithm puts all the vertices in a
(random) order, and processes one vertex at a time. A vertex is give a default color of
BLU E unless it ends up coloring some edge BLU E in which case, we color the vertex
RED. Note that the only monochromatic edges are all RED at the end of this procedure.
The ordering of the vertices is decided in the same manner as in the R−S algorithm. Each
vertex v picks independently and uniformly, Xv ∈ [0, 1] at random. As observed in the R-
S algorithm, if an edge is colored RED at the end of this procedure, there is some edge T
such that |S ∩T | = 1, and the common vertex of these edges is the last vertex of S and the
first vertex of T . We shall, following Cherkashin and Kozik sat that in this case (S, T ) is
a conflicting pair. We shall estimate the probability that the coloring produces no RED

95
edges, and to do that we shall estimate the probability that there exists a conflicting pair.

Let 0 < p < 1 be a parameter. Call an edge S an p-extreme edge if for each v ∈ S,
Xv ≤ 1−p 2
or Xv ≥ 1+p
2
. To estimate the probability that there is a conflicting pair, we
consider the two possibilities: One of the pair of edges is an extreme edge, and the other
case, when neither n of the edges is extreme. The probability of the former is at most
2 · (k2n−1 ) · 1−p
2
= k(1 − p)n
. In the other case, note that if S ∩ T = {v} then we must
1−p 1+p
have Xv ∈ ( 2 , 2 ) and for all the other u ∈ S, w ∈ T we have Xu < Xv and Xv < Xw
n−1
and the probability of this is (k2n−1 )2 · pXvn−1 (1 − Xv )n−1 < k 2 4n−1 · p 41 = pk 2 .

Hence, if pk 2 + k(1 − p)n < 1 then we are done, and the asymptotics for this are the
same as seen in the discussion following the R-S algorithm.

96
9 More sophisticated concentration: Talagrand’s
Inequality

A relatively recent, extremely powerful, and by now well utilized technique in prob-
abilistic methods, was discovered by Michel Talagrand and was published around 1996.
Talagrand’s inequality is an instance of what is refered to as the phenomenon of ‘Concen-
tration of Measure in Product Spaces’ (his paper was titled almost exactly this). Roughly
speaking, if we have several probability spaces, we many consider the product measure on
the product space. Talagrand showed a very sharp concentration of measure phenomenon
when the probability spaces were also metrics with some other properties. One of the
main reasons this inequality is so powerful is its relatively wide applicability. In this
chapter, we briefly study the inequality, and a couple of simple applications.

9.1 Talagrand’s Inequality in metric probability spaces


Let (Ω, P, ρ) be a metric probability space, and let A ⊆ Ω with P(A) ≥ 1/2. For fixed
t, let At = {ω ∈ Ω | ρ(ω, A) ≤ t}. What is P[At ]? That is, what can we say about the
probability of an outcome close to one in A?
Definition 60. Suppose Ω = Ω1 × · · · × Ωn , the product of n (not necessarily metric)
probability spaces (Ωi , Pi ). Then we can define a measure ρ on Ω by
X
ρ(x, A) := sup inf αi
kαk=1 y ∈ A x 6=y
i i

Here α can be thought of as a cost (set by an adversary) for changing each coordinate
of x to get to some event y ∈ A. Then we can intuitively think of ρ as the worst-case
cost necessary to get from x to some element in A by changing coordinates.
Now for any probability space we can define At = {x ∈ Ω | ρ(x, A) ≤ t}, as above.
Theorem 61. (Talagrand’s Inequality)
2 /4
P[A](1 − P[At ]) ≤ e−t

97
For the proof see p. 55 of Talagrand’s paper “Concentration of Measure and Isoperi-
metric Inequalities in Product Spaces.” For a very readable exposition, we refer the reader
to [28].

We can also define the measure ρ in another, perhaps more intuitive way. For a given
x ∈ Ω and A ⊆ Ω let

Path(x, A) = {s ∈ {0, 1}n | ∃y ∈ A with xi 6= yi ⇔ si = 1}

and let V (x, A) be the convex hull of Path(x, A) (in [0, 1]n ). We can think of Path(x, A)
as the set of all possible paths from x to some element y ∈ A, that is, the set of choices
given some cost vector.

Theorem 62. ρ(x, A) = min kvk.


v∈V (x,A)

Note that it is now clear that we can use min instead of sup and inf, since the convex
hull is closed. It is also clear now that ρ(x, A) = 0 iff (0, 0) ∈ V (x, A) iff x ∈ A.

Concentration of Measure about the Mean


Recall the definition of a Lipschitz function from earlier:

Definition 63. A random variable X : Ω → RED is c-Lipschitz if for any ω1 , ω2 ∈ Ω


differing in one coordinate |X(ω1 ) − X(ω2 )| ≤ c.

We will also need to define another similar notion.

Definition 64. A random variable X : Ω → RED is f -certifiable for f : RED → RED


if the following holds: If some ω ∈ Ω satisfies X(ω) ≥ s, then there is a set I ∈ [n] of
size ≤ f (s) such that X(ω 0 ) ≥ s for any ω 0 ∈ Ω with ωi0 = ωi for all i ∈ I.

We can now state a useful consequence of Talagrand’s Inequality:

Corollary 65. If X is 1-Lipschitz and f -certifiable then


2
p
P[X ≥ b]P[X ≤ b − t f (b)] ≤ e−t /4 .

In particular, if b is the median of X, i.e. b = inf {t ∈ RED | P[X ≥ t] ≤ 1/2}, we have


2
p
P[X ≤ b − t f (b)] ≤ 2e−t /4 .
n p o
Proof. Let A = ω | X(ω) < b − f (b) . We want to show that {ω | X(ω) < b} ⊇ At so
that P[X ≥ b] ≤ 1 − P[At ]. That is, we want to show that for any ω 0 with X(ω) ≥ b,
ω 0 6∈ At . Suppose otherwise. Since X is f -certifiable, there is a set I ⊆ [n] of size no more

98
than f (b) such that if x agrees with ω 0 on I then X(x) ≥ b. Now consider the penalty
−1/2
function
P αi = 1{i∈I} (|I|) . By our assumption that ω 0 ∈ At , there exists y ∈ A such
that yi 6=ω0 αi ≤ t. Then the number of coordinates in which y and ω 0 disagree is no
ip p
more than t |I| ≤ t f (b). Now pick z ∈ Ω suchp that zi = yi for all i 6∈ I and zi = ωi0 for
i ∈ I. Since z disagrees withpy on no more than t f (b) coordinates and X p is 1-Lipschitz
we have |X(z) − X(y)| ≤ t f (b). But since y ∈ A, we have X(y) < b − t f (b), so by
the closeness of X(y) and X(z) we have |X(z)| < b. But since z agrees with ω 0 on the
coordinates of I, f -certifiability guarantees that X(z) ≥ b, and we have a contradiction.

This phenomenon is known as concentration of measure about the median. The


median tends to be difficult to compute, but fortunately it is often close to the mean. The
conversion from median to mean is responsible for the constant factors in the following
corollary.
Corollary 66. (Talagrand’s Inequality About the Mean) Suppose X is c-Lipschitz and
r-certifiable (i.e. f -certifiable with f (s) = rs). Then
2 2
p
P[|X − E(X)| > t + 60c rE(X)] ≤ e−t /8c rE(X) .

p
Here we tend to think of t as some large multiple of E(X), so that we can rewrite
this as p
P[|X − E(X)| > k E(X)] ≤ e−Ω(1)
or
1 ε
P[|X − E(X)| > E(X) 2 +ε ] ≤ e−E(X) .

9.2 First examples


1. Non-isolated vertices in random graphs
Suppose G is a d-regular graph on n vertices. Let H be a random subgraph of G with
each edge of G being retained in H with probability p. Let
X
X = |{v | dH (v) > 0}| = 1dH (v)>0
v∈V

the number of non-isolated vertices in H. By linearity of expectation,


X
E[X] = P[dH (v) > 0] = n(1 − (1 − p)d ).
v∈V

The probability space in question is a product of the nd/2 binary probability spaces
corresponding to retaining each edge, so that the events are tuples representing the out-
comes for each edge. Changing the outcome of a single edge can isolate or un-isolate at

99
most two vertices, so X is 2-Lipschitz. Furthermore, for any value of H with X(H) ≥ s,
we can choose one edge adjacent to each of s non-isolated vertices whose existence in
another subgraph H 0 of G will ensure that the same s vertices are not isolated in H 0 , i.e.
X(H 0 ) ≥ s. Thus X is also 1-certifiable, and Talagrand gives us
h p i 2
P |X − E[X]| > (60 + k) E[X] ≤ e−k /32

so with
p high probability
√ the number of non-isolated vertices is within an interval of length
O( E[X]) = O( n) about the mean. Compare this to the result usingq Azuma on the
 
n
edge-exposure martingale, which would only give an interval of size O 2
= O(n)
about the mean.

2. Longest increasing subsequence


Suppose x1 , . . . , xn ∈ [0, 1] are picked uniformly and independently at random, and put
them in increasing order to generate a permutation of [n]. Let X be the length of the
longest increasing subsequence, and note that X is 1-Lipschitz (as changing a certain
value could only either add it to a long increasing subsequence or remove it from one)
and 1-certifiable (as any choice of the xi with a particular increasing subsequence of length
s always has X ≥ s). √
It is also easy to show that X ≤ 3 n with high probability. For any i1 < · · · < ik ,
P[xi1 ≤ · · · ≤ xik ] = k!1 so

en k ek
   
n 1
P[X ≥ k] ≤ ≤
k k! k kk
6√n
and thus P[X ≥ 3n] ≤ 3e → 0. On the other hand, there is always an increasing or

decreasing subsequence of length n − 1, so we actually find that with high probability
1√ √
n≤X≤3 n
3

so E[X] = O( n).
Talagrand’s
p inequality now tells us that X is with high probability in an interval of
1/4
√ O( E[X]) = O(n ). Note that Azuma would only give an interval of length
length
O( n), since the corresponding martingale would be of length n. The strength of Tala-
grand is that unlike Azuma it does not depend on the dimension of the product space.

9.3 An Improvement of Brook’s Theorem


a
Let us recall Brook’s Theorem: If a graph G is not Kn or C2k+1 then χ(G) ≤ (G).
Here are two improvements:
Kim (2002): For G with girth ≥ 5, χ(G) ≤ (1 + o(1)) logDD .

100
Johansson (2004): For M-free G, χ(G) ≤ O( logDD ).

T heorem: If G is M-free with max. degree D, then χ(G) ≤ (1 − α)D for some α > 0.

P roof : Without loss of generality, let G be D-regular.


Scheme - We shall color the vertices uniformly at random from [c]. If two adjacent ver-
tices are colored the same, uncolor both.
WTS - With positive probability, each vertex v has ≥ αD + 1 colors that are retained on
≥ 2 neighbors of v. If this is done, color each vertex greedily. The greedy algorithm will
complete the proof.

Let Av be the event that vertex v has ≤ αD colors retained on ≥ 2 neighbors of v.


Av ↔ Aw are dependent for < D4 choices of w. Therefore, if P(Av ) = O( D15 ), then we
are through.
Let Xv be the number of colors retained on ≥ 2 neighbors of v,
Xv0 be the number of colors retained on exactly 2 neighbors of v, and
Xv00 be the number of colors assigned on 2 neighbors of v and retained from the start.
Note that Xv ≥ Xv0 ≥ Xv00 .
E(Xv00 ) ≥ D2 1c (1 − 1c )3D−3
U . If u, w ∈ N (v)
S are assigned
S RED, then no vertext in V is
assigned RED, where V (N (v) \ {u, w} N (u) N (w)).
3
Now let C = βD =⇒ E(Xv00 ) ≥ D(D−1) 2
1
βD
1 D−1 3
[(1 − βD ) ] ≥ D−1
2
e− β , D  0.

Let us note that Xv is 1-Lipschitz and certifiable for Xv ≥ s.


Let us write Xv00 = Assv −Delv where Assv is the number of colors assigned to 2 neighbors
of v and Delv is the number of colors asssigned to 2 neighbors but deleted from at least
one of these two. We can see that Assv is 1-Lipschitz. If Delv ≥ s, then ∃ 2s vertices
making color choices in pairs picking the same color and another ≤s neighbors of at least
one of each of these pairs that witnesses G discoloration. Therefore, Delv ≥ s and Delv
is s-certifiable.

Lets us recall the following inequalities:


If X is 1-Lipschitz and determined by independent trials {T1 , . . . , Tm }, then P(|X −EX| >
t2
t) ≤ e− 2m . If X is also r-certifiable, then Talagrand tells us that P(|X − EX| >
√ t2
t + 60 rEX) ≤ e− 8rEX
√ t2 C 2 log D
This implies that t = C D log D then P(|Assv − E(Assv )| > t) ≤ 2e− D = 2e− 2 =
2
2
DC /2
.
t2
Also, P(|Delv − E(Delv )| > t + 60 3E(Delv )) ≤ 2e− 24E(Delv ) =
p 2
C 2 D2
.
D 24E(Delv )
We may now take β = 1
2
so that α = 2e−6 .

101
9.4 Almost Steiner Designs
In this section, we shall look now at a result due to Hod, Ferber, Krivelevich, and Su-
dakov [14], which achieves something very close to a Steiner design. Recall that a Steiner
t-design with parameters (k, n) (and denoted S(t, k, n)) is a k-uniform hypergraph on n
vertices such that every t-subset of the vertices is contained in exactly of the edges of the
hypergraph. A simple counting argument shows that the number of edges of a Steiner
(n)
t-design S(t, k, n) is kt .
(t)
We shall following [14] prove that, for n sufficiently large, there exists a k-uniform
hypergraph such that every t-subset of the vertex set is in at least one edge, and at most
2 edges, and also, the number of edges is asymptotically close to the correct number.
More precisely,

Theorem 67. For n sufficiently large, and given fixed integers k > t ≥ 2 there exist
k-uniform hypergraphs H on the vertex set V satisfying

(n)
• e(H) = (1 + o(1)) kt .
() t

• Every t-subset of V is contained in at least one E ∈ E(H) and is contained in at


most two edges.

This rather neat looking theorem has a relatively short proof.

Proof. For starters, one might want to start with an almost tight packing H and then
for each t-subset T that was not covered by the packing, we would like to pick another
k-subset that accounts for covering T . This motivates the following

Definition 68. For a k-uniform hypergraph H on [n] the Leave hypergraph associated
with H is the t-uniform hypergraph
LH := {T ⊂ [n] : |T | = t, T 6⊂ E for any E ∈ H}.

Thus for every T in the Leave Hypergraph we wish to choose another k edge from the
complete k-uniform hypergraph in order to cover every t-subset of [n]. In particular, one
would like that the size of LH is small in comparison to the size of H. This was already
achieved by Grable; in fact he proved

Theorem 69. (Grable, 1999) Let k > t ≥ 2 be integers. There exists a constant ε =
ε(k, t) > 0 such that for sufficiently large n there exists a partial Steiner design H =
([n], E) satisfying the following:
For every 0 ≤ l < t every set S ⊂ [n] with |S| = l is contained in O(nt−l−ε ) edges of
the leave hypergraph LH .

102
In particular, the size of LH is at most O(nt−ε ). But by picking one edge arbitrarily
to cover each T ∈ LH we run the risk of having some t subset covered more than twice
- something we do not want. Thus we need to be a bit choosy in picking edges to cover
the edges of the leave hypergraph.

For each A ∈ LH define TA := {E : |C| = k, A ⊂ C}. Firstly, note that we can form
a refinement of TA as follows:
 [ 
SA := TA \ TB .
B∈LH ,B6=A

In other words, SA consists of all E ∈ TA such that no other t-subset (other than A) of
the leave hypergraph is also in E. Suppose B ∈ LH and |A ∩B| = i. Then the number
of sets E ∈ TA that are not in SA (on account of B) is n−2t+i
k−2t+i
. Let ni (A); = {B ∈ LH :
B 6= A, |B ∩ A| = i} . If we fix S = |A ∩ B| is a subset of size i, it follows by the result
of Grable that there are at most O(nt−i−ε ) distinct B ∈ LH such that A ∩ B = S. Since
there are ti choices for S, it follows that ni (A) ≤ ti nt−i−ε . Thus,


  X t−1  
n−t n − 2t + i
|SA | ≥ − ni (A) = Θ(nk−t ) − O(nk−t−ε ) = Θ(nk−t ).
k−t i=0
k − 2t + i
So, the sets SA are all quite large.

Note also that by definition, the collections SA are pairwise disjoint for different
A ∈ LH . Thus we have plenty of choice for picking E ∈ SA for distinct A ∈ LH . This
however, will not be good enough. This distillation does ensure that different t-subsets of
the leave hypergraph are covered exactly once, but it may happen that t-subsets that were
initially covered by H may now be covered more than twice. Thus we need to be choosier.

One idea to deal with this is the following. If we can choose the edges E covering the
t-subsets of the leave hypergraph in such a way that for distinct A, B ∈ LH if we have
picked E ∈ SA , F ∈ SB and we also have |E ∩ F | < t then the issue addressed above will
not happen and then we can be sure that our second sub collection along with H will
satisfy the conditions of the theorem. Thus the sense of choosiness that we want may be
stated exactly as this:

For each t-subset A of the leave hypergraph we need to pick EA ∈ SA such that for
A 6= B we have |EA ∩ EB | < t.

One of the interesting new perspectives of the probabilistic method that this proof
suggests is the following principle:

The notion of being choosy can be interpreted probabilistically.

103
In other words, let us pick a random collection RA ⊂ SA as follows. For each E ∈ SA ,
pick it as a member of RA independently and with probability p (for some suitably small
p).

Now, if for each A, we decide to make the pick EA ∈ RA , we wish to show that
|EA ∩ EB | < t for all A 6= B in the leave hypergraph. Showing that |EA ∩ E| < t for all
E ∈ R where [
R= RB
B6=A,B∈LH

is more uniform, so let us aim to do that.

Fix A ∈ LH , and suppose RA has been determined but suppose RB for the other sets
of LH are not yet made. Knowing RB for all B ∈ LH \ {A} amounts to independent
trials made by the members of [
S= SB .
B6=A,B∈LH

To say that we can make a choice EA ∈ RA , we need good bounds on how many elements
of RA are poor choices, i.e., we need an estimate on

NA := {E ∈ RA : |E ∩ F | ≥ t for some F ∈ R} .

Note that if we assume that RA has already been chosen, then NA is determined by
the outcome of |S| independent Bernoulli trials. Moreover, it is clear from the definition
that NA is 1-certifiable. Indeed, if NA ≥ s, then there are E1 , E2 , . . . , Es ∈ RA and at
most s sets F1 , F2 , . . . , Fs ∈ S such that |Ei ∩ Fi | ≥ t. In order to obtain good concen-
tration, it would help if NA were also Lipschitz.

But unfortunately, that may not be the case. Suppose B ∈ LH and |A ∩ B| = t − 1.


Then for any F ∈ RB and E ∈ RA , we would have |E ∩ F | ≥ t − 1, so the only way the
intersection has size strictly less than t is if these sets are disjoint. Thus, it is conceivable
that a single trial F ∈ SB can affect NA substantially.

But now, we use an old trick of Bollobas, which ‘Lipschitzises’ this random variable,
i.e., considers another related random variable which is Lipschitz, and in addition is very
close to the random variable in question.

More precisely, suppose for each A, we pick a large enough sub collection QA ⊂ RA
by adding an element of RA into QA as long as it does not intersect any of the members
already picked outside of A. Thus, QA is a subfamily of RA in which any two sets are
pairwise disjoint outside of A itself. If RA is large enough, then perhaps one can imagine
obtaining a large enough QA ⊂ RA by this process.
If we set
NQ (A) := {E ∈ QA : |E ∩ F | ≥ t for some F ∈ R}

104
then note that the same argument for NA also works here, so NQ (A) is 1-certifiable. But
now, this is also Lipschitz. Indeed, if a certain choice F ∈ R is altered, then since the
sets in QA are pairwise disjoint outside of A, it follows that NQ (A) changes by at most
k − t, so NQ (A) is k − t-Lipschitz. Hence by Talagrand, we have
2
p
P(NQ (A) > t) < 2e−t/16k where t ≥ 2E(NQ (A)) + 80k E(NQ (A)).

Let us estimate E(NQ (A)) first. Note that (recall that we are assuming that RA , and
QA are fixed)
1E
X
NQ (A) =
E∈QA

where 1E counts the set E if there exists F ∈ S such that |E ∩ F | ≥ t. Let us first fix
E ∈ QA . Write
t−1
[
LH \ {A} = Bl
l=0

where
Bl := {B ∈ LH : |B ∩ E| = l}.
We wish to count the number of F ∈ S that trigger E and count in among NQ (A).

If B ∈ Bl we have

{F ∈ SB : |E ∩ F | ≥ t} ≤ {F ∈ TB : |E ∩ F | ≥ t}
= {F ∈ TB : |(E ∩ F ) \ B| ≥ t − l}

Consequently,
k−t   
X k−l n−k−t+l
{F ∈ S : B ⊂ F for some B ∈ Bl , |E∩F | ≥ t} ≤ = O(nk−2t+l ).
i=t−l
i k−t−i

Indeed, pick a subset of E \ B of size i, where t − l ≤ i ≤ k − t, then to get a choice


for F ∈ SB , we need to pick the remaining k − (t + i) elements from the set [n] \ (E ∪ B).
Now, for fixed l with 0 ≤ l ≤ t − 1, we have |Bl | ≤ kl O(nt−l−ε ) = O(nt−l−ε ). This is seen
by first fixing a set of E of size l and then by the result of Grable stated earlier, there
are at most O(nk−l−ε ) elements B ∈ LH that contains a set of size l. Hence, by a very
generous amount, we have

E(1E ) = P(E leads to increment of NQ (A)) ≤ pO(nk−2t+l )O(nt−l−ε ) = pO(nk−t−ε )

so
E(NQ (A)) ≤ |QA |pO(nk−t−ε ).

105
Now suppose we had p = nt−k+ε/2 ; then the estimate above gives us that

E(NQ (A)) ≤ |QA |O(n−ε/2 ).

Note that for this value of p we have with high probability |RA | ≈ Θ(nε/2 ) for all
A (standard Chernoff bounds). We shall now argue that the greedy process produces
|QA | ≥ (nε/3 ) for all A with high probability. We can then choose to stop at around this
stage while constructing QA , so that we indeed do have |QA | = Θ(nε/3 ) This completes
the proof.

Suppose that the greedy process stops after m steps, with m < nε/3S . Then there exist
sets E1 , E2 , . . . , Em such that every set in SA that is ‘disjoint’ from Ei (i.e., disjoint
outside of A) is not picked into RA . Now, if we set X = Ei then |X| < knε/3 . We now
S
need to ensure that the number of sets of SA that do not intersect X outside of A is of
the right order. In other words, the number of sets of TA that meet X non-trivially is at
most
k−t    X k−t
X |X| − t n − |X|
= Θ(niε/3 )nk−t−i = o(nk−t )
i=1
i k − t − i i=1

which implies that the number of sets in SA that are disjoint from X is M = Θ(nk−t ).
Thus, the probability that there exists some set X of size at most knε/3 that satisfies this
condition above is at most
 
n ε/3

ε/3
(1−p)M < O(nn )exp(−nt−k+ε/2 Θ(nk−t )) = exp(nε/3 log n−Θ(nε/2 )) < exp(−nε/7 )
kn
for n sufficiently large, so the result follows.

9.5 Chromatic number of graph powers


Recall, for k ≥ 1, the k th Graph Power Gk is defined as follows:
• V (Gk ) = V (G).
• For u 6= v, u ↔ v iff dist(u, v)G ≤ k.
In other words, two vertices are adjacent in Gk if they are at most a distance k apart
in G. Let ∆(G) = d. One would like bounds on χ(Gk ). The greedy algorithm tells us
χ(Gk ) ≤ dk + 1.

Johansson improved Brooks’ theorem for triangle free graphs by showing that χ(G) =
O( log∆∆ ). The following theorem below is a generalization of this extending to graphs
where the neighborhood of any vertex is sparse.

106
d2
Theorem 70. (Alon-Krivelevich-Sudakov, 2002): If G has at most t
edges in the in-
d
duced subgraph on N (v) for each v ∈ V (G) then χ(G) ≤ log(t) .
 
k dk
This implies (follows easily) that for G with girth at least 3k + 1, χ(G ) ≤ O log d
.

In particular one is interested to see if the above result is asymptotically best possible.
The following result of Alon and Mohar settles this in the affirmative.
Theorem 71. (Alon-Mohar 2001): For large d and any  fixed
k
 g ≥ 3 there exist graphs
d
with max degree ∆ ≤ d, girth at least g, and χ(Gk ) ≥ Ω log d
.

Proof : First, we shall bound ∆ and Γ. We want to pick G = Gn,p such that for all
d
v ∈ V (G), E[deg(v)] = (n − 1)p < np. Let p = 2n . Because this process is a binomial
distribution, we can bound the number of vertices with degree at least d using Chernoff.
d −(d/2)2
P[deg(v) ≥ d] < P[(deg(v) − E(d(v)) > )] ≤ e 3(d/2) = e−d/6
2
Now, let Nbad = |{v ∈ V |deg(v) > d}| =⇒

E[Nbad ] < ne−d/6

By the Markov inequality


P[Nbad > 10ne−d/6 ] < .1
Similarly, let N<g = |{Ck ⊆ G|k < g}| =⇒
g−1  i
X (n)i d
E[N<g ] = < dg
i=3
2i 2n

Again, Markov tells us that


P[N<g > 10dg ] < .1
This implies that with probability at least .8, G satisfies Nbad ≤ 10ne−d/6 and N<g ≤ 10dg .
We shall assume n  dg + ne−d/6 so that we can remove an arbitrary vertex from all
small cycles and remove all vertices of degree more than d. If we want to ensure ∆ = d,
it is simple enough to add some cycles of length g. Thus in order to get a condition on
the maximum degree and girth, all we need to do is delete a small number of vertices
from such a G.

To complete the proof we wish to show that a maximum independent set is not too
large. More precisely, we wish to show that α(G) = O( n log
dk
d
. This amounts to saying
that whp, every set U of this size is NOT independent in Gk .

IN order to achieve this, what we shall do is this. If we could show that for any such
set U , there are several paths of length kbetween some two vertices u, v in U , then in

107
order to make the pair {u, v} a non-edge in Gk , we should have deleted a vertex from
each of those paths between u, v. But if the number of such paths is way more, then
u, v is an edge in Gk giving us what we want. But showing that the number of paths is
concentrated is a difficult task, so we shall try to show that there are several internally
disjoint paths between two such vertices. This is again another instance of the same trick
that was mentioned in the previous section.

Let us get to the details. Let the path P be a U-path if the end vertices of P lie in U
and the internal vertices lie outside of U .Set U ⊆ V (G) such that

ck n log(d)
|U | = =x
dk
 k 
Now, to show χ(Gk ) ≥ Ω log(d)d
, we will show that α(Gk ) ≤ ck n log(d)
dk
for some ck (as
outlined above) To do this, we will show that with high probability, for every U , Π(G),
the number of internally disjoint U-paths of length k, is large. Specifically, we will show
that there are still many of these paths after we make vertex deletions for girth and
maximum degree considerations. This will bound independent sets in Gk .
Let µ be the number of U-paths of length k. It is easy to show that

c2 n2 log 2 (d) nk−1 dk c2k n log 2 (d)


 
x
E[µ] = (n − X)k−1 pk > k =
2 2d2k 2 2k nk 2k+2 dk

Now, we need to say that E[ν], the expected number of non-internally disjoint U-paths,
is much smaller than E[µ]. For n  d  k, the expected number of U-paths which share
one endpoint and the unique neighbor is at most

µck log d
µnk−2 xpk−1 = µ
2k−1 d
It is easy to see that the number of other types of intersecting U-paths is smaller, implying
that
c2k n log2 (d)
E[Π] =
2k+2 dk
Let us note that, because Π(G) counts the number internally disjoint U-paths, removing
one edge can change Π(G) by at most one. Therefore, Π(G) is a 1-Lipschitz function.
Let us also note that Π(G) is f -certifiable. That is, for f (s) = ks, when Π(G) ≥ s,
G contains a set of at most ks edges so that ∀G0 which agree with G on these edges,
Π(G0 ) ≥ s. We can now use Talagrand’s inequality to bound the number of graphs with
insufficiently many U-paths.
For any b and t, Talagrand’s tells us that
βt2
P[|X − E[X]| > t] ≤ e− E[X]

108
for some β > 0. This implies that for t = εE[Π], ε > 0,

(1 − ε)c2k n log2 (d) c2 n log2 (d)


−βε2 k k+2 k
P[Π < ] ≤ e 2 d = o(1)
2k+2 dk
Now, because the maximum number of sets U is at most
ck nk log d
edk
    
n en x d  n 
≤ ≤ ≤ exp ck k k log2 d
x x ck log d d

So, if
βε2 c2k
> 2kck
2k+2
2
then, with probability 1−o(1), for every set U , there are at least εn log d
2k+2 dk
pairwise internally
disjoint U-paths.
Now, for n  d  k
εn log2 d
10n2−d/10 + 10dg < k+2 k
2 d
so we can remove all small cycles and high-degree vertices without destroying all U-paths
and therefore  k 
k n log(d) k d
α(G ) ≤ ck k
=⇒ χ(G ) ≥ Ω
d log(d)
as desired, and this completes our proof.

109
10 Martingales and Concentration Inequalities

The theory of Martingales and concentration inequalities were first used spectacularly by
Janson, and then later by Bollobás in the determination of the chromatic number of a ran-
dom graph. Ever since, concentration inequalities Azuma’s inequality and its corollaries
in particular, have become a very important aspect of the theory of probabilistic tech-
niques. What makes these such an integral component is the relatively mild conditions
under which they apply and the surprisingly strong results they can prove which might be
near impossible to achieve otherwise. In this chapter, we shall review Azuma’s inequality
and as a consequence prove the Spencer-Shamir theorem for the chromatic number for
sparse graphs and later, study the Pippenger-Spencer theorem for the chromatic index of
uniform hypergraphs. Kahn extended some of these ideas to give an asymptotic version
of the yet-open Erdős-faber-Lovász conjecture for nearly disjoint hypergraphs.

10.1 Martingales
Suppose Ω, B, P is underlying probability space. F0 ⊆ F1 ⊆ ...Fn ⊆ ... where Fi is
σ-algebra in B.
[
F= Fi
i

Xi is a martingale if Xi is Fi measurable and E(Xi+1 |Fi ) = Xi .


In general, if X is F-measurable and E(X) < ∞, then Xi = E(X|Fi ) always gives a
martingale. This is called Doobs’ Martingale Process.

10.2 Examples
• Edge Exposure Martingale
Let the random graph G(n, p) be the underlying probability space. Label the po-
tential edges {i, j} ⊆ [n] by e1 , e2 , ..em where m = n2 . Let f be any graph theoretic


function. Then we can define martingale X0 , X1 , X2 , ...Xm where:

Xi = E(f (G)|ej is revealed ∀1 ≤ j ≤ i)

111
In other words to find Xi we first expose e1 , e2 , ..., ei and see if they are in G. Then
Xi will be expectation of f (G) with this information. Note that X0 is constant.
• Vertex Exposure Martingale
Again G(n, p) is underlying probability space and f is any function of G. Define
X1 , X2 , ..., Xn by:

Xi = E(f (G)|∀x, y ≤ i ex,y is exposed)

In words, to find Xi , we expose all edges between first i vertices (i.e. expose
subgraph induced by v1 , v2 , ..., vi ) and look at the conditional expectation given
this information.

10.3 Azuma’s Inequality


Definition 72 (Lipshitz). A function f is K −Lipschitz if ∀x, y |f (x)−f (y)| ≤ K|x−y|.
A martingale X0 , X1 , ... is K − Lipschitz if ∀i |Xi − Xi+1 | ≤ K

Theorem 73 (Azuma’s Inequality). Let 0 = X0 , X1 , ...Xm be a martingale with

|Xi+1 − Xi | ≤ 1 (i.e.1 − Lipschitz)

∀0 ≤ i < m. Let λ > 0 be arbitrary. Then


√ 2
P(Xm > λ m) < e−λ /2

Proof. Set α = λ/ m. Set Yi = Xi+1 − Xi so that |Yi | ≤ 1 and E(Yi |Xi−1 ) = 0. Then
similar to argument used for proving Chernoff bound, we have:
2 /2
E(eαYi |Xi−1 ) ≤ cosh(α) ≤ eα

Hence: m
Y
αXm
E(e ) = E( eαYi )
i=1
m−1
Y
= E(( eαYi )E(eαYm |Xm−1 ))
i=1
m−1
2 /2 2 m/2
Y
≤ E( eαYi )eα ≤ eα (by induction)
i=1

and using this result we get:

√ √
P(Xm > λ m) = P(eαXm > eαλ m )

112

≤ E(eαXm )e−αλ m

2 m/2−αλ√m
≤ eα
2 /2 √
= e−λ (since α = λ/ m)

Corollary 74. Let c = X0 , X1 , ...Xm be a martingale with


|Xi+1 − Xi | ≤ 1
∀0 ≤ i < m. Let λ > 0 be arbitrary. Then
√ 2
P(|Xm − c| > λ m) < 2e−λ /2

10.4 The Shamir-Spencer Theorem for Sparse Graphs


Theorem 75 (Theorem of Shamir-Spencer). If G = G(n, p) with p = n−α for some α
then there exists an integer µ = µ(n) such that
P(µ ≤ χ(G) ≤ µ + 3) → 1 as n → 1
(i.e., χ(G) get concentrated over only 4 values.)
(Almost every graph parameter has a behavior similar to chromatic number.)

Proof. Let  > 0 be arbitrarily small and let µ be defined as follows:


µ = inf{v | P(χ(G) > v) < 1 − }
i.e. with probability ≥ , χ(G) ≤ µ however P(χ(G) ≤ µ − 1) < .
Let Y be the vertex set of largest subgraph of G which is µ − colorable. Let R = V \ Y
where V = vertex(G), consider |R|. Consider a vertex exposure martingale i.e., we know
if the vertex is in R or Y one at a time.
Xi = E(|R| | exposed till i0 th vertex); clearly |Xi+1 − Xi | ≤ 1
By Azuma’s inequality we have:
√ 2
P(||R| − E(|R|)| > λ n − 1) ≤ 2e−λ /2 ∀λ > 0
2 /2
Pick λ s.t. 2e−λ < . R = 0 =⇒ G is µ colorable and this happens with prob ≥  i.e.
√ √ √
0 ∈ (E(R) − λ n, E(R) + λ n) =⇒ |R| ≈ c n w.p. ≥ 1 − 

But any induced subgraph on c n vertices can be 3-colored with high probability, i.e.
P(χ(G(R)) > 3) <  if n is large enough. Here G(R) is the graph induced by R.

Claim: Let S be s.t. |S| ≤ c n, w.h.p. S is 3-colorable.

113

Proof. Suppose not. Then ∃ a smallest subgraph of size ≤ c n that is not 3-colorable.
Let T be smallest such set. Note that every vertex in T has degree ≥√3 =⇒ e(T ) ≥ 23 |T |.
But in a graph Gn,p the probability that ∃ some set T of size ≤ c n which has ≥ 3t2
edges is o(1). Because:
 t 
3t
P(∃T of size t and with edges) ≈ 3t2 p3t/2 where p ∼ n−α
2 2

Because:

t=c n   t

√ 3t X n
P(∃T of size ≤ c n and with edges) ≤ 2
3t
p3t/2 → o(1)
2 t=0
t 2

if p ∼ n−α and α > 5/6.

This concludes the proof of Shamir-Spencer as µ ≤ χ(G) ≤ µ+3 with high probability.

10.5 The Pippenger-Spencer Theorem


Let H be a hypergraph. We say that E(H) can be properly N − colored if E(H) can
be partitioned into N matchings in H. By a matching, we mean a set of mutually non-
intersecting hyper-edges.
The smallest N for which E(H) can be N −colored is called chromatic index of H, denoted
by χ0 (H).
If G is a graph, we know that ∆(G) ≤ χ(G) where ∆(G) is max vertex degree.
Also from Vizing-Gupta Theorem we have χ0 (G) ≤ ∆(G) + 1. Overall we know:

∆(G) ≤ χ0 (G) ≤ ∆(G) + 1

for graphs.
However it is computationally hard to figure out if χ0 (G) = ∆(G) or ∆(G) + 1.

For H note that χ0 (H) ≥ ∆(H) where ∆ still denotes max degree in H i.e.:
∆(H) = max{d(x)|x ∈ V (H)}, d(x) = # of hyperedges containing x

Theorem 76 (The Pippenger-Spencer Theorem). Given  > 0, ∃ a δ > 0 and D0 () s.t.
the following holds if n ≥ D ≥ D0 and:

• D > d(x) > (1 − δ)D

• d(x, y) < δD∀x, y ∈ V where D = ∆(H) ?

114
Then χ0 (H) < (1 + )D

Note: d(x, y) is codegree of x, y i.e. d(x, y) = |{E ∈ E(H) s.t. {x, y} ⊆ E}|

The proof of this theorem due to Pippenger-Spencer follows the paradigm of the
‘pseudo-random method’ pioneered by Vojtech Rödl and the ‘Nibble’.

Proof of the P-S theorem:


Idea: Pick each edge of E with probability D independent of each other. Form the sub-
collection that is obtained, E1 , throw away these edges and other incident edges to E1 .
The resulting hypergraph is H1 . Then with high probability H1 also satisfies the same
the same 2 conditions ? of Pippenger-Spencer for a different D.
From E1 extract a matching M1 , i.e. pick those edges of E1 that do not intersect any
other edges of E1 . By repeating this procedure we have:
1 E
2 E t E
H = H0 −
→ H1 −
→ H2 . . . −
→ Ht

D1 ≈ De−k (where H is k-uniform) since


h  ik
P(edge surviving) ≈ (1 − )D = e−k
D
asymptotically. Now let:
t
[
(1)
M = Mi (Mi are disjoint by construction)
i=1

For an edge A:
t
X
(1)
P(A ∈ M ) = P (A ∈ Mi ) and
i=1

    −k
P(A ∈ M1 ) ≈ , P(A ∈ M2 ) ≈ (1 − )k(D−1) ≈ e in general :
D D1 D D1


P A ∈ Mi ≈ e−k+(i−1)

D
t
  X     1 − et 
(1) −k (i−1) −k α
=⇒ P(A ∈ M ) = e e =e ≈
D i=1 D 1 − e D
t
where α = α(, t, k) = e−k (1−e
1−e
)
. Now, we can generate a second independent matching
(2)
M by repeating the same process and so on.

115
Just like the Rödl’s nibble start by picking a ‘small’ number of ‘independent’ match-
ings from H. Let 0 < θ < 1 and µ = bθDc and generate independent matchings
M(1) , M(2) , M(3) . . . M(µ) with each M(i) having:
α
P(A ∈ M(i) ) ≈
D
Let P (1) = M(1) ∪ M(2) ∪ M(3) ∪ · · · ∪ M(µ) .
P (1) P (2) P (s)
H = H(0) −−→ H(1) −−→ H(2) . . . −−→ H(s)

Here first ‘packing’ P (1) is µ = θD-colorable since we can assign each matching M(i)
a separate color. Note that χ0 (H(0) ) ≤ µ + χ0 (H(1) ) (since chromatic number is subaddi-
tive). Similarly P (2) is θD(1) − colorable and so on.

Hence so far we need θD + θD(1) + · · · + θD(s−1) colors. After removing colored edges
(i.e. edges ∈ some P (i) ), very few edges will be left in H(s) .

Bounding χ0 (H(s) ): For any k − unif orm hypergraph H with max degree D, we have:
χ0 (H) ≤ k(D − 1) + 1 =⇒ χ0 (H(s) ) ≤ k(D(s) − 1) + 1

Hence:
s−1
X
total # of colors we used = θ D(i) + θD + k(D(s) − 1) + 1 ≈ D
i=1

s will be chosen as large as possible. Here we need to make sure that H(i) is similar to
H(i−1) (i.e. all degrees are almost equal and the co-degree is small). (In particular we’ll
be interested in i = 1 case).

Fix any x ∈ H, what is the E(d(1) (x))?

1A6 ∈P (1)
X
d(1) (x) =
A:x∈A∈H(0)

X α µ α α
=⇒ E(d(1) (x)) = (1 − ) ≈ D(1 − )µ ≈ D(1 − )θD ≈ De−αθ = D(1)
D D D
A:x∈A∈H(0)

Hence E(d(1) (x)) ≈ D(1) = De−αθ


Use Azuma’s inequality to get a concentration inequality for d(1) (x). The art is to pick
the right filtration.

(We will consider the following martingale Xi = E[d(1) (x) | M(1) , M(2) , . . . , M(i) ])

116
Let Fi = {M(1) , M(2) , . . . , M(i) } since M(i) is a matching =⇒ at most one edge contain-
ing x is exposed.

Then E[d(1) (x)|Fi ] := Xi is a 1 − Lipschitz martingale. So by Azuma’s inequality:


√ 2
P(|d(1) (x) − D(1) | > λ µ) ≤ e−λ /2 (Here x is fixed and µ ≈ θD = o(1)D)

Now question is: ”How to guarantee this for all vertices?”. Use Lovasz Local Lemma
(LLL): q
(1) (1)
Ax := |d (x) − D | > λ o(1)D(1)
Want to show that: !
^
P Ax >0
x∈V
2
We know: P(Ax ) ≤ 2e−λ /2 . To compute the dependence degree among {Ax |x ∈
V (H)}:
(i) (i) (i)
M(i) = M1 ∪ M2 ∪ . . . Mt

(Distance between two vertices is the shortest number of edges one needs to go from
x to y.)
Note that each matching M(i) is generated by atoms 1E where each E ∈ H(0) and whose
’distance’ from x ≤ t. So if distance between x and y ≥ 2t+1, Ax and Ay are independent.

=⇒ Dependence degree

≤ (k − 1)D(0) + 2(k − 1)2 (D − 1)D + · · · + r(k − 1)r (D − 1)r + · · · + 2t(k − 1)2t (D − 1)2t
≤ (2t+1)(kD(0) )2t+1
So for LLL, we need:
2 /2
e2e−λ (2t + 1)(kD(0) )2t+1 < 1

e(2t+1)(kD(0) )2t+1
p
Put λ = o(1)D(1) to get: ⇐⇒ (1) /2 < 1.
eo(1)D

Asymptotically D(1) beats t (big time), so condition for LLL will hold hence we are
in business.

Finally repeating the previous argument:


χ0 (H) ≤ µ(0) + µ(0) + · · · + µ(s−1) + χ0 (H(s) )
where µ(i) = θD(i) and D(i) = e−αθi D and χ0 (H(s) ) bounded above as before. Then we
get:

χ0 (H) ≤ θD(1+e−αθ +e−2αθ +· · ·+e−(s−1)αθ )+kθDe−sαθ

117
θD
≤ −αθ
+kθDe−sαθ → D(1+o(1))
1−e
as t → ∞, s → ∞,  → ∞, etc. Thus we’ll have the desired result.

When we do the calculations, everything works out nicely.

10.6 A Conjecture of Erdős-Faber-Lovász (EFL) and a theorem of Kahn


Definition 77. A hypergraph H is nearly-disjoint or linear if

∀A 6= B ∈ E(H), |A ∩ B| ≤ 1

.
Conjecture 78. If H is nearly-disjoint on n vertices, then χ0 (H) ≤ n
Theorem 79 (Erdos-de Bruijn Theorem). If H is a hypergraph on n vertices with

|A ∩ B| = 1 ∀A 6= B

then |E(H)| ≤ n.
As an aside, |E(H)| ≤ n =⇒ χ0 (H) ≤ n. This theorem is tight in the sense that if it
is a projective plane of order n, then n2 + n + 1 colors are needed =⇒ χ0 (H) = |E(H)|.
(¶n = projective plane of order n)
Theorem 80 (Theorem - Jeff Kahn (1992)). The EFL conjecture is asymptotically true,
i.e. χ0 (H) ≤ n(1 + o(1)) for H nearly-disjoint on n-vertices.

Note that in this general situation, the edge sizes need not be the same; in fact they
need not even be absolutely bounded, and as we shall see, that causes some of the trouble.

Firstly, we start with a simple observation. If there is an integer k such that for each
edge E in a nearly disjoint hypergraph H we have |E| ≤ k, then we can ‘uniformize’
the edge sizes. This is a standard trick, so we will not describe it in detail. One may
form a bipartite graph G whose vertex sets are the vertices and edges of H, and (v, E)
is an incident pair iff v ∈ E. Then the uniformization described earlier is equivalent to
embedding G into a bipartite graph with uniform degree over all the vertices E ∈ E such
that the graph is C4 -free. This is a fairly standard exercise in graph theory.

If all the edges are of bounded size, i.e., if 3 ≤ b ≤ |E| ≤ a for all edges E then
the Pippenger-Spencer theorem of the preceding section proves the result claimed by the
aforementioned theorem. Indeed, for any x count the number of pairs (y, E) where y 6= x,

118
and x, y ∈ E. Since H is nearly disjoint, any two vertices of P
H are in at most one edge
so this is at most n − 1. On the other hand, this is precisely x∈E (|E| − 1), so we have
(b − 1)d(x) ≤ n − 1 ⇒ d(x) ≤ n−1
b−1
< n2 .

Here is a general algorithm for trying to color the edges of H using C colors: Arrange
the edges of H in decreasing order of size and color them greedily. If the edges are
E1 , E2 , . . . , Em with |Ei | ≥ |Ei+1 | for all i then when Ei is considered for coloring, we
may do so provided there is a color not already assigned to one of the edges Ej , j < i
for which Ei ∩ Ej 6= ∅. To estimate |{1 ≤ j < i|Ej ∩ Ei 6= ∅}|, let us count the number
of triples (x, y, j) where x ∈ Ei ∩ Ej , y ∈ Ej \ Ei . Write |Ei | = k for simplicity. Again,
since H is nearly disjoint, any two vertices of H are in at most one edge, hence the
number of such triples is at most the number of pairs (x, y) with x ∈ Ei , y 6∈ Ei , which
is k(n − k). On the other hand, for each fixed Ej such that 1 ≤ j < i, Ej ∩ Ei 6= ∅,
Ei ∩ Ej is uniquely determined, so the number of such triples is |Ej | − 1. Hence denoting
I = {1 ≤ j < i|Ej ∩ Ei 6= ∅} and noting that for each j ∈ I |Ej | ≥ k, we get
X k(n − k)
(k − 1)|I| ≤ (|Ej | − 1) ≤ k(n − k) ⇒ |I| ≤ .
j∈I
k − 1

|E|(n−|E|)
In particular, if C > |E|−1
for every edge E, the greedy algorithm properly colors H.

Upshot: For any nearly disjoint hypergraph H on n vertices χ0 (H) ≤ 2n − 3.

The previous argument actually shows a little more. Since k(n−k) k−1
is decreasing in k
1
if |E| > a for some (large) constant a, then |I| < (1 + a )n. So, for a given  > 0 if we
a > 1/, say, then for C = (1 + 2)n, following the same greedy algorithm will properly
color all edges of size greater than a. This motivates us to consider

• Es := {E ∈ E : |E| ≤ b}.

• Em := {E ∈ E : b < |E| ≤ a}.

• El := {E ∈ E : |E| > a}

for some absolute constants a, b which we shall define later. We have seen that χ0 (Hl ) ≤
(1 + 2)n; also by a preceding remark, if we pick b > O(1)/ we have χ0 (Hm ) ≤ n. Thus,
let us do the following.

Let C = b(1 + 4)nc; we shall color the edges of H using the colors {1, 2 . . . , C}. Let
C1 = {1, 2 . . . , b(1 + 3)mc}; C2 := C \ C1 . Fix a coloring f1 of Hl using the colors of
C1 , and a coloring f2 of Hm using the colors of C2 . We now wish to color Hs . We shall
attempt to do that using the colors of C1 . For each E ∈ Hs let

Forb(E) := {c ∈ C1 |E ∩ A 6= ∅ for some A ∈ Hl , f1 (A) = c}.

119
Then as before, |Forb(E)| ≤ |{A ∈ Hl |A ∩ E 6= ∅}| ≤ a(n−a)
b
< ηD for η = a/b, D = n.
In other words, every edge of Hs also has a (small) list of forbidden colors for it. If we
can prove a theorem that guarantees a proper coloring of the edges with no edge given a
forbidden color, we have an asymptotic version of the EFL.

At this point, we are motivated enough (as was Kahn) to state the following

Conjecture 81. Let k ≥ 2, ν > 0, 0 ≤ η < 1. Let C be a set of colors of size at least
(1 + ν)D. There exists β > 0 such that if H is a k-uniform hypergraph satisfying

• (1 − β)D < d(x) ≤ D for all vertices x of H,

• d(x, y) < βD for all distinct pairs of vertices x, y,

• For each A ∈ E, there is a subset Forb(A) ⊂ C with |Forb(A)| < ηD.

then there is a proper coloring f of E such that for every edge A, f (A) 6∈ Forb(A).

Note that the first two conditions are identical to those of the PS theorem. Also, it is
important to note that there might be some additional constraints on η, ν which indeed
is the case. We will see what those are as we proceed with the proof.

To prove this conjecture, let us again recall the idea of the proof of the PS theorem.
The ith step/iteration in the proof of the PS theorem does the following: Fix 0 < θ < 1,
and let t, s be large integers. Starting with the hypergraph H(i) (1 ≤ i ≤ s) which satisfies
t )
conditions (1), (2) above with D(i) := e−αθi D with α = α(, t, k) = e−k (1−e
1−e
, with pos-
(1) (2) (µ )
itive probability there is a random packing P (i+1) := Mi+1 ∪ Mi+1 ∪ · · · ∪ Mi+1i ∈ H(i)
with µi = bθD(i) c, such that
α
• P(A ∈ P (i+1) ) ≈ D(i)
.

• For all A ∈ H(i) the event “A ∈ P (i+1) ” is independent of all events “B ∈ P (i+1) ” if
distance between A, B is at least 2t. Here, the distance is in the hypergraph H(i) .

The idea is to try to give every edge its ‘default color’ as and when we form the packings
P (i) . Since each such packing consists of up to µi different matchings, P (i) can be P(by
default) colored using µi colors, so that when we complete s iterations we have used i µi
different colors to color all the edges except those of H(s) . The PS theorem finishes off by
coloring these edges greedily using a fresh set and colors by observing that the number
of edges in H(s) is ‘small’.

To keep track of these let us write


[
C := Cij ∪ C ∗ , with Cij := {ci1 , ci2 , . . . , ciµi },
1≤j≤µi ,1≤i≤s

120
(j)
where these sets Cij are mutually disjoint and the matching Mi+1 is by default allocated
color cij .

In our present situation, the default colors allocated to some of the edges may be forbidden
at those edges. More specifically, define
(j)
B (i) := {A ∈ H(i) |A ∈ Mi+1 for some j and cij ∈ Forb(A)}.
(i)
For each vertex v, let Bv := |{A ∈ B (i) |v ∈ A}|.

At each stage, remove the ‘bad edges’ from the packings, i.e., the ones assigned a for-
bidden color.
S After s iterations the edges that need to be (re)colored are the ones in
H0 := H(s) si=1 B (i) and the colors that are left to be used are those in C ∗ . Note that
(s)
for each vertex v we have dH0 (v) ≤ Dv + Bv . The first term is o(D); if the second term
is also o(D) then we may finish the coloring greedily. Thus, if we can show that we can
pick our random packing at stage i in such a way that apart from the criteria in the
(i)
PS-theorem, we can also ensure that Bv is ‘small’ (compared to the order of D(i) ) then
we are through (there is still some technicality but we will come to that later).

Hence to start with, we need to show that at each step i of the iteration, we can get
a random packing P (i+1) such that
• |d(i) (v) − D(i) | < o(D(i) ) for all v.
(i) (i)
• Bv < E(Bv ) + o(D)
The proof of this part is identical to that of the PS theorem; use the same martingale,
the same filtration, and use Azuma’s inequality.
(i)
To complete the proof, we need to get an (over)estimate of E(Bv ). For each A ∈ H(i) ,
(j)
A is not in B (i) if and only if for each cij ∈ Forb(A) we have A 6∈ Mi=1 . Denoting
Forb(i) (A) := {j|cij ∈ Forb(A)} we have
|Forb(i) (A)|
α|Forb(i) (A)|

(i) α
P(A ∈ B ) = 1 − 1 − (i) < .
D D(i)
Hence,
X
E(Bv(i) ) = P(A ∈ B (i) )
v∈A∈H(i)
α X
. |Forb(i) (A)|
D(i)
v∈A∈H(i)

Let i(A) := max{0 ≤ i ≤ s|A ∈ H(i) }. Note that for any fixed i,
|{A ∈ H|v ∈ A, i(A) = i}| ≤ θe−αθi D.

121
Hence we have
s s
X X 1 X
E(Bv(i) ) .α |Forb(i) (A)|
i=0 i=0
D(i)
v∈A∈H(i)
X X |Forb (A)|  (i) 
=α 1A∈H(i)
v∈A i
D(i)
X 1 X 
(i)
≤α |Forb (A)|
v∈A
D(i(A)) i
X |Forb(A)|
≤α eαθi(A)
v∈A
D
s
X
< αo(1) eαθi |{A|v ∈ A, i(A) = i}|
i=0

The last term in the above expression can be made ‘small’. This completes the proof
of Kahn’s theorem.

122
Bibliography

[1] M Aigner, G. M. Ziegler, Proofs from The Book, Springer.

[2] N. Alon, D. J. Kleitman, Sum-free subsets, A tribute to Paul Erdős (A. Baker, B.
Bollobás, A. Hajnál eds), Cambridge University Press, pp. 13-26.

[3] N. Alon, Y. Matias, M. Szegedy, The space complexity of approximating the fre-
quency moments, J. Comp. Sys. Sci. 58 (1999) (1), 137-147

[4] N. Alon, Y. Peres, Uniform Dilations, Geom. Func. Anal. 2 (1992), No. 1, 1-28.

[5] N. Alon, J. H. Spencer, The Probabilistic Method, 4th ed., Wiley International, 2016.

[6] B. Bollobás. Random Graphs,

[7] N. Balachandran, S. Padinhatteeri, χD (G), |Aut(G)|, and a variant of the motion


lemma, Ars Math. Contemp. 12 (2017), no. 1, 89-109.

[8] N. Balachandran, E. Mazumdar, Zero sums in restricted sequences.

[9] N. Bansal,

[10] J. Beck,

[11] D. D. Cherkashin, J. Kozik, A note on random greedy coloring of uniform hyper-


graphs, Random Struct. Algorithms.

[12] S. Eberhart, B. Green, F. Manners, Sets of integers with no large sum-free subset,
Ann. Math. 180(2014), 621-652.

[13] W. T. Gowers, Lower Bounds of Tower Type for Szemerédi’s Uniformity Lemma,
GAFA, Geom. Funct. Anal., 7(1997), 322-337.

[14] Hod, A. Ferber, M. Krivelevich, B Sudakov,

[15] D.R. Hughes and F. C. Piper, Projective Planes, Graduate Texts in Mathematics
6, Springer-Verlag, New York, 1973.

[16] S. Janson, Luczak, D. Rucinski, Random Graphs

123
[17] P. Keevash, The existence of Steiner designs.

[18] A. Kostockha, B. Sudakov,

[19] F. Lazebnik, V.A. Ustimenko and A.J. Woldar, A New Series of Dense Graphs of
High Girth’, Bulletin of the AMS, 32(1995), Number 1, 73–79.

[20] A. Lubotzky, Phillips, P. Sarnak,

[21] Molloy, B. Reed, Graph Colouring and the Probabilistic Method,

[22] N. Pippenger, J.H. Spencer,

[23] J. Radhakrishnan, A. Srinivasan,

[24] N. Robertson, P. Seymour,

[25] T. Rothvoß,

[26] J. H. Spencer, Ten lectures on the Probabilistic Method

[27] J. H. Spencer (with L. Florescu), Asymptopia, Student Mathematical Library, Vol-


ume 71, American Mathematical Society, 2014.

[28] S. Venkatesh, The Theory of Probability, Cambridge University Press, 2013.

[29] D. B. West, Introduction to Graph Theory,

[30] A. Wigderson, Mathematics and Computation, Princeton University Press, 2019.

[31] R. M. Wilson,

[32] R. M. Wilson,

[33] R. M. Wilson,

124

You might also like