Probabilistic Paradigm Combin-Chap 1-6
Probabilistic Paradigm Combin-Chap 1-6
COMBINATORIAL PERSPECTIVE
Niranjan Balachandran
Dept. of Mathematics, IIT Bombay.
1
Contents
1 Preface 5
2
6.6 Resolution of the Erdős-Hanani Conjecture: The Rödl ‘Nibble’ . . . . . . 68
Bibliography 123
3
1 Preface
One of the most popular motifs in mathematics in recent times, has been the study of the
complementarity between the notions of ‘Structure’ and ‘Randomness’ in the sense that
most mathematical structures seem to admit this broad dichotomy of characterisation.
It is little wonder then, that probability theory has become a ubiquitous tool in as varied
areas of mathematics as Differential equations, Number theory, and, Combinatorics, to
name a few, as it brings forth the language to describe ‘randomness’.
Combinatorics is one of the areas where this dichotomy has played a very critical role
in the resolution of several very interesting problems. Combinatorics has long been held
as an area which , unlike many other areas of mathematics, does not involve a great deal
of theory building. However, it is of course not true that there is no general sense of
‘theory in combinatorics; to quote Gowers from his iconic essay, ‘The two Cultures of
Mathematics’, “The important ideas of combinatorics do not usually appear in the form
of precisely stated theorems, but more often as general principles of wide applicability.”
What plays an equivalent role in Combinatorics for ‘theory’ would be something akin to
‘general principles’ which shape the manner in which the combinatorist generally forms
his/her view. And one of the principal principles at work in combinatorics is the Proba-
bilistic paradigm.
One of the main reasons for the ubiquity and all-pervasive nature of the method is
that it provides a tool to deal with the ‘local-global’ problem. More specifically, many
problems of a combinatorial nature ask for a existence/construction/enumeration of a fi-
nite set structure that satisfies a certain combinatorial structure locally at every element.
The difficulty in many a combinatorial problem is to construct structures that are ‘locally
good’, everywhere. A significant part of this difficulty arises from the fact that, often,
there seem to be several possible choices for picking a local structure, but no canonical
5
choices, consequently, it is not clear which local choices are preferential. The probabilis-
tic paradigm enables one to consider all these ‘local’ patches simultaneously and provide
what one could call ‘weak’ conditions for building a global patch from the local data. In
recent times, many impressive results settling several long standing open problems, via
the use of the probabilistic method, essentially relying on this principle. But one thing
that stands out in all these results is: the techniques involved are often quite subtle, ergo,
one needs to understand how to use these tools, and think probabilistically.
Coming to existing literature on this subject, there are some truly wonderful mono-
graphs - the ones by Alon-Spencer [5], Spencer [26], Bollobás [6], Janson et al [16],
Molloy-Reed[21] spring readily to mind - on the probabilistic method in combinatorics,
specifically. In addition to this, one can find several lecture notes’ compilations on the
probabilistic method, on the internet. So, what would we seek to find in another book?
What ought it to offer one that is, say missing, in all this plethora of material that is
already available?
Most of the available material attempt to keep the proofs easy to follow and simple
to verify. But that invariably makes the proof appear rather magical, and almost always
obscures the thought processes behind them. Indeed, many interesting (probabilistic)
arguments appear in situations that do not seem to involve any probability at all, so
a certain sprezzatura is distinctly conveyed. So a new book could certainly do with a
deconstructionist perspective.
This book arose as a result of lectures for a graduate course - first at Caltech, and then
later at IIT Bombay - with the goal of providing that sense of perspective. Tim Gowers
has on more than one occasion written about ‘The Exposition Problem’: “Solving an
open exposition problem means explaining a mathematical subject in a way that renders
it totally perspicuous. Every step should be motivated and clear; ideally, students should
feel that they could have arrived at the results themselves.” 1 An entire book in this spirit
has not appeared before, and that is what this book really attempts to do.
6
I do not (deliberately) include proofs or detailed discussions of many important results
from probability although I do state the ones in their full form for their utility within the
confines of this book. The reason is twofold: this is basically a book on combinatorics,
which forms and informs the topics of interest in the first place. Secondly, the imperative
is to provide a perspective into probabilistic heuristics and reasoning and not get into
the details and technicalities of probabilistic results in themselves. I list some sources as
references throughout the text for related reading.
As mentioned earlier, principles in combinatorics play the role of theory in most other
areas in mathematics. Most experts are aware acquainted with these principles, and have
some other principles of their own2 these never see an explicit mention in books (though
some blogs like that of Tao or Gowers do a fabulous job there), and it is for a well-
founded reason: these principles are more akin to rule-of-thumb and a formal statement
attempting to put this in words will inevitably be an oversimplification that amounts to
an incorrect statement. But my opinion is, these simplistic heuristics go a great way in
laying a pathway, not just towards solving open conjectures, but also allow us to pose
newer interesting questions. Towards that end, I put as an encapsulation, one core princi-
ple from each chapter; each chapter’s title includes an epigram that attempts a heuristic
describes the underlying principle.
I thank all my students who very actively and enthusiastically acted as scribes for the
lectures over the years, and those scribed notes, formed the skeleton for this book.
2
á la Groucho Marx, perhaps.
7
2 Notation with asymptotics
This is a brief primer on the Landau asymptotic notation. Given functions f, g, we write
f (n)
f g (resp. f g) if lim → ∞ (resp. → 0). We also write f = o(g) to denote
n→∞ g(n)
that f g. We write f = O(g) (resp. f = Ω(g)) if there exists an absolute constant
C > 0 and n0 such that for all n ≥ n0 , |f (n)| ≤ C|g(n)| (resp. if |f (n)| ≥ C|g(n)|), and
f (n)
finally when we write f = Θ(g) then we mean f = O(g) and g = O(f ). If lim = 1,
n→∞ g(n)
then we write f ∼ g.
The main advantage of using this notation is (besides making several statements look
a lot neater than they would if written in their exact analytic form) that the asymp-
totic notation allows us to ‘invert’ some functions in an asymptotic sense when an exact
inversion is not feasible. But before we make that precise, here is a simple proposition
that follows easily from the definition of the notation.To make this precise, consider the
following example. Suppose n ≤ Ck log k. Then how do we get a lower bound for k in
terms on n?
Here is a simple trick (and this will be used repeatedly in the book). The given
inequality gives us
cn
k≥
log k
and since clearly k ≤ n (otherwise there is nothing to discuss further), this gives us
cn
k≥
log n
9
n
which is best possible asymptotically since if k = log n
then
n log log n
k log k = (log n − log log n) = n 1 − = n(1 − o(1)).
log n log n
One can use this idea more iteratively as well as we shall see in the book.
Computations with asymptotics is an art, and also needs a bit of practice before one
can get comfortable with it. We illustrate one case in a little more detail; this is the
calculation from the second chapter (which is omitted there). Suppose for a fixed k, we
k
wish to maximize n − nk 2−(2)+1 . Note that as n increases, this quantity eventually be-
comes negative, so one needs to find the optimal n for which this is large. Unfortunately,
the usual maxima/minima methods of calculus are not directly applicable here, so the
perspective is motivated more by an eye on the asymptotics.
Since the given quantity cannot exceed n, from an asymptotic perspective, we would
be happy if this quantity is at least as large as n/2. To see if we can achieve that, since
k k
k! ≥ ke , we have nk ≤ en
k
, so
n −(k2) en k
2 ≤ .
k k2(k−1)/2
Now set en k
4n = .
k2(k−1)/2
This gives
k
! k−1
k
n = 41/k 2(k−1)/2
e
k
= 41/(k−1) ( )k/(k−1) 2k/2
e
k2k/2
= (1 + o(1))
e
for k large.
4
4 4
be too concerned by constants, let us set d = 8 ss log 2 ss . Write Y = 2 ss ; then this
gives us d = 4Y log(Y ), and so by our discussions from earlier, this gives us Y = Ω logd d .
4
As for the asymptotics of Y , since ss ≤ (es3 )s ≤ s4s , we have s4s ≥ Ω( logd d ) so taking
10
log both sides gives us 4s log s ≥ log d(1− o(1)). Once again, by the same argument as
from earlier, this gives us s ≥ Ω logloglogd d .
11
3 The Basic Idea
We shall assume a basic familiarity with the notions of probability theory and graph
theory. As a good reference, we recommend [28] for probability theory, and [29] for graph
theory.
A fairly simple recursive upper bound on R(s, t) (proved inductively, and a good
exercise if you haven’t seen it before) is given by
13
which gives us
k+l−2
R(s, t) ≤
k−1
Explicitly, his construction goes as follows: take any set S, and turn the collection of all
3-element subsets of S into a graph by connecting subsets iff their intersection is odd. The
graph represents the red edges, and the non-edges are the blue edges. It is a not-entirely
trivial exercise to show that this coloring admits no monochromatic clique of size s.
There is a rather large gap between these two bounds; one natural question to ask,
then, is which of these two results is “closest” to the truth? Turán believed that the
correct order was of the order s2 . Erdős, in 1947, in a tour-de-force paper “’’ disproved
Turán’s conjecture in a rather strong form:
Proof. A lower bound entails a coloring of the edges of Kn using colors red and blue
with no monochromatic complete subgraph on s vertices. If one looks at Nágy’s exam-
ple, one is tempted to think of a ‘global’ recipe for coloring the edges in some manner
that witnesses the lack of sufficiently large monochromatic complete subgraphs. The
reason for this ‘global’ outlook is, if one starts with an ad-hoc coloring of the edges,
there seems to be plenty of leeway to color each edge one way or the other before our
color choices force our hand, and even on the occasions that they do, it is hard to see if
earlier choices could have been altered to improve upon the number we have so far. And
lastly, it is hard to see how this pattern (if one could use such a word here!) generalises
for large s. And in this conundrum of a situation, where local choices for edge colorings
do not seem clear, Erdős appealed to the principle explicated as the slogan of the chapter:
If you cannot think of anything clever, or worse, cannot think of anything, roll the die
and take your chances.
14
Fix n and consider a random 2-coloring of the edges of Kn . In other words, let us
n
work in the probability space (Ω, P r) = (all 2-colorings of Kn ’s edges, P r(ω) = 1/2( 2 ) .)
An alternate way of describing this would be to consider a random 2-coloring to be one
where each edge of Kn is independently colored red or blue with probability 1/2, since
there is no reason to prefer one color over another.
For some fixed set R of s vertices in V (Kn ), let AR be the event that the induced
subgraph on R is monochromatic. Then, we have that
n s n s
P(AR ) = 2 · 2( 2 )−(2) /2( 2 ) = 21−(2) .
Thus, we have that the probability of at least one of the AR ’s occurring is bounded
by
[ X n 1−(k2)
P( AR ) ≤ P(AR ) = 2 .
s
|R|=s R⊂Ω,|R|=s
s
If we can show that ns 21−(2) is less that 1, then we know that with nonzero probability
there is a 2-coloring ω ∈ Ω in which none of the bad events AR ’s occur! In other words,
we know that there is a 2-coloring of Kn that avoids both a red and a blue Ks , even
though we do not have such a coloring explicitly!
Solving, we see that
n 1−(2s) ns 1+(s/2)−(s2 /2) 21+s/2 ns
2 < ·2 = · s2 /2 < 1
s s! s! 2
whenever n = b2s/2 c, s ≥ 3.
15
This problem again seeks an orientation of the edges that achieves Sk for starters,
and when that is done, to see if this property has a universality to it. Note that unlike
the Ramsey problem it is not true that all sufficiently large tournaments have property
Sk . Indeed, a transitive tournament - a tournament where the players come seeded, and
all the games between them respect their rankings - clearly does not possess Sk since no
one beats the top ranked player.
For small k - 1, 2, 3 - one can answer these questions to some degree of satisfaction;
indeed, we can calculate values of Sk through ad-hoc arguments:
For k = 4, constructive methods have yet to find an exact answer. Indeed, constructive
methods have been fairly bad at finding asymptotics for how these values grow. And
again, anyone who takes a stab at this problem, realises very quickly that the fundamental
problem here is one of choice; there does not seem to be a canonical choice for orienting
edges one way or another, for each edge, and again, as with the Ramsey problem, it is
hard to unravel which choices lead to what outcomes. And so, we bring out the maxim
once more.
Proof. Consider a random tournament: in other words, for every edge (i, j) of Kn direct
the edge i → j with probability 1/2 and from j → i with probability 1/2. Again, this
uniformity in choosing the edge orientation reflects our ambiguity for not preferring either
direction.
Fix a set S of k vertices and some vertex v ∈ / S. What is the probability that v has
an edge to every element of S? Relatively simple: in this case, it’s just 1/2k , so that the
probability that v fails to have a directed edge to each member of S is 1 − 1/2k . We shall
notate this by event as v 6→ S.
16
n
There are k
-many such possible sets S; so, by using the union bound again, we have
n n−k
P(There exists S such that for all v ∈
/ S, v 6→ S) ≤ · 1 − 1/2k .
k
As before, it suffices to force the right-hand side to be less than 1 as this means that
there is at least one orientation of the edges of Kn on which no such subsets S exist – i.e.
that there is a tournament that satisfies Sk . k
This takes us into a world of approximations. Using the approximations nk ≤ en
k
and 1 − x ≤ e−x , we calculate:
n−k
−1/2k
e <1
en k k
⇔ < e(n−k)/2
k
⇔k(1 + log(n/k)) · 2k + k < n
Motivated by the above, take n > 2k · k; this allows us to make the upper bound
so, if n > k 2 2k log(2) · (1 + O(1)) we know that a tournament on n vertices with property
Sk exists.
Remark: The asymptotics of this problem are still not known exactly. However, it
is known (as was shown by Szekeres) that a tournament on n players satisfying Sk needs
ck2k vertices for some absolute constant c > 0.
As we move to our next few instances of the basic method, we introduce the basic
tool that gets the probabilistic method going. For a real -valued random variable X on
a finite probability space, the Expectation of X (denoted E(X)) is defined as
X
E(X) := xP(X = x).
x∈R
Note that since the sum is finite since the probability space is finite.
The expectation is the first important tool that one plays with, in this process, and
one of the reasons it is a useful and simple quantity to play with, is that for random
17
variables X, Y on the same space, E(X + Y ) = E(X) + E(Y ).1 The main handle the
expectation gives us is the following: If E(X) ≥ α, then with positive probability, X ≥ α.
A similar statement holds for other inequalities as well.
A philosophical point before we get into applications of the idea. Why is the expec-
tation a useful tool? Here is a heuristic: An expectation computation, in some sense,
enumerates ordered pairs, and the formal definition for the expectation, fixes one of the
parameters of the ordered pair, and enumerates over the other parameter while fixing
the former. But one of the oldest combinatorial insights is that one might interchange
the order of enumeration and this principle allows us to reinterpret the same computa-
tion from another perspective. This has the advantage of making what seems a ‘global’
computation, the sum of ‘local’ computations.
Theorem 6. Every set of n nonzero integers contains a sum-free subset of size ≥ n/3.
Proof. For ease of notation, let us write B = {b1 , . . . bn }. Firstly, (and this is now a
standard idea in Additive Combinatorics), we note that it is easier to work over finite
groups than the integers, so we may take p large so that all arithmetic in the set A (in Z)
may be assumed to be arithmetic in Z/pZ. Furthermore, if we assume that p is prime,
we have the additional advantage that the set is now a field, which means we have access
to the other field operations as well. Thus we pick some prime p = 3k + 2 that’s (for
instance) larger than twice the maximum absolute value of elements in B, and look at B
modulo p – i.e., look at B in Z/pZ. Because of our choice of p, all of the elements in B
are distinct mod p.
Now, look at the sets
xB := {xb : b ∈ B} in Z/pZ,
and let
We are then looking for an element x such that N (x) is at least n/3. Why? Well, if
this happens, then at least a third of xB’s elements will lie between p/3 and 2p/3; take
those elements, and add any two of them to each other.This yields an element between
2p/3 and p, and thus one that’s not in our original third; consequently, this subset of
1
In more fanciful terms, expectation as an operator on the space of random variables is linear operator
with operator norm 1.
18
over a third of xB is sum-free. But this means that this subset is a sum-free subset of B,
because p was a prime; so we would be done.
So, the question again is: Is there a clever way of choosing an x that would optimally
bring a big chunk of xB into the middle? Not really. So, let’s just roll the dice - pick
x uniformly (there is no reason to pick one element more than another) at random, and
examine the expectation of N (x):
k+1
1x·b∈[k+1,2k+1] = n ·
X
E(N (x)) = ≥ n/3.
b∈B
3k + 1
Thus, some value of x must make N (x) exceed n/3, and thus insure that a sum-free
subset of size n/3 exists.
Remark: One can ask the same question more generally on an arbitrary abelian
groups, and there, the corresponding constant is 2/7 (see [2]). For the integers, it remained
a hard problem to determine if the constant 1/3 could be improved, and as it turns out,
1/3 is indeed the best possible constant (see [12]).
Here, we shall consider the following specific problem. Let Fq denote the finite field
of order q, and let us denote the vector space F3q over Fq by V . Let P be the set of 1-
dimensional subspaces of V and L, the set of 2-dimensional subspaces of V . We shall refer
to the members of these sets by points and lines, respectively. The Levi graph of order
q, denoted by LGq , is a bipartite graph defined as follows: V (LGq ) = P ∪ L, where this
describes the partition of the vertex set; a point p is adjacent to a line ` if and only if p ∈ `.
The choice of terminology of ‘lines’, ‘points’ is because the pair (P, L) is a Projective
plane of order q, so every pair of points lie on a unique line, and every pair of lines have
19
a unique common point. For more, we refer the reader to [15], for instance.
The fundamental theorem of projective geometry [15] states that the full group of
automorphisms of the projective plane P G(2, Bq ) is induced by the group of all non-
singular semi-linear transformations P ΓL(F3q ). If q = pn for a prime number p, P ΓL(F3q ) ∼
=
3 3 ∼ 3
P GL(Fq ) o Gal(Fq /Fp ). In particular, if q is a prime, we have P ΓL(Fq ) = P GL(Fq ), so
P ΓL(F3q ) is a subgroup of the full automorphism group of LGq . The full group is larger
since it also includes maps induced by isomorphism of the projective plane with its dual.
Proof. First, let us see why 2 colors will not do. It is easy to see that LGq is connected, so
the only proper colorings correspond to the vertex partition (P, L). But every non-trivial
map A ∈ P GL(F3q ) induces an automorphism of LGq which keeps the two color classes
intact, and that establishes that χD (LGq ) > 2.
is contained within the same color class. And this happens for every automorphism.
Thus, a partition of L such that for at least one of the automorphisms induced by
P ΓL(F3q ), this property of all its orbits being within the same part is violated. This sug-
gests a random partition of L, i.e., for each ` ∈ L, set it in L1 or L2 independently, and
uniformly. A bad event in this context would be the presence of a nontrivial automor-
phism that maps both these partitions into themselves, or in terms of the observation
above, a bad event Eφ is the event where for each ` ∈ L, the orbit of ` is contained
entirely in the part Li containing it. This idea can be captured more generally as follows.
Suppose a graph G is given a vertex coloring using χ(G) colors and suppose C1 is one
of its color classes. Let G be the subgroup of Aut(G) consisting of all automorphisms that
C1 as a set. For each A ∈ G, let θA denote the total number of distinct orbits induced
by the automorphism A in C1 . Fix an integer t ≥ 2, and partition C1 randomly into t
parts, i.e., for each v ∈ C1 , pick uniformly and independently, an element in {1, 2, . . . , t}
and assign v to the corresponding part.
For φ ∈ G, let Eφ denote the event that φ fixes every color class. Observe that if φ
fixes a color class containing a vertex v, then all other vertices in the set orbφ (v) are also
20
in the same color class. Moreover the probability that Orbφ (v) is in the same color class
of v, equals t1−|Orbφ (v)| . Then
Y
P(Eφ ) = t1−|Orbφ (v)| = tθφ −|C1 |
θφ
Let N ⊂ G denote the set of all automorphisms which fixes each of the t parts that were
partitioned, and let N = |N |. Then note that
X 1
E(N ) ≤ (3.1)
φ∈G
t|C1 |−θφ
If E(N ) ≤ f (G) := A∈G tθA −|C1 | < r, where r is the least prime dividing |G|, then with
P
positive probability N < r. Since N is in fact a subgroup of G, N divides |G|, so it
follows that with positive probability, N = 1, which means, we have a distinguishing
proper coloring using χ(G) + t − 1 colors.
Define
Thus, if F (C1 ) < |C1 | − 2 logt |G| then there exists a distinguishing proper χ(G) + t − 1
coloring of the graph.
Let us return to our setting. Set G = P GL(F3q ). It is a simple exercise to check that
every A ∈ P GL(F3q ) which is not the identity fixes at most q + 2 points of LGq . Hence
(q 2 + q + 1) − (q + 2) q 2 + 2q + 3
θA ≤ q + 2 + = .
2 2
Consequently,
(q 8 − q 6 − q 5 + q 3 )
f (G) < +1 (3.5)
t(q2 +1)/2
For q = 7, t = 2, the right hand side of (3.5) is approximately 1.16. Since the right hand
side of inequality (3.5) is monotonically decreasing in q, it follows that f (G) < 2 for q ≥ 7,
hence χD (LGq ) ≤ 3.
21
Remark: It turns out that χD (LG5 ) = 3 as well, and this again follows the same
argument. The only difference is that the analysis above does not work, and one needs
to explicitly compute f (G) In this case, for t = 2 one calculate f (G) explicitly (using a
computer program) to obtain f (G) ≈ 1.2 to see that χD (LG5 ) = 3. For more results on
the distinguishing chromatic number, see [7].
One easy strategy to achieve a 50% success is if one of the friends takes a random
guess, and the others pass on their guess (“I don’t know the color of my hat”). The friends
wish to devise a strategy that increases the probability of their getting the reward. The
question is, do they have a better strategy? Again, the question is to be viewed as one
that deals with n large but fixed.
First, let us formalize the problem. Denoting white and black by 1 and 0 respec-
tively, any configuration of hats (on the friends’ heads) is a point in {0, 1}n . Thus the ith
member witnesses a vector xi = (xi,1 , . . . , xi,i−1 , ∗, xi,i+1 , . . . , xi,n ), and these vectors are
compatible in the sense that for all distinct i, j and k 6= i, j, we have xi,k = xj,k .
Suppose there exists a set L ⊂ {0, 1}n such that for every element (x1 , x2 , . . . , xn ) ∈
W = {0, 1}n \ L there is an element (y1 , y2 , . . . , yn ) ∈ L such that the set {i | xi 6= yi }
has size 1; call such a set L desirable. The upshot is the rather crucial observation that a
desirable set allows us to strategize as follows. Person i knows xj for all j 6= i, so if there
is a unique value of xi so that (x1 , x2 , . . . , xn ) ∈ W then person i declares that her hat
color is xi .
This allows the friends to argue as follows. If they have a desirable set on their hands,
then the strategy outlined above works unless the color choice profile of the hats cor-
responds to a point of L. Consequently, the probability that the friends get the award
following this strategy equals 1 − |L|
2n
. Thus to maximize this probability, they need a
‘small’ desirable set L. And since there is no canonical choice here, we shall pick it ran-
domly.
22
Pick a set X by choosing each element of {0, 1}n independently. But this time, the
probability distribution is not clear. Picking each element with probability 1/2 as in
the preceding examples would result in a very large set with high probability - this will
become more formal in later chapters. So for the moment, let us pick each such element
with probability p, where p is a parameter that is to be determined later.
For a fixed x ∈ {0, 1}n , let 1x∈L denote the random variable that takes value 1 if
x ∈ L and 0 otherwise. Note that E(1x∈L ) = P(1x∈L ). By linearity of expectation,
1x∈L ) = E(1x∈L ) = P(1x∈L ) = 2n p.
X X X
E(|X|) = E(
x∈{0,1}n x∈{0,1}n x∈{0,1}n
Let Y be the set of elements which differ from the chosen elements in at least two
coordinates. For a fixed x ∈ {0, 1}n , it is easy to see that x ∈ Y is equivalent to saying
that no element in the ‘ball’2 B(x) consisting of all the elements which differ from x in
at most one coordinate are not chosen into X. Since |B(x)| = n + 1 (the element x and
the n elements that differ from x in exactly one coordinate) we have
X
E(|Y |) = P(x ∈ Y ) = 2n (1 − p)n+1 .
x∈{0,1}n
so But now consider the set L = X ∪ Y ; this is indeed a desirable set! Furthermore,
E(|L|) = E(|X| + |Y |) = 2n (p + (1 − p)n+1 ). Minimizing this over p ∈ [0, 1] (basic calcu-
1
lus), gives us p = 1 − (n+1)1/n .
Remark: A concise (and low complexity) description of optimal sized desirable sets
is possible for n of the form 2k , and this is through what are known as Hamming codes.
One can also prove similar results when the friends are assigned hats that may take any
one of q different colors.
2
This term is not a loose one. The Hamming distance d( x, y) which counts the number of coordinates
where x, y differ is indeed a legitimate metric
23
3.6 The 1-2-3 theorem
The following question was first posed by Margulis: Given i.i.d random variables X, Y
according to some distribution F , is there a constant C (independent of F ; that is the
important thing) such that
Note that it is far from obvious that such a C < ∞ must even exist. However, it is
easy to see that such a C must be at least 3. Indeed, some X, Y are uniformly distributed
on the even integers {2, 4, . . . , 2n} then it is easy to check that P(|X − Y | ≤ 1) = 1/n
and P(|X − Y | ≤ 2) = n3 − n22 . It was finally proved by Kozlov in the early 90s that the
constant C = 3 actually works. Alon and Yuster shortly thereafter gave another proof
which was simpler and had the advantage that it actually established
for any positive integer r ≥ 2 which is also the best possible constant one can have for
this inequality. We shall only show the weaker inequality with ≤ instead of the strict
inequality. We shall later give mention briefly how one can improve the inequality to the
strict inequality though we will not go over all the details.
Proof. The starting point for this investigation is based on one of the main tenets of
Statistics: One can estimate (well enough) parametric information about a distribution
from (large) finite samples from the same. In other words, if we wish to get more infor-
mation about the unknown F , we could instead draw a large i.i.d sample X1 , X2 , . . . , Xm
for a suitably large m and then the sample percentiles give information about F with
high probability. This is in fact the basic premise of Non-parametric inference theory.
So, suppose we did draw such a large sample. Then a ‘good’ estimate for P(|X − Y | ≤
1) would be the ratio
{(i, j) : |Xi − Xj | ≤ 1}
.
m2
A similar ratio, namely,
{(i, j) : |Xi − Xj | ≤ r}
m2
should give a ‘good’ estimate for P(|X − Y | ≤ r). This suggests the following question.
Question 8. Suppose T = (x1 , x2 , . . . , xm ) is a sequence of (not necessarily distinct)
reals, and Tr := {(i, j) : |xi − xj | ≤ r}. Is it true that |Tr | ≤ (2r − 1)|T1 |?
If this were false for some real sequence, one can consider F appropriately on the
numbers in this sequence and maybe force a contradiction to the stated theorem. Thus,
24
it behooves us to consider this (combinatorial) question posed above.
Let us try to prove the above by induction on m. For m = 1 there is nothing to prove.
In fact, for m = 1 one in fact has strict inequality. So suppose we have (strict) inequality
for r − 1 and we wish to prove the same for r.
Fix an i and let T 0 = T \ {xi }. Consider the interval I := [xi − 1, xi + 1] and let
SI = {j|xj ∈ I}, and let |SI | = s. Then it is easy to see that
|T1 | = |T10 | + (2s − 1).
Now in order to estimate |Tr |, note that we need to estimate the number of pairs (j, i)
such that |xi − xj | ≤ r. Suppose i was chosen such that |SI | is maximum among all
choices for xi . Then observe that if we partition
[xi −r, xi +r] = [xi −r, xi −(r−1)) · · · , [xi −2, x1 −1), [xi − 1, xi + 1], (xi +1, xi +2], · · · , (xi +(r−1), xi +r]
as indicated above, then in each of the intervals in this partition there are at most s
values of j such that xj is in that corresponding interval. This follows by the maximality
assumption about xi .
In fact, a moment’s thought suggests a way in which this estimate can be improved.
Indeed, if we also choose xi to be the largest among all xk that satisfy the previous
criterion, then note that each of the intervals (xi + l, xi + (l + 1)] can in fact contain at
most s − 1 xj ’s. Thus it follows (by induction) that
|Tr | ≤ |Tr0 |+2(r−1)s+(2s−1)+2(r−1)(s−1) < (2r−1)|T10 |+(2r−1)(2s−1) = (2r−1)|T1 |.
This completes the induction and answers the question above, in the affirmative, with
strict inequality.
Now, we are almost through. Suppose we do sample i.i.d observations X1 , X2 , . . . , Xm
from the distribution F , and define the random variables T1 := {(i, j) : |Xi − Xj | ≤ 1}
and Tr := {(i, j) : |Xi − Xj | ≤ r} , then note that
X
E(T1 ) = P(|Xi − Xj | ≤ 1) + m = (m2 − m)p1 + m,
i6=j
25
As mentioned at the beginning, Alon and Yuster in fact obtain strict inequality.
We shall briefly describe how they go about achieving that. They first prove that if
pr = (2r − 1)p1 , then if we define pr (a) = P(|X − a| ≤ r) there exists some a ∈ R such
that pr (a) > (2r − 1)p1 (a). Once this is achieved, one can tweak the distribution F as
follows.
Let X be a random variable that draws according to the distribution F with probabil-
ity 1 − α and picks the number a (the one satisfying the inequality pr (a) > (2r − 1)p1 (a))
with probability α for a suitable α. Let us call this distribution G. Then from what we
(G) (G) (G)
just proved above, it follows that pr ≤ (2r − 1)p1 . Here pr denotes the probability
pr = P(|X − Y | ≤ r) if X, Y are picked i.i.d from the distribution G instead. However,
(G)
if we calculate these terms, we see that pr = pr (1 − α)2 + 2α(1 − α)pr (a) + α2 , so the
above inequality reads
As we complete this chapter, we leave the reader with an important caveat. The power
of the probabilistic method becomes more evident only when one runs out of ideas, so
to speak. Where one has a deterministic argument that seems to work, the probabilistic
method is not to be unsheathed as it will be suboptimal. To illustrate this point, suppose
[n] := {1, 2, . . . , n}, and suppose we wish to show that for n ≤ m, there is an injective
function from [n] to [m]. We pick a random function φ which maps for each x ∈ [n] a
uniformly random member of [m] as its image, and independently for x ∈ [n]. For a fixed
pair x, y ∈ [n], the probability that x and y are mapped to the same element in [m] by φ
(n)
is 1/m. Hence the union bound tells us that if m2 < 1 then with positive probability, the
random function φ is injective. In other words, our methods of this chapter only testify
towards the existence of an injective function for m ≥ cn2 , and the suboptimality of this
conclusion is evident to all. One may ask if this suboptimality is due to the union bound,
or if the reason is something subtler. We shall return to this point in a later chapter.
26
4 Managing Expectations and the Markov
bound
As we saw in the latter half of the preceding chapter, Expectation of a random variable
is one of the most useful tools within the Probabilistic paradigm. One of the reasons the
expectation appears a very natural tool is because most combinatorially relevant functions
can be regarded as random variables that tend to get robust with a larger population, so
the expected value gives an idea of where a ‘typical’ observation of the random variable
lies and that is often a very useful start. For instance, suppose our random variable in
question counts the number of undesirable events. Having its expected value less than
one is good for our cause. But even if that is not the case, having a low value of its
expectation has useful consequences - it instantiates the existence of a configuration with
very few undesirable events.
There is another important feature that the expectation computation provides. While
the expectation tells us that the random variable in question can take small/large values
relative to its expected value, it does not tell us how likely such an outcome might be.
Here is a concrete instantiation of the same. Consider a random variable X that takes
the value n2 with probability 1/n and 0 with probability 1 − 1/n (for n large). The
expected value is n and yet the random variable itself is non-zero with very low probabil-
ity. Of course, this example also illustrates that one needs a relative perspective on what
large/small ought to be. But if the expected value of a non-negative random variable is
small, then the random variable is not very likely to take large values.
27
4.1 Revisiting the Ramsey Number R(n, n)
Let us revisit the problem of the lower bounds for R(n, n). As usual, color the edges of
the complete graph Kn red or blue with equal probability, and independently for distinct
k
edges. Then the expected number of monochrome copies of Kk is m := nk 2−(2)+1 . Thus
there is a coloring of the edges in which there are at most m monochrome copies of Kk .
Now, from each such monochrome copy, delete a vertex; then the resulting graph on n−m
k
vertices has no monochrome Kk ! Thus we get R(k, k) > n − nk 2−(2)+1 .
Now, to see if this improves upon our earlier bound, we need to do some calculus.
If m = n/2, then we get R(k, k) > n/2. Some routine asymptotics (see the chapter on
asymptotics for the detail) give us
k
R(k, k) > (1 + o(1)) 2k/2
e
for large k.
The interesting feature of this theorem is that the ambient dimension n does not
feature at all! The only thing that matters is how good an approximation we are hoping
1
P A convex combination of x1 , . . . , xt is a sum of the form λ1 x1 + · · · + λt xt with λi ≥ 0 for all i and
i i = 1.
λ
2
The diameter of a set A is defined as the supremum of the distances between pairs of points on A.
28
to find. This is again a feature that appears on more than one occasion when one
encounters randomized methods.
Proof. Without loss of generality, by translating T , we assume that the radius of T is at
most 1, i.e., for all x ∈ T , we assume that kxk ≤ 1. Let x be a point in the convex hull of
T . By Caratheodory’sP theorem, there exist x1 , . . . , xm with m ≤ n + 1 such that x = λi xi
where λi > 0 and i λi = 1. The λi summing to one suggests a natural random variable.
Let y bePthe (vector-valued) random variable with P(y = xi ) = λi for i = 1, . . . , m. Then
E(y) = i λi xi = x. Here the expectation is taken coordinate-wise.
Picking from the general principle that an average of i.i.d (independent and identically
distributed) random variables converges to its expected value, it is somewhat natural
to consider y1 , . . . , yk independent and distributed as y above, and look at the average
z := k1 (y1 + · · · + yk ). Clearly, E(z) = x. To see how well this fares, let us compute (or
estimate) E(kz − xk2 . For the random variable y,
= E kyk2 − kxk2
Xm
λi kxi k2 − kxk2
=
i=1
≤ 1.
1
Pk
Set zi = yi − x so that z − y = k i=1 zi . Then
2
P
2 Ek i zi k
Ekz − xk =
k2 !
k
1 X X
= 2 Ekzi k2 + 2 Ehzi , zj i
k i=1 i<j
1
≤
k
where the last inequality comes from the preceding calculation and the fact that zi , zj
are independent which consequently gives Ehzi , zj i = 0 for all i < j. In particular, this
computation tells us that there exist k vectors yi for which kz − xk2 ≤ 1/k and z as
above. The rest is a routine consequence.
Remark: The theme of approximations is closely tied with the probabilistic paradigm,
and we will encounter the motif several times in the book.
29
that as the number of edges in a graph increases (proportional to the total possible num-
ber of edges) then its girth can go down dramatically. So, for a fixed parameter k, the
following extremal problem is both natural and of great interest to the extremal combi-
natorist: What is the maximum possible number of edges in a graph on n vertices with
girth at least k?
The following simple argument gives an upper bound. Suppose the graph G has
minimum degree at most d. Set ` = k−2 2
. If the girth of G is at least k, then for any
vertex v, the subgraph induced on the `-fold neighborhood centered at v, i.e., the set of
vertices at a distance of at most ` from v, is a tree This follows since if this was not a
tree, then there is a cycle contained in this graph. However, since any vertex w is at a
distance of at most ` from v, the size of the cycle is at most 2` + 1 < k contrary to the
assumption. Since each vertex has degree at least d, this induced subgraph has at least
`
d (d − 1) − 1
1 + d + d(d − 1) + · · · + d(d − 1)`−1 = 1 +
d−2
vertices.
Now for a given graph G with average degree d, note that if we remove a vertex of
degree at most k (for some k - we’ll see what k to set) then the modified graph has
average degree at least nd−2k
n−1
. If we set k = d2 then this last expression is at least d.
In other words, if we delete a vertex of degree at most d/2 then the average degree of
the graph does not decrease in this process. Consequently, this process must eventually
terminate and when it does, every vertex has degree at least d/2. It is a simple exercise
to show that if the average degree of a graph is at least Cn2/(k−2) for a suitably large C
then G must have a cycle of size at most k − 1. In other words, the maximum number of
2
edges in a graph with girth at least k is O(n1+ k−2 ).
Theorem 10. For a given integer k ≥ 4 and n somewhat large, there exist graphs on n
1
vertices with girth at least k with Ω(n1+ k−2 ) edges.
Proof. To establish a lower bound, one needs to construct a graph with girth at least k and
as many edges as possible. Like in the Ramsey problem, the small cases for k (k = 3, 4 for
starters) are well studied; indeed one knows the best possible bounds in these cases. But
in the general case, the specificity of the examples in the smaller cases makes it harder to
generalize. And so Erdős did what came to him naturally; he picked the graph at random.
Let us construct a random graph where each edge is chosen independently with prob-
ability p where p (as in a previous example) will be determined later. The ‘bad instances’
here are incidences of small cycles. Indeed, for 3 ≤ t ≤ k − 1 let Nt denote the number
30
of t-cycles in G. Then
n(n − 1) · · · (n − t + 1) t (np)t
E(Nt ) = p <
2t 2t
since every cyclic permutation of size t counts a particular t-cycle exactly 2t times - the
first vertex is picked in one of t ways, and the orientation in one of two possible ways.
This gives
X (np)3 + · · · + (np)k−1 (np)k−1
E( Nt ) ≤ ≤ .
3≤t≤k−1
6 3
n pn2
On the other hand, the expected number of edges of G is E(e(G)) = 2 3
for n large
enough.
The key insight again of Erdős was: If the number of small cycles is not that large,
say, it is at most half the total number of edges, then one may throw away one edge from
each of the small cycles, thus eliminating all small cycles, and yet, retaining at least half
the total number of edges. This suggests, that if
n2 p
(np)k−1 ≤
2
so that there is an instance with this inequality holding. That gives us a lower bound on
the number of edges of a graph with girth at least k.
The computation now is straightforward. We leave it to the reader to see that the
c
inequality we have forced gives us p = n(k−3)/(k−2) (for a small constant c > 0) which in
1
turn gives us a bound of e(G) = Ω(n1+ k−2 ).
Remark: As it turns out, the random construction here is not best possible, and the
considered opinion of the experts in extremal combinatorics, is that the exponent that
appears in the upper bound is the truth. However that remains an open problem. The
best known bound is a remarkable algebraic construction by Lazebnik, Ustimenko and
4
Woldar [19] which gives a lower bound of Ω(n1+ 3(k−2)−+ε ) which, in terms of the exponent
of n is ‘halfway’ between the randomized construction and the simple upper bound. This
again reinforces the caveat: If one can, one should strive for non-random constructions.
3
Again, the linearity is key here.
31
4.4 List Chromatic Number and minimum degree
The list chromatic number is a notion introduced by Erdős, Rubin and Taylor in their
seminal paper that sought to address what was called the ‘Dinitz problem’. This variant
of the usual chromatic number goes as follows. For a graph G, let L = {Lv |v ∈ V (G)}
be a collection of subsets of some set C indexed by the vertices of G. These are to be
interpreted as lists of colors assigned to each vertex. An L-coloring of G is an assignment
of an element χ(v) ∈ Lv for each v ∈ V such that if u and v are adjacent vertices in
G, then χ(u) 6= χ(v). In the parlance of colorings, this is a choice of color assignments
to each vertex such that no two adjacent vertices are assigned the same color. The list
chromatic number of G, denoted χl (G), is the smallest k such that for any family L with
|Lv | ≥ k for all v, G is L-colorable. It is not hard to see that this is well defined, and in
fact, the usual chromatic number of G corresponds to the case where all the vertex lists
are identical. The next result shows that the reverse inequality need not hold.
The list chromatic number is a very interesting invariant for a host of reasons. One
natural way to motivate this notion is the following. Suppose we attempt to properly
color the vertices of a graph using colors, say, 1, . . . , k and to suppose we are given a
partial coloring. For each uncolored vertex v, let Dv denote the set of colors that appear
among any of its neighbors from the partial coloring. Then the partial coloring extends
to a proper k coloring of the graph if and only if the induced graph on the remaining
uncolored vertices is L-colorable where Lv := [k] \ Dv . So in that sense, list colorings
arise quite naturally in connection with proper colorings.
As was observed in the seminal paper of Erdős, Rubin, and Taylor, there are bipartite
graphs with arbitrarily large list chromatic number.
2k−1
Theorem 11. (Erdős, Rubin, Taylor) χl (Kn,n ) > k if n ≥ k
.
Proof. We wish to show there is some L = {Lv |v ∈ V (G)} with |Lv | = k for each
v ∈ V (G) such that Kn,n is not L-colorable. Let A and B denote the two partition
classes of Kn,n , i.e., the two sets of vertices determined by the natural division of the
complete bipartite graph Kn,n into two independent subgraphs.
Now we construct L. Take the set of all colors from which we can construct Lv ’s
to be {1, 2, ..., 2k − 1}. Since n ≥ 2k−1 k
, which is the number of possible k-subsets
of {1, 2, ..., 2k − 1}, we can choose our Lv ’s for the v’s in B so that each k-subset of
{1, 2, ..., 2k − 1} is Lv for some v ∈ B, and similarly we choose lists for vertices of A.
If S is the set of all colors that appear in some Lv with v ∈ B, then S intersects every
k-element subset of {1, 2, ..., 2k − 1}. Then we must have that |S| ≥ k (since otherwise
its complement has size ≥ k and thus contains a subset of size k disjoint from S). But
then since |S| ≥ k, by choice of lists there exists a ∈ A with La ⊂ S. Since a is adjacent
to every vertex in B, so no L-coloring is possible.
32
Another interesting feature of the list chromatic number is the following result due to
Alon.
This is quite at variance with the usual chromatic number since one has bipartite
graphs with arbitrarily large minimum degree.
Proof. If the result holds, then it also holds in the case the chromatic number is two
(that is the first nontrivial case), so let us first assume that the graph is bipartite with
partition classes A and B, and |A| ≥ |B|.
We shall assume that the minimum degree is sufficiently large (for asymptotic rea-
sons). In order to establish a lower bound of the form χl (G) > s, we need to assign lists
of size s to each vertex and ensure that from these lists, a list coloring is not possible.
Suppose C is a set of colors from which we shall allocate lists to each of the vertices of G.
Without loss of generality, let C := {1, 2, ...., L} for some L to be fixed/determined later.
How do we show that a vertex a ∈ A cannot be colored from its list? Let us take a
cue from the previous result: Suppose a vertex a ∈ A has, among the lists assigned to
its neighbors in B, all the possible s-subsets of C. Consider a choice of colors assigned
to the vertices of B from their respective lists, and let W be the set of colors that are
witnessed by this choice. Note that since the neighbors of a witness all possible s-subsets
of C, W ∩ S 6= for all S ⊂ C of size s, so that in particular, |W | ≥ L − s + 1. If this choice
extends successfully to a choice for a, then La must contain an element from a very small
set, viz., C \ W , which has at most s − 1 colors of C. Now, if there are several such vertices
a ∈ A (i.e., that witness every s-subset as the list of one of its neighbors) then this same
criterion must be met by each of these vertices. And that is not very likely to happen if
we were to allot random lists to the vertices of A! This potentially sets up a contradiction.
Let us set this in motion. Call a vertex a ∈ A critical if among its neighbors, all
possible s-subsets of C appear. To achieve this, assign for each b ∈ B, the set Lb to be
an s-subset of C uniformly at random and independently over different vertices. Then
the probability that a is not critical is equal to the probability that there exists some
s-subset T of C such that no neighbor of a is assigned T as its list. Since there are Ls
possible T ’s it follows by the union bound that
!d
L 1 L −d/(Ls)
P (a is not critical) ≤ 1 − L ≤ e .
s s
s
33
Now assume that d Ls . Then by the above, P (a is not critical) < 12 . So if N
Thus there exists an assignment of lists for vertices in B, {Lv |v ∈ B}, such that the num-
ber of critical vertices is greater than |A|
2
. Fix these choices for the lists for the vertices
of B.
Fix a color palette w from these assigned lists, i.e., a choice of an element each from
the collection {Lv |v ∈ B}. Denote as W = W (w) the set of colors that appear among
the vertices on B from the palette w.
Since there exists critical a ∈ A, W has nonempty intersection with all s-subsets of
[L], so |W | ≥ L − s + 1. If an extension of w to a coloring to a exists for a critical vertex
a, then as we observed earlier, exists, La ∩ W 6= ∅.
Since we haven’t yet dealt with the color lists for A, let us pick color lists for the
vertices of A uniformly at random from the s-subsets of C. Then for a critical a ∈ A
(s − 1) L−1
s−1 s2
P (w extends to a exists) ≤ L
< .
s
L
|A|/2 12 !|B|
s2 s2
P (an extension to a coloring of G exists) ≤ s|B| ≤ s
L L
q
s2
which is less than 1 if s < 1, or equivalently, if L > s4 , so set L = 2s4 .. Recall the
L
L
assumption made earlier that d Ls . We needed this to make Ls e−d/( s ) < 21 , which is
In summary, if 4 4
2s 2s
d≥4 log 2 ,
s s
then there is a collection of lists L = {Lv |v ∈ G} with |{Lv }| = s for all v ∈ G such that
no L-coloring of G exists, i.e., χl (G) > s. Again, arriving at the lower bound as stated
in the theorem is a good exercise in asymptotics. For the precise details, see the first
34
chapter on asymptotics.
This is all under the assumption that G was bipartite. But it is a very simple fact
that every graph contains a ‘large’ bipartite subgraph:
Lemma 13. For any graph G, there exists a subgraph H of G with V (H) = V (G) such
that H is bipartite and dH (v) ≥ 21 dG (v) for all v ∈ V (G).
To see why, consider a partition of the vertex set into two parts so that the number of
crossing edges (across the partition) is maximized. It is now a straightforward observation
to see that for every vertex at least half of its neighbors must be in the other part since
otherwise we can move that vertex to the other part. This partition in particular produces
a bipartite subgraph H as stated. This completes the proof of the lemma and the theorem
as well.
Alon later improved his bound to χl (G) > ( 12 − o(1)) log d with d = δ(G). We shall
show a proof of a slightly weaker form of this result (χl (G) ≥ c log d for some constant
c > 0 in a later chapter by a different probabilistic paradigm. Alon also conjectured that
χl (G) ≤ O(∆(G)) where ∆(G) denotes the maximum degree of G. That remains an open
problem.
We now move on to the next principle outlined in the introduction of this chapter,
and that is this fairly easy inequality.
E(X) − a
P(X ≥ a) ≤ .
M −a
The idea of proof is straightforward. Write
Z Z
E(X) = X dP + X dP
x<a x≥a
≤ aP(X ≤ a) + M (1 − P(X ≥ a))
35
4.5 The Varnavides’ averaging argument
An old conjecture due to Erdős and Turán was the following: Given ε > 0 and an in-
teger k ≥ 3, there exists N0 := N0 (ε, k) such that the following holds: If N ≥ N0 and
A ⊂ {1, . . . , N } with |A| ≥ εN then A contains a k-term arithmetic progression (k-AP
for short). This conjecture came on the heels of the theorem due to van der Waerden
that states that given positive integers, r, k there exists an integer W (r, k) such that if
N ≥ W , then any r-coloring of the integers in [N ] := {1, . . . , N } necessarily contains a
monochromatic k-AP. The Erdős-Turán conjecture basically captures the intuitive idea
that van der Waerden’s theorem holds because it does so on the most popular color class.
This conjecture was settled in the affirmative for k = 3 by Roth, and then later in its full
generality by Szemerédi.
What we are after (following Varnavides) is a generalization of this result. The state-
ment we aim to prove is that, if N is sufficiently large, and |A| ≥ εN then A in fact
contains as many k-APs as there can be (upto a constant multiple):
Theorem 16. (Varnavides) Given ε > 0 and k ≥ 3, there exists δ := δ(ε, k) such that
the following holds. Any subset A ⊂ [N ] of size at least εN contains at least δN 2 APs of
length k.
Note that the total number of possible k term APs in [N ] is determined by specifying
the first term, and the common difference, so there are at most N 2 k-APs in all. Thus,
this theorem is best possible up to the constant.
Proof. By Szemerédi’s result, we know that there is an N0 := N0 (ε, k) such that any
subset A ⊂ [N0 ] of size at least εN0 contains at least one k-AP. The simplest thing one
can imagine doing is, cutting the set [N ] into linear chunks of length N0 ; clearly at least
one of these chunks must meet A in at least an ε proportion of its size. That unfortu-
nately does not give us anything new. But one thing is does suggest is, the number of such
chunks that have a reasonable proportion of A in them will each give us one distinct k-AP.
But this breaking into chunks is a little too wasteful. Every AP of size N0 is again a
model for the set [N0 ] so one might want to consider all possible APs of length N0 and
see how many among them meet A in a significantly large portion. Of course, the same
k-AP might be a part of several such N0 -APs, so there is a double counting issue to sort
out anyway. But that immediately suggests the following.
Pick an N0 -AP at random, i.e., pick x0 , d uniformly and independently from [N ] with
d 6= 0 and let K := K(x0 , d) = {x0 , x0 + d, . . . , x0 + (N0 − 1)d}. One problem this imme-
diately poses is that not all possible such pairs give rise to K ⊂ [N ]. To overcome this
nuisance, let us work in Z/N Z. The relevant random variable now is |A∩K|. To compute
the expectation using the linearity of expectation, we need to compute P(a ∈ K) for an
arbitrary a ∈ [N ]. To count the number of pairs (x0 , d) such that K(x0 , d) contains a,
36
observe that if a = x0 is the first term, then all possible choices for d count such valid
K, otherwise, there are N − 1 choices for x0 , and for each of those, if a is the ith term of
K(x0 , d) (with N0 − 1 choices for i) this determines d uniquely, provided we can solve the
corresponding linear equation a = x0 + id for d. Again, this makes things messy, but as
we have observed earlier, this works provided N is prime.
So, let us start again. Instead of working in Z/N Z, pick a prime p ∈ (2N, 4N ) and
let us then work in Z/pZ. We choose p > 2N since that ensures that addition of two
elements in {1, . . . , N } is the same as addition in Z/pZ. We also don’t want p to be too
large because we need to estimate p in terms of N . Now pick x0 , d uniformly from Z/pZ
as outlined before, and let K := K(x0 , d). Then
X |A| ε
E(|K ∩ A| = P(a ∈ K) = N0 ≥ N0
a∈A
p 4
by the assumption on the size of A. We would ideally like it to be the case that |K ∩ A|
takes somewhat large values with not-too-small probability, and by Proposition 15
ε ε
P |K ∩ A| ≥ N0 >
8 8
since |K ∩ A| ≤ N0 . This suggests the following: Let N0 = N0 (ε/8, k) from Szemerédi’s
theorem. Then it follows by everything seen above that with probability at least ε/8,
K ∩ A contains a k-AP. But on the other hand, if A is the set of all k-APS contained in
A, then
X
P(K ∩ A contains a member of A) ≤ P(P ⊂ K)
P ∈A
N0 (N0 − 1)
≤ |A|
N (N − 1)
since to determine if K(x0 , d) contains P , we have at most N0 (N0 − 1) choices for deter-
mining the first and second elements of P in K. Rearranging terms, this gives
ε
|A| ≥ N2
16N0 (N0 − 1)
and that completes the proof.
Remark: The constants N0 (ε, k) that Szemerédi’s proof offer are extremely large and
are of a tower type.
37
Why would one define this particular graph? This is actually one of the most well-
studied instances of a graph arising naturally from hypergraphs. The Kneser graphs are
instances of this when the underlying hypergraph is the complete uniform hypergraph of
order r, i.e., V (H) = [n] and E(H) = [n]
r
for some r < n/2. One of the other motiva-
tions for studying this graph comes from the problem of explicit constructions for Ramsey
graphs: We say that a graph on n vertices is k-Ramsey (if the k is implicitly clear, we
simply say Ramsey graph) if it neither contains an independent set4 nor a clique5 of order
k. Note that if the edges of a k-Ramsey G are colored red, and the remaining edges of
Kn are colored blue then by the definition of G being k-Ramsey, this coloring does not
contain a monochromatic Kk .
As seen earlier, the first explicit construction of a k-Ramsey graph due to Nágy was
by considering the graph GH where H was the complete uniform hypergraph of order 3.
As seen before, Erdős proved that R(k, k) > Ω(k2k/2 ) but his proof did not provide an
explicit deterministic construction. This also suggests the following question: Suppose
1
e(H) = 2( 2 +δ)n for some δ > 0. Is there a hypergraph H on n vertices such that the
graph GH is n-Ramsey? If true, this would in one stroke improve upon the probabilistic
lower bound, and also provide an explicit construction for n-Ramsey graphs.
While this would indeed be nice, it seems like asking for too much. And to see why
that would be the case, suppose G is a Ramsey graph on n vertices, i.e., suppose G
contains neither an independent set, nor a clique of order 2 log2 n (Why?!). We claim
that e(Gn ) must be rather large. Indeed, a celebrated theorem in extremal graph theory
due to Turán (in an alternate formulation) states that a graph on n vertices admits an
n
independent subset of size at least d+1 , where d is the average degree of the vertices of
the graph. If the Ramsey graph Gn satisfies e(Gn ) ≤ cn2−δ for some constants c, δ > 0
n
then by Turán’s theorem, 6 α(Gn ) ≥ d+1 = Ω(nδ ), so such graphs could not be Ramsey.
1
Since examples of Ramsey graphs of size 2( 2 +δ)n seemed difficult to contruct (they
still are!) this line of argument possibly convinced Daykin and Erdős to conjecture the
following:
1
Conjecture 17 (Daykin-Erdős). If |H| = m = 2( 2 +δ)n , then
Note that if m = 2n/2 then in fact there do exist hypergraphs H for which the graph
GH are dense (though not Ramsey graphs). For instance, take the set [n] and partition
it into two sets A, B of size n/2 each, and consider H to consist of all subsets of A along
with all subsets of B. Since A, B are disjoint, GH has all edges of the type (E, F ) where
4
A subset S of vertices is called independent if no two vertices of S are adjacent.
5
A clique in a graph is a set of pairwise adjacent vertices.
6
α(G) denotes the size of a largest independent subset of vertices in G,
38
A ⊂ A, F ⊂ F . The conjecture of Daykin and Erdős says that this cannot be improved
upon if the exponent were strictly greater than 1/2.
Theorem 18. (Alon-Füredi) Suppose 0 < δ < 1 is fixed, and suppose n is sufficiently
2
large. If |H| = m = 2(1/2+δ)n then d(H) < cm2−δ /2 for some positive constant c.
Proof. Let us see how we could go about this. If the graph GH is dense, one should
expect to find two large disjoint subsets S, T of V (GH ) which constitute a dense pair,
i.e., one ought to expect to see lots of edges between S and T . If this pair witnesses all
possible edges, one has ! !
[ \ [
S T = ∅.
S∈S T ∈T
[
So if the pair S, T constitute a dense pair, it would not be a stretch to expect that S
[ S∈S
and T are almost disjoint from each other. But if these sets are also ‘large’ subsets
T ∈T
of [n], this appears unlikely and would probably give a contradiction.
Let us begin formally now. We seek a collection S with A(S) being very large. Since
we are bereft of specific choices, pick S1 , S2 , · · · , St ∈ H uniformly and independently
for some t that shall be determined later. If A(S) has at most n/2 elements, then there
exists T ⊂ [n] such that |T | = n/2 and each Si ⊂ T . Fix such a choice for T .
n
#{E ∈ H | E ⊂ T } #{E ∈ H | E ⊂ T } 22 1
P(S1 ⊂ T ) = = 1 ≤ 1 = .
|H| 2( 2 +δ)n 2( 2 +δ)n 2δn
39
Therefore by the union bound,
t
2n
n 1 1
P(|A(S)| ≤ n/2) ≤ δn
< tδn
= (tδ−1)n .
n/2 2 2 2
Thus, to ensure that this is a low probability event, we need tδ − 1 > 0, or equivalently,
t > 1δ .
n/2
X part, we want X := #{E ∈ H | E ∩ A(S) = ∅} to be at least 2 .
For the second
Writing X = 1{E∩A(S)=∅} we have
E∈H
X
E(X) = P(E ∩ A(S) = ∅).
E∈H
Fix E ∈ H.
t
d(E)
P(E ∩ A(S) = ∅) = P(E ∩ Si = ∅ for all i = 1, . . . , t) =
m
where d(E) is the degree of E in GH . Denoting e(GH ) = M , we have
X X d(E) t
E(X) = P(E ∩ A(S) = ∅) =
E∈H E∈H
m
!
1 1 X
= t−1 (d(E))t
m m E∈H
2t M t
≥ .
m2t−1
E[X]
By proposition 15 we have (setting a = 2M
)
1 2t M t−1
1 E[X] m2t−1
− 2
M
E[X] 2M
P X ≥ aM ≥ E[X]
=
1− 2M 1− 2t−1 M t−1
m2t−1
which gives us
n 1
P(|A(S)| > ) ≥ 1 − (tδ−1)n .
2 2
If
2t M t−1
m2t−1 1
t−1 M t−1 > ,
1 − 2 m2t−1 2(tδ−1)n
then both events as outlined in the sketch happen simultaneously and our contradiction
is achieved. Choose t = 2δ . If M = cm2 , then this forced inequality is feasible for a
suitable t that depends only on δ and c. To determine the upper bound for M as in the
statement of the theorem is a straightforward exercise.
40
4.7 Graphs with high girth and large chromatic number
While it is easy to ensure that a graph constructed has a high chromatic number (make
a clique of that size as a subgraph), it became a considerably harder task of ensuring
that the same holds if we forbid large cliques. The first such question that arose was the
following:
Question 19. Do there exist graphs with chromatic number k (for any given k) and
which are also triangle free?
This was settled with the ‘Mycielski construction’ in the affirmative. This led to
the next natural question: What if we also forbid 4 cycles? Tutte produced a sequence
of graphs with girth 6 and arbitrarily large chromatic number, but the bigger question
loomed large: Do there exist graphs with arbitrarily large chromatic number and also
arbitrarily large girth? It took the ingenuity of Erdős to settle this in the affirmative.
To see why this is a little surprising, note that insisting on large girth g, simply implies
that for each vertex v, the induced subgraph on the set of vertices at a distance at most
g/2 is a tree, which can be 2-colored. Yet, it is indeed conceivable that the chromatic
number of the entire graph varies vastly from the chromatic number of small induced
subgraphs.
This again fits the general template we have discussed. We need a graph G in which
locally small induced subgraphs are trees, and yet, the graph itself has large chromatic
number. A random graph appears a sound candidate for such a possibility.
Theorem 20. (Erdős) There are graphs with arbitrarily high girth and chromatic number.
Proof. Let Gn,p denote a random graph on n vertices, where each pair of vertices {x, y}
is added independently with probability p. Let n be sufficiently large. For the random
graph to give us what we seek we want:
• Gn,p will have relatively few small cycles with reasonably high probability.
Fix a number `, and let N` denote the number of cycles of length at most ` in Gn,p . As
seen before,
`
X nj pj (np)3 (np)`−2 − 1 (np)`
E(N` ) ≤ ≤ · ≤ .
j=3
2j 6 (np) − 1 2
41
in other words, with probability at least 1/2 Gn,p has at most (np)` cycles of size at most
`. This is the first step.
To show that the chromatic number of our random graphs is large, we need to un-
derstand how one might bound the chromatic number from below. Doing this directly,
by working with the chromatic number itself, would be rather ponderous. But a simple
observation based on the definition tells us that since each color class is an independent
set, we have
n
χ(G) ≥
α(G)
where α(G) is the independence number of the graph. How does this help?
Let us examine α(G) in Gn,p . We need this to be small since we want the chromatic
number to be large. Then
P(α(G) ≥ m) = P(there exists an independent subset of G of size m)
X
≤ P(there are no edges inside S)
S⊂V,|S|=m
n m
= (1 − p)( 2 ) < (n exp(−p(m − 1)/2))m .
m
To get a handle on these parameters we have freely introduced, note that the term in-
m−1
side the parenthesis
l m in the last inequality above can be expressed as exp log n − ( 2
p) ,
so if m = 3 log
p
n
, then the probability that Gn,p contains an independent set of size at
least m goes to zero as n → ∞.
So, what
l domwe have on our hands now? If p is chosen in some manner, and then we
set m = 3 logp
n
then with positive probability (certainly) we have that G has at most
9np)` cycles of size at most ` AND that its independence number is at most m. Pick such
a G.
As before, we shall perform some deletions to G to rid it of all small cycles. But unlike
the earlier instance, if we deleted edges, we run the risk of pumping up its independence
number, so this time let us delete vertices instead. The advantage is that vertex deletions
result in an induced subgraph of the original graph, so its independence number remains
the same.
1/g
This suggests that we set (np)` ≤ n/2, or equivalently, p < Cnn for some constant C.
λ
So, set p = nn for some λ ∈ (0, 1/`), and m as suggested. Then remove an arbitrary vertex
from each small cycle from G, and call the resulting graph G0 . Then G0 has girth ≥ ` and
at least n/2 vertices. Finally, since deleting vertices doesn’t decrease the independence
number of a graph,
|V (G0 )| n/2 np nλ
χ(G0 ) ≥ ≥ ≥ = ,
α(G0 ) α(G) 6 log n 6 log n
42
which goes to infinity as n grows large.
Remark: There have been subsequently many constructive forms of this result, with
the first one by Lovász, and then subsequently by many others. Many of those construc-
tions actually construct hypergraphs with the same property. The nicest description of
such graphs however are the Ramanujan graphs constructed by Lubotzky-Phillips-Sarnak
([20]). But the proof involves some sophisticated number theory.
43
5 Dependent Random Choice
Question 21. For H bipartite, what is the correct value of α with 1 ≤ α < 2 such that
ex(n; H) = Θ(nα(H) )?
Suppose H = (A∪B, E) with |A| = a, |B| = b. One constructive way to find a copy of
H in a large graph G is to try and embed H into G, one vertex at a time. Suppose there
is a large subset A0 of G into which the vertices of A have already been embedded in
some fashion. Let B = {v1 , v2 , · · · vb } and suppose that we have embedded v1 , · · · vi−1 into
V (G). The new idea that provides a scheme by which this inductive procedure extends
to embedding vi as well is the following: Suppose vi has degree r, and suppose that every
r-subset of A0 has many common neighbors in G. One elementary bound here is that the
number of common neighbors is at least a + b. Since A has been embedded into A0 , this
gives a set U ⊂ A0 of size ≤ r which should be the neighbor set for vi . Since |U | ≤ r and
it has at least a + b common neighbors in G there is some available choice for vi in V (G)
which is not a vertex that has already been taken! In short, we have the following
45
every r-subset of A0 has at least a + b common neighbors in G. Then H can be embedded
in G.
The main idea here which basically inverts the usual way of approaching embeddings
presents a scenario which might allow one to establish other conditions that ensure that
a graph H can be embedded into a bigger graph G. And the technical condition that the
idea proposes leads to the following question: Given a graph G, under what conditions
can one ensure that there exists a subset of vertices A0 of size at least a such that every
r-subset of A has at least a + b common neighbors?
Before we address this question, let us see why picking the subset A0 randomly is not
a good idea. Suppose G is bipartite with both parts of considerable size. Then a random
set is very likely to pick at least one vertex from each of the parts and then the condition
cannot be satisfied. Since we do not choose our actual objects of interest by the random
method but rather in this dependent manner, this method is referred to as the method
of Dependent Random Choice.
Let V (H) = A ∪ B, |A| = a, |B| = b, let A0 be subset of V (G) containing all the vertices
of A. We seek to embed the graph H in G as described in the preceding section, and this
brings us concretely to the following question: How do we determine a set A0 such that
every r-subset of A0 has many common neighbors in G?
The main idea here is to invert this search: instead of picking the set A0 (say, ran-
domly) and hoping to have found one with many common neighbors, pick a set T and let
A0 be the set of those vertices which contain T among their neighbors. This is a healthy
heuristic since by fiat, we know that all the chosen vertices have the vertices of T among
their neighbors.
Indeed, over t rounds, pick a vertex vi uniformly at random and independently across
the rounds. Call this set T and consider the set of common neighbors of T - we shall
46
denote that by N ∗ (T ). Then
X
E(|N ∗ (T )|) = P(v ∈ N ∗ (T ))
v∈V
X d(v) t
=
v∈V
n
!t
1 1X
≥ d(v)
nt−1 n v∈V
¯t
(d)
=
nt−1
where d denotes the average degree of the vertices of G. The inequality above follows
from Jensen’s inequality for convex functions.
Let Y denote the number of r-subsets U of N ∗ (T ) such that U has fewer than m
common neighbors. Then
X
E(Y ) ≤ P(U ⊂ N ∗ (T )).
U ⊂V (G),|U |=r
|N ∗ (T )|<m
If U ⊂ N ∗ (T ), it means that every choice for T was picked from among the common
t
neighbors of U , so P(U ⊂ N ∗ (T )) ≤ mn
. Consequently,
n m t
E(Y ) ≤ ( )
r n
which implies
¯ t n m t
(d)
∗
E(|N (T )| − Y ) ≥ t−1 −
n r n
so that there exists (by the method of alterations as seen in the preceding chapter)
¯t m t
A0 ⊂ N ∗ (T ) of size at least n(d) n
t−1 − r n
such that every r-subset of A0 has at least m
common neighbors. This gives the following
Theorem 23. (Alon, Krivelevich, Sudakov) H is bipartite with vertex partition (A, B),
1
and if every vertex of B has degree ≤ r, then ex(n; H) = OH (n2− r ).
Proof. We only need to fill in the gaps now. Note that
1 1
e(G) ≥ Cn2− r =⇒ d¯ ≥ 2Cn1− r
47
Now plugging in the lower bound for d from before, we have
t t
¯ t n a + b
(d) ((2C)t nt− r )t nr a + b
t
− ( ) ≥ − .
nt−1 r n nt−1 r! n
(a + b)r
(2C)r − >a
r!
1/r
1 (a+b)r
with C > 2
a+ r!
and that completes the proof.
Before we move on, we highlight the inequality obtained earlier to the status of an
observation.
Observation 24. Suppose the average m tdegree of a graph G is d. Then there exists a
dt n
subset A0 of size at least nt−1 − r n such that every r-subset of A0 has at least m
common neighbors.
The peculiar aspect of this observation is that the parameter t which appears is not
present in the consequence, so it is more of a driving parameter that gives a condition to
make a conclusion.
Definition 26. A t-subdivision where each edge is replaced by a path with ≤ t internal
vertices.
Erdős also asked if ε-dense graphs, i.e., graphs Gn with e(Gn ) ≥ εn2 admit a 1-
subdivision of KΩ(√n) . More formally, is there a 1-subdivision of Kδ√n in an ε-dense
graph for some absolute δ = δ(ε) > 0? Note that the Bollobás-Hind result does not
establish a 1-subdivision, since the paths in the topological copy could involve some long
paths.
48
The following perspective is key:
If one seeks to embed a fixed bipartite graph into another graph, and if all the vertices
on one side of the bipartite graph have somewhat small degree, then the Dependent Choice
method gives a handle - effectively reducing the problem to a calculation - on proving suf-
ficiency results.
In the Erdős problem above, note that a strict 1-subdivision of the complete graph
Ka , i.e., one where each edge of Ka is subdivided to get a path of length two, corresponds
a
to a bipartite graph with parts of size a, 2 , respectively. Observe that every vertex in
a
the part of size 2 has degree 2 since each of these vertices is placed in an edge of the
original Ka , and hence has degree 2. Thus, the Dependent Random Choice technique
appears a likely tool.
Theorem 27. (Alon, Krivelevich, Sudakov) If e(Gn ) ≥ εn2 , then G has a 1-subdivision
of Kε3/2 √n .
Proof. If we think along the lines of the embedding procedure that we discussed in the
previous sections, then as remarked above, we have a sufficiency condition provided we
make a back calculation. Indeed, we would have the result we seek if
¯ t n m t
(d)
− ≥ a.
nt−1 r n
Here r = 2, m = a + a2 < 2 a2 < a2 , and d¯ ≥ 2εn.
Consequently,
n2 a2t
LHS > (2ε)t n − .
2 nt
√
For a = δ n and δ = ε3/2 , we have
n2 2t
t t
LHS > ε 2 n − ε
2
so if the second term in the square bracket equals n then we may factor out n from both
log n
these terms. This basically boils down to setting t = 2 log(1/ε) so that
√ √ √
n t+1 n t n 2 log(1/ε)
log 2
LHS > (2 − 1) > 2 = n
2 2 2
√
As n goes large, this beats a = δ n and settles the conjecture.
49
Definition 28. A graph homomorphism between graphs H, G is a map φ : V (H) → V (G)
such that whenever uv is an edge in H, φ(u)φ(v) is an edge in G.
Homomorphism capture graph adjacencies at the local level, i.e., for each vertex u the
neighbors of u are mapped to the neighbors of the image of u. The map φ is not required
to be injective, so for instance, there is a homomorphism from K3 to any odd cycle. It is
usually of greater interest to consider isomorphisms between graphs, i.e., injective maps
φ such that φ also preserves non-adjacencies. But if H is small compared to G, then
the non-injective maps are asymptotically far fewer than the injective ones, so homomor-
phisms are easier to study, since to count the number of homomorphisms, one can think
of it in terms of embeddings where for each vertex u ∈ H, the neighbors of u in H are
mapped into neighbors of its image in G.
Sidorenko’s conjecture (also attributed to Erdős and Simonovits) states that for any
bipartite graph H, among all graphs G with edge density p, the random graph G(n, p)
has asymptotically the least number of copies of H. More formally,
One way to intuit this is that a random graph tends to ‘spread out’ all the copies
of H so that no conglomeration of the copies of H is possible. While there have been
several attacks on this problem with beautiful results by several researchers, the problem
still remains open. In this section we shall see a beautiful result due to Conlon, Fox and
Sudakov (2010). But before we get to that result, let us quickly see how Sidorenko’s con-
jecture may also be stated in terms of counting homomorphisms instead of dealing with
homomorphism densities. Let |V (H)| = n, e(H) = m, and suppose hH (G) ≥ cH pm N n
holds for all graphs G on N vertices with pN 2 /2 edges. Here cH is a constant that de-
pends only on H. We first observe that this establishes the Sidorenko conjecture for H.
Indeed, suppose tH (G) < pm for some graph G with edge density p. The idea is
to ‘boost’ up the edge density of H in another related graph, by what is called the
‘tensoring trick’. For graphs G1 , G2 , the (weak) product G1 × G2 is the graph on the
vertex set V (G1 ) × V (G2 ) and (u, v) is adjacent to (u0 , v 0 ) if and only if uu0 ∈ E(G1 )
and vv 0 ∈ E(G2 ). The simplicity of this definition leads to many things, but here, the
relevant point is that for any H, tH (G1 × G2 ) = tH (G1 )tH (G2 ). We shall denote by G⊗r
the r-fold product G × · · · × G.
tH (G)
Consider 0 ≤ c = pm
< 1, if possible. Then for any integer r ≥ 1
50
since pr is the edge density of G⊗r and the assumption about the number of homomor-
phisms of H in any graph. But since c < 1, this yields a contradiction if r is sufficiently
large.
Proof. Let A = {u1 , . . . , ua } and B = {w1 , . . . , wb }. Let us start with a scheme for
constructing homomorphisms from H to G. One curious aspect of the hypothesis of the
theorem is the assumption about the existence of a special vertex in A. But if we were
to approach this from a constructive perspective, it makes certain things more rigid and
natural: Suppose u1 ∈ A is special. To construct a homomorphism φ one vertex at a
time, we shall first fix the image φ(u1 ) = x1 in G, and then (since we are interested
in homomorphisms which need not be injective) fix a sequence B = (y1 , . . . , yb ) with
yi ∈ N (x1 ) which would act as the image of B under φ, and then finally, choose the
images of the other ui ∈ A.
Suppose we have chosen yi ∈ N (x1 ) that act as φ(wi ). To see if this sequence B is a
good extension to the choice for x1 , let us examine how well this extends to the other ui
as a homomorphism. Each ui picks a subsequence B0 := (yi1 , . . . , yik ) that corresponds
to the neighbors of ui , so one is guaranteed many homomorphism extensions for defining
φ(ui ) if N ∗ (B0 ) is large. Since the neighbors of the ui may be arbitrary subsets of B, a
naturally good choice B is one for which for every subsequence B0 := (yi1 , . . . , yik ), the
set N ∗ (B0 ) is ‘large’. This begs the question: How large is ‘large’ ? Since Sidorenko’s
conjecture posits that the random graph generates the least number of homomorphisms,
it is reasonable to compare the size of the N ∗ (B) with pd(ui ) N since in a random graph
of edge density p, the expected number of common neighbors for a set of size k is pk N .
51
To explain this, we first pick x1 to be a good vertex, and fix x1 as the image of the special
u1 . For each desirable sequence (y1 , . . . , yb ) which will serve as (φ(w1 ), . . . , φ(wb )), we
pick choices for the remaining ui . Since B is desirable, there are at least αpd(ui ) N choices
for each ui .
Hence
a
!
X d(x1 )b Y αa−1 pm−b N a−1 X
αpd(ui ) N = d(x1 )b
x1 ∈ Good
2 i=2
2 x1 ∈ Good
a−1 m−b a−1
P b
α p N x1 ∈ Good d(x1 )
≥ N
2 N
!b
αn−1 pm−b N a X
= d(x1 )
2N b x1 ∈ Good
This sets us a goal, but there is still a lot packed tightly into the notion of what it
means for a vertex to be good. To unspool this a bit, suppose that a vertex x ∈ / Good.
An alternate way to state this is: Suppose B is picked uniformly at random from (N (x))b .
Then the probability that B is not desirable is at most 1/2.
Let B = (y1 , . . . , yb ), and let us fix 1 ≤ i1 < · · · < ik ≤ b, and set B0 = (yi1 , . . . , yik ).
If there exists β > 0 such that P(|N ∗ (B0 )| < αpk N ) < β for all k and choices of (i1 , . . . , ik )
the
P(B is not desirable) < 2b β ≤ 2n−1 β
and if we choose β = 2−n then this contradicts that x ∈
/ Good.
This motivates the following definition. For 1 ≤ k ≤ b, we say that a vertex x is prob-
lematic for k (which we shall denote by x ∼ k) for a positive integer k ≤ b if the number
of subsequences B0 = (y1 , . . . , yk ) ∈ N (x)k with |N ∗ (B0 )| < αpk N is somewhat large, say
at least βd(x)k . By our terminology, x ∈ Good if and only if x is not problematic for k
for each 1 ≤ k ≤ b.
We now invoke the Dependent Random Choice principle: Small subsets of the common
neighborhood of small random sets admit many common neighbors. Let B0 be a random
k-sequence of vertices from V , and let COUNTk be the number of vertices x ∈ V such
52
that B0 ∈ N (x)k and |N ∗ (B0 )| < αpk N . Then
X
E(COUNTk ) = P(B0 ∈ N (x)k and |N ∗ (B0 )| < αpk N ) (5.1)
x∈V
X X d(x) k
0 k
≥ P(B ∈ N (x) ) ≥ β (5.2)
x∼k x∼k
N
!k
X
≥ βN 1−2k d(x) (5.3)
x∼k
by the convexity of the function f (x) = xk and the definition of x being problematic for
k. On the other hand,
since any such x that is counted in this which admits B0 ∈ N (x)k with |N ∗ (B0 )| < αpk N
must necessarily satisfy x ∈ N ∗ (B0 ). Thus we have
1/k
X α
d(x) < pN 2 .
x∼k
β
1 1
X pN 2
so, if we take β = 2n
,α = 4n nn
we have d(x) ≥ and the proof is complete.
x∈ Good
2
53
But suppose we only have access to consider pairs of sums for a restricted number of
pairs of A. More precisely, consider a bipartite graph G = GA with both vertex partitions
corresponding to the set A and we only have access to the pairs of sums a + b whenever
ab ∈ E(G). We shall denote by A +G A the set {a + b : a, b ∈ A, ab ∈ E(G)}. How much
information does A +G A capture about |A + A|?
Let |A| = n . If the graph G is sparse, then there is not much hope of gathering
much information about A, so suppose that G is dense, i.e., e(G) ≥ αn2 for some fixed
0 < α < 1. If |A +G A| ≤ cn for some absolute constant c then could we conclude that
|A + A| ≤ c1 n for some c1 ?
A moment’s thought tells us that such a conclusion is too good to be true. Indeed,
suppose R is a random subset of [1, n2 ] where each x ∈ [1, n2 ] is picked uniformly and
independently with probability n1 , say, and let A = [1, n] ∪ R . Let G be the graph
corresponding to the pairs ab with a, b ∈ [1, n]. Then |A +G A| = 2n − 1 andt it is not
hard to see that P(|R + R| ≥ Ω(n2 )) = Ω(1).
However, in this example the set A had a large subset A0 = [1, n] for which |A0 + A0 | =
2|A0 |. This motivates the following modification: If |A +G A| = O(n), then is there
a large structured subset of A? The answer in the affirmative is the substance of the
Balog-Szemerédi-Gowers theorem:
Theorem 31. Suppose 0 < α < 1, c > 0 are reals then there exist c0 , c00 (depending on
c, α) such that the following holds and a constant n0 (c, k) such that the following holds.
Suppose n ≥ n0 and A is a subset of the integers of size n, and suppose that for a graph
G = GA with e(G) ≥ αn2 we have |A+G A| ≤ cn then there exists A0 ⊆ A with |A0 | ≥ c0 |A|
with |A0 + A0 | ≤ c00 n.
54
are at least Ω(n2 ) triples (x, x0 , x00 ) ∈ (A +G B)3 such that
x − x0 + x00 = a + b,
x = a + b0
x00 = a0 + b
X
c3 n 3 ≥ #{(x, x0 , x00 |y = x − x0 + x00 )} (5.5)
y∈A0 +B 0
κn2
n|A1 | = |A1 | · |B| ≥ e(A1 , B) ≥
2
which gives
κn
|A1 | ≥ .
2
Now let us speculate a bit.
Conjecture 33. Suppose e(A, B) ≥ κn2 . There exist absolute constants α, δ > 0 de-
pending only on κ such that the following holds: There exists A0 ⊆ A with |A0 | ≥ α|A|
such that every pair {a, a0 } in A0 has at least δ|B| common neighbors in B.
This conjecture is a natural first-line-of-attack. If conjecture 33 holds, then get
U ⊆ A1 , |U | ≥ α|A1 | such that every pair {a, a0 } in U have at least δ|B| common neigh-
bors; by the arguments outlined earlier, this gives us that |U | ≥ ακn/2.
Now we choose B1 ⊂ B to consist of those vertices with large degree into U - that
would provide many choices for the 3rd edge. If |B1 | = Ω(n), we are through.
55
and again exactly as before, since e(B \ B1 , U ) ≤ µ|U |n and e(U, B) ≥ |U |( κn
2
), we have
κ
|U | · |B1 | ≥ e(U, B1 ) ≥ ( − µ)|U |n (5.8)
2
so that
κn
|B1 | ≥ . (5.9)
4
|
We now claim that (U, B1 ) will do the job. Indeed, since each b ∈ B1 has ≥ κ|U 4
|
neighbors in U , b has at least κ|U
4
− 1 neighbors in U \ {a}. For each a0 ∈ N (b) \ {a},
there exist at least δn − 1 common neighbors for a, a0 in B \ {b}, so that there are at least
κ κδ ακδ 2
( |U | − 1)(δn − 1) ≥ |U |n ≥ n (5.10)
4 16 32
paths of length 3 from a to b.
How does one salvage this? If this line of argument can still be exploited, the next
natural question in place of Conjecture 33 would be
Question 34. Does conjecture 33 hold if the words ’every pair’ are replaced by ‘most
pairs’? More precisely, suppose κ > 0, and G = G[A, B] is bipartite with edge density
κ. Does there exist subset A0 ⊆ A such that |A0 | ≥ α|A| such that there are at least
(1 − ε)|A0 |2 ordered pairs of A0 , each of which have at least δ|B| common neighbours in
B, for some suitable α, δ, ε depending only on κ?
First, let us see if this weakening still yields the desired outcome. Let U, B be chosen
as before, but with U being the set guaranteed by an affirmative answer to question 34
rather than the erroneous Conjecture 33. To be precise, call (a, a0 ) a pair bad if they have
fewer than δ|B| common neighbors in B. Let U ⊂ A1 such that |U | ≥ α|A1 | and with
at most ε|U |2 bad pairs. Then again as before, b ∈ B1 implies that d(b, U ) ≥ κ4 |U |. But
this time, instead of using U , we refine it further. Let
κ
A0 := {a ∈ U : a is in at most |U | bad pairs}. (5.11)
8
Since the total number of bad pairs in U is at most ε|U |2 , the total number of bad
pairs featuring a vertex in U \ A0 is at least κ8 |U |(|U | − |A0 |), So
κ
ε|U |2 ≥ |U |(|U | − |A0 |)
8
which gives
8ε
|A0 | ≥ (1 − )|U | (5.12)
κ
56
So for instance if ε = κ
16
then |A0 | ≥ 21 |U |.
κ κ κδ ακδ 2
( |U | − 1 − |U |)(δn − 1) ≥ |U |n ≥ n = Ω(n2 ) (5.13)
4 8 32 64
as before with only slightly worse constants. Thus we are just left with proving settling
question (34) in the affirmative. And the good news is
Theorem 35. Suppose 0 < ε < 1 and G = G[A, B] is bipartite with e(G) = κ|A||B|.
There exist constants α, δ > 0 that depend only on κ, ε such that the following holds:
There exists A0 ⊆ A satisfying
• |A0 | ≥ α|A|,
• All pairs (a, a0 ) in A0 except for at most ε|A0 |2 admit at least δ|B| common neighbors
in B.
Proof. We go back to the Dependent Random Choice heuristic: Small subsets of the
common neighborhood of small random sets admit many common neighbors. Pick b ∈ B
at random and let A0 = N (b). Then
1 X
E[|A0 |] = d(B) = κ|A|. (5.14)
|B| b∈B
As before, call a pair (a, a0 ) bad if the number of common neighbors is at most δ|B|, and
let BAD denote the number of bad pairs (a, a0 ) in A. Then
X
E[|BAD|] = P (b is chosen from a set of size at most δ|B|) (5.15)
(a,a0 )∈BAD
≤ δ|A|2 (5.16)
Our goal is to find a b such that |N (b)| = Ω(|A|) and |BAD| ≤ ε|N (b)|2 . Note that by
Cauchy-Schwarz,
1 δ
E(|A0 |2 − |BAD|) ≥ κ2 |A|2 − |A|2 (5.17)
ε ε
εκ2
Setting δ = 2
, we have
1 κ2 |A|2
E[|A0 |2 − |BAD|] ≥
ε 2
so that there exists b ∈ B such that
1 κ2 |A|2
|N (b)|2 − |BAD| ≥
ε 2
57
Thus we have
58
6 The Second Moment Method
The method of using expectation of random variables is a very useful and powerful
tool, and its strength lies in its ease of computing the expectation. However, in order to
prove stronger results, one needs to obtain results which prove that the random variable
in concern takes values close to its expected value, with sufficient (high) probability. The
method of the second moment, as we shall study here gives one such result which is due
to Chebyshev. We shall outline the method, and illustrate a couple of examples. The
last section covers one of the most impressive applications of the second moment method
- Pippenger and Spencer’s theorem on coverings in uniform almost regular hypergraphs.
Var(X)
P(|X − E(X)| ≥ λ) ≤ .
λ2
Proof. Var(X) = E[(X − E(X))2 ] ≥ λ2 P(|X − E(X)| ≥ λ).
The use of Chebyshev’s inequality, also called the Second Moment Method, applies
in a very wide context, and it provides a very basic kind of ‘concentration about the
mean’ inequality. The applicability of the method is most pronounced when the variance
is of the order of the mean, or smaller. We shall see in some forthcoming chapters that
59
concentration about the mean can be achieved with much greater precision in many sit-
uations. What, however still makes Chebyshev’s inequality useful is the universality of
its applicability.
The (usually) difficult part of using the second moment method arises from the diffi-
culty of calculating/estimating Cov(X, Y ) for random variables X, Y . One particularly
pleasing aspect of the second moment method is that this calculation becomes much sim-
pler if for instance we have pairwise independence of the random variables which is much
weaker than the joint independence of all the random variables.
The preceding example illustrates one important aspect of the applicability of the second
moment method: If Var(Xn ) = O(E(Xn )) and E(Xn ) → ∞ then Chebyshev’s inequality
gives
P(|Xn − E(Xn )| > εE(Xn )) = o(1).
In particular, Xn is ‘close to’ E(X) with high probability.
The answer, perhaps surprisingly, is that one typically needs much shorter sequences.
60
Theorem 37. Suppose X := (X1 , . . . , Xn+2 ) be a random Zn -sequence, i.e., suppose Xi
are chosen uniformly and independently from Zn . Then with high probability, X contains
a zero-sum subsequence of length n.
H := {I ⊂ [n + 2] : |I| = n},
X
N := I(XI ).
I∈H
Then
X 1 n+2 (n + 2)(n + 1)
E(N ) = P(XI = 0) = = ,
I∈H
n n 2n
and, X X
V ar(N ) = V ar(I(XI )) + Cov(I(XI ), I(XJ )).
I∈H I6=J
I,J∈H
The main observation is that since Xi ’s are i.i.d, it follows that the XI are pairwise inde-
pendent. Indeed pick i ∈ I \ J and j ∈ J \ I and condition on the values of the random
variables {X` }`6=i,j ; this determines Xi , Xj uniquely, so the conditional (and hence also
the unconditional probability) of XI = XJ = 0 is 1/n2 = P(XI = 0) · P(XJ = 0).
61
of these analogues are typically much smaller. The only interesting instance where this
is not the case is when we allow weighted sums for the subsequences with weights in
{−1, 1}. For Z/nZ the corresponding weighted Davenport constant is log2 n whereas the
random analogue is (1/2 + o(1)) log2 n. For more such results, see [8].
For instance, if xk = 2k , then we see that {x1 , x2 , . . . , xk } has distinct sums. Erdős
posed the question of estimating the maximum size f (n) of a set {x1 , x2 , . . . , xk } with
distinct sums and xk ≤ n for a given integer n. The preceding example shows that
f (n) ≥ blog2 nc + 1.
Erdős conjectured that f (n) ≤ blog2 nc + C for some absolute constant C. He was able
to prove that f (n) ≤ log2 n + log2 log2 n + O(1) by a simple counting argument. Indeed,
there are 2f (n) distinct sums from a maximal set {x1 , x2 , . . . , xk }. On the other hand,
since each xi is at most n, the maximum such sum is at most nf (n). Hence 2f (n) < nf (n).
Taking logarithms and simplifying gives us the aforementioned result.
As before, here is a probabilistic spin. Suppose {x1 , x2 , . . . , xk } has distinct sums. Pick
a random subset S of [k] by picking each element of [k] with equal P probability and
independently. This random subset gives the random sum XS := xi ∈S xi . Now
2
E(XS ) = 2 (x1 + x2 + · · · + xk ). Similarly, Var(XS ) = 4 (x1 + x2 + · · · + xk ) ≤ n4k ,
1 1 2 2 2
so by Chebyshev we have
n2 k
P(|XS − E(XS )| < λ) ≥ 1 − .
4λ2
Now the key point is this: since the set has distinct sums and there are 2k distinct subsets
of {x1 , x2 , . . . , xk }, for any integer r we have that P(XS = r) ≤ 21k ; in fact it is either 0
or 21k . This observation coupled with Chebyshev’s inequality gives us
n2 k 2λ + 1
1− 2
≤ P(|XS − E(XS )| < λ) ≤ .
4λ 2k
62
6.4 The space complexity of approximating frequency moments
One of the paradigmatic features of the probabilistic method is that it suggests different
perspectives to many problems, and one of the features of probabilistic thinking is to be
more accepting of approximate solutions, provided we have a control on the errors that
accrue. This section features one such result due to Alon, Matias, and Szegedy.
One of the features of the Theory of Complexity is to study efficient handling of re-
sources in various algorithmic computational problems (see [30] for a fantastic overview
of the subject). Usually, the resource that is optimized is run time of an algorithm. In
this section, we look at an optimization for space constraints.
the most popular element of the sequence A1 . For various reasons, one wishes to com-
pute/estimate these statistics of a given sequence as the provide useful information about
A.
63
sequence in one pass, a number Y such that the probability that Y deviates from Fk by
more than λFk is at most ε. Most importantly, the algorithm only uses
k log(1/ε) 1−1/k
O N (log N + log m)
λ2
memory bits.
Proof. First, note that the statement does not require that we know the size of the
sequence A in advance. But for starters, let us assume that m is known. Since we seek a
randomized algorithm, the key first step is to identify a random variable whose expected
value is the parameter of interest, viz., Fk for each k. A first natural guess is to do the
following. Pick p uniformly from [m] and consider R := |{q ≥ p : aq = ap }|. Since we seek
to estimate Fk , the first natural choice is the random variable X := mRk . But a quick
check reveals why it is not good enough, and also how one can fix it. Indeed, suppose
the element i ∈ [N ] occurs in positions i1 < · · · < iu for some u. The contribution
from the element i towards Fk is uk . However, in computing the expected value of X,
the contribution instead is uk + (u − 1)k + · · · + 1k , so a fix for this would be to let
X = m(Rk − (R − 1)k ). Then
as desired. Also, if we pre-process the random choice prior to the one pass, then the
number of storage bits needed is at most O(log N + log m) bits that are needed to keep
track of the element ap and the number of occurrences of ap starting from position p in A.
To see how good an estimate X is, we follow the maxim in the epigraph of this chapter:
After you have computed the expectation of a random variable, you should try to compute
the variance. Towards that end, we see
N
!
2 m2 X
E(X ) = (mki − (mi − 1)k )2 + · · · + (2k − 1k )2 + 12k
m i=1
N
!
X
≤ m kmk−1
i (mki − (mi − 1)k ) + · · · + k2k−1 (2k − 1k ) + k12k−1
i=1
XN
≤ km m2k−1
i
i=1
= kF1 F2k−1
64
where we basically use the fact that
k−1
X
k k
a − b = (a − b) ak−i bi ≤ (a − b)kak−1
i=0
for positive reals a > b > 0. To bound this further, let M = max1≤i≤N mi . Then
F1 F2k−1 ≤ F1 M k−1 Fk
N
!(k−1)/k
X
≤ F1 mki Fk
i=1
1−1/k 1/k 2−1/k
≤ N Fk Fk
1−1/k 2
= N Fk
so that if s = N 1−1/(2k) we already have a saving in the number of memory bits since
we still only require O(s(log N + log m)) bits of memory. But one can do better; let
(X1 , . . . , Xs ) be independently sampled according to X as above, and let Y be their
1−1/k
mean - only this time, we take s = Cnλ2 for some constant C, which does not quite
give the high probability estimate we want but rather P(|Y − Fk | > λFk ) ≤ 1/C. But
repeat this process r times (for some r to be determined), and then report the value
Z = Median(Y1 , . . . , Yr ).
random variable Ỹi to equal 1 if Yi ∈ [Fk −λFk , Fk +λFk ] and zero otherwise
Define the P
and let Z̃ := ri=1 Ỹi , so that we can bound tail probabilities of Z̃ by the distribution
of the Binomial variable Bin(r, 1 − 1/C). If Z lies outside [Fk − λFk , Fk + λFk ] then Z̃
is less than r/2. If C = 8, say, one√can again use the Chebyshev bound (we omit these
details) to show that for r = O(1/ ε) one has P(|Z − Fk | > λFk ) ≤ ε.
But again, this is not optimal; the Binomial distribution approximates the Gaussian
random variable for sufficiently large r, so deviation from the expectation is an expo-
nentially unlikely event. A more precise form of this appears in the next chapter which
establishes exponential decay away from the expected value for the Binomial distribu-
tion. Thus, (and these details will be more clear in the next chapter) one can take
65
r = O(log(1/ε)) and the proof is complete.
The last point is to deal with this when m is not known apriori. In that case, start
with m = 1, and choose ap as in the randomized algorithm stated above. If m is not one,
we update m = 2, and replace p = 1 with p = 2 with probability 1/2. More generally,
having reached m0 , if m > m0 then we replace p with m0 + 1 with probability m01+1 . It
is not hard to see that this keeps the argument intact and the implementation still only
needs O(log m + log N ) bits.
Remark: We have throughout assumed implicitly, that m is not much larger than a
polynomial in N , but if m grows, say exponentially with N , then there are older results
that give a similar saving in memory.
The paper [3] includes several other interesting results. For instance, they show that the
space complexity results here are almost best possible: for k ≥ 6, randomized algorithms
∗
need at least Ω(n1−5/k ) memory bits, and that the estimation of F∞ requires Ω(N ) bits.
Another beautiful result is that the estimation of F2 can actually be done with only
O(log N ) memory bits. This uses the fact that there is a simple deterministic construction
of a set (using what are known as BCH codes) of O(N 2 ) {-1,1}-valued vectors of length
N which are four-wise independent, i.e., any 4 of the coordinates are independently
distributed amongst the possible 4-tuples of {−1, P 1}. If v := (v1 , . . . , vN ) is randomly
picked from the above set, define X := hv, Ai = N i=1 vi ai . For the details and other
interesting results, we refer the reader to the paper [3].
and posed the question of determining the correct order of k(ε). This was achieved by
Alon and Peres, who proved the following theorem.
Theorem 41. Given γ > 0, there exists ε0 = ε0 (γ) such that for ε < ε0 , every set T ⊂ T
of cardinality at least ε−(2+γ) has an ε-dense dilation nX. In other words,
66
We will see a proof of this in the special case when X consists entirely of rationals
with the same prime denominator p so that every element in X is of the form x/p. Under
this assumption, we first observe that the problem reduces to considering dilations of
subsets in the finite field Fp . In this case, the lower bound is of th eright order upto a
multiplicative constant.
To state the precise form of the stronger result, suppose p is a prime and ε > 0. De-
fine k(ε, p) to be the minimum integer k such that the following holds. For every subset
X ⊂ Fp of size at least k, there exists n such that nX intersects every interval of size at
least εp. Here, by an interval of length r we mean a set of the form {a, a+1, . . . , a+r −1}.
In words, this states that when a set is of size then some dilate of this set is fairly well-
spread out, and so touches all the intervals of length εp.
Here is the theorem of Alon and Peres [4] in its exact form.
Theorem 42. For every prime p and 0 < ε < 1 for which εp is an integer,
4
k(ε, p) ≤ .
ε2
Proof. We shall omit floors and ceilings in our presentation below for convenience. Let
X := {x1 , . . . , xk } ⊂ F∗p . At first glance, the main issue is that there are p different
intervals of length εp while any dilate aX contains only O(ε−2 ) elements, so it seems
quite a task for the dilate to meet every interval of size εp. But a simple trick makes this
task realistically feasible. Set s = 2/ε and let I1 , . . . , Is be disjoint intervals of length εp
2
that partition Fp . Since each interval I of length εp necessarily contains one of the Ii , it
suffices to show that there exists a ∈ Fp such that |aX ∩ Ii | > 0 for each of these intervals.
As with our previous rules of thumb, it is quite natural to try a random dilate of X.
Let a ∈ Fp be a random element and for a fixed interval Ii in the partition above, observe
that
X εk
E(|aX ∩ Ii |) = P(a ∈ {x/x1 , . . . , x/xk }) = .
x∈I
2
i
and the last double sum poses a bit of a problem. The brilliant idea of Alon and Peres
was to modify the random process so as to make this sum computable. And to do that,
they consider affine translates of the set as well.
Indeed, pick a, b ∈ Fp independently and consider the set aX + b. The key point is
that if there are a, b such that aX + b meets every interval of size εp, then since translates
67
of intervals are again intervals, the same holds for aX as well! Furthermore, for x 6= y
the events x ∈ aX + b and y ∈ aX + b are pairwise independent, so we have
X εk
E(|(aX + b) ∩ Ii |) = P(x, y ∈ aX + b) =
x∈Ii
2
X
Var(|(aX + b) ∩ Ii |) = Var(1x∈aX+b )
x∈Ii
εk
≤ .
2
Hence by Chebyshev,
2
P((aX + b) ∩ Ii = ∅) <
εk
which implies that
4
P(aX + b ∩ Ii = ∅ for some 1 ≤ i ≤ s) ≤ <1
ε2 k
for k as in the theorem.
Remark: To move from this special case to all intervals in T as in the main theorem
stated at the beginning of this section, the idea is to pick a large enough A, and pick
a ∈ {1, . . . , A} randomly, and b ∈ T uniformly, and consider the affine translate aX + b
as before. As before, we shall fix a partition of T into intervals of size ε/2 and fix such
an interval I. Again, E(|(aX + b) ∩ I| = εk/2, but now computing the variance is a
little more complicated. If we write X = {x1 , . . . , xk } then it is not hard to show that
computing the variance of the aforementioned random variable can be estimated in terms
of the differences xi −xj . The technical ideas deal with how one can control the covariance
terms, and this involves a few more subtleties than the simple result above.
The paper [4] contains several other general results on when one can find dilations that
are ε-dense. For instance, they also show that one can pick a dilate n which is prime for
which nX is ε-dense, and further, if k ≥ ε−(4+γ) then there is an ε-dense dilation of the
form n2 X as well. We direct the interested reader to the paper [4] for other results.
The Rödl ‘Nibble’ refers to a probabilistic paradigm (pioneered by Vojtech Rödl) for
a host of applications in which a desirable combinatorial object is constructed via a ran-
dom process, through a series of several small steps, with a certain amount of control over
each step. Subsequently, researchers realized that Rödl’s method can be extended as a
68
paradigm to a host of other constructions, particularly for coloring problems in graphs,
and matchings/coverings problems in hypergraphs. Indeed, the proof of Erdős-Hanani
conjecture - the result that launched the Rödl Nibble - is an instance of a covering prob-
lem of a specific hypergraph. In this section, we shall see a resolution of the Erdős-Hanani
conjecture following a latter simplification by Pippenger and Spencer [22].
We start with a definition. As always, [n] denotes the set {1, . . . , n}. Suppose r, t ∈ N.
An r-uniform covering for [n] t
is a collection A of r-subsets of [n] such that for each
[n]
t-subset T ∈ t , there exists an A ∈ A such that T ⊂ A. An r-uniform packing for
[n] [n]
t
is a collection A of r-subsets of [n] such that for each t-subset T ∈ t
, there exists
at most one A ∈ A such that T ⊂ A.
Let M (n, r, t) be the size of a minimum covering, and m(n, r, t) be the size of a
maximum packing. A simple combinatorial counting argument shows that
n
t
m(n, r, t) ≤ r ≤ M (n, r, t).
t
Indeed, if one were to consider a covering, then for each t-subset, there is at least one
r-subset containing it; conversely, each r-subset contains rt t-subsets, so the first inequal-
ity is obtained. The argument is similar for maximal packings. It then rseems natural to
n
ask if there exists a collection A of r-subsets of [n] with size |A| = t / t such that A is
both an r-uniform covering and packing for [n]
t
. This is called a t − (n, r, 1) design and
is also referred to as a Steiner t-design.
M (n, r, 2) m(n, r, 2)
lim n
r = lim n r = 1.
n→∞
2
/ 2 n→∞
2
/ 2
and further conjectured that this is true for all positive integers r ≥ t. In a sense, the
conjecture posits that as n grows large, one gets more room to attempt to fit these r-
subsets to cover all t-subsets, so as n gets larger, one ought to be able to get closer to
as tight a packing (or covering) as one can. This conjecture was settled affirmatively by
Vojtech Rödl in 1985.
69
each e ∈ E has size r. The degree of a vertex in a hypergraph is the same as the one we
have encountered in the graph case, i.e., d(x) = |{E ∈ E : E 3 x}|. Given an r-uniform
hypergraph H on n vertices which is D-regular for some D, i.e. d(x) = D for all x ∈ V ,
we seek a covering (resp. a packing) of H which is as tight as possible, i.e., a covering
(resp. packing) of size approximately n/r. This more general question subsumes the
[n]
Erdős-Hanani question: consider the hypergraph H = (V, E) where V = t and the
edges of H correspond to r-subsets of [n] with each such r-subset E containing all the
r
vertices x that correspond to t-subsets of E. It is easy to see that this is an t -uniform
regular hypergraph with degree D = n−t r−t
.
Let ε > 0. Note that in this new formulation, if we can find a packing of size (1−ε)n
r
,
then there are at most εn vertices uncovered. Hence, we can find a covering of size
(1−ε)n
r
+ εn = (1 − (r − 1)ε) nr . On the other hand, if we can find a covering A of size
(1+ε)n
r
, then for every x which is covered by d(x) hyperedges, we delete d(x) − 1 of them.
The number of deleted edges is at most
X X X (1 + ε)n
(d(x) − 1) = d(x) − 1 = |{(x, E) : E ∈ A}| − n = · r − n = εn
x∈V x∈V x∈V
r
(1+ε)n
so there is a packing of size at least r
− εn = (1 − (r − 1)ε) nr . The upshot is:
n
Finding a covering of size approximately r
is equivalent to finding a packing of size
approximately nr .
Let us try a simple probabilistic idea first to see what its shortcomings are. Suppose
we decide to pick each edge of the hypergraph H independently with probability p. We
seek a collection E ∗ with |E ∗ | ≈ nr ; if we start with an almost regular graph of degree
D, then r|E| ≈ nD, so that implies that we need p ≈ D1 . Let us see how many vertices
get left behind by this probabilistic scheme. A vertex x is uncovered only if every edge
containing it is discarded. In other words, the probability that a vertex x gets left behind
is approximately (1 − 1/D)D ≈ 1/e. This is now a problem because this implies that the
expected number of vertices that go uncovered is approximately n/e which is a constant
proportion of the total number of vertices.
Rödl’s idea was to, as we described in the beginning of this section, attempt an inductive
procedure: We pick a small number of edges, so that the rest of H is as ‘close as possible’ to
the original one. If the inductive procedure were to take over for the modified hypergraph,
then after several “nibbles” into the hypergraph we are left with a very small proportion
of the vertex set that is yet uncovered. But for these, we pick an arbitrary edge for each
of these vertices to get a covering for the entire vertex set.
70
However, note that after each step, some of the regularity conditions of the hyper-
graph are bound to be violated, so for the inductive procedure to apply to the smaller
hypergraph the hypotheses would have to be milder. We will get to this point momen-
tarily.
This sets our paradigm into motion. In the first ‘step’, each edge E ∈ E is picked
independently with probability p = Dε . If E ∗ is the set of chosen edges then we have
εn
E[|E ∗ |] = .
r
Also, the probability that a vertex x is not covered in this process is (1 − ε/D)d(x) ≈ e−ε .
In the rest of this section, and also in subsequent chapters, we shall adopt Pip-
penger and Spencer’s wonderfully terse notation. We shall write x = a ± b to mean
x ∈ (a − b, a + b). Also, since there will be many constants that keep popping up, we
shall throw in new variables to denote various small quantities which can all be tied down
eventually, if need be.
Getting back to our first step, after a ‘nibble’, the rest of the hypergraph is no longer
regular, so as mentioned earlier, we need to make the hypotheses milder, so we propose:
Given an r-uniform hypergraph H on n vertices such that d(x) = D(1 ± δ) for all x ∈ V
for some small δ > 0, we want to find a covering of size ≈ nr . For this to work, our first
step necessarily has to reduce the degrees of all the vertices that are not covered during
round one, and that is still a little too strong. So, the hypothesis needs to be milder:
So under the milder hypothesis we wish to find a collection of edges E ∗ such that
1. |E ∗ | = εn
r
(1 + δ 0 ),
S
2. |V ∗ | = ne−ε (1 ± δ 00 ) where V ∗ := V \ E ,
E∈E ∗
3. For all x ∈ V ∗ except at most δ 000 |V ∗ | of the vertices, if d∗ (x) denotes its degree in
the residual hypergraph then d∗ (x) = D(1 ± µ).
71
To explain this requirement, let 1x = 1{x∈/ any edge of E ∗ } . If each edge is picked indepen-
dently with probability ε/D then
X ε d(x) ε D(1±δ)
E(|V ∗ |) = 1− ≈ n(1 − δ) 1 − ≈ n(1 − δ)e−ε(1±δ) ≈ ne−ε (1 ± δ 00 ).
x∈V
D D
Furthermore,
X
Var(|V ∗ |) = Var 1x
x∈V
X X
= Var(1x ) + Cov(1x , 1y )
x∈V x6=y
where d(x, y) denotes the codegree of x and y, i.e., d(x, y) = |{E : x, y ∈ E}|.
ε
Note that e− D d(x,y) − 1 is very small provided that d(x, y) D. This r−t
is true in
[n] n−t
the original Erdős-Hanani problem, where V = t , since D = r−t = O(n ), while
1 ∪T2 |
d(x, y) = n−|T ≤ n−t−1
r−|T1 ∪T2 | r−t−1
= O(nr−t−1 ) D, where x and y corresponds to t-subsets
T1 and T2 respectively.
Before we make our speculation outright, there is one more aspect that suggests that
the hypothesis should be made milder. Suppose that for some vertex x the degree of
x is super large, i.e., D = o(d(x)). Since we wish to retain the r-uniformity of the
hypergraphs, our process would entail throwing away all edges that intersect some vertex
of V \ V ∗ to get to the modified hypergraph. But if d(x) is very large, then it is somewhat
likely that some edge containing x is chosen, and since x would get picked, all edges that
contain x will have to be discarded to get to the residual hypergraph, and we may lose
too many edges in this process. So, to prevent this, we may want d(x) = O(D) for all x.
This motivates the following tentative statement:
Lemma 43. (‘Nibble’ lemma) Suppose r ≥ 2 is a positive integer, and k, ε, δ ∗ > 0 are
given. Then there exist δ0 (r, k, ε, δ ∗ ) > 0 and D0 (r, k, ε, δ ∗ ) such that for all n ≥ D ≥ D0
and 0 < δ ≤ δ0 , if H is an r-uniform hypergraph on n vertices satisfying
(i) except at most δn vertices, d(x) = D(1 ± δ) for other vertices x ∈ V ,
72
(ii) d(x) < kD for all x ∈ V ,
(a) |E ∗ | = εn
r
(1 ± δ ∗ );
S
(b) |V ∗ | = ne−ε (1 ± δ ∗ ), where V ∗ = V \ E ;
E∈E ∗
We say that H is an (n, k, D, δ)-hypergraph when (i), (ii) and (iii) hold for H.
This lemma says that if H is an (n, k, D, δ)-hypergraph then it contains an induced
(n∗ , k ∗ , D∗ , δ ∗ ) -hypergraph H∗ where
δ∗ = δeε(r−1) ,
n∗ = ne−ε (1 ± δ ∗ ),
k∗ = keε(r−1) ,
D∗ = De−ε(r−1) (1 ± δ ∗ )
To see why these are the relevant new parameters, consider for instance, the parameter
δ;
δ ∗ De−ε(r−1)
d∗ (x, y) ≤ d∗ (x, y) < δD = δ ∗ D∗ forces δ = = δ ∗ e−ε(r−1)
D
and that gives δ ∗ = δeε(r−1) . Similarly for the parameter k, d∗ (x) ≤ d(x) < kD = k ∗ D∗
forces k ∗ = De−ε(r−1)
kD
= keε(r−1) .
Let us see if this lemma is good enough to take us through. If we repeat the nibble
t times (where t shall shortly be determined) then we have δ = δ0 < δ1 < · · · < δt with
δi = δi−1 eε(r−1) , and H = H0 ⊃ H1 ⊃ · · · ⊃ Ht . Note that this establishes a cover of size
t−1
P
|Ei | + |Vt | where
i=1
i
Y
−ε −εi
|Vi | = |Vi−1 |e (1 ± δi ) ≤ ne (1 + δj )
j=1
and
i
ε|Vi−1 | εne−ε(i−1) Y
|Ei | = (1 ± δi ) ≤ (1 + δj ),
r r j=1
73
so the size of the cover is
t−1 t−1
! t t
X X εne−ε(i−1) Y
−εt
Y
|Ei | + |Vt | ≤ (1 + δi ) + ne (1 + δi )
i=1 i=1
r i=1 i=1
t
! t
!
Y n X
= (1 + δi ) εe−ε(i−1) + re−εt
i=1
r i=1
t
!
Y n ε −εt
≤ (1 + δi ) + re .
i=1
r 1 − e−ε
Pick t such that e−εt < ε - for instance take t = 2ε−1 log(1/ε). For this t, pick δ small
Qt ε
enough such that (1 + δi ) ≤ 1 + ε. Since lim = 1, the limit of this expression
i=1 ε→0 1 − e−ε
goes to nr as ε → 0. Therefore, all that remains is to prove the ‘Nibble’ Lemma 43.
Proof. (Proof of Lemma 43) We will use subscripts δ(i) to denote various small constants.
Keeping with the probabilistic paradigm, we pick each edge of H independently with
probability Dε . Let E ∗ be the set of picked edges.
We say x ∈ V is good if d(x) = (1 ± δ)D, else we say that x is bad. Note that
So
(1 − δ)2 Dn Dn
≤ |E| ≤ (1 + (k + 1)δ) which gives
r r
Dn
|E| = (1 ± δ(1) ).
r
Hence X ε Dn εn
E[|E ∗ |] = P(E is picked) = (1 ± δ(1) ) = (1 ± δ(1) ).
E∈E
D r r
Let 1E = 1{E is picked} . By independence, Var(|E ∗ |) = Var(1E ) ≤ E[|E ∗ |]. By Cheby-
P
E∈E
∗ |)
shev’s inequality, we get P |E | − E[|E |] > δ(2) E[|E |] ≤ δVar(|E
∗ ∗ ∗
2 E[|E ∗ |]2 . So if n 0, then
(2)
εn εn
|E ∗ | = (1 ± δ(1) )(1 ± δ(2) ) = (1 ± δ ∗ )
r r
74
with high probability, yielding (a).
∗
X ε d(x) X ε D(1+δ)
E[|V |] = 1− ≥ 1− ≥ e−ε (1 − δ(3) ) · (1 − δ)n.
x∈V
D x good
D
So
ne−ε (1 − δ(3) )(1 − δ) ≤ E[|V ∗ |] ≤ ne−ε (1 + δ(4) + δeε )
implying
E[|V ∗ |] = ne−ε (1 ± δ(5) ).
where
75
To prove (c), suppose x survives after the removal of E ∗ . Fix an E ∈ E such that
E 3 x. We wish to estimate the probability that E also survives conditioned on the
assumption that x survives. Let FE = {F ∈ E : x ∈/ F, F ∩ E 6= ∅}. Then E survives if
and only if FE ∩ E ∗ = ∅.
Call E ∈ E bad if E contains at least one bad vertex. Suppose x is good, and E is
good. Then
r−1
ε (r−1)(1±δ)D−( 2 )δD
P(E survives | x survives) = 1 − (1 ± δ(7) )
D
ε (r−1)D
= 1− (1 ± δ(8) ).
D
Let Bad(x) := {E : E is bad and does not contain x}. If |Bad(x)| < δ(9) D, then
E[d∗ (x)] = De−ε(r−1) (1 ± δ(10) ).
Now, the question is: how many x have |Bad(x)| ≥ δ(9) D? Call x Incorrigible if x
is good but |Bad(x)| ≥ δ(9) D. We now want to bound the size of VINCOR := {x ∈ V :
x is incorrigible}. Note that
|{(x, E) : |Bad(x)| ≥ δ(9) D}| ≥ δ(9) D · |VINCOR |.
On the other hand,
|{(x, E) : |Bad(x)| ≥ δ(9) D}| ≤ |{(x, E) : E is bad}|
≤ r|{(x, E) : x is bad}|
≤ r(kD)(δn).
r(δn)k
Hence, |VINCOR | := δ ∗ n = δ(9)
. Therefore, except at most δ ∗ n vertices, the remaining
vertices x satisfy E[d∗ (x)] = De−ε(r−1) (1 ± δ(10) ).
Let 1E = 1{E survives} . For those x that are neither incorrigible nor bad,
X X
Var(d∗ (x)) = Var(1E ) + Cov(1E , 1F )
E∈E E6=F
X
∗
≤ E[d (x)] + Cov(1E , 1F ) + δ(9) D · (1 + δ)D · 1
E6=F good
X X
∗
≤ E[d (x)] + Cov(1E , 1F ) + Cov(1E , 1F )
E6=F good E6=F good
E∩F ={x} |E∩F |>1
+ δ(9) (1 + δ)D2
X
≤ E[d∗ (x)] + Cov(1E , 1F ) + (r − 1)δD · (1 + δ)D · 1
E6=F good
E∩F ={x}
+ δ(9) (1 + δ)D2 .
76
Now, denote by FE the collection of those edges that intersect E non-trivially. Then,
Let N = |{x good : d∗ (x) 6= e−ε(r−1) D(1 ± δ ∗ )}|. Now Markov’s inequality gives
E[N ] < δ(11) n, so all except δ ∗ n vertices satisfy (c). This completes the proof of the
Nibble lemma, and hence of the proof of the Erdős-Hanani conjecture as well.
Remark: The theory of Steiner designs is one of the oldest problems in Design theory.
The existence and explicit constructions of Steiner 2-designs for all feasible parameters
(parameters (n, r) for which the corresponding numbers are integers) and very large set
sizes (n 0) is the pioneering work of R. Wilson [31, 32, 33] beginning in the early
70s. The problem of existence of Steiner t-designs for t ≥ 6 was open completely until
P. Keevash [17] in 2014 settled this by a tour-de-force algebraic probabilistic argument.
Keevash’s proof is a little too involved for us to include in this book, but we will see some
relevant ideas in later chapters.
77
7 Basic Concentration Inequalities - The
Chernoff Bound and Hoeffding’s inequality
It is often the case that the random variable of interest is a sum of independent ran-
dom variables. In many of those cases, the theorem of Chebyshev is much weaker than
what can be proven. Under reasonably mild conditions, one can prove that the random
variable is tightly concentrated about its mean, i.e., the probability that the random vari-
able is ‘far’ from the mean decays exponentially, and this exponential decay is crucial in
several probabilistic applications.
The distribution of the sum of i.i.d random variables, suitably normalized, behaves
like the Standard Gaussian; that is the import of the Central Limit Theorem (CLT for
short) in Probability, so in that sense, the Chernoff bound has its antecedents from much
earlier - indeed this goes back to Laplace. But the CLT is a limiting theorem, whereas
the Chernoff bounds are not. This qualitative difference is also very useful from an algo-
rithmic point of view.
In this chapter, we consider a few prototypes1 of such results along with some combi-
natorial applications.
79
Proof. Consider eλXi , with λ to be optimized. Then E[eλXi ] = (eλ + e−λ )/2 = cosh(λ).
Taking the Taylor expansion, we see that
∞ ∞
X λ2k X (λ2 /2)k 2
E eλXi = = eλ /2
<
k=0
(2k)! k=0 k!
Since the Xi are independent,
P Y 2
E eλSn = E e λXi = E[eλXi ] = cosh(λ)n < eλ n/2
i
By Markov’s Inequality,
E[eλSn ] 2
P eλSn > eλa ≤ < eλ n/2−λa
eλa
2
Since P[Sn > a] = P[eλSn > eλa ], we see that P[Sn > a] < eλ n/2−λa . Optimizing this
2
bound by setting λ = a/n, we see that P[Sn > a] < eλ n/2 , as desired.
Proposition 44 can be generalized and specialized in various ways. We state two such
modifications here.
Proposition 45 (Chernoff Bound (Generalized Version)). Let p1 , . . . , pn ∈ [0, 1], and let
Xi be independent random variables such that PnP[Xi = 1 − pi ] = pi and P[Xi = −pi ] =
1 − pi , so that E[Xi ] = 0 for all i. Let Sn = i=1 Xi . Then
2 /n 2 /n
P[Sn > a] < e−2a and P[Sn < −a] < 2e−2a
Letting p = n1 (p1 + . . . + pn ), this can be improved to
2 /pn+a3 /2(pn)2
P[Sn > a] < e−a
Proposition 46 (Chernoff Bound (Binomial Version), see [16]). Let X ∼ Binomial(n, p),
and let 0 ≤ t ≤ np. Then
−t2
P[|X − np| ≥ t] ≤
2(np + t/3)
2 /3np
and the last expression is at most 2e−t if t ≤ np.
In all three cases, the independence assumption can be removed while preserving the
exponential decay (although with a worse constant).
Before we move on to some applications, we make a quick remark. While the afore-
mentioned version of the Chernoff bound holds always, its efficacy, especially when we
wish to establish that some event occurs with high probability only works when np → ∞.
If p = O(1/n) so that np = O(1) then this bound does not work as well. And this is again
an observation that goes back to Poisson; the Binomial distribution, suitably normalized,
can be well approximated by the standard Gaussian when the expected value goes to
infinity with n, and if the expected value is bounded by a constant, then for large n, the
behavior is more like the Poisson. We will return to this point in a later chapter.
80
7.2 First applications of the Chernoff bound
We start with a return to a result from a previous chapter. Recall the randomized
algorithm to determine the frequency moments using a sub-linear number of bits. We had
a sequence of random variables (Y1 , . . . , Yr ) with E(Yi ) = Fk and P(|Y −Fk | > λFk ) ≤ 1/8.
Our final report was the random variable Z = Median(Y1 , . . . , Yr ) and our interest was
in obtaining a bound for r such that P(|Z − Fk | > λFk ) ≤ ε. Towards that end, we
had defined the random variable Ỹi to equal 1 if Yi ∈ [Fk − λFkP , Fk + λFk ] and zero
r
otherwise, so that Ỹi is distributed as Ber(7/8). Setting Z̃ := i=1 Ỹi allows us to
estimate this probability by the bounds of a Binomial random variable Bin(r, 7/8). If
Z ∈/ [Fk − λFk , Fk + λFk ], then Z̃ is√ less than r/2. The second moment method gives
P(|Z − Fk | > λFk ) ≤ ε for r = O(1/ ε). Instead, if we use the Chernoff bound, then we
have
−9r
P(|Z − Fk | > λFk ) ≤ P(Z̃ < r/2) ≤ exp
126
1
so that we can, as claimed earlier, take r = O log( ε ) to get the same probability bound
as stated.
But here, we shall prove a much more modest statement, which is also the starting
point for many of the improved results in this direction.
Proposition
√ 47. Suppose H is a hypergraph on n vertices and m edges. Then disc(H) =
O( n log m).
Proof. To show an upper bound on the discrepancy, we need to exhibit a coloring c for
which the discrepancy is small. Pick a random coloring, i.e., for each v assign it the color
2
This was Erdős’ notion of the best/cleanest possible proof of any result. For more on this, see [1].
81
P
1 or −1 independently. For an edge E with |E| = k, let XE := v ∈ Ec(v). Then
2
E(XE ) = 0, so √ using the Chernoff bound to decree P(|XE | > t) ≤ 2 exp−t /2k < 1/m
suggests t = O( n log m) since k ≤ n. By choice, this implies that the expected number
of edges with discrepancy greater than t is less than one, so again by the method of
expectations(!), there is a coloring c such that all the edges discrepancies are at most t.
This completes the proof.
Remark: It is not hard√ to show that there are hypergraphs with n vertices and n edges
with a discrepancy of Ω( n), so the result of Spencer is asymptotically tight. But as we
have already seen with the Rödl nibble method, some probabilistic constructions cannot
achieve the desired goal in a single step process, and the proofs for the sharp discrepancy
follow a random process. We will address random processes in a later chapter.
Unlike the graph case where there is a very simple algorithmic characterization of
2-colorability, the problem of deciding when a hypergraph has property B is far from well
understood. Indeed, one of Erdős/ oldest and returning motifs was to determine m(n)
the minimum number of edges in an n-uniform hypergraph that does not have property B.
One of the earliest observations regarding property B was the following due to Lovász,
which effectively comes from an algorithmic procedure to attempt to 2-color the vertices:
If H is such that |E1 ∩ E2 | =
6 1 for all pairs of distinct edges E1 , E2 , then H is 2-colorable,
and therefore has property B. Indeed, number the vertices 1, . . . , n. Color each vertex,
in order, avoiding monochromatic edges. It is easily seen that by the assumptions on
H, this must yield a valid coloring. So for now, let us work with a situation where the
hypergraph violates this condition in the extreme, i.e., suppose that H has the property
that every pair of edges meet at exactly 1 vertex. Examples of such hypergraphs arise
from the projective planes which we have encountered in Chapter 1. The Fano Plane,
shown here with each edge represented as a line, shows that such hypergraphs are not
necessarily 2-colorable. Following Erdős, we now define a stronger version of property B,
which we will refer to as Property B(s).
82
If we set s = n − 1, then for n-uniform hypergraphs, property B(s) is the same as the
usual property B.
Fix a line L, and let SL = |S ∩ L|. Note that E[SL ] = (n + 1)p = f (n). By the
Chernoff Bound, P[|SL − f (n)| > f (n)/2] < 2e−f (n)/12 . Since ¶n contains n2 + n + 1 lines,
P[There exists L such that |SL − f (n)| > f (n)/2] < 4e−Cf (n) n2
for some absolute constant C. Therefore, if eCf (n) > Ω(n2 ), a set S with the desired
property exists. This in turn tells us that setting f (n) = O(log n) gives us the stated
result for sufficiently large n.
Remark: Erdős conjectured that for the projective planes, a much stronger state-
ment holds: There exists an absolute constant s such that for all sufficiently large n, the
projective plane of order n has property B(s).
The problem of determining m(n) which was alluded to earlier remains one of the
most elusive problems in extremal combinatorics. We will, later in this book, see a proof
of the statement r
n n
Ω 2 ≤ m(n) ≤ O(n2 2n )
log n
which still marks the best known result to date.
83
7.5 Graph Coloring and Hadwiger’s Conjecture
IN this section we see a counterexample to a conjecture of Hájos in an attempt to solve
the famous Hadwiger conjecture. To get there, we first need a couple of definitions. For
an edge e = uv in a graph G, the contraction of e is a graph denoted G/e obtained by
deleting the vertices u, v and replacing it with a new vertex ve which is adjacent to all
the neighbors of u and v counting with multiplicity. In other words, if a vertex w was
adjacent to both u, v in G, then ve has two edges to w in G/e.
One can think of H as a subgraph of G in which disjoint paths are allowed to act as
edges. Note that if H is a subdivision of G, then H is also a minor of G; however, the
converse is false in general.
Due to the apparent difficulty of Hadwiger’s Conjecture, Hajós strengthened the con-
jecture to state that if χ(G) ≥ p, then G contains Kp as a subdivision. But as it is usually
the case, this strengthened conjecture turned out to be false as was shown by Catlin via
an explicit counterexample. However, the motivating question really is: How good a
conjecture is the strengthened version? If the counterexamples were freak instances, then
3
The four color theorem states that ever planar graph, i.e., graph that can be embedded on the plane,
is 4-colorable.
84
maybe one at least had an asymptotically strong statement since subdivisions are eas-
ier to understand from a verification perspective unlike minors. But later, Erdős and
Fajtlowicz put this possibility to rest showing that the conjecture is almost never true.
n
Theorem 53 (Erdős, Fajtlowicz). There exist graphs G such that χ(G) ≥ 3 log n
and G
has no K3√n subdivision.
Proof. Let G = (V, E) be a random graph on n vertices, with each edge placed in the
graph with probability 1/2. We first show that with high probability, G has large chro-
matic number, and then also that G has no large Kp subdivision.
Since χ(G) ≥ n/α(G), let us examine an upper bound for α(G). We have
P[α(G) ≥ x] = P[there exists a set of x vertices which form an independent set]
x
n −(x2) n
≤ 2 ≤ x−1
x 2 2
Set x = 2 log n + 3 so that 2(x−1)/2 = 2n; then
2 lg n+3
1 1
P[α(G) ≥ x] ≤ = 2
2 8n
so with high probability, α(G) ≤ 2 log n + 3 < 3 log n.
must contain as many disjoint paths. Now, each vertex of G must either be a vertex of
the Kt subdivision, or else be contained in at most one of the paths. Since there are n
t
vertices in G, √
we end up forcing many of these paths to be single edges if 2 = Ω(n).
Setting t = 3 n the argument outlined gives us that at least 3n of the paths in the
subdivision of Kt must be single edges of G.
√
Fix a set U ⊂ V , |U | = 3 n. If U forms the vertices of a K3√n subdivision, then
e(U ) ≥ 3n. By the Chernoff Bound we have
1
P[|e(U ) − E[e(U )]| ≥ E[e(U )]] ≤ 2e−E[e(U )]/48
4
so that √
P[e(U ) ≥ 3n] ≤ 2e−(9n−3 n)/192
< e−n/25
which implies that
P[U forms the vertices of a K3√n subdivision] < e−n/25
Hence by the union bound
√ 3√n
n −n/25 e n
P[G has a K3√n subdivision] < √ e < e−n/25 = o(1)
3 n 3
85
as n → ∞. So with high probability, G does not contain a K3√n subdivision.
n
Thus, it follows that with high probability, χ(G) ≥ 3 log n
and G has no K3√n subdi-
vision, as desired.
Remark: This result shows that the chromatic number of a graph is a more esoteric
global feature of the graph. In fact, the determination of the chromatic number of the
random graph Gn,p is an interesting problem which still has many unresolved facets, and
we will examine some related results in the forthcoming chapters.
The fact that the chromatic number of a graph is a somewhat enigmatic invariant is
further evidenced by the following theorem due to Erdős: Given ε > 0, and an integer
k, there exist graphs G = Gn (for n sufficiently large) such that χ(G) > k, while every
induced subgraph H on εn vertices satisfies χ(H) ≤ 3. This was again based on a random
graph construction, and the interested reader can see this result in [5].
To make this more precise, we need the notion of an ε regular partition. For a pair
of sets (not necessarily disjoint) U, W of vertices of a graph G, we denote by e(U, W )
the number of pairs (u, w) ∈ U × W such that uw ∈ E(G)and by teh density of the
pair (U, W ) we means d(U, W ) := e(U,W )
|U ||W |
. A pair (u, W ) is called ε-regular if whenever
A ⊂ U, B ⊂ W with |A| ≥ ε|U | and |B| ≥ ε|W | then the densities of the pairs (A, B)
and (U, W ) differ by at most ε, i.e., |d(A, B) − d(U, W )| ≤ ε. A partition of the vertex
set V = ∪ki=1 Vi is called an ε-regular partition if the number of irregular pairs (Vi , Vj )
from among these sets is at most εk 2 . The regularity lemma then states the following:
Given ε > 0, there exists M = Oε (1) such that every graph admits an ε-regular partition
into at most M parts. We will see this in greater detail in a subsequent chapter (Chapter ).
The regularity lemma has been found to be of deep consequence in extremal graph
theory, and since the proof procedure has an algorithmic flow to it, the regularity lemma
also finds applications in Theoretical Computer Science (Property Testing). However,
one drawback in the algorithmic application of the Regularity lemma is that the number
M that is obtained through the proof is a tower of 2’s with the height of the tower is
Ω(1/ε5 ). This makes the result immensely interesting and useful theoretically, but com-
pletely useless from a practical point of view. The natural question that arises is: Do we
really need such a large M ? Gowers [13]) settled this question in the affirmative. More
precisely, there exist graphs G for which every ε-regular partition has partition size a
4
If the graph is not dense, then the statement in the usual version is completely tautological.
86
tower of 2’s with height 1/εc where 0 < c < 1 is an absolute constant.
While we shall not prove Gowers’ result here, we shall give an easier and a weaker
version of his result which also appears in the same paper.
Theorem 54. Given 1/1024 > ε > 0, there exist graphs such that every ε-regular parti-
tion has size a tower of 2’s with the height of the tower being of the order log2 (1/ε).
87
8 Property B: Lower and Upper bounds
8.1 Introduction
For an integer n ≥ 2, an n−uniform hypergraph H is an ordered pair H = (V, E), where
V is a finite non-empty set of vertices and E is a family of distinct n−subsets of V. A
2-coloring of H is a partition of its vertex set hv into two color classes, R and B (for red,
blue), so that no edge in E is monochromatic. A hypergraph is 2-colorable if it admits a
2-coloring. For an n−uniform hypergraph, we define
2-colorability of finite hypergraphs is also known as “Property B”. In [?], Erdős showed
that 2n−1 < m(n) < O(n2 2n ).
Let us start with a brief look at these results. The first of these, namely, that any
n-uniform hypergraph with at most 2n−1 edges is 2-colorable is an immediate consequence
of considering a random coloring and computing the expected number of monochromatic
edges. The upper bound for m(n) too follows from a simple randomized construction,
and here is the gist.
q
1 n
n n
In [10], Beck proved that m(n) = Ω(n 3 2 ) and this was improved to m(n) = Ω 2 log n
89
We will begin with some notation, if an edge S ∈ H is monochromatic, we will denote
it as S ∈ M,, and in addition, if it is red (blue), we write S ∈ RED (S ∈ BLU E). Also
for a vertex v ∈ V, v ∈ RED and v ∈ BLU E have a similar meaning. We shall freely
abuse notation and denote by RED (resp. BLU E) both, the set of points colored RED
as well as the set of edges of H that are colored RED and this should not create any
confusion, hopefully.
It is easy to bound P1
pn + (1 − p)n
P1 = P (S(1) ∈ RED, S(2) ∈ RED) + P (S(1) ∈ BLU E, S(2) ∈ RED) =
2n
2(1 − p)n
≤
2n
(8.2)
In (8.2), we used the fact that p is small, in particular p < 0.5, this will be validated in
the following analysis. Towards analyzing P2 , note that, for the vertices that were blue
90
after step 1 to have turned red, they must belong to blue monochromatic edges, i.e., for
each v ∈ S that is blue, there is an edge T such that T ∩ S 6= Φ and T ∈ BLU E. Define
EST := event S(1) ∈
/ M, T (1) ∈ BLU E, S ∩ T 6= Φ and S(2) ∈ RED
Then we have
X
P2 ≤ P (EST ) (8.3)
T 6=S
For a fixed triple (S, T, U ), for U to even flip it must belong to some other edge which is
blue after step 1. But for an upper bound, let is just flip to red.
1 p
P(EST U ) ≤ p|S∩T |+|U | = (2p)|S∩T |−1 p|U |
22n−|S∩T | 22n−1
p
≤ p|U |
22n−1
Using this in (8.3), we have
n−1
X p |U | p X n − 1 |U |
P(EST ) ≤ p ≤ p
22n−1 22n−1 |U |
U ⊆S\T |U |=0
n−1 n
(1 + p) p 2p(1 + p) 2p exp(np)
= ≤ ≤
22n−1 22n 22n
X 2mp exp(np)
=⇒ P(EST ) ≤ (8.4)
S6=T
22n
For an arbitrary > 0, let p = (1+)nlog k , then k(1 − p)n ≤ k exp(−np) = k − and
3+
k 2 p exp(np) = k (1+)
n
log k
. So, (8.5) gives
2k 3+ (1 + ) log k
E(N ) ≤ 2k − + (8.6)
n
So, if k ∼ n1/3−2/3 , then (8.6) will be less than 1, so that P(N = 0) > 0.
91
8.3 The Radhakrishnan-Srinivasan (R-S) improvement
Theorem 56 ([23]).
r
n n
m(n) = Ω 2 (8.7)
log n
(R-S) take Beck’s recoloring idea and improve it. Their technique is motivated by the
following observation
Observation 57. Suppose S is monochrome after step 1, then it suffices to re-color just
one vertex in S, the rest can stay as is. So, after the first vertex in S changes color, the
remaining vertices can stay put unless they belong to other monochromatic edges.
This motivates the following modification, do not re-color all vertices simultaneously,
put them in an ordered list and re-color one vertex at a time. Here is the modified step
2.
Step 2: For a given ordering, if the first vertex lies in a monochromatic edge, flip its
color with probability p. After having colored vertices 1, . . . , i − 1, if vertex i is in a
monochromatic edge after having modified the first i − 1 vertices, then flip its color with
probability p.
The analysis proceeds along similar to that in the previous section until (8.2). Con-
sider P2 . The last blue vertex v of S changes color to red because there is some T 6= S
such that T was blue after step 1 and |S ∩ T | = 1. We shall say that S blames T (which
we shall denote by S 7−→ T ) if this happens. Also, none of the vertices in T that were
considered before v change their color to red. To summarize,
Then,
!
_ X
P2 ≤ P S 7−→ T ≤ P(S 7−→ T ) (8.8)
T 6=S T 6=S
Fix an ordering π on the vertices. With respect to this ordering, let v be the (iπ + 1)th
vertex in S and the (jπ + 1)th vertex in T . If the index of w is less than that of v, we
92
write is as π(w) < π(v). Also define,
Tv− and Tv+ have similar meanings. To compute P(S 7−→ T ), we will need to list some
probabilities
p
1. P(v(1) ∈ BLU E, v(2) ∈ RED) =
2
1
2. P ((T \ v)(1) ∈ BLU E) =
2n−1
1
3. P(Sv+ (1) ∈ RED) =
2n−iπ −1
4. P(Tv− (2) ∈
/ RED | T (1) ∈ BLU E) = (1 − p)jπ
1+p
P((w(1) ∈ RED) or (w(1) ∈ BLU E, w(2) ∈ RED) | S ∈
/ M) =
2
Let the ordering π be random. Then P(S 7−→ T ) = Eπ P(S 7−→ T | π). A random ordering
is determined as follows. Each vertex picks a real number uniformly at random from the
interval (0, 1), this real number is called its delay. Then the ordering is determined by
the increasing order of the delays.
Lemma 59.
p
P(S 7−→ T ) = E (P(S 7−→ T | π)) ≤ (8.10)
22n−1
93
conditioning on `(v) ∈ (x, x + dx) and with some abuse of notation, we can write
1
P (S 7−→ T, |U | = u | `(v) = x) = xu p1+u (1 − px)n−1
22n−1 |{z} |{z}
| {z } `(U )≤x U ∪ {v} flip to red
coloring after step 1
n−1 Z 1
X n−1 1
=⇒ P(S 7−→ T ) ≤ 2n−1
p1+u xu (1 − px)n−1 dx
u=0
u 0 2
n−1
!
1
n−1
Z
p X
= (px)u (1 − px)n−1 dx
22n−1 0 u=0
u
Z 1
p
= (1 − p2 x2 )n−1 dx
22n−1 0
p
≤ (8.11)
22n−1
mp 2(1−p)n
Proof of theorem 56. Using (8.11) in (8.8), we get P2 ≤ 22n−1
. Recall that P1 ≤ 2n
,
summing over all edges S, we get
k(1 − p)n k 2 p
E(N ) ≤ + (8.12)
2 2
Compare (8.12) with (8.5) and note that exp(np) is not present in (8.12). For an arbitrary
> 0, setting p = (1+)nlog k and approximating (1 − p)n ≈ exp(−np), we get
k 2 log k
−
E(N ) ≤ 0.5 k + (1 + ) (8.13)
n
q
n
Clearly k ∼ log n
makes E(N ) < 1 giving the result.
Spencer’s proof of lemma 59. Aided by hindsight, Spencer gives an elegant combinatorial
argument to arrive at (8.11). Given the pair of edges S, T with |S ∩ T | = 1, fix a matching
between the vertices S \ {v} and T \ {v}. Call the matching µ := {µ(1), . . . , µ(n − 1)},
where each µ(i) is an ordered pair (a, b), a ∈ S \ {v} and b ∈ \{v}, define µs (i) := a and
µt (i) := b. We condition on whether none, one or both vertices of µ(i) appear in Sv− ∪ Tv− ,
for each 1 ≤ i ≤ n − 1. Let Xi = |µ(i) ∩ (Sv− ∪ Tv− )|. Since the ordering is uniformly
94
random, Xi and Xj are independent for i 6= j. From (8.9), consider E ((1 − p)jπ (1 + p)iπ ).
Pn−1 − Pn−1 −
E (1 − p)jπ (1 + p)iπ | µ ∩ Sv− ∪ Tv− = E (1 − p) i=1 I(µ(i)∩Sv 6=Φ) (1 + p) i=1 I(µ(i)∩Tv 6=Φ)
n−1
!
Y − −
=E (1 − p)I(µs (i)∈Sv ) (1 + P )I(µt (i)∈Tv )
i=1
n−1
I(µs (i)∈Sv− ) I(µt (i)∈Tv− )
Y
= E (1 − p) (1 + P )
i=1
n−1
Y
1
= (1 − p + 1 + p + 1 + 1 − p2 )
i=1
4
n−1
Y p2
= 1− <1
i=1
4
As before, suppose e(H) = k2n−1 . The coloring algorithm puts all the vertices in a
(random) order, and processes one vertex at a time. A vertex is give a default color of
BLU E unless it ends up coloring some edge BLU E in which case, we color the vertex
RED. Note that the only monochromatic edges are all RED at the end of this procedure.
The ordering of the vertices is decided in the same manner as in the R−S algorithm. Each
vertex v picks independently and uniformly, Xv ∈ [0, 1] at random. As observed in the R-
S algorithm, if an edge is colored RED at the end of this procedure, there is some edge T
such that |S ∩T | = 1, and the common vertex of these edges is the last vertex of S and the
first vertex of T . We shall, following Cherkashin and Kozik sat that in this case (S, T ) is
a conflicting pair. We shall estimate the probability that the coloring produces no RED
95
edges, and to do that we shall estimate the probability that there exists a conflicting pair.
Let 0 < p < 1 be a parameter. Call an edge S an p-extreme edge if for each v ∈ S,
Xv ≤ 1−p 2
or Xv ≥ 1+p
2
. To estimate the probability that there is a conflicting pair, we
consider the two possibilities: One of the pair of edges is an extreme edge, and the other
case, when neither n of the edges is extreme. The probability of the former is at most
2 · (k2n−1 ) · 1−p
2
= k(1 − p)n
. In the other case, note that if S ∩ T = {v} then we must
1−p 1+p
have Xv ∈ ( 2 , 2 ) and for all the other u ∈ S, w ∈ T we have Xu < Xv and Xv < Xw
n−1
and the probability of this is (k2n−1 )2 · pXvn−1 (1 − Xv )n−1 < k 2 4n−1 · p 41 = pk 2 .
Hence, if pk 2 + k(1 − p)n < 1 then we are done, and the asymptotics for this are the
same as seen in the discussion following the R-S algorithm.
96
9 More sophisticated concentration: Talagrand’s
Inequality
A relatively recent, extremely powerful, and by now well utilized technique in prob-
abilistic methods, was discovered by Michel Talagrand and was published around 1996.
Talagrand’s inequality is an instance of what is refered to as the phenomenon of ‘Concen-
tration of Measure in Product Spaces’ (his paper was titled almost exactly this). Roughly
speaking, if we have several probability spaces, we many consider the product measure on
the product space. Talagrand showed a very sharp concentration of measure phenomenon
when the probability spaces were also metrics with some other properties. One of the
main reasons this inequality is so powerful is its relatively wide applicability. In this
chapter, we briefly study the inequality, and a couple of simple applications.
Here α can be thought of as a cost (set by an adversary) for changing each coordinate
of x to get to some event y ∈ A. Then we can intuitively think of ρ as the worst-case
cost necessary to get from x to some element in A by changing coordinates.
Now for any probability space we can define At = {x ∈ Ω | ρ(x, A) ≤ t}, as above.
Theorem 61. (Talagrand’s Inequality)
2 /4
P[A](1 − P[At ]) ≤ e−t
97
For the proof see p. 55 of Talagrand’s paper “Concentration of Measure and Isoperi-
metric Inequalities in Product Spaces.” For a very readable exposition, we refer the reader
to [28].
We can also define the measure ρ in another, perhaps more intuitive way. For a given
x ∈ Ω and A ⊆ Ω let
and let V (x, A) be the convex hull of Path(x, A) (in [0, 1]n ). We can think of Path(x, A)
as the set of all possible paths from x to some element y ∈ A, that is, the set of choices
given some cost vector.
Note that it is now clear that we can use min instead of sup and inf, since the convex
hull is closed. It is also clear now that ρ(x, A) = 0 iff (0, 0) ∈ V (x, A) iff x ∈ A.
98
than f (b) such that if x agrees with ω 0 on I then X(x) ≥ b. Now consider the penalty
−1/2
function
P αi = 1{i∈I} (|I|) . By our assumption that ω 0 ∈ At , there exists y ∈ A such
that yi 6=ω0 αi ≤ t. Then the number of coordinates in which y and ω 0 disagree is no
ip p
more than t |I| ≤ t f (b). Now pick z ∈ Ω suchp that zi = yi for all i 6∈ I and zi = ωi0 for
i ∈ I. Since z disagrees withpy on no more than t f (b) coordinates and X p is 1-Lipschitz
we have |X(z) − X(y)| ≤ t f (b). But since y ∈ A, we have X(y) < b − t f (b), so by
the closeness of X(y) and X(z) we have |X(z)| < b. But since z agrees with ω 0 on the
coordinates of I, f -certifiability guarantees that X(z) ≥ b, and we have a contradiction.
p
Here we tend to think of t as some large multiple of E(X), so that we can rewrite
this as p
P[|X − E(X)| > k E(X)] ≤ e−Ω(1)
or
1 ε
P[|X − E(X)| > E(X) 2 +ε ] ≤ e−E(X) .
The probability space in question is a product of the nd/2 binary probability spaces
corresponding to retaining each edge, so that the events are tuples representing the out-
comes for each edge. Changing the outcome of a single edge can isolate or un-isolate at
99
most two vertices, so X is 2-Lipschitz. Furthermore, for any value of H with X(H) ≥ s,
we can choose one edge adjacent to each of s non-isolated vertices whose existence in
another subgraph H 0 of G will ensure that the same s vertices are not isolated in H 0 , i.e.
X(H 0 ) ≥ s. Thus X is also 1-certifiable, and Talagrand gives us
h p i 2
P |X − E[X]| > (60 + k) E[X] ≤ e−k /32
so with
p high probability
√ the number of non-isolated vertices is within an interval of length
O( E[X]) = O( n) about the mean. Compare this to the result usingq Azuma on the
n
edge-exposure martingale, which would only give an interval of size O 2
= O(n)
about the mean.
en k ek
n 1
P[X ≥ k] ≤ ≤
k k! k kk
6√n
and thus P[X ≥ 3n] ≤ 3e → 0. On the other hand, there is always an increasing or
√
decreasing subsequence of length n − 1, so we actually find that with high probability
1√ √
n≤X≤3 n
3
√
so E[X] = O( n).
Talagrand’s
p inequality now tells us that X is with high probability in an interval of
1/4
√ O( E[X]) = O(n ). Note that Azuma would only give an interval of length
length
O( n), since the corresponding martingale would be of length n. The strength of Tala-
grand is that unlike Azuma it does not depend on the dimension of the product space.
100
Johansson (2004): For M-free G, χ(G) ≤ O( logDD ).
T heorem: If G is M-free with max. degree D, then χ(G) ≤ (1 − α)D for some α > 0.
101
9.4 Almost Steiner Designs
In this section, we shall look now at a result due to Hod, Ferber, Krivelevich, and Su-
dakov [14], which achieves something very close to a Steiner design. Recall that a Steiner
t-design with parameters (k, n) (and denoted S(t, k, n)) is a k-uniform hypergraph on n
vertices such that every t-subset of the vertices is contained in exactly of the edges of the
hypergraph. A simple counting argument shows that the number of edges of a Steiner
(n)
t-design S(t, k, n) is kt .
(t)
We shall following [14] prove that, for n sufficiently large, there exists a k-uniform
hypergraph such that every t-subset of the vertex set is in at least one edge, and at most
2 edges, and also, the number of edges is asymptotically close to the correct number.
More precisely,
Theorem 67. For n sufficiently large, and given fixed integers k > t ≥ 2 there exist
k-uniform hypergraphs H on the vertex set V satisfying
(n)
• e(H) = (1 + o(1)) kt .
() t
Proof. For starters, one might want to start with an almost tight packing H and then
for each t-subset T that was not covered by the packing, we would like to pick another
k-subset that accounts for covering T . This motivates the following
Definition 68. For a k-uniform hypergraph H on [n] the Leave hypergraph associated
with H is the t-uniform hypergraph
LH := {T ⊂ [n] : |T | = t, T 6⊂ E for any E ∈ H}.
Thus for every T in the Leave Hypergraph we wish to choose another k edge from the
complete k-uniform hypergraph in order to cover every t-subset of [n]. In particular, one
would like that the size of LH is small in comparison to the size of H. This was already
achieved by Grable; in fact he proved
Theorem 69. (Grable, 1999) Let k > t ≥ 2 be integers. There exists a constant ε =
ε(k, t) > 0 such that for sufficiently large n there exists a partial Steiner design H =
([n], E) satisfying the following:
For every 0 ≤ l < t every set S ⊂ [n] with |S| = l is contained in O(nt−l−ε ) edges of
the leave hypergraph LH .
102
In particular, the size of LH is at most O(nt−ε ). But by picking one edge arbitrarily
to cover each T ∈ LH we run the risk of having some t subset covered more than twice
- something we do not want. Thus we need to be a bit choosy in picking edges to cover
the edges of the leave hypergraph.
For each A ∈ LH define TA := {E : |C| = k, A ⊂ C}. Firstly, note that we can form
a refinement of TA as follows:
[
SA := TA \ TB .
B∈LH ,B6=A
In other words, SA consists of all E ∈ TA such that no other t-subset (other than A) of
the leave hypergraph is also in E. Suppose B ∈ LH and |A ∩B| = i. Then the number
of sets E ∈ TA that are not in SA (on account of B) is n−2t+i
k−2t+i
. Let ni (A); = {B ∈ LH :
B 6= A, |B ∩ A| = i} . If we fix S = |A ∩ B| is a subset of size i, it follows by the result
of Grable that there are at most O(nt−i−ε ) distinct B ∈ LH such that A ∩ B = S. Since
there are ti choices for S, it follows that ni (A) ≤ ti nt−i−ε . Thus,
X t−1
n−t n − 2t + i
|SA | ≥ − ni (A) = Θ(nk−t ) − O(nk−t−ε ) = Θ(nk−t ).
k−t i=0
k − 2t + i
So, the sets SA are all quite large.
Note also that by definition, the collections SA are pairwise disjoint for different
A ∈ LH . Thus we have plenty of choice for picking E ∈ SA for distinct A ∈ LH . This
however, will not be good enough. This distillation does ensure that different t-subsets of
the leave hypergraph are covered exactly once, but it may happen that t-subsets that were
initially covered by H may now be covered more than twice. Thus we need to be choosier.
One idea to deal with this is the following. If we can choose the edges E covering the
t-subsets of the leave hypergraph in such a way that for distinct A, B ∈ LH if we have
picked E ∈ SA , F ∈ SB and we also have |E ∩ F | < t then the issue addressed above will
not happen and then we can be sure that our second sub collection along with H will
satisfy the conditions of the theorem. Thus the sense of choosiness that we want may be
stated exactly as this:
For each t-subset A of the leave hypergraph we need to pick EA ∈ SA such that for
A 6= B we have |EA ∩ EB | < t.
One of the interesting new perspectives of the probabilistic method that this proof
suggests is the following principle:
103
In other words, let us pick a random collection RA ⊂ SA as follows. For each E ∈ SA ,
pick it as a member of RA independently and with probability p (for some suitably small
p).
Now, if for each A, we decide to make the pick EA ∈ RA , we wish to show that
|EA ∩ EB | < t for all A 6= B in the leave hypergraph. Showing that |EA ∩ E| < t for all
E ∈ R where [
R= RB
B6=A,B∈LH
Fix A ∈ LH , and suppose RA has been determined but suppose RB for the other sets
of LH are not yet made. Knowing RB for all B ∈ LH \ {A} amounts to independent
trials made by the members of [
S= SB .
B6=A,B∈LH
To say that we can make a choice EA ∈ RA , we need good bounds on how many elements
of RA are poor choices, i.e., we need an estimate on
NA := {E ∈ RA : |E ∩ F | ≥ t for some F ∈ R} .
Note that if we assume that RA has already been chosen, then NA is determined by
the outcome of |S| independent Bernoulli trials. Moreover, it is clear from the definition
that NA is 1-certifiable. Indeed, if NA ≥ s, then there are E1 , E2 , . . . , Es ∈ RA and at
most s sets F1 , F2 , . . . , Fs ∈ S such that |Ei ∩ Fi | ≥ t. In order to obtain good concen-
tration, it would help if NA were also Lipschitz.
But now, we use an old trick of Bollobas, which ‘Lipschitzises’ this random variable,
i.e., considers another related random variable which is Lipschitz, and in addition is very
close to the random variable in question.
More precisely, suppose for each A, we pick a large enough sub collection QA ⊂ RA
by adding an element of RA into QA as long as it does not intersect any of the members
already picked outside of A. Thus, QA is a subfamily of RA in which any two sets are
pairwise disjoint outside of A itself. If RA is large enough, then perhaps one can imagine
obtaining a large enough QA ⊂ RA by this process.
If we set
NQ (A) := {E ∈ QA : |E ∩ F | ≥ t for some F ∈ R}
104
then note that the same argument for NA also works here, so NQ (A) is 1-certifiable. But
now, this is also Lipschitz. Indeed, if a certain choice F ∈ R is altered, then since the
sets in QA are pairwise disjoint outside of A, it follows that NQ (A) changes by at most
k − t, so NQ (A) is k − t-Lipschitz. Hence by Talagrand, we have
2
p
P(NQ (A) > t) < 2e−t/16k where t ≥ 2E(NQ (A)) + 80k E(NQ (A)).
Let us estimate E(NQ (A)) first. Note that (recall that we are assuming that RA , and
QA are fixed)
1E
X
NQ (A) =
E∈QA
where 1E counts the set E if there exists F ∈ S such that |E ∩ F | ≥ t. Let us first fix
E ∈ QA . Write
t−1
[
LH \ {A} = Bl
l=0
where
Bl := {B ∈ LH : |B ∩ E| = l}.
We wish to count the number of F ∈ S that trigger E and count in among NQ (A).
If B ∈ Bl we have
{F ∈ SB : |E ∩ F | ≥ t} ≤ {F ∈ TB : |E ∩ F | ≥ t}
= {F ∈ TB : |(E ∩ F ) \ B| ≥ t − l}
Consequently,
k−t
X k−l n−k−t+l
{F ∈ S : B ⊂ F for some B ∈ Bl , |E∩F | ≥ t} ≤ = O(nk−2t+l ).
i=t−l
i k−t−i
so
E(NQ (A)) ≤ |QA |pO(nk−t−ε ).
105
Now suppose we had p = nt−k+ε/2 ; then the estimate above gives us that
Note that for this value of p we have with high probability |RA | ≈ Θ(nε/2 ) for all
A (standard Chernoff bounds). We shall now argue that the greedy process produces
|QA | ≥ (nε/3 ) for all A with high probability. We can then choose to stop at around this
stage while constructing QA , so that we indeed do have |QA | = Θ(nε/3 ) This completes
the proof.
Suppose that the greedy process stops after m steps, with m < nε/3S . Then there exist
sets E1 , E2 , . . . , Em such that every set in SA that is ‘disjoint’ from Ei (i.e., disjoint
outside of A) is not picked into RA . Now, if we set X = Ei then |X| < knε/3 . We now
S
need to ensure that the number of sets of SA that do not intersect X outside of A is of
the right order. In other words, the number of sets of TA that meet X non-trivially is at
most
k−t X k−t
X |X| − t n − |X|
= Θ(niε/3 )nk−t−i = o(nk−t )
i=1
i k − t − i i=1
which implies that the number of sets in SA that are disjoint from X is M = Θ(nk−t ).
Thus, the probability that there exists some set X of size at most knε/3 that satisfies this
condition above is at most
n ε/3
ε/3
(1−p)M < O(nn )exp(−nt−k+ε/2 Θ(nk−t )) = exp(nε/3 log n−Θ(nε/2 )) < exp(−nε/7 )
kn
for n sufficiently large, so the result follows.
Johansson improved Brooks’ theorem for triangle free graphs by showing that χ(G) =
O( log∆∆ ). The following theorem below is a generalization of this extending to graphs
where the neighborhood of any vertex is sparse.
106
d2
Theorem 70. (Alon-Krivelevich-Sudakov, 2002): If G has at most t
edges in the in-
d
duced subgraph on N (v) for each v ∈ V (G) then χ(G) ≤ log(t) .
k dk
This implies (follows easily) that for G with girth at least 3k + 1, χ(G ) ≤ O log d
.
In particular one is interested to see if the above result is asymptotically best possible.
The following result of Alon and Mohar settles this in the affirmative.
Theorem 71. (Alon-Mohar 2001): For large d and any fixed
k
g ≥ 3 there exist graphs
d
with max degree ∆ ≤ d, girth at least g, and χ(Gk ) ≥ Ω log d
.
Proof : First, we shall bound ∆ and Γ. We want to pick G = Gn,p such that for all
d
v ∈ V (G), E[deg(v)] = (n − 1)p < np. Let p = 2n . Because this process is a binomial
distribution, we can bound the number of vertices with degree at least d using Chernoff.
d −(d/2)2
P[deg(v) ≥ d] < P[(deg(v) − E(d(v)) > )] ≤ e 3(d/2) = e−d/6
2
Now, let Nbad = |{v ∈ V |deg(v) > d}| =⇒
To complete the proof we wish to show that a maximum independent set is not too
large. More precisely, we wish to show that α(G) = O( n log
dk
d
. This amounts to saying
that whp, every set U of this size is NOT independent in Gk .
IN order to achieve this, what we shall do is this. If we could show that for any such
set U , there are several paths of length kbetween some two vertices u, v in U , then in
107
order to make the pair {u, v} a non-edge in Gk , we should have deleted a vertex from
each of those paths between u, v. But if the number of such paths is way more, then
u, v is an edge in Gk giving us what we want. But showing that the number of paths is
concentrated is a difficult task, so we shall try to show that there are several internally
disjoint paths between two such vertices. This is again another instance of the same trick
that was mentioned in the previous section.
Let us get to the details. Let the path P be a U-path if the end vertices of P lie in U
and the internal vertices lie outside of U .Set U ⊆ V (G) such that
ck n log(d)
|U | = =x
dk
k
Now, to show χ(Gk ) ≥ Ω log(d)d
, we will show that α(Gk ) ≤ ck n log(d)
dk
for some ck (as
outlined above) To do this, we will show that with high probability, for every U , Π(G),
the number of internally disjoint U-paths of length k, is large. Specifically, we will show
that there are still many of these paths after we make vertex deletions for girth and
maximum degree considerations. This will bound independent sets in Gk .
Let µ be the number of U-paths of length k. It is easy to show that
Now, we need to say that E[ν], the expected number of non-internally disjoint U-paths,
is much smaller than E[µ]. For n d k, the expected number of U-paths which share
one endpoint and the unique neighbor is at most
µck log d
µnk−2 xpk−1 = µ
2k−1 d
It is easy to see that the number of other types of intersecting U-paths is smaller, implying
that
c2k n log2 (d)
E[Π] =
2k+2 dk
Let us note that, because Π(G) counts the number internally disjoint U-paths, removing
one edge can change Π(G) by at most one. Therefore, Π(G) is a 1-Lipschitz function.
Let us also note that Π(G) is f -certifiable. That is, for f (s) = ks, when Π(G) ≥ s,
G contains a set of at most ks edges so that ∀G0 which agree with G on these edges,
Π(G0 ) ≥ s. We can now use Talagrand’s inequality to bound the number of graphs with
insufficiently many U-paths.
For any b and t, Talagrand’s tells us that
βt2
P[|X − E[X]| > t] ≤ e− E[X]
108
for some β > 0. This implies that for t = εE[Π], ε > 0,
So, if
βε2 c2k
> 2kck
2k+2
2
then, with probability 1−o(1), for every set U , there are at least εn log d
2k+2 dk
pairwise internally
disjoint U-paths.
Now, for n d k
εn log2 d
10n2−d/10 + 10dg < k+2 k
2 d
so we can remove all small cycles and high-degree vertices without destroying all U-paths
and therefore k
k n log(d) k d
α(G ) ≤ ck k
=⇒ χ(G ) ≥ Ω
d log(d)
as desired, and this completes our proof.
109
10 Martingales and Concentration Inequalities
The theory of Martingales and concentration inequalities were first used spectacularly by
Janson, and then later by Bollobás in the determination of the chromatic number of a ran-
dom graph. Ever since, concentration inequalities Azuma’s inequality and its corollaries
in particular, have become a very important aspect of the theory of probabilistic tech-
niques. What makes these such an integral component is the relatively mild conditions
under which they apply and the surprisingly strong results they can prove which might be
near impossible to achieve otherwise. In this chapter, we shall review Azuma’s inequality
and as a consequence prove the Spencer-Shamir theorem for the chromatic number for
sparse graphs and later, study the Pippenger-Spencer theorem for the chromatic index of
uniform hypergraphs. Kahn extended some of these ideas to give an asymptotic version
of the yet-open Erdős-faber-Lovász conjecture for nearly disjoint hypergraphs.
10.1 Martingales
Suppose Ω, B, P is underlying probability space. F0 ⊆ F1 ⊆ ...Fn ⊆ ... where Fi is
σ-algebra in B.
[
F= Fi
i
10.2 Examples
• Edge Exposure Martingale
Let the random graph G(n, p) be the underlying probability space. Label the po-
tential edges {i, j} ⊆ [n] by e1 , e2 , ..em where m = n2 . Let f be any graph theoretic
111
In other words to find Xi we first expose e1 , e2 , ..., ei and see if they are in G. Then
Xi will be expectation of f (G) with this information. Note that X0 is constant.
• Vertex Exposure Martingale
Again G(n, p) is underlying probability space and f is any function of G. Define
X1 , X2 , ..., Xn by:
In words, to find Xi , we expose all edges between first i vertices (i.e. expose
subgraph induced by v1 , v2 , ..., vi ) and look at the conditional expectation given
this information.
Hence: m
Y
αXm
E(e ) = E( eαYi )
i=1
m−1
Y
= E(( eαYi )E(eαYm |Xm−1 ))
i=1
m−1
2 /2 2 m/2
Y
≤ E( eαYi )eα ≤ eα (by induction)
i=1
√ √
P(Xm > λ m) = P(eαXm > eαλ m )
112
√
≤ E(eαXm )e−αλ m
2 m/2−αλ√m
≤ eα
2 /2 √
= e−λ (since α = λ/ m)
113
√
Proof. Suppose not. Then ∃ a smallest subgraph of size ≤ c n that is not 3-colorable.
Let T be smallest such set. Note that every vertex in T has degree ≥√3 =⇒ e(T ) ≥ 23 |T |.
But in a graph Gn,p the probability that ∃ some set T of size ≤ c n which has ≥ 3t2
edges is o(1). Because:
t
3t
P(∃T of size t and with edges) ≈ 3t2 p3t/2 where p ∼ n−α
2 2
Because:
√
t=c n t
√ 3t X n
P(∃T of size ≤ c n and with edges) ≤ 2
3t
p3t/2 → o(1)
2 t=0
t 2
This concludes the proof of Shamir-Spencer as µ ≤ χ(G) ≤ µ+3 with high probability.
for graphs.
However it is computationally hard to figure out if χ0 (G) = ∆(G) or ∆(G) + 1.
For H note that χ0 (H) ≥ ∆(H) where ∆ still denotes max degree in H i.e.:
∆(H) = max{d(x)|x ∈ V (H)}, d(x) = # of hyperedges containing x
Theorem 76 (The Pippenger-Spencer Theorem). Given > 0, ∃ a δ > 0 and D0 () s.t.
the following holds if n ≥ D ≥ D0 and:
114
Then χ0 (H) < (1 + )D
Note: d(x, y) is codegree of x, y i.e. d(x, y) = |{E ∈ E(H) s.t. {x, y} ⊆ E}|
The proof of this theorem due to Pippenger-Spencer follows the paradigm of the
‘pseudo-random method’ pioneered by Vojtech Rödl and the ‘Nibble’.
For an edge A:
t
X
(1)
P(A ∈ M ) = P (A ∈ Mi ) and
i=1
−k
P(A ∈ M1 ) ≈ , P(A ∈ M2 ) ≈ (1 − )k(D−1) ≈ e in general :
D D1 D D1
P A ∈ Mi ≈ e−k+(i−1)
D
t
X 1 − et
(1) −k (i−1) −k α
=⇒ P(A ∈ M ) = e e =e ≈
D i=1 D 1 − e D
t
where α = α(, t, k) = e−k (1−e
1−e
)
. Now, we can generate a second independent matching
(2)
M by repeating the same process and so on.
115
Just like the Rödl’s nibble start by picking a ‘small’ number of ‘independent’ match-
ings from H. Let 0 < θ < 1 and µ = bθDc and generate independent matchings
M(1) , M(2) , M(3) . . . M(µ) with each M(i) having:
α
P(A ∈ M(i) ) ≈
D
Let P (1) = M(1) ∪ M(2) ∪ M(3) ∪ · · · ∪ M(µ) .
P (1) P (2) P (s)
H = H(0) −−→ H(1) −−→ H(2) . . . −−→ H(s)
Here first ‘packing’ P (1) is µ = θD-colorable since we can assign each matching M(i)
a separate color. Note that χ0 (H(0) ) ≤ µ + χ0 (H(1) ) (since chromatic number is subaddi-
tive). Similarly P (2) is θD(1) − colorable and so on.
Hence so far we need θD + θD(1) + · · · + θD(s−1) colors. After removing colored edges
(i.e. edges ∈ some P (i) ), very few edges will be left in H(s) .
Bounding χ0 (H(s) ): For any k − unif orm hypergraph H with max degree D, we have:
χ0 (H) ≤ k(D − 1) + 1 =⇒ χ0 (H(s) ) ≤ k(D(s) − 1) + 1
Hence:
s−1
X
total # of colors we used = θ D(i) + θD + k(D(s) − 1) + 1 ≈ D
i=1
s will be chosen as large as possible. Here we need to make sure that H(i) is similar to
H(i−1) (i.e. all degrees are almost equal and the co-degree is small). (In particular we’ll
be interested in i = 1 case).
1A6 ∈P (1)
X
d(1) (x) =
A:x∈A∈H(0)
X α µ α α
=⇒ E(d(1) (x)) = (1 − ) ≈ D(1 − )µ ≈ D(1 − )θD ≈ De−αθ = D(1)
D D D
A:x∈A∈H(0)
(We will consider the following martingale Xi = E[d(1) (x) | M(1) , M(2) , . . . , M(i) ])
116
Let Fi = {M(1) , M(2) , . . . , M(i) } since M(i) is a matching =⇒ at most one edge contain-
ing x is exposed.
Now question is: ”How to guarantee this for all vertices?”. Use Lovasz Local Lemma
(LLL): q
(1) (1)
Ax := |d (x) − D | > λ o(1)D(1)
Want to show that: !
^
P Ax >0
x∈V
2
We know: P(Ax ) ≤ 2e−λ /2 . To compute the dependence degree among {Ax |x ∈
V (H)}:
(i) (i) (i)
M(i) = M1 ∪ M2 ∪ . . . Mt
(Distance between two vertices is the shortest number of edges one needs to go from
x to y.)
Note that each matching M(i) is generated by atoms 1E where each E ∈ H(0) and whose
’distance’ from x ≤ t. So if distance between x and y ≥ 2t+1, Ax and Ay are independent.
=⇒ Dependence degree
≤ (k − 1)D(0) + 2(k − 1)2 (D − 1)D + · · · + r(k − 1)r (D − 1)r + · · · + 2t(k − 1)2t (D − 1)2t
≤ (2t+1)(kD(0) )2t+1
So for LLL, we need:
2 /2
e2e−λ (2t + 1)(kD(0) )2t+1 < 1
e(2t+1)(kD(0) )2t+1
p
Put λ = o(1)D(1) to get: ⇐⇒ (1) /2 < 1.
eo(1)D
Asymptotically D(1) beats t (big time), so condition for LLL will hold hence we are
in business.
117
θD
≤ −αθ
+kθDe−sαθ → D(1+o(1))
1−e
as t → ∞, s → ∞, → ∞, etc. Thus we’ll have the desired result.
∀A 6= B ∈ E(H), |A ∩ B| ≤ 1
.
Conjecture 78. If H is nearly-disjoint on n vertices, then χ0 (H) ≤ n
Theorem 79 (Erdos-de Bruijn Theorem). If H is a hypergraph on n vertices with
|A ∩ B| = 1 ∀A 6= B
then |E(H)| ≤ n.
As an aside, |E(H)| ≤ n =⇒ χ0 (H) ≤ n. This theorem is tight in the sense that if it
is a projective plane of order n, then n2 + n + 1 colors are needed =⇒ χ0 (H) = |E(H)|.
(¶n = projective plane of order n)
Theorem 80 (Theorem - Jeff Kahn (1992)). The EFL conjecture is asymptotically true,
i.e. χ0 (H) ≤ n(1 + o(1)) for H nearly-disjoint on n-vertices.
Note that in this general situation, the edge sizes need not be the same; in fact they
need not even be absolutely bounded, and as we shall see, that causes some of the trouble.
Firstly, we start with a simple observation. If there is an integer k such that for each
edge E in a nearly disjoint hypergraph H we have |E| ≤ k, then we can ‘uniformize’
the edge sizes. This is a standard trick, so we will not describe it in detail. One may
form a bipartite graph G whose vertex sets are the vertices and edges of H, and (v, E)
is an incident pair iff v ∈ E. Then the uniformization described earlier is equivalent to
embedding G into a bipartite graph with uniform degree over all the vertices E ∈ E such
that the graph is C4 -free. This is a fairly standard exercise in graph theory.
If all the edges are of bounded size, i.e., if 3 ≤ b ≤ |E| ≤ a for all edges E then
the Pippenger-Spencer theorem of the preceding section proves the result claimed by the
aforementioned theorem. Indeed, for any x count the number of pairs (y, E) where y 6= x,
118
and x, y ∈ E. Since H is nearly disjoint, any two vertices of P
H are in at most one edge
so this is at most n − 1. On the other hand, this is precisely x∈E (|E| − 1), so we have
(b − 1)d(x) ≤ n − 1 ⇒ d(x) ≤ n−1
b−1
< n2 .
Here is a general algorithm for trying to color the edges of H using C colors: Arrange
the edges of H in decreasing order of size and color them greedily. If the edges are
E1 , E2 , . . . , Em with |Ei | ≥ |Ei+1 | for all i then when Ei is considered for coloring, we
may do so provided there is a color not already assigned to one of the edges Ej , j < i
for which Ei ∩ Ej 6= ∅. To estimate |{1 ≤ j < i|Ej ∩ Ei 6= ∅}|, let us count the number
of triples (x, y, j) where x ∈ Ei ∩ Ej , y ∈ Ej \ Ei . Write |Ei | = k for simplicity. Again,
since H is nearly disjoint, any two vertices of H are in at most one edge, hence the
number of such triples is at most the number of pairs (x, y) with x ∈ Ei , y 6∈ Ei , which
is k(n − k). On the other hand, for each fixed Ej such that 1 ≤ j < i, Ej ∩ Ei 6= ∅,
Ei ∩ Ej is uniquely determined, so the number of such triples is |Ej | − 1. Hence denoting
I = {1 ≤ j < i|Ej ∩ Ei 6= ∅} and noting that for each j ∈ I |Ej | ≥ k, we get
X k(n − k)
(k − 1)|I| ≤ (|Ej | − 1) ≤ k(n − k) ⇒ |I| ≤ .
j∈I
k − 1
|E|(n−|E|)
In particular, if C > |E|−1
for every edge E, the greedy algorithm properly colors H.
The previous argument actually shows a little more. Since k(n−k) k−1
is decreasing in k
1
if |E| > a for some (large) constant a, then |I| < (1 + a )n. So, for a given > 0 if we
a > 1/, say, then for C = (1 + 2)n, following the same greedy algorithm will properly
color all edges of size greater than a. This motivates us to consider
• Es := {E ∈ E : |E| ≤ b}.
• El := {E ∈ E : |E| > a}
for some absolute constants a, b which we shall define later. We have seen that χ0 (Hl ) ≤
(1 + 2)n; also by a preceding remark, if we pick b > O(1)/ we have χ0 (Hm ) ≤ n. Thus,
let us do the following.
Let C = b(1 + 4)nc; we shall color the edges of H using the colors {1, 2 . . . , C}. Let
C1 = {1, 2 . . . , b(1 + 3)mc}; C2 := C \ C1 . Fix a coloring f1 of Hl using the colors of
C1 , and a coloring f2 of Hm using the colors of C2 . We now wish to color Hs . We shall
attempt to do that using the colors of C1 . For each E ∈ Hs let
119
Then as before, |Forb(E)| ≤ |{A ∈ Hl |A ∩ E 6= ∅}| ≤ a(n−a)
b
< ηD for η = a/b, D = n.
In other words, every edge of Hs also has a (small) list of forbidden colors for it. If we
can prove a theorem that guarantees a proper coloring of the edges with no edge given a
forbidden color, we have an asymptotic version of the EFL.
At this point, we are motivated enough (as was Kahn) to state the following
Conjecture 81. Let k ≥ 2, ν > 0, 0 ≤ η < 1. Let C be a set of colors of size at least
(1 + ν)D. There exists β > 0 such that if H is a k-uniform hypergraph satisfying
then there is a proper coloring f of E such that for every edge A, f (A) 6∈ Forb(A).
Note that the first two conditions are identical to those of the PS theorem. Also, it is
important to note that there might be some additional constraints on η, ν which indeed
is the case. We will see what those are as we proceed with the proof.
To prove this conjecture, let us again recall the idea of the proof of the PS theorem.
The ith step/iteration in the proof of the PS theorem does the following: Fix 0 < θ < 1,
and let t, s be large integers. Starting with the hypergraph H(i) (1 ≤ i ≤ s) which satisfies
t )
conditions (1), (2) above with D(i) := e−αθi D with α = α(, t, k) = e−k (1−e
1−e
, with pos-
(1) (2) (µ )
itive probability there is a random packing P (i+1) := Mi+1 ∪ Mi+1 ∪ · · · ∪ Mi+1i ∈ H(i)
with µi = bθD(i) c, such that
α
• P(A ∈ P (i+1) ) ≈ D(i)
.
• For all A ∈ H(i) the event “A ∈ P (i+1) ” is independent of all events “B ∈ P (i+1) ” if
distance between A, B is at least 2t. Here, the distance is in the hypergraph H(i) .
The idea is to try to give every edge its ‘default color’ as and when we form the packings
P (i) . Since each such packing consists of up to µi different matchings, P (i) can be P(by
default) colored using µi colors, so that when we complete s iterations we have used i µi
different colors to color all the edges except those of H(s) . The PS theorem finishes off by
coloring these edges greedily using a fresh set and colors by observing that the number
of edges in H(s) is ‘small’.
120
(j)
where these sets Cij are mutually disjoint and the matching Mi+1 is by default allocated
color cij .
In our present situation, the default colors allocated to some of the edges may be forbidden
at those edges. More specifically, define
(j)
B (i) := {A ∈ H(i) |A ∈ Mi+1 for some j and cij ∈ Forb(A)}.
(i)
For each vertex v, let Bv := |{A ∈ B (i) |v ∈ A}|.
At each stage, remove the ‘bad edges’ from the packings, i.e., the ones assigned a for-
bidden color.
S After s iterations the edges that need to be (re)colored are the ones in
H0 := H(s) si=1 B (i) and the colors that are left to be used are those in C ∗ . Note that
(s)
for each vertex v we have dH0 (v) ≤ Dv + Bv . The first term is o(D); if the second term
is also o(D) then we may finish the coloring greedily. Thus, if we can show that we can
pick our random packing at stage i in such a way that apart from the criteria in the
(i)
PS-theorem, we can also ensure that Bv is ‘small’ (compared to the order of D(i) ) then
we are through (there is still some technicality but we will come to that later).
Hence to start with, we need to show that at each step i of the iteration, we can get
a random packing P (i+1) such that
• |d(i) (v) − D(i) | < o(D(i) ) for all v.
(i) (i)
• Bv < E(Bv ) + o(D)
The proof of this part is identical to that of the PS theorem; use the same martingale,
the same filtration, and use Azuma’s inequality.
(i)
To complete the proof, we need to get an (over)estimate of E(Bv ). For each A ∈ H(i) ,
(j)
A is not in B (i) if and only if for each cij ∈ Forb(A) we have A 6∈ Mi=1 . Denoting
Forb(i) (A) := {j|cij ∈ Forb(A)} we have
|Forb(i) (A)|
α|Forb(i) (A)|
(i) α
P(A ∈ B ) = 1 − 1 − (i) < .
D D(i)
Hence,
X
E(Bv(i) ) = P(A ∈ B (i) )
v∈A∈H(i)
α X
. |Forb(i) (A)|
D(i)
v∈A∈H(i)
Let i(A) := max{0 ≤ i ≤ s|A ∈ H(i) }. Note that for any fixed i,
|{A ∈ H|v ∈ A, i(A) = i}| ≤ θe−αθi D.
121
Hence we have
s s
X X 1 X
E(Bv(i) ) .α |Forb(i) (A)|
i=0 i=0
D(i)
v∈A∈H(i)
X X |Forb (A)| (i)
=α 1A∈H(i)
v∈A i
D(i)
X 1 X
(i)
≤α |Forb (A)|
v∈A
D(i(A)) i
X |Forb(A)|
≤α eαθi(A)
v∈A
D
s
X
< αo(1) eαθi |{A|v ∈ A, i(A) = i}|
i=0
The last term in the above expression can be made ‘small’. This completes the proof
of Kahn’s theorem.
122
Bibliography
[2] N. Alon, D. J. Kleitman, Sum-free subsets, A tribute to Paul Erdős (A. Baker, B.
Bollobás, A. Hajnál eds), Cambridge University Press, pp. 13-26.
[3] N. Alon, Y. Matias, M. Szegedy, The space complexity of approximating the fre-
quency moments, J. Comp. Sys. Sci. 58 (1999) (1), 137-147
[4] N. Alon, Y. Peres, Uniform Dilations, Geom. Func. Anal. 2 (1992), No. 1, 1-28.
[5] N. Alon, J. H. Spencer, The Probabilistic Method, 4th ed., Wiley International, 2016.
[9] N. Bansal,
[10] J. Beck,
[12] S. Eberhart, B. Green, F. Manners, Sets of integers with no large sum-free subset,
Ann. Math. 180(2014), 621-652.
[13] W. T. Gowers, Lower Bounds of Tower Type for Szemerédi’s Uniformity Lemma,
GAFA, Geom. Funct. Anal., 7(1997), 322-337.
[15] D.R. Hughes and F. C. Piper, Projective Planes, Graduate Texts in Mathematics
6, Springer-Verlag, New York, 1973.
123
[17] P. Keevash, The existence of Steiner designs.
[19] F. Lazebnik, V.A. Ustimenko and A.J. Woldar, A New Series of Dense Graphs of
High Girth’, Bulletin of the AMS, 32(1995), Number 1, 73–79.
[25] T. Rothvoß,
[31] R. M. Wilson,
[32] R. M. Wilson,
[33] R. M. Wilson,
124