Entropy Methods in Combinatorics: Daniel Naylor
Entropy Methods in Combinatorics: Daniel Naylor
Combinatorics
Daniel Naylor
Contents
3 Brégman’s Theorem 13
Index 49
Lecture 1
1
1 The Khinchin (Shannon?) axioms for entropy
Note. In this course, “random variable” will mean “discrete random variable” (unless otherwise
specified).
All logarithms will be base 2 (unless otherwise specified).
Definition (Entropy). The entropy of a discrete random variable X is a quantity H[X] that
takes real values and has the following properties:
Proof. H[X | Y ] =
P
y P[Y = y]H[X | Y = y].
Since X and Y are independent, the distribution of X is unaffected by knowing Y (so by invariance,
H[X | Y = y] = H[X]), so
H[X | Y = y] = H[X]
for all y, which gives the result.
2
Corollary 1.2. If X1 , . . . , Xn are independent, then
• Y = f (X)
Then
H[X, Y ] = H[X].
Also,
H[Z | X, Y ] = H[Z | X].
Proof. The map g : x 7→ (x, f (x)) is a bijection, and (X, Y ) = g(X). So the first statement follows by
invariance. For the second statement:
Then H[X] = 0.
3
Proof. X and X are independent. Therefore, by Lemma 1.1, H[X, X] = 2H[X]. But by invariance,
H[X, X] = H[X]. So H[X] = 0.
Then H[X] = n.
Proof. Let X1 , . . . , Xn be independent random variables uniformly distributed on {0, 1}. By Corol-
lary 1.2 and normalisation, H[X1 , . . . , Xn ] = n. But (X1 , . . . , Xn ) is uniformly distributed on {0, 1}n ,
so by invariance, the result follows.
Lecture 2
Reminder: log here is to the base 2 (which is the convention for this course).
4
Then Å ã
X 1
H[X] = pa log .
pa
a∈A
Proof. First we do the case where all pa are rational (and then can finish easily by the continuity
axiom).
Pick n ∈ N such that for all a, there is some ma ∈ N ∪ {0} such that pa = n .
ma
Let Z be uniform on [n]. Let (Ea : a ∈ A) be a partition of [n] into sets with |Ea | = ma . By invariance
we may assume that X = a ⇐⇒ Z ∈ Ea . Then
log n = H[Z]
= H[Z, X]
= H[X] + H[Z | X]
X
= H[X] + pa H[Z | X = a]
a∈A
X
= H[X] + pa log(ma )
a∈A
X
= H[X] + pa (log pa + log n)
a∈A
Hence Å ã
X X 1
H[X] = − pa log pa = pa log .
pa
a∈A a∈A
By continuity, since this holds if all pa are rational, we conclude that the formula holds in general.
• Y = f (X)
Then H[Y ] ≤ H[X].
5
Proof. H[X] = H[X, Y ] = H[Y ] + H[X | Y ]. But H[X | Y ] ≥ 0.
Next, observe that H[X | Y ] ≤ H[X] if X is uniform on a finite set. That is because
X
H[X | Y ] = P[Y = y]H[X | Y = y]
y
(by maximality)
X
≤ P[Y = y]H[X]
y
= H[X]
By the equivalence noted above, we also have that H[X | Y ] ≤ H[X] if Y is uniform.
Now let pab = P[(X, Y ) = (a, b)] and assume that all pab are rational. Pick n such that we can write
pab = mnab with each mab an integer. Partition [n] into sets Eab of size mab . Let Z be uniform on [n].
Without loss of generality (by invariance) (X, Y ) = (a, b) ⇐⇒ Z ∈ Eab .
Then
6
Then H[X] ≥ 0.
Proof. Calculate:
X
H[X | Y, Z] = P[Z = z]H[X | Y, Z = z]
z
X
≤ P[Z = z]H[X | Z = z]
z
= H[X | Z]
Lecture 3
7
• Z = f (Y )
Then
H[X | Y ] ≤ H[X | Z].
Proof.
• X, Y, Z random variables
• Z = f (X) = g(Y )
Then
H[X, Y ] + H[Z] ≤ H[X] + H[Y ].
• H[X] = H[Y ]
Then X is uniform.
8
The function x 7→ x log 1
x is concave on [0, 1]. So, by Jensen’s inequality this is at most
Å ã
1
|A|(Ea pa ) log = log(|A|) = H[Y ].
Ea pa
• X, Y random variables
• H[X, Y ] = H[X] + H[Y ]
Then X and Y are independent.
Proof. We go through the proof of Subadditivity and check when equality holds.
≤ H[X]
with equality if and only if H[X | Y = y] is uniform on A for all y (by Lemma 1.17), which implies
that X and Y are independent.
where W was uniform. So equality holds only if X and W are independent, which implies (since Y
depends on W ) that X and Y are indpendent.
Definition (Mutual information). Let X and Y be random variables. The mutual information
I[X : Y ] is
Subadditivity is equivalent to the statement that I[X : Y ] ≥ 0 and Corollary 1.18 implies that I[X :
Y ] = 0 if and only if X and Y are independent.
Note that
H[X, Y ] = H[X] + H[Y ] − I[X : Y ].
9
Definition (Conditional mutual information). Let X, Y and Z be random variables. The
conditional mutual information of X and Y given Z, denoted by I[X : Y |Z] is
X
P[Z = z]I[X | Z = z : Y | Z = z]
z
X
= P[Z = z](H[X | Z = z] + H[Y | Z = z] − H[X, Y | Z = z])
z
= H[X | Z] + H[Y | Z] − H[X, Y | Z]
= H[X, Z] + H[Y < Z] − H[X, Y, Z] − H[Z]
10
2 A special case of Sidorenko’s conjecture
Let G be a bipartite graph with vertex sets X and Y (finite) and density α (defined to be |E(G)|
|X||Y | ). Let
H be another (think of it as ‘small’) bipartite graph with vertex sets U and V and m edges.
P[(φ, ψ) is a homomorphism] ≥ αm .
Not hard to prove when H is Kr,s . Also not hard to prove when H is K2,2 (use Cauchy Schwarz).
Proof. We want to show that if G is a bipartite graph of density α with vertex sets X, Y of size m and
n and we choose x1 , x2 ∈ X, y1 , y2 ∈ Y independently at random, then
P[x1 y1 , x2 y2 , x3 y3 ∈ E(G)] ≥ α3 .
It would be enough to let P be a P3 chosen uniformly at random and show that H[P ] ≥ log(α3 m2 n2 ).
Instead we shall define a different random variable taking values in the set of all P3s (and then apply
maximality).
H[X1 , Y1 , X2 , Y2 ] ≥ log(α3 m2 n2 ).
Lecture 4
11
Therefore,
So by maximality,
#P3 s × |X| × |Y | ≥ |E(G)|3 .
12
3 Brégman’s Theorem
Let G be a bipartite graph with vertex sets X, Y of size n. Given (x, y) ∈ XY , let
®
1 xy ∈ E(G)
Axy =
0 xy ∈ / E(G)
Brégman’s theorem concerns how large per(A) can be if A is a 01-matrix and the sum of entres in the
i-th row is di .
Proof (Radhakrishnan). Each matching corresponds to a bijection σ : X → Y such that xσ(x) ∈ E(G)
for every x. Let σ be chosen uniformly from all such bijections.
13
Then
H[σ(x1 )] ≤ log d(x1 )
H[σ(x2 ) | σ(x1 )] ≤ Eσ log dσx1 (x2 )
where
dσxi (x2 ) = |N (x2 ) \ {σ(x1 )}|.
In general,
H[σ(xi )]σ(x1 ), . . . , σ(xi−1 ) ≤ Eσ log dσx1 ,...,xi−1 (xi ),
where
dσx1 ,...,xi−1 (xi ) = |N (xi ) \ {σ(x1 ), . . . , σ(xi−1 )}|.
Lecture 5
Key idea: we now regard x1 , . . . , xn as a random enumeration of X and take the average.
Then one of the yj will be σ(x), say yh . Note that dσx1 ,...,xi−1 (xi ) (given that xi = x) is
14
Definition (1-factor). Let G be a graph with 2n vertices. A 1-factor in G is a collection of n
disjoint edges.
Proof (Alon, Friedman). Let M be the set of 1-factors of G, and let (M1 , M2 ) be a uniform random
element of M2 . For each M1 , M2 , the union M1 ∪ M2 is a collection of disjoint edges and even cycles
that covers all the vertices of G.
If we are given such a cover, then the number of pairs (M1 , M2 ) that could give rise to it is 2k , where
k is the number of even cycles.
Now let’s build a bipartite graph G2 out of G. G2 has two vertex sets (call them V1 , V2 ), both copies
of V (G). Join x ∈ V1 to y ∈ V2 if and only if xy ∈ E(G).
For example:
15
By Bregman, the number of perfect matchings in G2 is ≤ x∈V (G) (d(x)!) d(x) . Each matching gives a
Q 1
Each such σ has a cycle decomposition, and each cycle gives a cycle in G. So σ gives a cover of V (G)
by isolated vertices, edges and cycles.
Given such a cover with k cycles, each edge can be directed in two ways, so the number of σ that give
rise to is is 2k , where k is the number of cycles.
So there is an injection from M2 to the set of matchings of G2 , since every cover by edges and even
cycles is a cover by vertices, edges and cycles.
So Y 1
|M|2 ≤ (d(x)!) d(x) .
x∈V (G)
16
4 Shearer’s lemma and applications
Therefore,
X X X
H[XA ] ≥ H[Xa | X<a ]
A∈A A∈A a∈A
Xn
≥r H[Xa | X<a ]
a=1
= rH[X]
Lecture 6
Alternative version:
17
Then
H[X] ≤ µ−1 EA H[XA ].
Proof. As before, X
H[XA ] ≥ H[Xa | X<a ].
a∈A
So
X
EA H[XA ] ≥ EA H[Xa | X<a ]
a∈A
n
X
≥µ H[Xa | X<a ]
a=1
= µH[X]
Definition (PA ). Let E ⊂ Zn and let A ⊂ [n]. Then we write PA E for the set of all u ∈ Z
such that there exists v ∈ Z[n]\A such that [u, v] ∈ E, where [u, v] is u suitably intertvined with
v (i.e. u ∪ v as functions).
18
This case is the discrete Loomis-Whitney theorem.
approximately (2m) 3
6 triangles.
Proof. Let (X1 , X2 , X3 ) be a random ordered triangle (without loss of generality G has a triangle so
that this is possible).
1 3
(H[X1 , X2 ] + H[X1 , X3 ] + H[X2 , X3 ]) ≤ · log(2m).
2 2
Definition. Let X be a set of size n and let G be a set of graphs with vertex set X. Then
G is ∆-intersecting (read as “triangle-intersecting”) if for all G1 , G2 ∈ G, G1 ∩ G2 contains a
triangle.
Proof. Let X be chosen uniformly at random from G. We write V (2) for the set of (unordered) pairs
of elements of V . Think of any G ∈ G as a function from V (2) to {0, 1}. So X = (Xe : e ∈ V (2) ).
19
For each R, we shall look at the projection XGR , which we can think of as taking values in the set
{G ∩ GR : G ∈ G} =: GR .
Thus, GR is an intersecting family, so it has size at most 2|E(GR )|−1 . By Shearer, expectation version,
Lecture 7
20
Proof. By the discrete Loomis-Whitney inequality,
n
Y 1
|A| ≤ |P[n]\{i} A| n−1
i=1
n
! n−1
n
1
Y
= |P[n]\{i} A| n
i=1
n
! n−1
n
1X
≤ |P[n]\{i} A|
n i=1
So
n
! n−1
n
1 X
|A| ≤ |∂ i A|
2n i=1
Å ã n−1 n
1
= |∂A|
2n
Proof. Let X be a uniform random element of A and write X = (X1 , . . . , Xn ). Write X\i for
(X1 , . . . , Xi−1 , Xi+1 , . . . , Xn ). By Shearer,
r
1 X
H[X] ≤ H[X\i ]
n − 1 i=1
n
1 X
= H[X] − H[Xi | X\i ]
n − 1 i=1
Hence
n
X
H[Xi | X\i ] ≤ H[X].
i=1
Note (
−1
1 |P[n]\{i} (u)| = 2
H[Xi | X\i = u] = −1
0 |P[n]\{i} (u)| = 1
21
The number of points of the second kind is |∂i A|, so H[Xi | X\i ] = 1 − |A| .
|∂i A|
So
n Å
|∂ i A|
X ã
H[X] ≥ 1−
i=1
|A|
|∂A|
=n−
|A|
Definition (Lower shadow). Let A be a family of sets of size d. The lower shadow ∂A is
{B : |B| = d − 1, ∃A ∈ A, B ⊂ A}.
22
Let T be chosen independently of X1 , . . . , Xk−1 with
0 probability p
®
T =
1 probability 1 − p
2s s s log(2s + 1) s2s
(log(2 + 1) − log 2 ) + + = log(2s + 1).
2s + 1 2s + 1 2s + 1
This proves the claim.
r + d − 1 ≤ t, r ≤ t + 1 − d.
23
It follows that
Ç Ç åå
t
H[X1 , . . . , Xd−1 ] = log d! − log r
d
Å ã
t!
≥ log d!
d!(t − d)!(t + 1 − d)
Ç Ç åå
t
= log (d − 1)!
d−1
24
5 The union-closed conjecture
Definition (Union-closed). Let A be a (finite) family of sets. Say that A is union closed if for
any A, B ∈ A, we have A ∪ B ∈ A.
Theorem (Justin Gilmer). There exists c > 0 such that if A is a union-closed family, then
there exists x that belongs to at least c|A| of the sets in A.
√
His method has a “natural barrier” of 2 .
3− 5
A reason for this is that if we weaken the property union-closed to “almost union-closed”
√
(if we pick
two elements randomly, then with high probability the union is in the family), then 2 is the right
3− 5
bound.
Let A = [n](pn) ∪ [n](≥(2p−p −o(1))n) . With high probability, if A, B are random elements of [n](pn) ,
2
25
√
One of the roots of the quadratic 1 − 3p + p2 = 0 is p = 2 .
3− 5
If we want to prove Justin Gilmer’s Theorem, it is natural to let A, B be independent uniformly random
elements of A and to consider H[A ∪ B]. Since A is union-closed, A ∪ B ∈ A, so H[A ∪ B] ≤ log |A|.
Now we would like to get a lower bound for H[A ∪ B] assuming that no x belongs to more than p|A|
sets in A.
h(xy) ≥ c(xh(y) + yh(x)), h(x2 ) ≥ 2cxh(x).
Lecture 9
Proof. Think of A, B as characteristic functions. Write A<k for (A1 , . . . , Ak−1 ) etc. By the Chain rule
it is enough to prove for every k that
H[(A ∪ B)k | (A ∪ B)<k ] > c(1 − p)(H[Ak | A<k ] + H[Bk | B<k ]).
By Submodularity,
H[(A ∪ B)k | (A ∪ B)<k ] ≥ H[(A ∪ B)k | A<k , B<k ].
26
For each u, v ∈ {0, 1}k−1 write p(u) = P(Ak = 0 | A<k = u), q(v) = P(Bk = 0 | B<k = v).
Then
H[(A ∪ B)k | A<k = u, B<k = v] = h(p(u)q(v))
which by hypothesis is at least
c(p(u)h(q(v)) + q(v)h(p(u))).
So X
H[(A ∪ B)k | (A ∪ B)<k ] ≥ c P(A<k = u)P(B<k = v) (p(u)h(q(v)) + q(v)h(p(u))) .
u,v
But X
P(A<k = u)P(Ak = 0 | A<k = u) = P(Ak = 0)
u
and X X
P(B<k = v)h(q(v)) = P(B<k = v)H[Bk | B<k = v] = H[Bk | B<k ].
v v
as required.
We shall obtain √1 .
5−1
We start by proving the diagonal case – i.e. when x = y.
h(x2 ) ≥ φxh(x).
√
Proof. Write ψ for φ−1 = 5−12 . Then ψ = 1 − ψ, so h(ψ ) = h(1 − ψ) = h(ψ) and φψ = 1, so
2 2
27
Toolkit:
ln 2h(x) = −x ln x − (1 − x) ln(1 − x)
ln 2h0 (x) = − ln x − 1 + ln(1 − x) + 1
= ln(1 − x) − ln x
1 1
ln 2h00 (x) = − −
x 1−x
1 1
ln 2h000 (x) = 2 −
x (1 − x)2
28
So
−12x 8x3 (1 − 2x2 ) 3φ φx(1 − 2x)
ln 2f 000 (x) = 2 2
+ 4 2 2
+ − 2
x (1 − x ) x (1 − x ) x(1 − x) x (1 − x)2
2
−12 8(1 − 2x ) 3φ φ(1 − 2x)
= 2
+ 2 2
+ −
x(1 − x ) x(1 − x ) x(1 − x) x(1 − x)2
−12(1 − x ) + 8(1 − 2x ) + 3φ(1 − x)(1 + x)2 − φ(1 − 2x)(1 + x)2
2 2
=
x(1 − x)2 (1 + x)2
This is zero if and only if
which simplifies to
−φx3 − 4x2 + 3φx − 4 + 2φ = 0.
Lecture 10 Since this is a cubic with negative leading coefficient and constant term, it has a negative root, so it
has at most two roots in (0, 1). It follows (using Rolle’s theorem) that f has at most five roots in [0, 1],
up to multiplicity.
But
f 0 (x) = 2x(log(1 − x2 ) − log x2 ) + φ(x log x + (1 − x) log(1 − x)) − φx(log(1 − x) − log x).
If x is small,
Å ã
1 1 1 1
f (x) = x2 log + (1 − x2
) log − φx x log + (1 − x) log
x2 1 − x2 x 1−x
1 1
= 2x2 log − φx2 log + O(x2 )
x x
so there exists x such that f (x) > 0.
29
Lemma 5.3. The function f (x, y) = h(xy)
xh(y)+yh(x) is minimised on (0, 1)2 at a point where x = y.
So it tends to 1 again.
y ∗ g 0 (x∗ y ∗ ) − αg 0 (x∗ ) = 0
x∗ g 0 (x∗ y ∗ ) − αg 0 (y ∗ ) = 0
h0 (x)
So x∗ g 0 (x∗ ) = y ∗ g 0 (y ∗ ). So it’s enough to prove that xg 0 (x) is an injection. g 0 (x) = x − x2 ,
h(x)
so
h(x)
xg 0 (x) = h0 (x) −
x
x log x + (1 − x) log(1 − x)
= log(1 − x) − log x +
x
log(1 − x)
=
x
30
Differentiating gives
−1 log(x − 1) −x − (1 − x) log(1 − x)
− 2
= .
x(1 − x) x x2 (1 − x)
The numerator differentiates to −1 + 1 + log(1 − x), which is negative everywhere. Also, it equals 0 at
0. So it has a constant sign.
φ
h(xy) ≥ (xh(y) + yh(x)).
2
√ √
This allows us to take 1 − 1
φ =1− 5−1
2 = 2 .
3− 5
31
6 Entropy in additive combinatorics
We shall need two “simple” results from additive combinatorics due to Imre Ruzsa.
Definition (Sum set / difference set / etc). Let G be an abelian group and let A, B ⊂ G.
The sumset A + B is the set {x + y : x ∈ A, y ∈ B}.
The difference set A − B is the set {x − y : x ∈ A, y ∈ B}.
We write 2A for A + A, 3A for A + A + A, etc.
|A − B|
1 1 .
|A| 2 |B| 2
• G an abelian group
• A, B finite subsets of G
Proof. Let {x1 , . . . , xk } be a maximal subset of A such that the sets xi + B are disjoint.
32
Lecture 11
Let X, Y be discrete random variables taking values in an abelian group. What is X + Y when X and
Y are independent?
For each z, P(X + Y = z) = P P(X = x)P(Y = y). Writing px and qy for P(X = x) and
P
x+y=z
P(Y = y) respectively, this givesim x+y=z px qy = p ∗ q(z) where p(x) = px and q(y) = zy .
Definition (Entropic Ruzsa distance). Let G be an abelian group and let X, Y be G-valued
random variables. The entropic Ruzsa distance d[X; Y ] is
1 1
H[X 0 − Y 0 ] − H[X] − H[Y ]
2 2
where X 0 , Y 0 are independent copies of X and Y .
33
Proof.
H[X + Y ] ≥ H[X + Y | Y ] (by Subadditivity)
= H[X + Y, Y ] − H[Y ]
= H[X, Y ] − H[Y ]
= H[X] + H[Y ] − H[Y ] − I[X : Y ]
= H[X] − I[X : Y ]
By symmetry we also have
H[X + Y ] ≥ H[Y ] − I[X : Y ].
Proof.
34
⇒ Suppose that X, Y are independent and H[X − Y ] = 12 (H[X] + H[Y ]).
From the first line of the proof of Lemma 6.4, it follows that H[X − Y | Y ] = H[X − Y ]. Therefore,
X − Y and Y are independent. So for every z ∈ A − B and every y1 , y2 ∈ B,
P(X − Y = z | Y = y1 ) = P(X − Y = z | Y = y2 )
where A = {x : px 6= 0}, B = {y : qy 6= 0}, i.e. for all y1 , y2 ∈ B,
P(X = y1 + z) = P(X = y2 + z).
So px is constant on z + B.
In particular, A ⊃ z + B.
By symmetry, B ⊃ A − z.
So A = B + z for any z ∈ A − B. So for every x ∈ A, y ∈ B, A = B + x − y, so A − x = B − y. So
A − x is the same for every x ∈ A. Therefore, A − x = A − A for every x ∈ A.
It follows that
A − A + A − A = (A − x) − (A − x) = A − A.
So A − A is a subgroup. Also, A = A − A + c, so A is a coset of A − A. B = A + z, so B is also a
coset of A − A.
Proof. We must show that (assuming without loss of generality that X, Y and Z are independent)
that
1 1 1 1 1 1
H[X − Z] − H[X] − H[Z] ≤ H[X − Y ] − H[X] − H[Y ] + H[Y − Z] − H[Y ] − H[Z],
2 2 2 2 2 2
i.e. that
H[X − Z] + H[Y ] ≤ H[X − Y ] + H[Y − Z]. (∗)
Since X − Z is a function of (X − Y, Y − Z) and is also a function of (X, Z), we get using Lemma 1.16
that
H[X − Y, Y − Z, X, Z] + H[X − Z] ≤ H[X − Y, Y − Z] + H[X, Z].
This is the same as
H[X, Y, Z] + H[X − Z] ≤ H[X, Z] + H[X − Y, Y − Z].
By independence, cancelling common terms and Subadditivity, we get (∗).
35
Lemma 6.8 (Submodularity for sums). Assuming that:
• X, Y , Z are independent G-valued random variables
Then
H[X + Y + Z] + H[Z] ≤ H[X + Z] + H[Y + Z].
Hence
H[X, Y, Z] + H[X + Y + Z] ≤ H[X + Z] + H[Y ] + H[X] + H[Y + Z].
By independence and cancellation, we get the desired inequality.
Lecture 12
36
Proof.
d[X; −Y ] ≤ d[X; Y ] + d[Y ; −Y ]
≤ d[X; Y ] + 2d[Y ; Y ]
≤ d[X; Y ] + 2(d[Y ; X] + d[X; Y ])
= 5d[X; Y ]
Conditional Distances
Definition (Conditional distance). Let X, Y, U, V be G-valued random variables (in fact, U and
V don’t have to be G-valued for the definition to make sense). Then the conditional distance is
X
d[X | U ; Y | V ] = P[U = u]P[V = v]d[X | U = u; Y | V = v].
u,v
• X 0 is distributed like X.
• Y 0 is distributed like Y .
• For each u ∈ U , X 0 | U = u is distributed like X | U = u,
37
Remark. The last few terms look like 2d[A; −B]. But they aren’t equal to it, because A and
B aren’t (necessarily) independent!
Proof.
1 1
d[A; B k A + B] = H[A0 − B 0 | A + B] − H[A0 | A + B] − H[B 0 | A + B]
2 2
where A0 , B 0 are conditionally independent trials of A, B given A + B. Now calculate
H[A0 | A + B] = H[A | A + B]
= H[A, A + B] − H[A + B]
= H[A, B] − H[A + B]
= H[A] + H[B] − I[A : B] − H[A + B]
H[A0 − B 0 | A + B] ≤ H[A0 − B 0 ].
Let (A1 , B1 ) and (A2 , B2 ) be conditionally independent trials of (A, B) given A+B. Then H[A0 −B 0 ] =
H[A1 − B2 ]. By Submodularity,
Finally,
H[A1 − B2 , A1 , B1 ] = H[A1 , B1 , A2 , B2 ]
= H[A1 , B1 , A2 , B2 | A + B] + H[A + B]
= 2H[A, B]A + B + H[A + B] (by conditional independence of (A1 , B1 ) and (A2 , B2 ))
= 2H[A, B] − H[A + B]
= 2H[A] + 2H[B] − 2I[A : B] − H[A + B]
Adding or subtracting as appropriate all these terms gives the required inequality.
Lecture 13
38
7 A proof of Marton’s conjecture in Fn2
Theorem 7.1 (Green, Manners, Tao, Gowers). There is a polynomial p with the following
property: If n ∈ N and A ⊂ Fn2 is such that |A + A| ≤ C|A|, then there is a subspace H ⊂ Fn2
of size at most |A| such that A is contained in the union of at most p(C) translates of H.
(Equivalently, there exists K ⊂ Fn2 , |K| ≤ p(C) such that A ⊂ K + H).
contradiction.
Proof. Let A ⊂ Fn2 , |A + A| ≤ C|A|. Let X and Y be independent copies of UA . Then by Theorem 7.2,
there exists H (a subgroup) such that
so
α
d[X; UH ] ≤ d[X; Y ].
2
39
But
d[X; Y ] = H[UA − UA0 ] − H[UA ]
= H[UA + UA0 ] − H[UA ] (characteristic 2)
≤ log(C|A|) − log |A|
= log C
So d[X; UH ] ≤ α log C
2 . Therefore
1 1 α log C
H[X + UH ] ≤ H[X] + H[UH ] +
2 2 2
1 1 α log C
= log |A| + log |H| +
2 2 2
Therefore, by Lemma 7.3, there exists z such that
1 1 α
P(X + UH = z) ≥ |A|− 2 |H|− 2 C − 2 .
But
|A ∩ (z − H)| |A ∩ (z + H)|
P(X + UH = z) = =
|A||H| |A||H|
(using characteristic 2). So there exists z ∈ G such that
α 1 1
|A ∩ (z + H)| ≥ C − 2 |A| 2 |H| 2 .
Let B = A ∩ (z + H). By the Ruzsa covering lemma, we can cover A by at most |A+B|
|B| translates of
B + B. But B ⊂ z + H so B + B ⊂ H + H = H, so A can be covered by at most |A+B|
|B| translates of
H.
But using B ⊂ A,
|A + B| ≤ |A + A| ≤ C|A|.
So 1
|A + B| C|A| α |A| 2
≤ −α 2 +1
1 1 = C 1 .
|B| C 2 |A| 2 |H| 2 |H| 2
Since B is contained in z + H,
α 1 1
|H| ≥ C − 2 |A| 2 |H| 2
so |H| ≥ C −α |A|, so
1
α |A| 2
C 2 +1 1 ≤ C α+1 .
|H| 2
If |H| ≤ |A| then we are done. Otherwise, since B ⊂ A,
α 1 1
|A| ≥ C − 2 |A| 2 |H| 2
so |H| ≤ C α |A|.
H , so A is a union of at most 2C
0 2α+1
translates of H .
0
40
Now we reduce further. We shall prove the following statement:
Theorem 7.5 (EPFR0 ). There is a constant η > 0 such that if X and Y are any two Fn2 -valued
random variables with d[X; Y ] > 0, then there exists Fn2 -valued random variables U and V such
that
d[U ; V ] + η(d[U ; X] + d[V ; Y ]) < d[X; Y ].
Lecture 14
is minimised. If d[U ; V ] 6= 0 then by EPFR0 (η) there exist Z, W such that τU,V [Z; W ] < d[U ; V ].
But then
Contradiction.
It follows that d[U ; V ] = 0. So there exists H such that U and V are uniform on cosets of H, so
41
Remark. If we can prove EPFR0 for conditional random variables, then by averaging we get
it for some pair of random variables (e.g. of the form U |Z = z and V |W = w).
d[X; Y ] = d[φ(X); φ(Y )] + d[X|φ(X); Y |φ(Y )] + I[X − Y : φ(X), φ(Y ) | φ(X) − φ(Y )].
Proof.
1 1
d[X; Y ] = H[X − Y ] − H[X] − H[Y ]
2 2
1
= H[φ(X) − φ(Y )] + H[X − Y | φ(X) − φ(Y )] − H[φ(X)]
2
1 1 1
− H[X | φ(X)] − H[φ(Y )] − H[Y | φ(Y )]
2 2 2
= d[φ(X); φ(Y )] + d[X | φ(X); Y | φ(Y )] + H[X − Y | φ(X) − φ(Y )]
− H[X − Y | φ(X), φ(Y )]
But the last line of this expression equals
H[X −Y | φ(X)−φ(Y )]−H[X −Y | φ(X), φ(Y ), φ(X)−φ(Y )] = I[X −Y : φ(X), φ(Y ) | φ(X)−φ(Y )].
Then
42
We shall now set W = X1 + X2 + X3 + X4 .
Equivalently,
1
I[X : Y ] ≥(d[X; Y k X + Y ] + H[X] + H[Y ] − 2H[X + Y ]).
3
Applying this to the information term (∗), we get that it is at least
1
(d[X1 + X3 , X2 + X4 ; X1 + X2 , X3 + X4 k X2 + X3 , W ] + H[X1 + X3 , X2 + X4 | W ]
3
+ H[X1 + X2 , X3 + X4 | W ] − 2H[X2 + X3 , X2 + X3 | W ])
which simplifies to
1
(d[X1 + X3 , X2 + X4 ; X1 + X2 , X3 + X4 k X2 + X3 , W ] + H[X1 + X3 | W ]
3
+ H[X1 + X2 | W ] − 2H[X2 + X3 | W ])
where we made heavy use of the observation that if i, j, k, l are some permutation of 1, 2, 3, 4, then
H[Xi + Xj | W ] = H[Xk + Xl | W ].
d[X1 + X2 , X3 + X4 ; X1 + X3 , X2 + X4 k X2 + X3 , W ]
by
d[X1 + X2 ; X1 + X3 k X2 + X3 , W ].
Therefore, we get the following inequality:
43
Lemma 7.9.
Proof. Above.
Now let X1 , X2 be copies of X and Y1 , Y2 copies of Y and apply Lemma 7.9 to (X1 , X2 , Y1 , Y2 ) (all
independent), to get this.
6d[X; Y ]
≥ 2d[X1 + X2 ; Y1 + Y2 ] + d[X1 + Y2 ; X2 + Y1 ]
+ 2d[X1 | X1 + X2 ; Y1 | Y1 + Y2 ] + d[X1 | X1 + Y1 ; X2 | X2 + Y2 ]
2
+ d[X1 + X2 ; X1 + Y1 k X2 + Y1 , X1 + Y2 ]
3
1
+ d[X1 + Y1 ; X1 + Y2 k X1 + X2 , Y1 + Y2 ]
3
OR? TODO: figure out which is correct
6d[X; Y ]
≥ 2d[X1 + X2 ; Y1 + Y2 ] + d[X1 + Y1 ; X2 + Y2 ]
+ 2d[X1 | X1 + X2 ; Y1 | Y1 + Y2 ] + d[X1 | X1 + Y1 ; X2 | X2 + Y2 ]
2
+ d[X1 + X2 | X1 + Y1 ; X2 + Y1 | X1 + Y2 ]
3
1
+ d[X1 + Y1 | X1 + Y2 ; X − 1 + X1 | Y1 + Y2 ]
3
44
Recall that we want (U, V ) such that
Lemma 7.10 gives us a collection of distances (some conditioned), at least one of which is at most
7 d[X; Y ]. So it will be enough to show that for all of them we get
6
Proof.
1 1
d[U + V ; X] = H[U + V + X] − H[U + V ] − H[X]
2 2
1 1
= H[U + V + X] − H[U + V ] + H[U + V ] − H[X]
2 2
1 1 1 1 1 1
≤ H[U + X] − H[U ] + H[V + X] − H[V ] + H[U + V ] − H[X]
2 2 2 2 2 2
1
= (d[U ; X] + d[V ; X] + d[U ; V ])
2
45
Then (U1 + U2 , V1 + V2 ) is 2C-relevant to (X, Y ).
Proof.
d[U1 + U2 ; X] + d[V1 + V2 ; Y ]
1
≤ (2d[U ; V ] + d[U ; U ] + 2d[V ; Y ] + d[V ; V ]) (by Lemma 7.12)
2
≤ 2(d[U ; X] + d[V ; Y ]) (by Entropic Ruzsa triangle inequality)
≤ 2Cd[X; Y ]
46
Proof.
1
d[U + V ; X] ≤ (d[U ; X] + d[V ; X] + d[U ; V ])
2
1
≤ (d[U ; X] + d[V ; Y ] + d[X; Y ] + d[U ; X] + d[X; Y ] + d[V ; Y ])
2
= d[U ; X] + d[V ; Y ] + d[X; Y ]
Then
1
d[U | U + V ; X] ≤ (d[U ; X] + d[V ; X] + d[U ; V ]).
2
Proof.
1 1
d[U | U + V ; X] ≤ H[U + X | U + V ] − H[U | U + V ] − H[X]
2 2
1 1 1 1
≤ H[U + X] − H[U ] − H[V ] + H[U + V ] − H[X]
2 2 2 2
But d[U | U + V ; X] = d[V | U + V ; X], so it’s also
1 1 1 1
≤ H[V + X] − H[U ] − H[V ] + H[U + V ] − H[X].
2 2 2 2
Averaging the two inequalities gives the result (as earlier).
Proof. Use Lemma 7.16. Then as soon as it is used, we are in exactly the situation we were in when
bounding the relevance of (U1 + U2 , V1 + V2 ) and (U1 + V1 , U2 + V2 ).
47
It remains to tackle the last two terms in Lemma 7.10. For the fifth term we need to bound
d[X1 + X2 | X2 + Y1 , X1 + Y2 ; X] + d[X1 + Y1 | X2 + Y1 , X1 + Y2 ; Y ].
≤ d[X1 | X1 + Y2 ; X] + d[X2 | X2 + Y1 ; X]
= 2d[X | X + Y ; X]
Now we can use Lemma 7.16, and similarly for the other terms.
In this way, we get that the fifth and sixth terms have relevances bounded above by λC for an absolute
constant λ.
48
�
Index
H 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18, 19, 20, 21, 22, 23, 26, 27, 33, 34, 35, 36, 37, 38, 39,
40, 42, 43, 45, 47
additivity 2, 3, 7, 23
bound 20, 21
entropy 2
centdist 43, 44
continuity 2, 5, 6
entropy 2
entd 33, 34, 35, 36, 37, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48
extendability 2
invariance 2, 3, 4, 5, 6
maximality 2, 4, 6, 11, 12
normalisation 2, 4
1-factor 14, 15
per 13
ruzd 32, 33
49
scentd 37, 38, 43, 44, 47
shadow 22
∆-intersecting 19
50