0% found this document useful (0 votes)
21 views33 pages

Upper Bounds On Diagonal Ramsey Numbers (After Campos, Griffiths, Morris, and Sahasrabudhe) Yuval Wigderson

Uploaded by

Sachin Barthwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views33 pages

Upper Bounds On Diagonal Ramsey Numbers (After Campos, Griffiths, Morris, and Sahasrabudhe) Yuval Wigderson

Uploaded by

Sachin Barthwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Séminaire BOURBAKI Novembre 2024

77e année, 2024–2025, no 1230

UPPER BOUNDS ON DIAGONAL RAMSEY NUMBERS


[after Campos, Griffiths, Morris, and Sahasrabudhe]

by Yuval Wigderson
arXiv:2411.09321v1 [math.CO] 14 Nov 2024

1. Introduction

Ramsey theory is a branch of combinatorics that studies order and disorder. The
underlying mantra of the field, as articulated by Theodore Motzkin, is that “com-
plete disorder is impossible”—any sufficiently large system must have a large, highly
structured subsystem. The prototypical example of a Ramsey-theoretic statement is
Ramsey’s theorem, from which the field derives its name.

Theorem 1.1 (Ramsey, 1929). — For every integer k ⩾ 2, there exists some positive
integer N such that any two-coloring of the edges of the complete graph(1) KN contains
a monochromatic Kk .

In other words, no matter how we assign the edges of KN a color, say red or blue,
we can always find k vertices such that all edges between them receive the same color.
That is, any such coloring, no matter how unstructured, contains a highly structured
subcoloring. Even this simple statement has some remarkable consequences. For exam-
ple, Schur (1917) used Theorem 1.1(2) to prove that for all sufficiently large primes p,
there exist non-trivial solutions to the equation xn + y n ≡ z n (mod p), that is, that one
cannot prove Fermat’s last theorem via a local-global argument.
Connections and applications to other fields of mathematics have been an important
feature of Ramsey theory from the very beginning. Ramsey himself had an application
in mathematical logic in mind when he proved Theorem 1.1 (indeed, his paper is titled
“On a problem of formal logic”). The influential paper of Erdős and Szekeres (1935),
which helped establish Ramsey theory as a central branch of combinatorics, is titled
“A combinatorial problem in geometry”; in it, they reproved Theorem 1.1 in order to
deduce a result on convex polygons among sets of points in Euclidean space.

Recall that the complete graph KN has N vertices, and all of the N2 possible edges are present.
(1)

(2)
Alert readers may note that Schur’s result precedes Ramsey’s by more than a decade. In fact, Schur
proved a closely related lemma, which one can now recognize as a consequence of Theorem 1.1, and
derived his theorem from that lemma.
1230–02

Today, Ramsey-theoretic theorems and techniques are of fundamental importance in


many different fields, including additive number theory, Banach space theory, discrete
geometry, ergodic theory, group theory, and theoretical computer science. These are
deep and rich connections, and are difficult to adequately summarize, so we refer to
the book of Graham, Rothschild, and Spencer (1990), to the survey of Conlon, Fox,
and Sudakov (2015), and to the lecture notes of Wigderson (2024) for more in-depth
introductions to the field.
For many applications, such as those of Schur (1917) in number theory, Ramsey
(1929) in logic, and Erdős and Szekeres (1935) in geometry mentioned above, qualitative
statements such as Theorem 1.1 suffice. However, much of the modern research in
Ramsey theory is concerned with quantitative statements: how large is the integer N
in Theorem 1.1 as a function of k? Formally, we make the following definition.
Definition 1.2. — The Ramsey number r(k) is the least integer N such that every
two-coloring of the edges of KN contains a monochromatic Kk .
Before continuing with the discussion of what is known about the function r(k), let
us pause and ask why we should study such quantitative questions, when qualitative
statements like Theorem 1.1 are elegant and already suffice for many applications.
There are several answers to this question. One answer is that for certain applications,
especially in fields such as theoretical computer science (e.g. the lower bound of Razborov
(1985) on monotone circuit complexity), qualitative statements are not sufficient, as
the application itself is quantitative. A second answer is that a better quantitative
understanding of Ramsey-theoretic results can yield new insights and new proofs of
existing theorems. For example, recent breakthroughs on the quantitative aspects of the
Ramsey-theoretic theorem of Roth (1953), due to Bloom and Sisask (2020) and Kelley
and Meka (2023) (see also the exposé of Peluse (2022)), imply that the primes contain
infinitely many three-term arithmetic progressions. This result was first proved by van
der Corput (1939), and is a special case of the landmark result of Green and Tao (2008).
However, in contrast to these earlier proofs, we now know that the primes contain
infinitely many three-term arithmetic progressions simply because there are many prime
numbers. That is, the quantitative improvements yielded a new proof of this theorem,
using essentially no properties of the primes other than their density. Finally, and no
less importantly, a third reason for studying such quantitative questions is that doing
so can reveal a world of deep and beautiful mathematics.
With that said, let us turn to the quantitative aspects of Theorem 1.1, that is, to
the determination of the function r(k) from Definition 1.2. The exact value of r(k)
is only known for k ⩽ 4, and it currently seems completely hopeless(3) to obtain an
(3)
The following famous anecdote was reported by Spencer (1994): “Erdős asks us to imagine an
alien force, vastly more powerful than us, landing on Earth and demanding the value of r(5) or they
will destroy our planet. In that case, he claims, we should marshal all our computers and all our
mathematicians and attempt to find the value. But suppose, instead, that they ask for r(6). In
that case, he believes, we should attempt to destroy the aliens.” Indeed, results of Exoo (1989) and
1230–03

exact formula for r(k), so let us content ourselves with asymptotic bounds as k → ∞.
Essentially every proof of Theorem 1.1 yields (at least implicitly) an upper bound on
r(k), by proving the existence of some integer N . The original proof of Ramsey (1929)
gave a bound of r(k) ⩽ k!, but Ramsey wrote “I have little doubt that [this upper
bound is] far larger than is necessary”. Indeed, a few years later, Erdős and Szekeres
(1935) proved the following stronger bound.
Theorem 1.3 (Erdős and Szekeres, 1935). — r(k) ⩽ 4k for every k ⩾ 2.
For about a decade, it was believed that this bound was also far larger than is
necessary, namely that r(k) should grow subexponentially as a function of k. However,
Erdős (1947) dispelled this belief by proving(4) an exponential lower bound.
√ k
Theorem 1.4 (Erdős, 1947). — r(k) ⩾ 2 for every k ⩾ 2.
After this breakthrough, progress stalled for 75 years. There were a number of
improvements to these bounds over the years, including important results of Spencer
(1975), Graham and Rödl (1987), Thomason (1988), Conlon (2009), and Sah (2023), but
all of these improvements only√affected the lower-order terms, and did not improve either
of the exponential constants 2 and 4. This impasse finally ended with a breakthrough
of Campos, Griffiths, Morris, and Sahasrabudhe (2023).
Theorem 1.5 (Campos, Griffiths, Morris, and Sahasrabudhe, 2023)
There exists a constant δ > 0 such that r(k) ⩽ (4 − δ)k for all k ⩾ 2. Concretely,
r(k) ⩽ 3.993k for all sufficiently large k.
The exact constant 3.993 is not particularly important, and a more careful analysis of
the same proof yields a slightly better bound(5) . The important thing about this result
is that it is the first result, after almost 90 years of intense study, to break the barrier
of 4k .
The new tool introduced by Campos, Griffiths, Morris, and Sahasrabudhe (2023) is the
so-called book algorithm, an elementary but ingenious technique for finding monochro-
matic book graphs in colorings of KN . As we will shortly discuss, a book graph is a
basic graph-theoretic object, whose study turns out to be closely connected to the study
of Ramsey numbers. Every known proof of Theorem 1.1 uses, implicitly or explicitly,
monochromatic book graphs.

Angeltveit and McKay (2024) show that r(5) lies takes on one of the four values {43, 44, 45, 46}, but
we remain very far from knowing the value of r(6).
(4)
Lower bounds on Ramsey numbers are somewhat beyond the scope of this exposé, so we will not
discuss the proof of Theorem 1.4 in detail. However, it would be remiss not to mention that this
beautiful proof is extraordinarily influential, and is the origin of the probabilistic method, an extremely
powerful technique in modern combinatorics.
(5)
Very recently, Gupta, Ndiaye, Norin, and Wei (2024) recast the proof of Theorem 1.5 in a different
language, which allowed them to optimize the technique and obtain a much stronger bound of r(k) ⩽
3.8k for sufficiently large k.
1230–04

The rest of this exposé is dedicated to sketching the proof of Theorem 1.5, and is
organized as follows. We begin in Section 2 with a proof of Theorem 1.3, in the course
of which we introduce book graphs as well as several of the key ideas that go into the
proof of Theorem 1.5. In Section 3, we introduce and analyze the book algorithm, and
will then fail to prove Theorem 1.5. Luckily, we will rescue the argument and complete
the proof in Section 4 by introducing two additional ingredients. We end in Section 5
with an epilogue, discussing the use of book graphs in the original proof of Ramsey
(1929) of Theorem 1.1, as well as how our understanding of book graphs and Ramsey
theory has developed over the subsequent 95 years.
Acknowledgments. — An early version of this exposé was written for the lecture notes
of a Ramsey theory course that I taught at ETH in Spring 2024; I am grateful to all
of the students in the course for their interest and insights. I would also like to thank
Nicolas Bourbaki, Marcelo Campos, Xiaoyu He, Greg Kuperberg, Vivian Kuperberg,
and Wojciech Samotij for many helpful discussions and comments on earlier drafts. I
am supported by Dr. Max Rössler, the Walter Haefner Foundation, and the ETH Zürich
Foundation.

2. The Erdős–Szekeres theorem and algorithm

In this section, we prove Theorem 1.3 (and thus Theorem 1.1). This proof is elegant
and interesting in its own right, and additionally it contains within it several of the
important ideas used in the proof of Theorem 1.5. We will actually see three different
proofs (or, more precisely, three different ways of viewing the same proof) of Theorem 1.3,
in each of the next three subsections. Each proof will help introduce some of the key
ideas that go into the proof of Theorem 1.5.

2.1. Off-diagonal Ramsey numbers


We begin with the original proof of Erdős and Szekeres (1935). Before proceeding with
the proof, we generalize the notion of Ramsey numbers from Definition 1.2. Here and
throughout, we denote by V (KN ) and E(KN ) the vertex set and edge set, respectively,
of the complete graph KN .

Definition 2.1. — Given integers k, ℓ ⩾ 2, the off-diagonal Ramsey number r(k, ℓ) is


the least integer N such that every two-coloring of E(KN ) with colors red and blue
contains a red Kk or a blue Kℓ .

Note that r(k, ℓ) = r(ℓ, k) as the colors play symmetric roles, and that r(k) = r(k, k).
The quantity r(k) is often called the diagonal Ramsey number.
With this terminology, we can prove Theorem 1.3. In fact, we will prove the following
more precise result.
1230–05

Theorem 2.2 (Erdős and Szekeres, 1935). — For all integers k, ℓ ⩾ 2, we have
!
k+ℓ−2
r(k, ℓ) ⩽ .
k−1
In particular, !
2k − 2
r(k) ⩽ < 4k .
k−1
Proof. — We proceed by induction on k + ℓ, with the base case min{k, ℓ} = 2 being
trivial. For the inductive step, the key claim is that the following inequality holds:
(2.1) r(k, ℓ) ⩽ r(k − 1, ℓ) + r(k, ℓ − 1).
To prove (2.1), fix a red/blue coloring of E(KN ), where N = r(k − 1, ℓ) + r(k, ℓ − 1),
and fix some vertex v ∈ V (KN ). Suppose for the moment that v is incident to at least
r(k − 1, ℓ) red edges, and let R denote the set of endpoints of these red edges. By
definition, as |R| ⩾ r(k − 1, ℓ), we know that R contains a red Kk−1 or a blue Kℓ . In
the latter case we have found a blue Kℓ (so we are done), and in the former case we can
add v to this red Kk−1 to obtain a red Kk (and we are again done).
So we may assume that v is incident to fewer than r(k − 1, ℓ) red edges. By the
exact same argument, just interchanging the roles of the colors, we may assume that
v is incident to fewer than r(k, ℓ − 1) blue edges. But then the total number of edges
incident to v is at most
(r(k − 1, ℓ) − 1) + (r(k, ℓ − 1) − 1) = N − 2,
which is impossible, as v is adjacent to all N − 1 other vertices. This is a contradiction,
proving (2.1).
We can now complete the induction. By (2.1) and the inductive hypothesis, we find
that
r(k, ℓ) ⩽ r(k − 1, ℓ) + r(k, ℓ − 1)
! !
(k − 1) + ℓ − 2 k + (ℓ − 1) − 2
⩽ +
(k − 1) − 1 k−1
!
k+ℓ−2
= ,
k−1
where the final equality is Pascal’s identity for binomial coefficients.

2.2. Enter the book


Definition 2.3. — Let t, m be positive integers. The book graph Bt,m consists of a copy
of Kt , plus m additional vertices which are adjacent to all vertices of the Kt , but not
adjacent to one another. Equivalently,
  Bt,m is obtained from the complete bipartite
t
graph Kt,m by adding in all the 2 possible edges in the side of size t. Equivalently,
Bt,m consists of m copies of Kt+1 which are glued along a common Kt .
1230–06

Note that two important special cases are m = 1, where Bt,1 is simply the complete
graph Kt+1 , and t = 1, where B1,m is simply the star graph K1,m , consisting of one
vertex joined to m others (and no other edges). The “book” terminology comes from
the case t = 2, in which case B2,m consists of m triangles sharing an edge, which looks,
to some extent, like a book with m triangular pages. Continuing this analogy, the Kt in
Bt,m is called the spine, and the m additional vertices of Bt,m are called the pages. We
will often denote a book as a pair of sets (A, Y ), where A is the spine and Y comprises
the pages.
The reason book graphs are important in the study of Ramsey numbers comes down
to the following simple observation.

Lemma 2.4. — Suppose that a two-coloring of E(KN ) contains a monochromatic red


copy of Bt,m , where m ⩾ r(k − t, ℓ). Then this coloring contains a red Kk or a blue Kℓ .

Proof. — Let A be the spine of the book, and let Y be its pages. By assumption,
|Y | = m ⩾ r(k − t, ℓ), so Y contains a blue Kℓ or a red Kk−t . In the former case we are
done, and in the latter case, we may add A to the red Kk−t to obtain a red Kk .
This proof should look familiar—we have already encountered the same idea in the
proof of Theorem 2.2, where we implicitly used the t = 1 case of Lemma 2.4. Indeed, in
that proof, we showed that if a coloring contains a red star with r(k − 1, ℓ) leaves, then
it contains a red Kk or a blue Kℓ . The only new idea in Lemma 2.4 is that we don’t
need to consider a single vertex (i.e. the case t = 1), but may take an arbitrary book.
Although the idea of Lemma 2.4 basically goes back to the work of Erdős and Szekeres
(1935), it was first formulated in essentially this language by Thomason (1982), who
used Lemma 2.4 to propose a natural approach to improving the upper bounds on r(k).
Namely, if one can show that every two-coloring of E(KN ) contains a monochromatic
Bt,m , for some appropriate parameters t and m ⩾ r(k − t, k), then one can plug this
into Lemma 2.4 and conclude that r(k) ⩽ N . Again, this is essentially the approach
we used in the proof of Theorem 2.2, where a simple argument based on the pigeonhole
principle showed that any coloring of E(KN ) contains a large monochromatic star, that
is, a monochromatic book with many pages and a spine of size t = 1. The idea behind
Thomason’s program is that perhaps for larger values of t, more sophisticated arguments
than the pigeonhole principle could yield stronger results, and improve the upper bounds
on r(k).
Thomason’s idea has been quite successful. The three prior asymptotic improvements
to Theorem 2.2, due to Thomason (1988), Conlon (2009), and Sah (2023), all used
this idea, roughly showing that if some two-coloring of E(KN ) does not contain a
monochromatic Bt,m (for some fixed t, m), then its structure must be such that the
proof of Theorem 2.2 can be made more efficient. A more precise structural result along
these lines is given by Conlon, Fox, and Wigderson (2022). However, for fundamental
technical reasons, none of these techniques seems capable of finding books with spine
larger than t = O(log k), whereas in order to prove a result like Theorem 1.5 in this
1230–07

k
way, one would want to take (say) t = 1000 . This was where the matter stood for a few
years, until the work of Campos, Griffiths, Morris, and Sahasrabudhe (2023).

2.3. The Erdős–Szekeres algorithm


One of the many new ingredients introduced by Campos, Griffiths, Morris, and
Sahasrabudhe (2023) is the following simple idea: rather than searching for some specific
book Bt,m , they define an exploration algorithm for finding some book, and then prove
that regardless of which book is found, the parameters involved are good enough to plug
into Lemma 2.4. Although this idea is almost a triviality, this change of perspective is
crucial for the proof of Theorem 1.5.
Before we define this exploration algorithm—which they termed the book algorithm—
let us first rephrase the proof of Theorem 1.3 as an exploration algorithm, the Erdős–
Szekeres algorithm. Let us fix a two-coloring of E(KN ). We assume that this coloring
has no monochromatic Kk , and our goal is to eventually obtain a contradiction if N is
sufficiently large. For the moment we only seek to get a contradiction if N > 4k , and
thus reprove Theorem 1.3.
For a vertex v ∈ V (KN ), we write NR (v) for the red neighborhood of v, that is, the set
of vertices w ∈ V (KN ) such that the edge vw is colored red. Similarly, NB (v) denotes
the blue neighborhood of v.
In the Erdős–Szekeres algorithm, we maintain three disjoint sets A, B, X ⊆ V (KN );
the sets A and B will grow throughout the process, whereas X will shrink. The key
property we maintain is that (A, X) is a red book, and (B, X) is a blue book; that is,
A is completely red, B is completely blue, all edges between A and X are red, and all
edges between B and X are blue. To initialize the process, we set A = B = ∅ and
X = V (KN ). We now repeatedly run the following steps.

Algorithm 2.5 (Erdős–Szekeres algorithm)


1. If |X| ⩽ 1, |A| ⩾ k, or |B| ⩾ k, stop the process.
2. Pick a vertex v ∈ X, and check whether v has at least 12 (|X| − 1) red neighbors
in X.
3. If yes, move v to A and shrink X to the red neighborhood of v. That is, update
A → A ∪ {v} and X → X ∩ NR (v), and keep B the same. Call this a red step.
4. If not, then v has at least 12 (|X| − 1) blue neighbors in X. We now move v to B,
and shrink X to the blue neighborhood of v. That is, we update B → B ∪ {v}
and X → X ∩ NB (v), and keep A the same. Call this a blue step.
5. Return to step 1.

By the way we update the sets, we certainly maintain the key property that (A, X)
and (B, X) are red and blue books, respectively, throughout the entire process, since
every time we add a vertex v to A (resp. B), we shrink X to the red (resp. blue)
neighborhood of v.
Using Algorithm 2.5, we can give an alternative proof of Theorem 1.3.
1230–08

“Algorithmic” proof of Theorem 1.3. — Let N = 4k , and fix a two-coloring of E(KN ).


Assume for contradiction that this coloring contains no monochromatic Kk . We now
run Algorithm 2.5 until it terminates.
Suppose first that the algorithm terminated because |A| ⩾ k. Throughout the process,
we maintain the property that all edges inside A are red. Therefore, if |A| ⩾ k at the
end of the process, we have found a monochromatic red Kk , a contradiction. Similarly,
if |B| ⩾ k at the end of the process, we have found a blue Kk , another contradiction.
We may thus assume that at the end of the process, we have |A| < k and |B| < k.
Therefore, the process can only end when |X| ⩽ 1. The key observation now is that
at every step of the process, we have
(2.2) |X| ⩾ 2−|A|−|B| N.
Indeed, this certainly holds when the process begins, for then we have |A| = |B| = 0
and |X| = N . We can now check that it holds by induction: every time we do a red
step, we increase |A| by 1, and decrease |X| to at least(†) 12 |X|, thus preserving the
validity of (2.2). Similarly, in a blue step, we increase |B| by 1 and decrease |X| to at
least 12 |X|, again preserving (2.2). By induction, we conclude that (2.2) also holds at
the end of the process.
At the end of the process, we thus have
N ⩽ 2|A|+|B| |X| < 2k+k · 1 = 4k ,
where we plug in our upper bounds |A| < k, |B| < k, |X| ⩽ 1. This contradiction
completes the proof.
It is worth noting that, as presented, this argument only proves Theorem 1.3—that
is, the bound r(k) ⩽ 4k —rather than the sharper estimate given in Theorem 2.2. It
is an interesting and instructive exercise to figure out how to modify Algorithm 2.5 to
obtain the stronger bound proved in Theorem 2.2 via a similar “algorithmic” proof.
For future reference (and as a hint to solving the exercise
  above), it is good to observe
that the off-diagonal Erdős–Szekeres bound r(k, ℓ) ⩽ k+ℓ ℓ
can also be obtained in this

way. To do so, set γ = k+ℓ . Then we can modify the Erdős–Szekeres algorithm as
follows.

Algorithm 2.6 (Off-diagonal Erdős–Szekeres algorithm)


1. If |X| ⩽ 1, |A| ⩾ k, or |B| ⩾ ℓ, stop the process.
2. Pick a vertex v ∈ X, and check whether v has at least (1 − γ)|X| red neighbors
in X.

Strictly speaking, we should write here 12 (|X| − 1), although the claimed bound (2.2) can also be
(†)

proved inductively by judicious use of ceiling signs. However, from now on, we will start ignoring such
additive ±1 terms. Of course they need to be carefully dealt with to obtain a correct proof, but they
will always contribute a negligible error, which we will ignore. We will add the symbol (†) to mark the
places where we omit such additive errors.
1230–09

3. If yes, move v to A and shrink X to the red neighborhood of v. That is, update
A → A ∪ {v} and X → X ∩ NR (v), and keep B the same. Call this a red step.
4. If not, then v has at least(†) γ|X| blue neighbors in X. We now move v to B,
and shrink X to the blue neighborhood of v. That is, we update B → B ∪ {v}
and X → X ∩ NB (v), and keep A the same. Call this a blue step.
5. Return to step 1.

The point now is that we obtain the red Kk or blue Kℓ if |A| ⩾ k or |B| ⩾ ℓ, and
thus we may assume that we do fewer than k red steps and fewer than ℓ blue steps. X
shrinks by a factor of 1 − γ at every red step, and by a factor of γ at every blue step,
so throughout the process we have

|X| ⩾ (1 − γ)|A| γ |B| N.

On the other hand, the process only terminates if |X| ⩽ 1, so this implies N <
(1 − γ)−k γ −ℓ . One can check, by Stirling’s approximation, that
!
k+ℓ
= 2o(k) γ −ℓ (1 − γ)−k

 
k+ℓ
for all ℓ ⩽ k, and hence this gives a contradiction if we choose N of the form 2o(k) ℓ
.
This recovers Theorem 2.2 up to the subexponential error term.

3. The book algorithm

We are now ready to describe the book algorithm of Campos, Griffiths, Morris, and
Sahasrabudhe (2023). As before, we fix a two-coloring of E(KN ), and assume that there
is no monochromatic Kk ; our goal is to obtain a contradiction if N is sufficiently large.
Throughout the process, we maintain four disjoint sets A, B, X, Y , with the following
properties: (A, X) is a red book, (B, X) is a blue book, and (A, Y ) is another red
book(6) . Thus, the only difference from the Erdős–Szekeres algorithm is the presence
of the new set Y . At the end of the process, our goal is to output the pair (A, Y ),
and to prove that t := |A| and m := |Y | satisfy m ⩾ r(k − t, k), so that we can apply
Lemma 2.4 to obtain a contradiction. We initialize the process with A = B = ∅, and
X ⊔ Y an arbitrary partition of V (KN ) with(†) |X| = |Y |. By permuting the colors if
necessary, we may assume that at the beginning of the process, at least half the edges
between X and Y are red.

(6)
Equivalently, we could say that (B, X) is a blue book and (A, X ∪ Y ) is a red book.
1230–10

X Y

As in the Erdős–Szekeres algorithm, we will iteratively build this picture by moving


vertices from X to A or B, and then shrinking X and Y . A move from X to A will be
called a red step, and a move from X to B will be called a blue step.
What is the advantage of maintaining such a picture? Recall that in the Erdős–
Szekeres algorithm, |X| shrinks by a factor of two whenever we do a red or a blue
step, hence we end up with |X| ⩾ 2−|A|−|B| N as in (2.2), yielding the bound r(k) < 4k .
However, it is reasonable to hope that since we are imposing “half as many constraints”
on Y as on X—that is, we are only maintaining that the edges between A and Y are
red, and not that any edges incident to Y are blue—we may be able to obtain better
control on |Y |. Indeed, we might hope that every blue step does not shrink Y at all,
while every red step shrinks Y by only a factor of two, as before, yielding(7) a bound of
|Y | ≳ 2−|A| N .
In other words, our goal will be to “sacrifice” the vertices in X, and use them as the
fuel we use to build the large red book (A, Y ). This approach comes with a fundamental
asymmetry between the colors, in marked contrast to the Erdős–Szekeres proof. We
will really insist on finding a red book (A, Y ), and will do our best to build it. Only
when doing so is really impossible will we take blue steps.
Because of this, our preferred move would be taking a red step. That is, we would like
to pick a vertex v ∈ X, move v to A, and update X → X ∩ NR (v). Moreover, since we
need to maintain that (A, Y ) is a red book, we will also need to update Y → Y ∩ NR (v).
In particular, when deciding whether to add a vertex v ∈ X to A, we need to check not
only that v has many red neighbors in X—so that X doesn’t shrink too much—but also
that v has many red neighbors in Y , so that Y doesn’t shrink too much. In particular,
we see that in addition to tracking the sizes of A, B, X, and Y , we will also need to
track a fifth parameter, the red edge density between X and Y . We denote this density
by
eR (X, Y )
p := dR (X, Y ) = ,
|X||Y |

If we could really obtain such strong control on |Y |, we would show that r(k) ≲ 2k , a dramatic
(7)

improvement over Theorem 1.3. Unfortunately, and unsurprisingly, the devil is in the details, and a lot
of work is needed to make such an approach work, and the extra complications yield a substantially
weaker bound.
1230–11

where eR (X, Y ) denotes the number of red edges with one endpoint in X and the other
in Y . Recall that, by assumption, we have p ⩾ 12 at the beginning of the process.
Note that every time we add a vertex to A or to B (and thus have to shrink X and
potentially Y ), this red density p might change. For our simplified exposition of the
proof of Theorem 1.5, we will make the following (completely unjustified) assumption.
Assumption 3.1. — At every step of the process, every vertex in X has exactly p|Y |
red neighbors in Y , and every vertex in Y has exactly p|X| red neighbors in X. In other
words, the bipartite graph of red edges between X and Y is bi-regular.
We stress again that X, Y, and p change throughout the process, but Assumption 3.1
asserts that whenever such a change happens, we magically end up back with the same
bi-regularity.
While Assumption 3.1 is clearly a bogus assumption, it is actually possible to (essen-
tially) make it rigorous. Indeed, the definition of p implies that the vertices in X have,
on average, p|Y | red neighbors in Y . A basic but important observation, used frequently
in extremal combinatorics, is that one can often convert such average degree conditions
to minimum or maximum degree conditions, by deleting a few “outlier” vertices. In
the rigorous proof of Theorem 1.5, one must repeatedly “clean” X by removing such
outliers, and thus one can indeed maintain an approximate version of Assumption 3.1,
at least ensuring that all vertices in X have roughly the same red degree(8) . However, for
our exposition, we ignore these important technicalities, and stick with Assumption 3.1.

3.1. The steps of the book algorithm


The two basic steps in the book algorithm will again be red steps and blue steps,
as in the Erdős–Szekeres algorithm. Note that when we perform a blue step (moving
v ∈ X to B and updating X → X ∩ NB (v)), we do not need to update Y at all, since
these changes do not affect the fact that (A, Y ) is a red book. In particular, thanks
to Assumption 3.1, the red density between X and Y remains unchanged during a
blue step, since all the remaining vertices in X still have exactly p|Y | red neighbors
in Y . However, as discussed above, red steps can affect p, since in a red step we update
X → X ∩ NR (v) and Y → Y ∩ NR (v), and thus our value of p is updated to
p′ := dR (X ∩ NR (v), Y ∩ NR (v)).
Let us call a vertex prosperous if p′ ⩾ p − α, for some parameter α we will shortly choose.
We will then perform a red step only if there is a vertex v ∈ X which is prosperous,
and which has at least 12 |X| red neighbors in X. In such a step, we increase |A| by 1,
decrease |X| by a factor of 2, decrease Y by a factor of p (since v has p|Y | red neighbors
in Y , by Assumption 3.1), and update p to at least p − α.

It is much harder to ensure degree-regularity in both X and Y simultaneously. Luckily, it turns out
(8)

that degree-regularity in Y is substantially less important in the argument, and in the formal proof
one doesn’t even ensure an approximate version of it. In its place, one uses a judicious choice of the
vertex v.
1230–12

In Algorithm 2.5, we were always able to do either a red or a blue step, since every
vertex in X has at least 12 |X| neighbors in X in one of the colors(†) . However, if we
require that our red vertex v be prosperous, then we may be in a position where neither
a red nor a blue step is possible. Namely, we get stuck if all vertices in X have at least
1
2
|X| red neighbors in X, but none of them is prosperous.
In this case, we implement a density-boost step, which is one of the other main
innovations of Campos, Griffiths, Morris, and Sahasrabudhe (2023). Pick a vertex
v ∈ X, and consider the following picture.

v
sity U
red den
T <p−α

X Y
Since v is not prosperous, the red edge density between T := NR (v) ∩ X and U :=
NR (v) ∩ Y must be less than p − α. However, by Assumption 3.1, every vertex in U has
p|X| red neighbors in X. Therefore, setting S := NB (v) ∩ X, we find that(†)
p|X||U | = eR (X, U ) = eR (T, U ) + eR (S, U ) < (p − α)|T ||U | + eR (S, U ).
Rearranging, we conclude that
eR (S, U ) > |U | (p|X| − (p − α)|T |) .
|S|
Let β := |X| , so that β records what fraction of the edges from v to the rest of X are
blue. Then |S| = β|X| and(†) |T | = (1 − β)|X|, and the above can be rewritten as
! !
p (p − α)(1 − β) 1−β
eR (S, U ) > |U ||S| − = |S||U | p + α ,
β β β
which implies
1−β
(3.1) dR (S, U ) > p + α .
β
Note too that since we cannot do a blue step, we must have β ⩽ 21 , implying that
dR (S, U ) > p + α. In other words, in the bad situation where we cannot perform
a red or a blue step, we can perform a density-boost step, where we replace X by
S = NB (v) ∩ X, replace Y by U = NR (v) ∩ Y , and thus boost the density from p to at
least p + α 1−β
β
⩾ p + α.
1230–13

Note that density-boost steps are expensive, in that they shrink X and Y , but don’t
actually make progress by increasing |A| or |B|. In particular, we don’t a priori have
any control on how many density-boost steps we perform. Luckily, there is a simple fix
to this problem: since we are in any case updating X → X ∩ NB (v) in a density-boost
step, we may add v to B for free, while maintaining the property that (B, X) is a blue
book. That is, a density-boost step can also be made a type of blue step, and thus we
necessarily perform at most k density-boost steps without creating a blue Kk .
The final piece we need before formally defining the book algorithm is to choose α,
which determines the threshold above which a vertex is considered prosperous. Note
that every red step may decrease p by α, so if we end up doing up to k red steps, we
may decrease p from its initial value of 12 to 12 − αk. Moreover, whenever we do a red
step, we also shrink Y by a factor of (the current value of) p. In particular, if p ever
drops below (say) 14 , we are in big trouble: then Y shrinks by a factor of 4 at every
step, and we have no real hope of proving a bound stronger than r(k) ⩽ 4k . As such,
we want to pick α ⩽ kε , so that even after doing k red steps, we have not meaningfully
decreased p below its initial value. Here, one can think of ε as a tiny absolute constant,
although in the final analysis we will actually pick ε to tend to 0 slowly with k.
Unfortunately, there is a trade-off. Recall that we have very little control over the
effect of the density-boost steps, because these are the steps we do as a last resort.
In fact, essentially our only way of bounding their total effect is the observation that
p ⩽ 1 throughout the entire process, which should imply that we cannot do too many
density-boost steps, as that would drive the red density up too high. The problem is
that a density-boost step only increases p by roughly α, so if we pick α ⩽ kε , then even
if we do k density-boost steps (the maximum possible number), we will only increase
the density by ε. In particular, we have no hope of reaching the threshold of p = 1,
where we finally gain some control over the density-boost steps.
The way to resolve this apparent contradiction is to pick α adaptively. Indeed, suppose
that at some point in the process, we have reached a red density of, say, p = 0.51. At
this point, it doesn’t make sense to have the cutoff be α = kε ; we wouldn’t even mind
losing an absolute constant of 1/100 in the density, since that will only bring us back to
our original value of p! So we will instead pick α to be dependent on our current value
of p; namely, we set

1
ε
k
if p ⩽ 2
+ k1 ,
(3.2) α(p) :=
ε(p − 1 ) otherwise.
2

Again, the point of this is that, if we are at some step of the process where p > 21 , then
we can afford to lose more in the density without every dropping p into the “danger zone”
of being substantially smaller than 12 . The advantage of this is that the amount we win
in a density-boost step is itself proportional to α = α(p). So if we have already done
some number of density-boost steps, such that p > 12 , each subsequent density-boost
1230–14

boosts the density even further, at an exponential rate, thus rapidly bringing us closer
to the threshold p = 1.

3.2. Formal definition of the book algorithm


With all of these preliminaries, we are finally able to define the book algorithm.

Algorithm 3.2 (Book algorithm)


1. If |X| ⩽ 1, |A| ⩾ k, or |B| ⩾ k, stop the process.
2. Let p = dR (X, Y ) be the current red density between X and Y . Define α = α(p)
as in (3.2), where ε is some fixed parameter throughout the process.
3. Check whether some vertex v ∈ X has at least 12 |X| blue neighbors in X. If
yes, perform a blue step, by updating
A → A, B → B ∪ {v}, X → X ∩ NB (v), Y → Y,
and return to step 1.
4. Check whether some vertex v ∈ X is prosperous, meaning that dR (NR (v) ∩
X, NR (v) ∩ Y ) ⩾ p − α. If yes, perform a red step, by updating
A → A ∪ {v}, B → B, X → X ∩ NR (v), Y → Y ∩ NR (v),
and return to step 1.
5. In the remaining case, pick some vertex v ∈ X. It is not prosperous, and has
β|X| blue neighbors in X, for some β ⩽ 12 . We now perform a density-boost
step, by updating
A → A, B → B ∪ {v}, X → X ∩ NB (v), Y → Y ∩ NR (v),
and return to step 1.

For future reference, the following table records how the key parameters change during
the execution of the book algorithm, following the discussion above.

|A| |B| |X| |Y | p


blue step – +1 × 21 – –
red step +1 – × 21 ×p −α
density-boost step – +1 ×β ×p +α 1−β
β

Table 3.1. How the various parameters evolve during Algorithm 3.2. Dashes
denote quantities that are unchanged. In general, the entries in the table are
lower bounds, e.g. a density-boost step may increase p by more than α 1−β
β , and
a red step may shrink X to more than half of its previous size.
1230–15

3.3. Analysis of the book algorithm


Suppose that, when the book algorithm ends, we have done t red steps, s density-
boost steps, and b blue steps. We may assume that t < k and that s + b < k, since
otherwise we have found a monochromatic Kk . We now collect a number of estimates
on the various parameters associated with the process.

1
Lemma 3.3. — We have p ⩾ 2
− ε throughout the entire process.

Proof. — As discussed above, every blue step keeps p constant (by Assumption 3.1),
every density-boost step can only increase p, and every red step decreases p by at most
α(p). Additionally, the choice of α(p) shows that p − α(p) ⩾ 12 whenever p > 12 + k1 ,
whereas α(p) = kε whenever p ⩽ 21 + k1 . Since we do t ⩽ k red steps, p can never drop
below 12 − t · kε ⩾ 12 − ε.

It will now be convenient to pick ε = k −1/4 , although we note that this choice is not
particularly important; many functions of k which tend to 0 neither too slowly or too
quickly would work.

Lemma 3.4. — At the end of the process, we have |Y | ⩾ 2−t−s−o(k) N .

Proof. — Y is unchanged by every blue step. On the other hand, during each red or
density-boost step, we decrease Y by a factor of p, by Assumption 3.1. By Lemma 3.3,
we have that p ⩾ 12 − ε at every such step, hence
t+s
1 N

|Y | ⩾ −ε · = 2−t−s−o(k) N,
2 2
where we plug in our choice of ε and recall that we start the process with(†) |Y | = N
2
.

We next turn to bounding |X| at the end of the process. Just as in the Erdős–Szekeres
algorithm, the main point of this is to estimate how many steps we do, since we recall
that the process terminates when |X| ⩽ 1.
Recall that at each density-boost step, we shrink X by a factor of β, where β is defined
as the fraction |NB (v) ∩ X|/|X| of blue neighbors of the currently chosen vertex v. Let
β1 , . . . , βs be the sequence of values of β for each of the s blue steps. Let β be the
harmonic mean of β1 , . . . , βs , that is, define β by
s
1 1X 1
= .
β s i=1 βi

Lemma 3.5. — At the end of the process, we have

|X| ⩾ 2−t−b−o(k) β s N.
1230–16

Proof. — Every red or blue step shrinks X by at most a factor(†) of 2, hence the factor
of 2−t−b . On the other hand, the ith density-boost step decreases |X| by a factor of βi .
The inequality of arithmetic and geometric means implies that
s s
!1/s
1 1X 1 Y 1
= ⩾ ,
β s i=1 βi i=1 βi

hence the contribution of the density-boost steps is


s
βi ⩾ β s .
Y

i=1

Together with the fact that we begin the process with(†) |X| = N
2
, this yields the claimed
bound.
The final, and perhaps most important, result we need is an estimate on the number
of density-boost steps. As discussed above, we can get a good estimate on this quantity
because of the “dynamic” choice of α; this is the content of the next lemma, which is
called the zig-zag lemma by Campos, Griffiths, Morris, and Sahasrabudhe (2023).

Lemma 3.6 (Zig-zag lemma). — We have


s
X 1 − βi
⩽ t + o(k).
i=1 βi
We won’t give a full proof of Lemma 3.6, but the following sketch captures the main
ideas.
Proof sketch for Lemma 3.6. — For the moment, let us assume that we stay in the
regime p ⩾ 12 + k1 . It will be more convenient to reparametrize p, by defining q := p − 21 .
By our choice of α in (3.2), we have that α(p) = εq.
Suppose we do one step of the book algorithm, and thus update p to some new
value p′ (and update q to q ′ = p′ − 12 ). If the step we do is a blue step, then by
Assumption 3.1, the density p does not change, hence p′ = p and q ′ = q. If, instead,
we do a red step, then v is prosperous, and hence p′ ⩾ p − α(p). This implies that
q ′ ⩾ q − α(p) = q − εq = (1 − ε)q. Finally, if this step is the ith density-boost step,
then by (3.1) we have that
1 − βi
p′ ⩾ p + α(p)
βi
and thus !
′ 1 − βi 1 − βi
q ⩾ q + α(p) =q 1+ε .
βi βi
Putting this all together, we conclude that at each step of the algorithm, we have

 1 when we do a blue step,
q′



(3.3) ⩾ 1−ε when we do a red step,
q 
1 + ε 1−βi

when we do the ith density-boost step.

βi
1230–17

Let qfinal denote the value of q at the end of the algorithm, and let qinitial be the value
of q at the beginning of the algorithm. Multiplying (3.3) over all steps of the algorithm,
we find that
s
!
qfinal t
Y 1 − βi
⩾ (1 − ε) 1+ε ,
qinitial i=1 βi
since we get a contribution of 1 − ε from each of the t red steps and a contribution
of 1 + ε 1−β
βi
i
from the ith density-boost step. Combining this inequality with the
approximation 1 + x ≈ ex , an approximation that is valid for sufficiently small(9) |x|,
we find that
s s
! !!
qfinal 1 − βi 1 − βi
≳ e−εt exp ε
X X
(3.4) = exp ε −t + .
qinitial i=1 βi i=1 βi
We have that qfinal ⩽ 12 , since p ⩽ 1 throughout the whole process. On the other hand,
since we are assuming that p ⩾ 12 + k1 throughout, we have that qinitial ⩾ k1 . Therefore,
qfinal k
qinitial
⩽ 2
⩽ k. Plugging this into (3.4) and taking logarithms, we find that
s
! !
qfinal X 1 − βi
log k ⩾ log ≳ ε −t + ,
qinitial i=1 βi
implying that
s
X 1 − βi log k
≲t+ = t + o(k),
i=1 βi ε
where we plug in our choice of ε = k −1/4 in the final equality.
This proof worked under the assumption that we remain throughout in the range
p ⩾ 12 + k1 . Let us now work in the complementary regime, where p < 12 + k1 throughout
the whole process. In this case, recalling the definition of α(p) from (3.2), we have





0 when we do a blue step,

(3.5) p −p⩾ −ε
 k
when we do a red step,
1−βi

+ ε

· when we do the ith density-boost step.
k βi

Adding up (3.5) over all steps of the process, we conclude that


s s
!
ε εX 1 − βi ε X 1 − βi
(3.6) pfinal − pinitial ⩾− t+ = −t + .
k k i=1 βi k i=1 βi
Recall that we had pinitial ⩾ 12 , and we are now assuming that we remain in the regime
p < 21 + k1 throughout, hence in particular pfinal < 12 + k1 . Therefore pfinal − pinitial ⩽ k1 .

This approximation can be made rigorous, but we’re still cheating in this derivation of (3.4). We
(9)

have no guarantee that ε 1−β


βi is small, since we have no control over βi , and thus no guarantee that
i

the approximation is valid. A correct proof of this lemma would need to separate out the contribution
from the steps where βi is very small, and thus where such an approximation is not accurate.
1230–18

Plugging this in to (3.6) and rearranging, we conclude that


s
X 1 − βi 1
⩽ t + = t + o(k),
i=1 βi ε

again by our choice of ε.


We have thus proved the desired inequality in the two extreme cases, namely when
p ⩾ 12 + k1 throughout the process, and when p < 21 + k1 throughout the process. Of
course, in reality, we may move between these two regimes multiple times during the
execution of Algorithm 3.2. However, by breaking the execution of the algorithm into
intervals in which we remain in one regime or the other, it is not too difficult to combine
the arguments above and conclude that the claimed inequality always holds.

As an immediate consequence of Lemma 3.6, we obtain an upper bound on the number


s of density-boost steps.

Lemma 3.7. — We have


!
β
s⩽ t + o(k).
1−β
Equivalently,
s
(3.7) β ⩾ (1 + o(1)) .
s+t

Proof. — By the definition of β, we have that


s
1X 1 − βi 1−β
= .
s i=1 βi β

Plugging this into Lemma 3.6 shows that


s
! !
β X 1 − βi β
s= ⩽ (t + o(k)).
1 − β i=1 βi 1−β

Moreover, since each βi is at most 12 , we find that 1−ββ


⩽ 1, yielding the first claimed
(10)
bound. The second bound follows by solving for β.

Again, there is some cheating going on here—one can only obtain the claimed estimate if s is not
(10)

too small as a function of k, in order to absorb the error terms. In the formal proof, one has to separate
into cases: the bound (3.7) is valid if s is not too small, whereas if s is very small one can complete
the proof of Theorem 1.5 via a simpler analysis.
1230–19

3.4. Proof attempt for Theorem 1.5


We are now ready to put everything together. Let C be a constant, which we will
optimize later, and let N = 2(1+C)k . Our plan is to show that if C is chosen appropriately,
then r(k) ⩽ N = 2(1+C)k . Since our goal is to prove Theorem 1.5, we thus hope to
be able to prove this for some fixed C < 1, but for the moment, let us leave C as an
unspecified constant.
We proceed by contradiction, so let us assume that there is a two-coloring of E(KN )
with no monochromatic Kk . We now run Algorithm 3.2 on this coloring. We certainly
obtain the desired contradiction if the algorithm finds a red or blue Kk , so let us assume
that this does not happen. Therefore, the process only ends when |X| ⩽ 1, which by
Lemma 3.5 implies that
N ⩽ β −s 2t+b+o(k) ⩽ β −s 2t+(k−s)+o(k) ,
where we plug in the bound b + s ⩽ k, arising from the fact that B never becomes a
blue Kk . We now plug in the lower bound on β from Lemma 3.7 to find that
t + s s k+t−s+o(k)
 
(3.8) N⩽ 2 .
s
At this point everything is in terms of the parameters s and t, which we expect to
scale linearly in k, so it is more convenient to reparametrize everything in terms of
x := kt , y := ks . In terms of these parameters, and recalling that N = 2(1+C)k , we can
rewrite (3.8) as
!
x+y
C − o(1) ⩽ (x − y) + y log2 =: G(x, y).
y
Recall that our goal is to obtain a contradiction, and the only thing we have not
yet specified is the value of C we choose. In particular, if we pick C to be larger
than the maximum value of G(x, y) over the square [0, 1]2 , then we certainly obtain
a contradiction. As our goal is to eventually obtain a contradiction with some fixed
C < 1, we would be happy if this maximum value were less than 1. However, this is not
true, as shown on the following contour plot; the maximum value of G is roughly 1.33.
1230–20

Of course, if we recall our original strategy, it is way too much to hope for that the
maximum of G is less than 1. Indeed, the whole point of the book algorithm was to
output the book (A, Y ), and to ensure that its parameters are good enough to apply
Lemma 2.4.
What are the parameters of this book? Well, we have that |A| = t by definition, and
m := |Y | ⩾ 2−t−s−o(k) N
by Lemma 3.4. By Lemma 2.4, we know that if m ⩾ r(k−t, k), we find a monochromatic
Kk , yielding our desired contradiction. Thus, we may assume that m < r(k − t, k),
implying that
(3.9) N ⩽ 2t+s+o(k) m < 2t+s+o(k) r(k − t, k).
By Theorem 2.2, we know that
!
2k − t
r(k − t, k) ⩽ .
k−t
 
A useful upper bound on binomial coefficients is that ab ⩽ 2aH(b/a) , where H(z) :=
−z log2 z − (1 − z) log2 (1 − z) is the binary entropy function. Plugging this in, we find
that
! !
2k − t k−t 1−x
  
log2 r(k − t, k) ⩽ log2 ⩽ (2k − t)H = k (2 − x)H .
k−t 2k − t 2−x
Taking logarithms of (3.9) and dividing by k shows that
1−x
 
C − o(1) ⩽ −1 + (x + y) + (2 − x)H =: F (x, y).
2−x
Putting all of this together, we have shown that either we derive the claimed contradic-
tion, or C − o(1) ⩽ min{F (x, y), G(x, y)}. Again, we have the freedom to choose C, so
we can obtain the desired contradiction if we set C to be larger than the maximum of
min{F (x, y), G(x, y)} on the square [0, 1]2 . In particular, as our goal is to pick C < 1,
we are done if min{F (x, y), G(x, y)} < 1 for all x, y ∈ [0, 1]. Here is a contour plot of F :
1230–21

This looks great! The areas where F is large seem to be different from the areas where
G is large, so there should be no problem to show that their maximum is always strictly
less than 1. In fact, here are the regions where F > 1 and G > 1.

Bad news! There’s a big red region where both functions are greater than 1, and our
whole proof strategy fails. In fact, one can check that min{F (x, y), G(x, y)} attains
a maximum value of roughly 1.054. That is, in order to obtain a contradiction, the
smallest value of C we could pick is 1.054, and thus this whole complex proof is only
able to show that r(k) ⩽ 22.054k ≈ 4.15k , which is worse than the bound of Theorem 1.3.

4. Rescuing the argument

The fact that min{F (x, y), G(x, y)} > 1 for some (x, y) ∈ [0, 1]2 is a fundamental
obstruction to this approach. In order to overcome it, we will use two tricks, both of
which involve tweaking the book algorithm.

4.1. Changing the cutoff


The first new idea is to examine our criterion for deciding whether to do red or blue
steps. Recall that, as in Algorithm 2.5, we do a blue step if some vertex in X has at
least 12 |X| blue neighbors in X, and otherwise we do a red or density-boost step. In the
Erdős–Szekeres setting, this is the optimal choice: since the argument is symmetric in
the two colors, it would be strictly worse to use any other cutoff.
However, the book algorithm is highly asymmetric, so we should re-examine this
assumption. Recall that at the end of the process, we output the red book (A, Y ),
where |A| = t and |Y | ⩾ 2−t−s−o(k) N by Lemma 3.4. The fact that |Y | decays like 2−t N
is unavoidable, but the fact that |Y | decays exponentially in s shows that density-boost
steps are very expensive, in terms of making this trade-off very bad. As such, we
should try to minimize the number s of density-boost steps we do, in terms of t. Since
β
Lemma 3.7 tells us that s ⩽ 1−β t + o(k), the natural way to decrease s is to decrease β.
1230–22

To achieve this, we do the following. We pick a number µ ∈ [0, 1], which will be fixed
throughout the argument. In step 3 of Algorithm 3.2, we now perform a blue step if
some vertex in X has at least µ|X| blue neighbors in X; otherwise, we proceed to the
subsequent steps of the algorithm unchanged. An important effect of this choice is that
now, when we perform the ith density-boost step, the parameter βi is constrained to
be at most µ, and thus also β ⩽ µ at the end of the process. In particular, if we pick
µ < 12 , we will have accomplished our goal of decreasing s relative to t. This suggests
we should pick µ very small, but of course there is a trade-off—if µ is very small then
every blue step decreases |X| by a lot, and thus the process will terminate quickly. To
balance these two effects, we want to pick µ to be neither too large nor too small. For
completeness, here is the description of our modified book algorithm.

Algorithm 4.1 (Book algorithm with cutoff µ)


1. If |X| ⩽ 1, |A| ⩾ k, or |B| ⩾ k, stop the process.
2. Let p = dR (X, Y ) be the current red density between X and Y . Define α = α(p)
as in (3.2), where ε is some fixed parameter throughout the process.
3. Check whether some vertex v ∈ X has at least µ|X| blue neighbors in X. If
yes, perform a blue step, by updating
A → A, B → B ∪ {v}, X → X ∩ NB (v), Y → Y,
and return to step 1.
4. Check whether some vertex v ∈ X is prosperous, meaning that dR (NR (v) ∩
X, NR (v) ∩ Y ) ⩾ p − α. If yes, perform a red step, by updating
A → A ∪ {v}, B → B, X → X ∩ NR (v), Y → Y ∩ NR (v),
and return to step 1.
5. In the remaining case, pick some vertex v ∈ X. It is not prosperous, and has
β|X| blue neighbors in X, for some β ⩽ µ. We now perform a density-boost
step, by updating
A → A, B → B ∪ {v}, X → X ∩ NB (v), Y → Y ∩ NR (v),
and return to step 1.

|A| |B| |X| |Y | p


blue step – +1 ×µ – –
red step +1 – ×(1 − µ) ×p −α
density-boost step – +1 ×β ×p +α 1−β
β

Table 4.1. How the various parameters evolve during Algorithm 4.1. The
only difference from Table 3.1 is that blue and red steps shrink X by factors
of µ and 1 − µ, respectively.
1230–23

In this modified book algorithm, Lemmas 3.3, 3.4, 3.6 and 3.7 remain true; the only
change is that Lemma 3.5 needs to be modified to the following statement, reflecting
the fact that each blue (resp. red) step shrinks X by a factor(†) of µ (resp. 1 − µ) in the
worst case. The proof is otherwise identical to that of Lemma 3.5.

Lemma 4.2 (Modified Lemma 3.5). — At the end of the process, we have
|X| ⩾ 2−o(k) (1 − µ)t µb β s N.
In particular, since b + s ⩽ k, we have
|X| ⩾ 2−o(k) (1 − µ)t µk−s β s N.

Since the process terminates when |X| ⩽ 1, we conclude from Lemma 4.2 that
s+t s
 
o(k) −t −(k−s) −s o(k) −t −(k−s)
(4.1) N ⩽ 2 (1 − µ) µ β ⩽ 2 (1 − µ) µ ,
s
where the final inequality follows from the lower bound on β in Lemma 3.7. Taking
logarithms and dividing by k, we conclude that
! !
1 1 x+y
C − o(1) ⩽ −1 + x log2 + (1 − y) log2 + y log2 =: Gµ (x, y).
1−µ µ y
Note that in the case µ = 12 , we precisely recover the previous function G, which of
course makes sense as we are then recovering Algorithm 3.2. Here are contour plots of
1 2 3 4
Gµ for µ ∈ { 10 , 10 , 10 , 10 }.

1 2 3 4
And here are pictures of the regions where F > 1 and Gµ > 1, for µ ∈ { 10 , 10 , 10 , 10 }.

4
It looks like we’re already done at µ = 10 = 25 , but unfortunately we’re not: one can
check that min{F (x, y), G 2 (x, y)} attains a maximum value of 1.0017, hence we only
5
obtain a bound of r(k) ⩽ 4.006k . Here is a closer view of what happens at µ = 25 :
1230–24

But we’re definitely making progress! The bad red region is extremely small now, and
our maximum value of min{F, Gµ } is extraordinarily close to 1. Unfortunately, one
can check that no choice of µ will actually decrease this value below 1—which would
complete the proof—so another idea is needed.

4.2. Off-diagonal Ramsey numbers


So far, we have played with the parameter µ in order to vary the region where Gµ > 1,
and have almost succeeded in making it disjoint from the region where F > 1. We
will now try to tweak F , in order to move this latter region. Recall that the way we
defined F was in terms of an upper bound on r(k − t, k). If we can obtain a better
upper bound on r(k − t, k), then F will decrease, and we may be in business. In fact, we
don’t need to improve the upper bound on r(k − t, k) in all cases; it suffices to improve
this upper bound for pairs (k − t, k) near the problematic region where both F and G 2
5
are greater than 1. Since this problematic region is near x ≈ 0.75, we could hope to
improve the upper bound on r(k − t, k) where k − t ≈ 0.75k, or equivalently on r(k, ℓ)
where ℓ ≈ k4 .
There is actually a good reason to expect this to work. Recall Algorithm 2.6, which
yields an upper bound on such off-diagonal Ramsey numbers. In that algorithm, we

choose whether to do red or blue steps based on the cutoff γ = k+ℓ . If we just blindly

import the same idea into the book algorithm, it makes sense to set µ ≈ k+ℓ in order
k 1
to upper-bound r(k, ℓ). In case ℓ ≈ 4 , we have µ ≈ 5 . In our argument above, we
saw that it is good to take µ small, except for the trade-off that now X shrinks by a
factor of µ for every blue step. However, in this regime, we will do at most ℓ blue steps,
and µℓ ≈ ( 15 ) 4 ≈ 0.67k ; in contrast, in the argument above, the blue steps shrink X
k

by ( 25 )k = 0.4k , which is a much more significant decrease. Hence we may expect the
trade-offs to work well for us.
1230–25

For completeness, here is our final book algorithm, suited for upper-bounding r(k, ℓ).
We set µ = k+ℓ ℓ
and ε = k −1/4 . We initiate A = B = ∅, and X ⊔ Y an arbitrary
partition of V (KN ) into two equally-sized parts. Let pinitial = dR (X, Y ) be the density
of red edges between X and Y at the beginning of the process, and define

ε
k
if p ⩽ pinitial + k1 ,
(4.2) α(p) :=
ε(p − p
initial ) otherwise.
The algorithm is then as follows.

Algorithm 4.3 (Off-diagonal book algorithm)


1. If |X| ⩽ 1, |A| ⩾ k, or |B| ⩾ ℓ, stop the process.
2. Let p = dR (X, Y ) be the current red density between X and Y . Define α = α(p)
as in (4.2).
3. Check whether some vertex v ∈ X has at least µ|X| blue neighbors in X. If
yes, perform a blue step, by updating
A → A, B → B ∪ {v}, X → X ∩ NB (v), Y → Y,
and return to step 1.
4. Check whether some vertex v ∈ X is prosperous, meaning that dR (NR (v) ∩
X, NR (v) ∩ Y ) ⩾ p − α. If yes, perform a red step, by updating
A → A ∪ {v}, B → B, X → X ∩ NR (v), Y → Y ∩ NR (v),
and return to step 1.
5. In the remaining case, pick some vertex v ∈ X. It is not prosperous, and has
β|X| blue neighbors in X, for some β ⩽ µ. We now perform a density-boost
step, by updating
A → A, B → B ∪ {v}, X → X ∩ NB (v), Y → Y ∩ NR (v),
and return to step 1.


Apart from the choice of µ = k+ℓ , this algorithm is identical to Algorithm 4.1, except
that we now stop when |B| ⩾ ℓ, rather than when |B| ⩾ k as before. In particular,
Table 4.1 still gives the relevant changes in the parameters. Unfortunately, there is an
additional complication introduced by moving to the off-diagonal setting. Before, when
we sought to upper-bound r(k), we could assume that the initial red density pinitial was
at least 12 , by simply swapping the roles of the two colors if necessary. However, once
we are in the off-diagonal setting, this is no longer allowed, and we may have no control
on pinitial . Let us make another completely unjustified assumption.
k
Assumption 4.4. — At the beginning of the process, we have pinitial ⩾ k+ℓ
= 1 − µ.
k
Note that this is a natural assumption, since Algorithm 2.6 “predicts” that k+ℓ is
roughly the correct red density to expect, in the sense that in the analysis of Algo-
rithm 2.6, this red density is the worst-case occurrence. That is, if Assumption 4.4
1230–26

is false “robustly”,
  then Algorithm 2.6 should already prove a stronger bound than
r(k, ℓ) ⩽ k+ℓℓ
. In fact, one can essentially force Assumption 4.4 to hold because of
k
such an argument; if we start with pinitial < k+ℓ , we can run a number of steps of the
k
Erdős–Szekeres algorithm, until we end up with p ⩾ k+ℓ . If this never happens, then
 
k+ℓ
Algorithm 2.6 itself will prove that r(k, ℓ) ≪ ℓ .
Given Assumption 4.4, we can conclude that Lemmas 3.6, 3.7 and 4.2 remain true for
Algorithm 4.3. Moreover, we can prove the following modified version of Lemmas 3.3
and 3.4 (combined into a single statement), whose proof is essentially unchanged.

Lemma 4.5 (Modified Lemmas 3.3 and 3.4). — We have p ⩾ pinitial − ε ⩾ (1 − µ) − ε


throughout the entire process. Therefore, at the end of the process, we have
|Y | ⩾ (1 − µ)t+s+o(k) N.

With all of this setup, we are finally able to prove(11) an exponentially-improved upper
bound on r(k, ℓ).
2
 
k+ℓ
Theorem 4.6. — We have r(k, ℓ) ⩽ 2− 9 ℓ+o(k) ℓ
for all ℓ ⩽ k4 .

Note that in this theorem, the gain over Theorem 2.2 is exponential in ℓ, and not
in k. This is natural, and the best we could hope for. Indeed, if ℓ = o(k), then the
bound in Theorem 2.2 is already subexponential in k, so it is impossible to improve it
by a factor of 2−δk+o(k) for any fixed δ > 0.
Proof of Theorem 4.6. — Let C be a constant that we will optimize later, and let
N = 2(1+C)k . We fix a two-coloring of E(KN ), and assume for contradiction that there
is no red Kk or blue Kℓ in this coloring. We apply Algorithm 4.3, with µ = k+ℓ ℓ
⩽ 15 .
µ
Note that this choice of µ implies that kℓ = 1−µ . If we output that A is a red Kk or B
is a blue Kℓ , then we have obtained a contradiction, hence we can assume this does not
happen. Therefore, the process only terminates when |X| ⩽ 1, and we also have that
b + s ⩽ ℓ. Plugging this into Lemma 4.2, we find that
s
−t −(ℓ−s) s + t

o(k) −t −(ℓ−s) −s o(k)
N ⩽ 2 (1 − µ) µ β ⩽ 2 (1 − µ) µ .
s
Note that we obtain a better exponent on µ than we had in (4.1), because the assumption
b + s ⩽ ℓ is stronger than what we had before; this is precisely the extra strength gained
by moving to the off-diagonal setting. Taking logarithms and dividing by k shows that
! ! !
1 µ 1 x+y
C − o(1) ⩽ −1 + x log2 + − y log2 + y log2 =: G
e (x, y),
µ
1−µ 1−µ µ y

One should really write “prove”, since everything here is dependent on the unjustified Assump-
(11)

tions 3.1 and 4.4, as well as on the key Lemma 3.6, which we did not rigorously prove. Additionally,
the bound in Theorem 4.6 is stronger than any result proved by Campos, Griffiths, Morris, and Sa-
hasrabudhe (2023), and this too is a consequence of the fact that we are being unrigorous, especially
with the verification of certain numerical inequalities. However, Theorem 4.6 is true; a rigorous proof
of a stronger statement is given by Gupta, Ndiaye, Norin, and Wei (2024, Corollary 6).
1230–27

where the only difference between Gµ and G e is the the term µ in the latter, which
µ 1−µ
is simply 1 in the former. It comes from the ℓ in the exponent; upon dividing by k we
µ
obtain kℓ = 1−µ .
Additionally, by Lemma 4.5, we have
|Y | ⩾ (1 − µ)t+s+o(k) N.
If |Y | ⩾ r(k − t, ℓ), then we obtain a contradiction by Lemma 2.4, so we may assume
that |Y | < r(k − t, ℓ). Taking logarithms and dividing by k again shows that
!
1 1
(4.3) C − o(1) ⩽ −1 + (x + y) log2 + log2 r(k − t, ℓ).
1−µ k
By Theorem 1.3, we have
!
k−t+ℓ
log2 r(k − t, ℓ) ⩽ log2

!

⩽ (k − t + ℓ)H
k−t+ℓ
! !
µ µ/(1 − µ)
=k· 1−x+ H .
1−µ 1 − x + µ/(1 − µ)
Plugging this into (4.3) shows that
! ! !
1 µ µ/(1 − µ)
C − o(1) ⩽ −1 + (x + y) log2 + 1−x+ H
1−µ 1−µ 1 − x + µ/(1 − µ)
=: Feµ (x, y).
We are no longer trying to beat the bound r(k) ⩽ 4k , so our goal is no longer to obtain
1

k+ℓ
a contradiction for some C < 1. Instead, we are comparing to k log2 ℓ , which equals
µ
(1 + 1−µ )H(µ) + o(1), and thus our goal is to obtain a contradiction for some fixed
C < (1 + 1−µµ
)H(µ) − 1. That is, we would like to show is that for all µ ⩽ 15 , we have
e (x, y)} < (1 + µ )H(µ) − 1 for all x, y ∈ [0, 1]. In fact, we hope to
min{Feµ (x, y), Gµ 1−µ
prove this inequality with some slack, so that we gain an improvement in the exponent.
Recall that our goal is to prove a gain over Theorem 2.2 that is exponential in ℓ. As
µ
such, the slack we get in this inequality should scale like kℓ = 1−µ . That is, we would
like to prove an inequality of the form
!
e (x, y)} < 1 + µ
min{Feµ (x, y), G H(µ) − 1 − δ
µ
,
µ
1−µ 1−µ
where δ > 0 is some absolute constant; such a bound would prove that r(k, ℓ) ⩽
2−δℓ+o(k) k+ℓℓ
.
Such an inequality holds! In fact, one can check that for µ ⩽ 15 , we may take δ as
large as 29 . Indeed, here is a plot of the regions where Feµ > (1 + 1−µ
µ
)H(µ) − 1 − 29 1−µ
µ

and Ge > (1 + µ )H(µ) − 1 − 2 µ , respectively, for µ = 1 . One can verify that the
µ 1−µ 9 1−µ 5
regions only move further apart as µ decreases, so µ = 15 is the worst case.
1230–28

µ 2 µ
This shows that we do indeed get a contradiction if we set C = (1 + 1−µ
)H(µ) − 9 1−µ
,
proving the bound
!
µ
(1+ 1−µ H(µ)− 92 µ
+o(1)
)k − 92 ℓ+o(k) k+ℓ
r(k, ℓ) ⩽ 2 1−µ =2

for all ℓ ⩽ k4 .

4.3. Back to the diagonal


Now that we have an upper bound on r(k, ℓ) for ℓ ⩽ k4 , we can finally complete the
proof of Theorem 1.5. We will actually prove the following bound, which is a little bit
stronger. The only reason we obtain a stronger bound than Campos, Griffiths, Morris,
and Sahasrabudhe (2023) is the numerical computations: the authors rigorously justify
every numerical inequality, which is often substantially simpler to do if one proves
a slightly weaker bound, whereas we content ourselves with “proving” the numerical
bounds through pictures. As mentioned in Section 1, much stronger bounds were
recently proved by Gupta, Ndiaye, Norin, and Wei (2024) via an alternative analysis.
It follows from their results that Theorem 4.7 is true, even though we do not provide a
rigorous proof.

Theorem 4.7. — We have r(k) ⩽ 2(2− 200 +o(1))k ≈ 3.96k .


3

Proof. — Let C be a constant that we will optimize later. Let N = 2(1+C)k , and fix a
two-coloring of E(KN ), which we assume for contradiction has no monochromatic Kk .
We run Algorithm 4.1 with µ = 25 . Thanks to Theorem 4.6 (plus Theorem 2.2), we
1230–29

know that  !
2k − t
if t < 43 k,



k−t


r(k − t, k) ⩽  !
2k − t
− 2 (k−t)+o(k)
2 9 if t ⩾ 43 k.


k−t

Recall that we obtain a contradiction if |Y | ⩾ r(k − t, k) at the end of the process, hence
we may assume that |Y | < r(k − t, k). Combining this with Lemma 3.4(12) , we see that
we get a contradiction if
1
C − o(1) ⩽ −1 + (x + y) + log2 r(k − t, k)

k
−1 + (x + y) + (2 − x)H( 1−x ) if x < 34 ,
2−x

−1 + (x + y) − 2 (1 − x) + (2 − x)H( 1−x ) if x ⩾ 3
9 2−x 4
2
= F (x, y) − (1 − x)1x⩾ 3
9 4

=: F (x, y),
b

where 1x⩾ 3 denotes the indicator function for the event x ⩾ 34 . In particular, it suffices
4

for us to prove that min{Fb (x, y), G 2 (x, y)} ⩽ 1 − δ for all x, y ∈ [0, 1], where δ > 0 is a
5
constant that will end up in the exponent in N .
This indeed works! Here are the plots of where Fb and G 2 are greater than 1; the
5
second plot is just zoomed in to show the “dangerous area”, where the two regions no
longer intersect.

In fact, one can check that maxx,y∈[0,1] min{Fb (x, y), G 2 (x, y)} < 0.985. Therefore, we
5
3
proving that r(k) ⩽ 2(2− 200 +o(1))k ,
3
obtain a contradiction if we set C = 0.985 = 1 − 200
,
as claimed.

We are back to the diagonal setting, so we may assume that pinitial ⩾ 21 . Therefore Lemmas 3.3
(12)

and 3.4 are again valid.


1230–30

5. Epilogue: Ramsey’s original proof of Theorem 1.1

As mentioned in Section 1, there is no known proof of Theorem 1.1 that does not
use book graphs in some way. As a hopefully fitting end to this exposé, let us see the
original proof of Ramsey (1929), which uses book graphs in a rather different way from
Lemma 2.4, yet whose proof shares certain ideas with the ones we have already seen.
Let us denote by r(Bt,m ) the least integer N such that every two-coloring of E(KN )
contains a monochromatic copy of Bt,m . Ramsey (1929) proved the following upper
bound on r(Bt,m ).

Theorem 5.1 (Ramsey, 1929). — r(Bt,m ) ⩽ (t + 1)! · m for all t, m ⩾ 1.

Note that since Kk = Bk−1,1 , this immediately implies the bound r(k) ⩽ k!, and
hence yields a proof of Theorem 1.1.
Proof of Theorem 5.1. — We proceed by induction on t. For the base case t = 1, we
wish to prove that r(B1,m ) ⩽ 2m, which is immediate: in any two-coloring of E(K2m ),
any vertex is incident to 2m − 1 edges, at least m of which must have the same color
by the pigeonhole principle. This yields a monochromatic copy of B1,m .
For the inductive step, suppose the result has been proved for t − 1, and fix a coloring
of E(KN ) where N = (t + 1)! · m = t! · ((t + 1)m). By the inductive hypothesis, this
coloring contains a monochromatic copy of Bt−1,(t+1)m , which we may assume to be
red without loss of generality. That is, there exist disjoint sets A, X ⊆ V (KN ) with
|A| = t − 1 and |X| = (t + 1)m, such that all edges inside A and between A and X are
red. If there is a vertex v ∈ X with at least m red neighbors in X, we may perform a
“red step” by updating A → A ∪ {v} and X → X ∩ NR (v), yielding a red Bt,m . So we
may assume that every vertex in X has at most m − 1 red neighbors in X. Let v1 be an
arbitrary vertex of X, and let X1 = NB (v1 ) ∩ X, so that |X1 | ⩾ |X| − m. Let v2 be an
arbitrary vertex of X1 , and let X2 = NB (v2 ) ∩ X1 , so that |X2 | ⩾ |X1 | − m ⩾ |X| − 2m.
Continuing in this way, we may build a sequence of vertices v1 , . . . , vt as well as a set
Xt with |Xt | ⩾ |X| − tm = m, such that each vi is adjacent in blue to all vj with j > i,
as well as to all vertices in Xt . But this precisely means that we have constructed a
blue Bt,m , completing the inductive step.
Given Theorem 5.1, it is natural to wonder what the true value of r(Bt,m ) is. This
question was first explicitly raised by Erdős, Faudree, Rousseau, and Schelp (1978) and
Thomason (1982), who independently proved the bounds (2t − o(1))m ⩽ r(Bt,m ) ⩽ 4t m.
Thomason in particular was motivated by Lemma 2.4, as discussed in Section 2.2, and
made the following bold conjecture.

Conjecture 5.2 (Thomason, 1982). — r(Bt,m ) ⩽ 2t (m + t − 2) + 2 for all m, t ⩾ 1.

This conjecture is known to be true (and optimal) for t ∈ {1, 2}, but it is wide open
for t ⩾ 3 (and may well be false). Moreover, Conjecture 5.2 is likely to be very difficult:
even the m = 1 case would yield r(k) ⩽ 2k+o(k) , a bound far stronger than anything
1230–31

currently known. However, a beautiful result of Conlon (2019) confirms this conjecture
asymptotically for any fixed t.
Theorem 5.3 (Conlon, 2019). — r(Bt,m ) = (2t + o(1))m as m → ∞, for any fixed t.
Conlon’s result addresses a question that arguably goes back 90 years to the original
work of Ramsey (1929), yet uses highly sophisticated tools developed in the interim,
such as the regularity lemma of Szemerédi (1978), and this question is also closely
related to a famous conjecture of Burr and Erdős (1975), which was recently resolved
by Lee (2017). In fact, Ramsey theory has seen a number of recent breakthroughs on
old, seemingly intractable problems by the introduction of remarkable new techniques:
three other examples from the past two years are the works of Li (2023) on explicit
constructions, of Mattheus and Verstraete (2024) on off-diagonal Ramsey numbers, and
of Reiher and Rödl (2023) on restricted Ramsey graphs. There is every reason to hope
and expect this trend to continue.

References

Vigleik Angeltveit and Brendan D. McKay (2024). R(5, 5) ⩽ 46. arXiv: 2409.15709
[math.CO].
Thomas F. Bloom and Olof Sisask (2020). Breaking the logarithmic barrier in Roth’s
theorem on arithmetic progressions. arXiv: 2007.03528 [math.NT].
Stefan A. Burr and Paul Erdős (1975). “On the magnitude of generalized Ramsey
numbers for graphs”, in: Infinite and finite sets (Colloq., Keszthely, 1973; dedicated
to P. Erdős on his 60th birthday), Vols. I, II, III. Vol. 10. Colloq. Math. Soc. János
Bolyai. North-Holland, Amsterdam-London, pp. 215–240.
Marcelo Campos, Simon Griffiths, Robert Morris, and Julian Sahasrabudhe (2023). An
exponential improvement for diagonal Ramsey. arXiv: 2303.09521 [math.CO].
David Conlon (2009). “A new upper bound for diagonal Ramsey numbers”, Ann. of
Math. (2) 170 (2), pp. 941–960.
(2019). “The Ramsey number of books”, Adv. Comb., Paper No. 3, 12pp.
David Conlon, Jacob Fox, and Benny Sudakov (2015). “Recent developments in graph
Ramsey theory”, in: Surveys in combinatorics 2015. Vol. 424. London Math. Soc.
Lecture Note Ser. Cambridge Univ. Press, Cambridge, pp. 49–118.
David Conlon, Jacob Fox, and Yuval Wigderson (2022). “Ramsey numbers of books
and quasirandomness”, Combinatorica 42 (3), pp. 309–363.
Paul Erdős (1947). “Some remarks on the theory of graphs”, Bull. Amer. Math. Soc.
53, pp. 292–294.
Paul Erdős, Ralph J. Faudree, Cecil C. Rousseau, and Richard H. Schelp (1978). “The
size Ramsey number”, Period. Math. Hungar. 9 (1-2), pp. 145–161.
Paul Erdős and George Szekeres (1935). “A combinatorial problem in geometry”, Com-
positio Math. 2, pp. 463–470.
1230–32

Geoffrey Exoo (1989). “A lower bound for R(5, 5)”, J. Graph Theory 13 (1), pp. 97–98.
Ronald L. Graham and Vojtěch Rödl (1987). “Numbers in Ramsey theory”, in: Surveys
in combinatorics 1987 (New Cross, 1987). Vol. 123. London Math. Soc. Lecture Note
Ser. Cambridge Univ. Press, Cambridge, pp. 111–153.
Ronald L. Graham, Bruce L. Rothschild, and Joel H. Spencer (1990). Ramsey theory.
2nd ed. Wiley-Interscience Series in Discrete Mathematics and Optimization. A Wiley-
Interscience Publication. John Wiley & Sons, Inc., New York, pp. xii+196.
Ben Green and Terence Tao (2008). “The primes contain arbitrarily long arithmetic
progressions”, Ann. of Math. (2) 167 (2), pp. 481–547.
Parth Gupta, Ndiame Ndiaye, Sergey Norin, and Louis Wei (2024). Optimizing the
CGMS upper bound on Ramsey numbers. arXiv: 2407.19026 [math.CO].
Zander Kelley and Raghu Meka (2023). “Strong bounds for 3-progressions”, in: 2023
IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS). IEEE
Computer Soc., Los Alamitos, CA, pp. 933–973.
Choongbum Lee (2017). “Ramsey numbers of degenerate graphs”, Ann. of Math. (2)
185 (3), pp. 791–829.
Xin Li (2023). “Two source extractors for asymptotically optimal entropy, and (many)
more”, in: 2023 IEEE 64th Annual Symposium on Foundations of Computer Science—
FOCS 2023. IEEE Computer Soc., Los Alamitos, CA, pp. 1271–1281.
Sam Mattheus and Jacques Verstraete (2024). “The asymptotics of r(4, t)”, Ann. of
Math. (2) 199 (2), pp. 919–941.
Sarah Peluse (2022). “Recent progress on bounds for sets with no three terms in arith-
metic progression”, in: Séminaire Bourbaki, Volume 2021/2022, Astérisque, no. 438
(2022), Exposé no. 1196, 547–581.
Frank P. Ramsey (1929). “On a problem of formal logic”, Proc. London Math. Soc. (2)
30 (4), pp. 264–286.
Alexander A. Razborov (1985). “Lower bounds on the monotone complexity of some
Boolean functions”, Dokl. Akad. Nauk SSSR 281 (4), pp. 798–801.
Christian Reiher and Vojtěch Rödl (2023). The girth Ramsey theorem. arXiv: 2308.
15589 [math.CO].
Klaus F. Roth (1953). “On certain sets of integers”, J. London Math. Soc. 28, pp. 104–
109.
Ashwin Sah (2023). “Diagonal Ramsey via effective quasirandomness”, Duke Math. J.
172 (3), pp. 545–567.
Issai Schur (1917). “Über die Kongruenz xm + y m ≡ z m (mod p)”, Jahresber. Dtsch.
Math.-Ver. 25, pp. 114–117.
Joel Spencer (1975). “Ramsey’s theorem—a new lower bound”, J. Combin. Theory Ser.
A 18, pp. 108–115.
(1994). Ten lectures on the probabilistic method. 2nd ed. Vol. 64. CBMS-NSF Re-
gional Conference Series in Applied Mathematics. Society for Industrial and Applied
Mathematics (SIAM), Philadelphia, PA, pp. vi+88.
1230–33

Endre Szemerédi (1978). “Regular partitions of graphs”, in: Problèmes combinatoires


et théorie des graphes (Colloq. Internat. CNRS, Univ. Orsay, Orsay, 1976). Vol. 260.
Colloq. Internat. CNRS. CNRS, Paris, pp. 399–401.
Andrew Thomason (1982). “On finite Ramsey numbers”, European J. Combin. 3 (3),
pp. 263–273.
(1988). “An upper bound for some Ramsey numbers”, J. Graph Theory 12 (4),
pp. 509–517.
Johannes G. van der Corput (1939). “Über Summen von Primzahlen und Primzahl-
quadraten”, Math. Ann. 116 (1), pp. 1–50.
Yuval Wigderson (2024). “Ramsey theory—lecture notes”. url: https://n.ethz.ch/
~ywigderson/math/static/RamseyTheory2024LectureNotes.pdf.

Yuval Wigderson
Institute for Theoretical Studies,
ETH Zürich
E-mail : [email protected]

You might also like