Upper Bounds On Diagonal Ramsey Numbers (After Campos, Griffiths, Morris, and Sahasrabudhe) Yuval Wigderson
Upper Bounds On Diagonal Ramsey Numbers (After Campos, Griffiths, Morris, and Sahasrabudhe) Yuval Wigderson
by Yuval Wigderson
arXiv:2411.09321v1 [math.CO] 14 Nov 2024
1. Introduction
Ramsey theory is a branch of combinatorics that studies order and disorder. The
underlying mantra of the field, as articulated by Theodore Motzkin, is that “com-
plete disorder is impossible”—any sufficiently large system must have a large, highly
structured subsystem. The prototypical example of a Ramsey-theoretic statement is
Ramsey’s theorem, from which the field derives its name.
Theorem 1.1 (Ramsey, 1929). — For every integer k ⩾ 2, there exists some positive
integer N such that any two-coloring of the edges of the complete graph(1) KN contains
a monochromatic Kk .
In other words, no matter how we assign the edges of KN a color, say red or blue,
we can always find k vertices such that all edges between them receive the same color.
That is, any such coloring, no matter how unstructured, contains a highly structured
subcoloring. Even this simple statement has some remarkable consequences. For exam-
ple, Schur (1917) used Theorem 1.1(2) to prove that for all sufficiently large primes p,
there exist non-trivial solutions to the equation xn + y n ≡ z n (mod p), that is, that one
cannot prove Fermat’s last theorem via a local-global argument.
Connections and applications to other fields of mathematics have been an important
feature of Ramsey theory from the very beginning. Ramsey himself had an application
in mathematical logic in mind when he proved Theorem 1.1 (indeed, his paper is titled
“On a problem of formal logic”). The influential paper of Erdős and Szekeres (1935),
which helped establish Ramsey theory as a central branch of combinatorics, is titled
“A combinatorial problem in geometry”; in it, they reproved Theorem 1.1 in order to
deduce a result on convex polygons among sets of points in Euclidean space.
Recall that the complete graph KN has N vertices, and all of the N2 possible edges are present.
(1)
(2)
Alert readers may note that Schur’s result precedes Ramsey’s by more than a decade. In fact, Schur
proved a closely related lemma, which one can now recognize as a consequence of Theorem 1.1, and
derived his theorem from that lemma.
1230–02
exact formula for r(k), so let us content ourselves with asymptotic bounds as k → ∞.
Essentially every proof of Theorem 1.1 yields (at least implicitly) an upper bound on
r(k), by proving the existence of some integer N . The original proof of Ramsey (1929)
gave a bound of r(k) ⩽ k!, but Ramsey wrote “I have little doubt that [this upper
bound is] far larger than is necessary”. Indeed, a few years later, Erdős and Szekeres
(1935) proved the following stronger bound.
Theorem 1.3 (Erdős and Szekeres, 1935). — r(k) ⩽ 4k for every k ⩾ 2.
For about a decade, it was believed that this bound was also far larger than is
necessary, namely that r(k) should grow subexponentially as a function of k. However,
Erdős (1947) dispelled this belief by proving(4) an exponential lower bound.
√ k
Theorem 1.4 (Erdős, 1947). — r(k) ⩾ 2 for every k ⩾ 2.
After this breakthrough, progress stalled for 75 years. There were a number of
improvements to these bounds over the years, including important results of Spencer
(1975), Graham and Rödl (1987), Thomason (1988), Conlon (2009), and Sah (2023), but
all of these improvements only√affected the lower-order terms, and did not improve either
of the exponential constants 2 and 4. This impasse finally ended with a breakthrough
of Campos, Griffiths, Morris, and Sahasrabudhe (2023).
Theorem 1.5 (Campos, Griffiths, Morris, and Sahasrabudhe, 2023)
There exists a constant δ > 0 such that r(k) ⩽ (4 − δ)k for all k ⩾ 2. Concretely,
r(k) ⩽ 3.993k for all sufficiently large k.
The exact constant 3.993 is not particularly important, and a more careful analysis of
the same proof yields a slightly better bound(5) . The important thing about this result
is that it is the first result, after almost 90 years of intense study, to break the barrier
of 4k .
The new tool introduced by Campos, Griffiths, Morris, and Sahasrabudhe (2023) is the
so-called book algorithm, an elementary but ingenious technique for finding monochro-
matic book graphs in colorings of KN . As we will shortly discuss, a book graph is a
basic graph-theoretic object, whose study turns out to be closely connected to the study
of Ramsey numbers. Every known proof of Theorem 1.1 uses, implicitly or explicitly,
monochromatic book graphs.
Angeltveit and McKay (2024) show that r(5) lies takes on one of the four values {43, 44, 45, 46}, but
we remain very far from knowing the value of r(6).
(4)
Lower bounds on Ramsey numbers are somewhat beyond the scope of this exposé, so we will not
discuss the proof of Theorem 1.4 in detail. However, it would be remiss not to mention that this
beautiful proof is extraordinarily influential, and is the origin of the probabilistic method, an extremely
powerful technique in modern combinatorics.
(5)
Very recently, Gupta, Ndiaye, Norin, and Wei (2024) recast the proof of Theorem 1.5 in a different
language, which allowed them to optimize the technique and obtain a much stronger bound of r(k) ⩽
3.8k for sufficiently large k.
1230–04
The rest of this exposé is dedicated to sketching the proof of Theorem 1.5, and is
organized as follows. We begin in Section 2 with a proof of Theorem 1.3, in the course
of which we introduce book graphs as well as several of the key ideas that go into the
proof of Theorem 1.5. In Section 3, we introduce and analyze the book algorithm, and
will then fail to prove Theorem 1.5. Luckily, we will rescue the argument and complete
the proof in Section 4 by introducing two additional ingredients. We end in Section 5
with an epilogue, discussing the use of book graphs in the original proof of Ramsey
(1929) of Theorem 1.1, as well as how our understanding of book graphs and Ramsey
theory has developed over the subsequent 95 years.
Acknowledgments. — An early version of this exposé was written for the lecture notes
of a Ramsey theory course that I taught at ETH in Spring 2024; I am grateful to all
of the students in the course for their interest and insights. I would also like to thank
Nicolas Bourbaki, Marcelo Campos, Xiaoyu He, Greg Kuperberg, Vivian Kuperberg,
and Wojciech Samotij for many helpful discussions and comments on earlier drafts. I
am supported by Dr. Max Rössler, the Walter Haefner Foundation, and the ETH Zürich
Foundation.
In this section, we prove Theorem 1.3 (and thus Theorem 1.1). This proof is elegant
and interesting in its own right, and additionally it contains within it several of the
important ideas used in the proof of Theorem 1.5. We will actually see three different
proofs (or, more precisely, three different ways of viewing the same proof) of Theorem 1.3,
in each of the next three subsections. Each proof will help introduce some of the key
ideas that go into the proof of Theorem 1.5.
Note that r(k, ℓ) = r(ℓ, k) as the colors play symmetric roles, and that r(k) = r(k, k).
The quantity r(k) is often called the diagonal Ramsey number.
With this terminology, we can prove Theorem 1.3. In fact, we will prove the following
more precise result.
1230–05
Theorem 2.2 (Erdős and Szekeres, 1935). — For all integers k, ℓ ⩾ 2, we have
!
k+ℓ−2
r(k, ℓ) ⩽ .
k−1
In particular, !
2k − 2
r(k) ⩽ < 4k .
k−1
Proof. — We proceed by induction on k + ℓ, with the base case min{k, ℓ} = 2 being
trivial. For the inductive step, the key claim is that the following inequality holds:
(2.1) r(k, ℓ) ⩽ r(k − 1, ℓ) + r(k, ℓ − 1).
To prove (2.1), fix a red/blue coloring of E(KN ), where N = r(k − 1, ℓ) + r(k, ℓ − 1),
and fix some vertex v ∈ V (KN ). Suppose for the moment that v is incident to at least
r(k − 1, ℓ) red edges, and let R denote the set of endpoints of these red edges. By
definition, as |R| ⩾ r(k − 1, ℓ), we know that R contains a red Kk−1 or a blue Kℓ . In
the latter case we have found a blue Kℓ (so we are done), and in the former case we can
add v to this red Kk−1 to obtain a red Kk (and we are again done).
So we may assume that v is incident to fewer than r(k − 1, ℓ) red edges. By the
exact same argument, just interchanging the roles of the colors, we may assume that
v is incident to fewer than r(k, ℓ − 1) blue edges. But then the total number of edges
incident to v is at most
(r(k − 1, ℓ) − 1) + (r(k, ℓ − 1) − 1) = N − 2,
which is impossible, as v is adjacent to all N − 1 other vertices. This is a contradiction,
proving (2.1).
We can now complete the induction. By (2.1) and the inductive hypothesis, we find
that
r(k, ℓ) ⩽ r(k − 1, ℓ) + r(k, ℓ − 1)
! !
(k − 1) + ℓ − 2 k + (ℓ − 1) − 2
⩽ +
(k − 1) − 1 k−1
!
k+ℓ−2
= ,
k−1
where the final equality is Pascal’s identity for binomial coefficients.
Note that two important special cases are m = 1, where Bt,1 is simply the complete
graph Kt+1 , and t = 1, where B1,m is simply the star graph K1,m , consisting of one
vertex joined to m others (and no other edges). The “book” terminology comes from
the case t = 2, in which case B2,m consists of m triangles sharing an edge, which looks,
to some extent, like a book with m triangular pages. Continuing this analogy, the Kt in
Bt,m is called the spine, and the m additional vertices of Bt,m are called the pages. We
will often denote a book as a pair of sets (A, Y ), where A is the spine and Y comprises
the pages.
The reason book graphs are important in the study of Ramsey numbers comes down
to the following simple observation.
Proof. — Let A be the spine of the book, and let Y be its pages. By assumption,
|Y | = m ⩾ r(k − t, ℓ), so Y contains a blue Kℓ or a red Kk−t . In the former case we are
done, and in the latter case, we may add A to the red Kk−t to obtain a red Kk .
This proof should look familiar—we have already encountered the same idea in the
proof of Theorem 2.2, where we implicitly used the t = 1 case of Lemma 2.4. Indeed, in
that proof, we showed that if a coloring contains a red star with r(k − 1, ℓ) leaves, then
it contains a red Kk or a blue Kℓ . The only new idea in Lemma 2.4 is that we don’t
need to consider a single vertex (i.e. the case t = 1), but may take an arbitrary book.
Although the idea of Lemma 2.4 basically goes back to the work of Erdős and Szekeres
(1935), it was first formulated in essentially this language by Thomason (1982), who
used Lemma 2.4 to propose a natural approach to improving the upper bounds on r(k).
Namely, if one can show that every two-coloring of E(KN ) contains a monochromatic
Bt,m , for some appropriate parameters t and m ⩾ r(k − t, k), then one can plug this
into Lemma 2.4 and conclude that r(k) ⩽ N . Again, this is essentially the approach
we used in the proof of Theorem 2.2, where a simple argument based on the pigeonhole
principle showed that any coloring of E(KN ) contains a large monochromatic star, that
is, a monochromatic book with many pages and a spine of size t = 1. The idea behind
Thomason’s program is that perhaps for larger values of t, more sophisticated arguments
than the pigeonhole principle could yield stronger results, and improve the upper bounds
on r(k).
Thomason’s idea has been quite successful. The three prior asymptotic improvements
to Theorem 2.2, due to Thomason (1988), Conlon (2009), and Sah (2023), all used
this idea, roughly showing that if some two-coloring of E(KN ) does not contain a
monochromatic Bt,m (for some fixed t, m), then its structure must be such that the
proof of Theorem 2.2 can be made more efficient. A more precise structural result along
these lines is given by Conlon, Fox, and Wigderson (2022). However, for fundamental
technical reasons, none of these techniques seems capable of finding books with spine
larger than t = O(log k), whereas in order to prove a result like Theorem 1.5 in this
1230–07
k
way, one would want to take (say) t = 1000 . This was where the matter stood for a few
years, until the work of Campos, Griffiths, Morris, and Sahasrabudhe (2023).
By the way we update the sets, we certainly maintain the key property that (A, X)
and (B, X) are red and blue books, respectively, throughout the entire process, since
every time we add a vertex v to A (resp. B), we shrink X to the red (resp. blue)
neighborhood of v.
Using Algorithm 2.5, we can give an alternative proof of Theorem 1.3.
1230–08
Strictly speaking, we should write here 12 (|X| − 1), although the claimed bound (2.2) can also be
(†)
proved inductively by judicious use of ceiling signs. However, from now on, we will start ignoring such
additive ±1 terms. Of course they need to be carefully dealt with to obtain a correct proof, but they
will always contribute a negligible error, which we will ignore. We will add the symbol (†) to mark the
places where we omit such additive errors.
1230–09
3. If yes, move v to A and shrink X to the red neighborhood of v. That is, update
A → A ∪ {v} and X → X ∩ NR (v), and keep B the same. Call this a red step.
4. If not, then v has at least(†) γ|X| blue neighbors in X. We now move v to B,
and shrink X to the blue neighborhood of v. That is, we update B → B ∪ {v}
and X → X ∩ NB (v), and keep A the same. Call this a blue step.
5. Return to step 1.
The point now is that we obtain the red Kk or blue Kℓ if |A| ⩾ k or |B| ⩾ ℓ, and
thus we may assume that we do fewer than k red steps and fewer than ℓ blue steps. X
shrinks by a factor of 1 − γ at every red step, and by a factor of γ at every blue step,
so throughout the process we have
On the other hand, the process only terminates if |X| ⩽ 1, so this implies N <
(1 − γ)−k γ −ℓ . One can check, by Stirling’s approximation, that
!
k+ℓ
= 2o(k) γ −ℓ (1 − γ)−k
ℓ
k+ℓ
for all ℓ ⩽ k, and hence this gives a contradiction if we choose N of the form 2o(k) ℓ
.
This recovers Theorem 2.2 up to the subexponential error term.
We are now ready to describe the book algorithm of Campos, Griffiths, Morris, and
Sahasrabudhe (2023). As before, we fix a two-coloring of E(KN ), and assume that there
is no monochromatic Kk ; our goal is to obtain a contradiction if N is sufficiently large.
Throughout the process, we maintain four disjoint sets A, B, X, Y , with the following
properties: (A, X) is a red book, (B, X) is a blue book, and (A, Y ) is another red
book(6) . Thus, the only difference from the Erdős–Szekeres algorithm is the presence
of the new set Y . At the end of the process, our goal is to output the pair (A, Y ),
and to prove that t := |A| and m := |Y | satisfy m ⩾ r(k − t, k), so that we can apply
Lemma 2.4 to obtain a contradiction. We initialize the process with A = B = ∅, and
X ⊔ Y an arbitrary partition of V (KN ) with(†) |X| = |Y |. By permuting the colors if
necessary, we may assume that at the beginning of the process, at least half the edges
between X and Y are red.
(6)
Equivalently, we could say that (B, X) is a blue book and (A, X ∪ Y ) is a red book.
1230–10
X Y
If we could really obtain such strong control on |Y |, we would show that r(k) ≲ 2k , a dramatic
(7)
improvement over Theorem 1.3. Unfortunately, and unsurprisingly, the devil is in the details, and a lot
of work is needed to make such an approach work, and the extra complications yield a substantially
weaker bound.
1230–11
where eR (X, Y ) denotes the number of red edges with one endpoint in X and the other
in Y . Recall that, by assumption, we have p ⩾ 12 at the beginning of the process.
Note that every time we add a vertex to A or to B (and thus have to shrink X and
potentially Y ), this red density p might change. For our simplified exposition of the
proof of Theorem 1.5, we will make the following (completely unjustified) assumption.
Assumption 3.1. — At every step of the process, every vertex in X has exactly p|Y |
red neighbors in Y , and every vertex in Y has exactly p|X| red neighbors in X. In other
words, the bipartite graph of red edges between X and Y is bi-regular.
We stress again that X, Y, and p change throughout the process, but Assumption 3.1
asserts that whenever such a change happens, we magically end up back with the same
bi-regularity.
While Assumption 3.1 is clearly a bogus assumption, it is actually possible to (essen-
tially) make it rigorous. Indeed, the definition of p implies that the vertices in X have,
on average, p|Y | red neighbors in Y . A basic but important observation, used frequently
in extremal combinatorics, is that one can often convert such average degree conditions
to minimum or maximum degree conditions, by deleting a few “outlier” vertices. In
the rigorous proof of Theorem 1.5, one must repeatedly “clean” X by removing such
outliers, and thus one can indeed maintain an approximate version of Assumption 3.1,
at least ensuring that all vertices in X have roughly the same red degree(8) . However, for
our exposition, we ignore these important technicalities, and stick with Assumption 3.1.
It is much harder to ensure degree-regularity in both X and Y simultaneously. Luckily, it turns out
(8)
that degree-regularity in Y is substantially less important in the argument, and in the formal proof
one doesn’t even ensure an approximate version of it. In its place, one uses a judicious choice of the
vertex v.
1230–12
In Algorithm 2.5, we were always able to do either a red or a blue step, since every
vertex in X has at least 12 |X| neighbors in X in one of the colors(†) . However, if we
require that our red vertex v be prosperous, then we may be in a position where neither
a red nor a blue step is possible. Namely, we get stuck if all vertices in X have at least
1
2
|X| red neighbors in X, but none of them is prosperous.
In this case, we implement a density-boost step, which is one of the other main
innovations of Campos, Griffiths, Morris, and Sahasrabudhe (2023). Pick a vertex
v ∈ X, and consider the following picture.
v
sity U
red den
T <p−α
X Y
Since v is not prosperous, the red edge density between T := NR (v) ∩ X and U :=
NR (v) ∩ Y must be less than p − α. However, by Assumption 3.1, every vertex in U has
p|X| red neighbors in X. Therefore, setting S := NB (v) ∩ X, we find that(†)
p|X||U | = eR (X, U ) = eR (T, U ) + eR (S, U ) < (p − α)|T ||U | + eR (S, U ).
Rearranging, we conclude that
eR (S, U ) > |U | (p|X| − (p − α)|T |) .
|S|
Let β := |X| , so that β records what fraction of the edges from v to the rest of X are
blue. Then |S| = β|X| and(†) |T | = (1 − β)|X|, and the above can be rewritten as
! !
p (p − α)(1 − β) 1−β
eR (S, U ) > |U ||S| − = |S||U | p + α ,
β β β
which implies
1−β
(3.1) dR (S, U ) > p + α .
β
Note too that since we cannot do a blue step, we must have β ⩽ 21 , implying that
dR (S, U ) > p + α. In other words, in the bad situation where we cannot perform
a red or a blue step, we can perform a density-boost step, where we replace X by
S = NB (v) ∩ X, replace Y by U = NR (v) ∩ Y , and thus boost the density from p to at
least p + α 1−β
β
⩾ p + α.
1230–13
Note that density-boost steps are expensive, in that they shrink X and Y , but don’t
actually make progress by increasing |A| or |B|. In particular, we don’t a priori have
any control on how many density-boost steps we perform. Luckily, there is a simple fix
to this problem: since we are in any case updating X → X ∩ NB (v) in a density-boost
step, we may add v to B for free, while maintaining the property that (B, X) is a blue
book. That is, a density-boost step can also be made a type of blue step, and thus we
necessarily perform at most k density-boost steps without creating a blue Kk .
The final piece we need before formally defining the book algorithm is to choose α,
which determines the threshold above which a vertex is considered prosperous. Note
that every red step may decrease p by α, so if we end up doing up to k red steps, we
may decrease p from its initial value of 12 to 12 − αk. Moreover, whenever we do a red
step, we also shrink Y by a factor of (the current value of) p. In particular, if p ever
drops below (say) 14 , we are in big trouble: then Y shrinks by a factor of 4 at every
step, and we have no real hope of proving a bound stronger than r(k) ⩽ 4k . As such,
we want to pick α ⩽ kε , so that even after doing k red steps, we have not meaningfully
decreased p below its initial value. Here, one can think of ε as a tiny absolute constant,
although in the final analysis we will actually pick ε to tend to 0 slowly with k.
Unfortunately, there is a trade-off. Recall that we have very little control over the
effect of the density-boost steps, because these are the steps we do as a last resort.
In fact, essentially our only way of bounding their total effect is the observation that
p ⩽ 1 throughout the entire process, which should imply that we cannot do too many
density-boost steps, as that would drive the red density up too high. The problem is
that a density-boost step only increases p by roughly α, so if we pick α ⩽ kε , then even
if we do k density-boost steps (the maximum possible number), we will only increase
the density by ε. In particular, we have no hope of reaching the threshold of p = 1,
where we finally gain some control over the density-boost steps.
The way to resolve this apparent contradiction is to pick α adaptively. Indeed, suppose
that at some point in the process, we have reached a red density of, say, p = 0.51. At
this point, it doesn’t make sense to have the cutoff be α = kε ; we wouldn’t even mind
losing an absolute constant of 1/100 in the density, since that will only bring us back to
our original value of p! So we will instead pick α to be dependent on our current value
of p; namely, we set
1
ε
k
if p ⩽ 2
+ k1 ,
(3.2) α(p) :=
ε(p − 1 ) otherwise.
2
Again, the point of this is that, if we are at some step of the process where p > 21 , then
we can afford to lose more in the density without every dropping p into the “danger zone”
of being substantially smaller than 12 . The advantage of this is that the amount we win
in a density-boost step is itself proportional to α = α(p). So if we have already done
some number of density-boost steps, such that p > 12 , each subsequent density-boost
1230–14
boosts the density even further, at an exponential rate, thus rapidly bringing us closer
to the threshold p = 1.
For future reference, the following table records how the key parameters change during
the execution of the book algorithm, following the discussion above.
Table 3.1. How the various parameters evolve during Algorithm 3.2. Dashes
denote quantities that are unchanged. In general, the entries in the table are
lower bounds, e.g. a density-boost step may increase p by more than α 1−β
β , and
a red step may shrink X to more than half of its previous size.
1230–15
1
Lemma 3.3. — We have p ⩾ 2
− ε throughout the entire process.
Proof. — As discussed above, every blue step keeps p constant (by Assumption 3.1),
every density-boost step can only increase p, and every red step decreases p by at most
α(p). Additionally, the choice of α(p) shows that p − α(p) ⩾ 12 whenever p > 12 + k1 ,
whereas α(p) = kε whenever p ⩽ 21 + k1 . Since we do t ⩽ k red steps, p can never drop
below 12 − t · kε ⩾ 12 − ε.
It will now be convenient to pick ε = k −1/4 , although we note that this choice is not
particularly important; many functions of k which tend to 0 neither too slowly or too
quickly would work.
Proof. — Y is unchanged by every blue step. On the other hand, during each red or
density-boost step, we decrease Y by a factor of p, by Assumption 3.1. By Lemma 3.3,
we have that p ⩾ 12 − ε at every such step, hence
t+s
1 N
|Y | ⩾ −ε · = 2−t−s−o(k) N,
2 2
where we plug in our choice of ε and recall that we start the process with(†) |Y | = N
2
.
We next turn to bounding |X| at the end of the process. Just as in the Erdős–Szekeres
algorithm, the main point of this is to estimate how many steps we do, since we recall
that the process terminates when |X| ⩽ 1.
Recall that at each density-boost step, we shrink X by a factor of β, where β is defined
as the fraction |NB (v) ∩ X|/|X| of blue neighbors of the currently chosen vertex v. Let
β1 , . . . , βs be the sequence of values of β for each of the s blue steps. Let β be the
harmonic mean of β1 , . . . , βs , that is, define β by
s
1 1X 1
= .
β s i=1 βi
|X| ⩾ 2−t−b−o(k) β s N.
1230–16
Proof. — Every red or blue step shrinks X by at most a factor(†) of 2, hence the factor
of 2−t−b . On the other hand, the ith density-boost step decreases |X| by a factor of βi .
The inequality of arithmetic and geometric means implies that
s s
!1/s
1 1X 1 Y 1
= ⩾ ,
β s i=1 βi i=1 βi
i=1
Together with the fact that we begin the process with(†) |X| = N
2
, this yields the claimed
bound.
The final, and perhaps most important, result we need is an estimate on the number
of density-boost steps. As discussed above, we can get a good estimate on this quantity
because of the “dynamic” choice of α; this is the content of the next lemma, which is
called the zig-zag lemma by Campos, Griffiths, Morris, and Sahasrabudhe (2023).
Let qfinal denote the value of q at the end of the algorithm, and let qinitial be the value
of q at the beginning of the algorithm. Multiplying (3.3) over all steps of the algorithm,
we find that
s
!
qfinal t
Y 1 − βi
⩾ (1 − ε) 1+ε ,
qinitial i=1 βi
since we get a contribution of 1 − ε from each of the t red steps and a contribution
of 1 + ε 1−β
βi
i
from the ith density-boost step. Combining this inequality with the
approximation 1 + x ≈ ex , an approximation that is valid for sufficiently small(9) |x|,
we find that
s s
! !!
qfinal 1 − βi 1 − βi
≳ e−εt exp ε
X X
(3.4) = exp ε −t + .
qinitial i=1 βi i=1 βi
We have that qfinal ⩽ 12 , since p ⩽ 1 throughout the whole process. On the other hand,
since we are assuming that p ⩾ 12 + k1 throughout, we have that qinitial ⩾ k1 . Therefore,
qfinal k
qinitial
⩽ 2
⩽ k. Plugging this into (3.4) and taking logarithms, we find that
s
! !
qfinal X 1 − βi
log k ⩾ log ≳ ε −t + ,
qinitial i=1 βi
implying that
s
X 1 − βi log k
≲t+ = t + o(k),
i=1 βi ε
where we plug in our choice of ε = k −1/4 in the final equality.
This proof worked under the assumption that we remain throughout in the range
p ⩾ 12 + k1 . Let us now work in the complementary regime, where p < 12 + k1 throughout
the whole process. In this case, recalling the definition of α(p) from (3.2), we have
0 when we do a blue step,
′
(3.5) p −p⩾ −ε
k
when we do a red step,
1−βi
+ ε
· when we do the ith density-boost step.
k βi
This approximation can be made rigorous, but we’re still cheating in this derivation of (3.4). We
(9)
the approximation is valid. A correct proof of this lemma would need to separate out the contribution
from the steps where βi is very small, and thus where such an approximation is not accurate.
1230–18
Again, there is some cheating going on here—one can only obtain the claimed estimate if s is not
(10)
too small as a function of k, in order to absorb the error terms. In the formal proof, one has to separate
into cases: the bound (3.7) is valid if s is not too small, whereas if s is very small one can complete
the proof of Theorem 1.5 via a simpler analysis.
1230–19
Of course, if we recall our original strategy, it is way too much to hope for that the
maximum of G is less than 1. Indeed, the whole point of the book algorithm was to
output the book (A, Y ), and to ensure that its parameters are good enough to apply
Lemma 2.4.
What are the parameters of this book? Well, we have that |A| = t by definition, and
m := |Y | ⩾ 2−t−s−o(k) N
by Lemma 3.4. By Lemma 2.4, we know that if m ⩾ r(k−t, k), we find a monochromatic
Kk , yielding our desired contradiction. Thus, we may assume that m < r(k − t, k),
implying that
(3.9) N ⩽ 2t+s+o(k) m < 2t+s+o(k) r(k − t, k).
By Theorem 2.2, we know that
!
2k − t
r(k − t, k) ⩽ .
k−t
A useful upper bound on binomial coefficients is that ab ⩽ 2aH(b/a) , where H(z) :=
−z log2 z − (1 − z) log2 (1 − z) is the binary entropy function. Plugging this in, we find
that
! !
2k − t k−t 1−x
log2 r(k − t, k) ⩽ log2 ⩽ (2k − t)H = k (2 − x)H .
k−t 2k − t 2−x
Taking logarithms of (3.9) and dividing by k shows that
1−x
C − o(1) ⩽ −1 + (x + y) + (2 − x)H =: F (x, y).
2−x
Putting all of this together, we have shown that either we derive the claimed contradic-
tion, or C − o(1) ⩽ min{F (x, y), G(x, y)}. Again, we have the freedom to choose C, so
we can obtain the desired contradiction if we set C to be larger than the maximum of
min{F (x, y), G(x, y)} on the square [0, 1]2 . In particular, as our goal is to pick C < 1,
we are done if min{F (x, y), G(x, y)} < 1 for all x, y ∈ [0, 1]. Here is a contour plot of F :
1230–21
This looks great! The areas where F is large seem to be different from the areas where
G is large, so there should be no problem to show that their maximum is always strictly
less than 1. In fact, here are the regions where F > 1 and G > 1.
Bad news! There’s a big red region where both functions are greater than 1, and our
whole proof strategy fails. In fact, one can check that min{F (x, y), G(x, y)} attains
a maximum value of roughly 1.054. That is, in order to obtain a contradiction, the
smallest value of C we could pick is 1.054, and thus this whole complex proof is only
able to show that r(k) ⩽ 22.054k ≈ 4.15k , which is worse than the bound of Theorem 1.3.
The fact that min{F (x, y), G(x, y)} > 1 for some (x, y) ∈ [0, 1]2 is a fundamental
obstruction to this approach. In order to overcome it, we will use two tricks, both of
which involve tweaking the book algorithm.
To achieve this, we do the following. We pick a number µ ∈ [0, 1], which will be fixed
throughout the argument. In step 3 of Algorithm 3.2, we now perform a blue step if
some vertex in X has at least µ|X| blue neighbors in X; otherwise, we proceed to the
subsequent steps of the algorithm unchanged. An important effect of this choice is that
now, when we perform the ith density-boost step, the parameter βi is constrained to
be at most µ, and thus also β ⩽ µ at the end of the process. In particular, if we pick
µ < 12 , we will have accomplished our goal of decreasing s relative to t. This suggests
we should pick µ very small, but of course there is a trade-off—if µ is very small then
every blue step decreases |X| by a lot, and thus the process will terminate quickly. To
balance these two effects, we want to pick µ to be neither too large nor too small. For
completeness, here is the description of our modified book algorithm.
Table 4.1. How the various parameters evolve during Algorithm 4.1. The
only difference from Table 3.1 is that blue and red steps shrink X by factors
of µ and 1 − µ, respectively.
1230–23
In this modified book algorithm, Lemmas 3.3, 3.4, 3.6 and 3.7 remain true; the only
change is that Lemma 3.5 needs to be modified to the following statement, reflecting
the fact that each blue (resp. red) step shrinks X by a factor(†) of µ (resp. 1 − µ) in the
worst case. The proof is otherwise identical to that of Lemma 3.5.
Lemma 4.2 (Modified Lemma 3.5). — At the end of the process, we have
|X| ⩾ 2−o(k) (1 − µ)t µb β s N.
In particular, since b + s ⩽ k, we have
|X| ⩾ 2−o(k) (1 − µ)t µk−s β s N.
Since the process terminates when |X| ⩽ 1, we conclude from Lemma 4.2 that
s+t s
o(k) −t −(k−s) −s o(k) −t −(k−s)
(4.1) N ⩽ 2 (1 − µ) µ β ⩽ 2 (1 − µ) µ ,
s
where the final inequality follows from the lower bound on β in Lemma 3.7. Taking
logarithms and dividing by k, we conclude that
! !
1 1 x+y
C − o(1) ⩽ −1 + x log2 + (1 − y) log2 + y log2 =: Gµ (x, y).
1−µ µ y
Note that in the case µ = 12 , we precisely recover the previous function G, which of
course makes sense as we are then recovering Algorithm 3.2. Here are contour plots of
1 2 3 4
Gµ for µ ∈ { 10 , 10 , 10 , 10 }.
1 2 3 4
And here are pictures of the regions where F > 1 and Gµ > 1, for µ ∈ { 10 , 10 , 10 , 10 }.
4
It looks like we’re already done at µ = 10 = 25 , but unfortunately we’re not: one can
check that min{F (x, y), G 2 (x, y)} attains a maximum value of 1.0017, hence we only
5
obtain a bound of r(k) ⩽ 4.006k . Here is a closer view of what happens at µ = 25 :
1230–24
But we’re definitely making progress! The bad red region is extremely small now, and
our maximum value of min{F, Gµ } is extraordinarily close to 1. Unfortunately, one
can check that no choice of µ will actually decrease this value below 1—which would
complete the proof—so another idea is needed.
by ( 25 )k = 0.4k , which is a much more significant decrease. Hence we may expect the
trade-offs to work well for us.
1230–25
For completeness, here is our final book algorithm, suited for upper-bounding r(k, ℓ).
We set µ = k+ℓ ℓ
and ε = k −1/4 . We initiate A = B = ∅, and X ⊔ Y an arbitrary
partition of V (KN ) into two equally-sized parts. Let pinitial = dR (X, Y ) be the density
of red edges between X and Y at the beginning of the process, and define
ε
k
if p ⩽ pinitial + k1 ,
(4.2) α(p) :=
ε(p − p
initial ) otherwise.
The algorithm is then as follows.
ℓ
Apart from the choice of µ = k+ℓ , this algorithm is identical to Algorithm 4.1, except
that we now stop when |B| ⩾ ℓ, rather than when |B| ⩾ k as before. In particular,
Table 4.1 still gives the relevant changes in the parameters. Unfortunately, there is an
additional complication introduced by moving to the off-diagonal setting. Before, when
we sought to upper-bound r(k), we could assume that the initial red density pinitial was
at least 12 , by simply swapping the roles of the two colors if necessary. However, once
we are in the off-diagonal setting, this is no longer allowed, and we may have no control
on pinitial . Let us make another completely unjustified assumption.
k
Assumption 4.4. — At the beginning of the process, we have pinitial ⩾ k+ℓ
= 1 − µ.
k
Note that this is a natural assumption, since Algorithm 2.6 “predicts” that k+ℓ is
roughly the correct red density to expect, in the sense that in the analysis of Algo-
rithm 2.6, this red density is the worst-case occurrence. That is, if Assumption 4.4
1230–26
is false “robustly”,
then Algorithm 2.6 should already prove a stronger bound than
r(k, ℓ) ⩽ k+ℓℓ
. In fact, one can essentially force Assumption 4.4 to hold because of
k
such an argument; if we start with pinitial < k+ℓ , we can run a number of steps of the
k
Erdős–Szekeres algorithm, until we end up with p ⩾ k+ℓ . If this never happens, then
k+ℓ
Algorithm 2.6 itself will prove that r(k, ℓ) ≪ ℓ .
Given Assumption 4.4, we can conclude that Lemmas 3.6, 3.7 and 4.2 remain true for
Algorithm 4.3. Moreover, we can prove the following modified version of Lemmas 3.3
and 3.4 (combined into a single statement), whose proof is essentially unchanged.
With all of this setup, we are finally able to prove(11) an exponentially-improved upper
bound on r(k, ℓ).
2
k+ℓ
Theorem 4.6. — We have r(k, ℓ) ⩽ 2− 9 ℓ+o(k) ℓ
for all ℓ ⩽ k4 .
Note that in this theorem, the gain over Theorem 2.2 is exponential in ℓ, and not
in k. This is natural, and the best we could hope for. Indeed, if ℓ = o(k), then the
bound in Theorem 2.2 is already subexponential in k, so it is impossible to improve it
by a factor of 2−δk+o(k) for any fixed δ > 0.
Proof of Theorem 4.6. — Let C be a constant that we will optimize later, and let
N = 2(1+C)k . We fix a two-coloring of E(KN ), and assume for contradiction that there
is no red Kk or blue Kℓ in this coloring. We apply Algorithm 4.3, with µ = k+ℓ ℓ
⩽ 15 .
µ
Note that this choice of µ implies that kℓ = 1−µ . If we output that A is a red Kk or B
is a blue Kℓ , then we have obtained a contradiction, hence we can assume this does not
happen. Therefore, the process only terminates when |X| ⩽ 1, and we also have that
b + s ⩽ ℓ. Plugging this into Lemma 4.2, we find that
s
−t −(ℓ−s) s + t
o(k) −t −(ℓ−s) −s o(k)
N ⩽ 2 (1 − µ) µ β ⩽ 2 (1 − µ) µ .
s
Note that we obtain a better exponent on µ than we had in (4.1), because the assumption
b + s ⩽ ℓ is stronger than what we had before; this is precisely the extra strength gained
by moving to the off-diagonal setting. Taking logarithms and dividing by k shows that
! ! !
1 µ 1 x+y
C − o(1) ⩽ −1 + x log2 + − y log2 + y log2 =: G
e (x, y),
µ
1−µ 1−µ µ y
One should really write “prove”, since everything here is dependent on the unjustified Assump-
(11)
tions 3.1 and 4.4, as well as on the key Lemma 3.6, which we did not rigorously prove. Additionally,
the bound in Theorem 4.6 is stronger than any result proved by Campos, Griffiths, Morris, and Sa-
hasrabudhe (2023), and this too is a consequence of the fact that we are being unrigorous, especially
with the verification of certain numerical inequalities. However, Theorem 4.6 is true; a rigorous proof
of a stronger statement is given by Gupta, Ndiaye, Norin, and Wei (2024, Corollary 6).
1230–27
where the only difference between Gµ and G e is the the term µ in the latter, which
µ 1−µ
is simply 1 in the former. It comes from the ℓ in the exponent; upon dividing by k we
µ
obtain kℓ = 1−µ .
Additionally, by Lemma 4.5, we have
|Y | ⩾ (1 − µ)t+s+o(k) N.
If |Y | ⩾ r(k − t, ℓ), then we obtain a contradiction by Lemma 2.4, so we may assume
that |Y | < r(k − t, ℓ). Taking logarithms and dividing by k again shows that
!
1 1
(4.3) C − o(1) ⩽ −1 + (x + y) log2 + log2 r(k − t, ℓ).
1−µ k
By Theorem 1.3, we have
!
k−t+ℓ
log2 r(k − t, ℓ) ⩽ log2
ℓ
!
ℓ
⩽ (k − t + ℓ)H
k−t+ℓ
! !
µ µ/(1 − µ)
=k· 1−x+ H .
1−µ 1 − x + µ/(1 − µ)
Plugging this into (4.3) shows that
! ! !
1 µ µ/(1 − µ)
C − o(1) ⩽ −1 + (x + y) log2 + 1−x+ H
1−µ 1−µ 1 − x + µ/(1 − µ)
=: Feµ (x, y).
We are no longer trying to beat the bound r(k) ⩽ 4k , so our goal is no longer to obtain
1
k+ℓ
a contradiction for some C < 1. Instead, we are comparing to k log2 ℓ , which equals
µ
(1 + 1−µ )H(µ) + o(1), and thus our goal is to obtain a contradiction for some fixed
C < (1 + 1−µµ
)H(µ) − 1. That is, we would like to show is that for all µ ⩽ 15 , we have
e (x, y)} < (1 + µ )H(µ) − 1 for all x, y ∈ [0, 1]. In fact, we hope to
min{Feµ (x, y), Gµ 1−µ
prove this inequality with some slack, so that we gain an improvement in the exponent.
Recall that our goal is to prove a gain over Theorem 2.2 that is exponential in ℓ. As
µ
such, the slack we get in this inequality should scale like kℓ = 1−µ . That is, we would
like to prove an inequality of the form
!
e (x, y)} < 1 + µ
min{Feµ (x, y), G H(µ) − 1 − δ
µ
,
µ
1−µ 1−µ
where δ > 0 is some absolute constant; such a bound would prove that r(k, ℓ) ⩽
2−δℓ+o(k) k+ℓℓ
.
Such an inequality holds! In fact, one can check that for µ ⩽ 15 , we may take δ as
large as 29 . Indeed, here is a plot of the regions where Feµ > (1 + 1−µ
µ
)H(µ) − 1 − 29 1−µ
µ
and Ge > (1 + µ )H(µ) − 1 − 2 µ , respectively, for µ = 1 . One can verify that the
µ 1−µ 9 1−µ 5
regions only move further apart as µ decreases, so µ = 15 is the worst case.
1230–28
µ 2 µ
This shows that we do indeed get a contradiction if we set C = (1 + 1−µ
)H(µ) − 9 1−µ
,
proving the bound
!
µ
(1+ 1−µ H(µ)− 92 µ
+o(1)
)k − 92 ℓ+o(k) k+ℓ
r(k, ℓ) ⩽ 2 1−µ =2
ℓ
for all ℓ ⩽ k4 .
Proof. — Let C be a constant that we will optimize later. Let N = 2(1+C)k , and fix a
two-coloring of E(KN ), which we assume for contradiction has no monochromatic Kk .
We run Algorithm 4.1 with µ = 25 . Thanks to Theorem 4.6 (plus Theorem 2.2), we
1230–29
know that !
2k − t
if t < 43 k,
k−t
r(k − t, k) ⩽ !
2k − t
− 2 (k−t)+o(k)
2 9 if t ⩾ 43 k.
k−t
Recall that we obtain a contradiction if |Y | ⩾ r(k − t, k) at the end of the process, hence
we may assume that |Y | < r(k − t, k). Combining this with Lemma 3.4(12) , we see that
we get a contradiction if
1
C − o(1) ⩽ −1 + (x + y) + log2 r(k − t, k)
k
−1 + (x + y) + (2 − x)H( 1−x ) if x < 34 ,
2−x
⩽
−1 + (x + y) − 2 (1 − x) + (2 − x)H( 1−x ) if x ⩾ 3
9 2−x 4
2
= F (x, y) − (1 − x)1x⩾ 3
9 4
=: F (x, y),
b
where 1x⩾ 3 denotes the indicator function for the event x ⩾ 34 . In particular, it suffices
4
for us to prove that min{Fb (x, y), G 2 (x, y)} ⩽ 1 − δ for all x, y ∈ [0, 1], where δ > 0 is a
5
constant that will end up in the exponent in N .
This indeed works! Here are the plots of where Fb and G 2 are greater than 1; the
5
second plot is just zoomed in to show the “dangerous area”, where the two regions no
longer intersect.
In fact, one can check that maxx,y∈[0,1] min{Fb (x, y), G 2 (x, y)} < 0.985. Therefore, we
5
3
proving that r(k) ⩽ 2(2− 200 +o(1))k ,
3
obtain a contradiction if we set C = 0.985 = 1 − 200
,
as claimed.
We are back to the diagonal setting, so we may assume that pinitial ⩾ 21 . Therefore Lemmas 3.3
(12)
As mentioned in Section 1, there is no known proof of Theorem 1.1 that does not
use book graphs in some way. As a hopefully fitting end to this exposé, let us see the
original proof of Ramsey (1929), which uses book graphs in a rather different way from
Lemma 2.4, yet whose proof shares certain ideas with the ones we have already seen.
Let us denote by r(Bt,m ) the least integer N such that every two-coloring of E(KN )
contains a monochromatic copy of Bt,m . Ramsey (1929) proved the following upper
bound on r(Bt,m ).
Note that since Kk = Bk−1,1 , this immediately implies the bound r(k) ⩽ k!, and
hence yields a proof of Theorem 1.1.
Proof of Theorem 5.1. — We proceed by induction on t. For the base case t = 1, we
wish to prove that r(B1,m ) ⩽ 2m, which is immediate: in any two-coloring of E(K2m ),
any vertex is incident to 2m − 1 edges, at least m of which must have the same color
by the pigeonhole principle. This yields a monochromatic copy of B1,m .
For the inductive step, suppose the result has been proved for t − 1, and fix a coloring
of E(KN ) where N = (t + 1)! · m = t! · ((t + 1)m). By the inductive hypothesis, this
coloring contains a monochromatic copy of Bt−1,(t+1)m , which we may assume to be
red without loss of generality. That is, there exist disjoint sets A, X ⊆ V (KN ) with
|A| = t − 1 and |X| = (t + 1)m, such that all edges inside A and between A and X are
red. If there is a vertex v ∈ X with at least m red neighbors in X, we may perform a
“red step” by updating A → A ∪ {v} and X → X ∩ NR (v), yielding a red Bt,m . So we
may assume that every vertex in X has at most m − 1 red neighbors in X. Let v1 be an
arbitrary vertex of X, and let X1 = NB (v1 ) ∩ X, so that |X1 | ⩾ |X| − m. Let v2 be an
arbitrary vertex of X1 , and let X2 = NB (v2 ) ∩ X1 , so that |X2 | ⩾ |X1 | − m ⩾ |X| − 2m.
Continuing in this way, we may build a sequence of vertices v1 , . . . , vt as well as a set
Xt with |Xt | ⩾ |X| − tm = m, such that each vi is adjacent in blue to all vj with j > i,
as well as to all vertices in Xt . But this precisely means that we have constructed a
blue Bt,m , completing the inductive step.
Given Theorem 5.1, it is natural to wonder what the true value of r(Bt,m ) is. This
question was first explicitly raised by Erdős, Faudree, Rousseau, and Schelp (1978) and
Thomason (1982), who independently proved the bounds (2t − o(1))m ⩽ r(Bt,m ) ⩽ 4t m.
Thomason in particular was motivated by Lemma 2.4, as discussed in Section 2.2, and
made the following bold conjecture.
This conjecture is known to be true (and optimal) for t ∈ {1, 2}, but it is wide open
for t ⩾ 3 (and may well be false). Moreover, Conjecture 5.2 is likely to be very difficult:
even the m = 1 case would yield r(k) ⩽ 2k+o(k) , a bound far stronger than anything
1230–31
currently known. However, a beautiful result of Conlon (2019) confirms this conjecture
asymptotically for any fixed t.
Theorem 5.3 (Conlon, 2019). — r(Bt,m ) = (2t + o(1))m as m → ∞, for any fixed t.
Conlon’s result addresses a question that arguably goes back 90 years to the original
work of Ramsey (1929), yet uses highly sophisticated tools developed in the interim,
such as the regularity lemma of Szemerédi (1978), and this question is also closely
related to a famous conjecture of Burr and Erdős (1975), which was recently resolved
by Lee (2017). In fact, Ramsey theory has seen a number of recent breakthroughs on
old, seemingly intractable problems by the introduction of remarkable new techniques:
three other examples from the past two years are the works of Li (2023) on explicit
constructions, of Mattheus and Verstraete (2024) on off-diagonal Ramsey numbers, and
of Reiher and Rödl (2023) on restricted Ramsey graphs. There is every reason to hope
and expect this trend to continue.
References
Vigleik Angeltveit and Brendan D. McKay (2024). R(5, 5) ⩽ 46. arXiv: 2409.15709
[math.CO].
Thomas F. Bloom and Olof Sisask (2020). Breaking the logarithmic barrier in Roth’s
theorem on arithmetic progressions. arXiv: 2007.03528 [math.NT].
Stefan A. Burr and Paul Erdős (1975). “On the magnitude of generalized Ramsey
numbers for graphs”, in: Infinite and finite sets (Colloq., Keszthely, 1973; dedicated
to P. Erdős on his 60th birthday), Vols. I, II, III. Vol. 10. Colloq. Math. Soc. János
Bolyai. North-Holland, Amsterdam-London, pp. 215–240.
Marcelo Campos, Simon Griffiths, Robert Morris, and Julian Sahasrabudhe (2023). An
exponential improvement for diagonal Ramsey. arXiv: 2303.09521 [math.CO].
David Conlon (2009). “A new upper bound for diagonal Ramsey numbers”, Ann. of
Math. (2) 170 (2), pp. 941–960.
(2019). “The Ramsey number of books”, Adv. Comb., Paper No. 3, 12pp.
David Conlon, Jacob Fox, and Benny Sudakov (2015). “Recent developments in graph
Ramsey theory”, in: Surveys in combinatorics 2015. Vol. 424. London Math. Soc.
Lecture Note Ser. Cambridge Univ. Press, Cambridge, pp. 49–118.
David Conlon, Jacob Fox, and Yuval Wigderson (2022). “Ramsey numbers of books
and quasirandomness”, Combinatorica 42 (3), pp. 309–363.
Paul Erdős (1947). “Some remarks on the theory of graphs”, Bull. Amer. Math. Soc.
53, pp. 292–294.
Paul Erdős, Ralph J. Faudree, Cecil C. Rousseau, and Richard H. Schelp (1978). “The
size Ramsey number”, Period. Math. Hungar. 9 (1-2), pp. 145–161.
Paul Erdős and George Szekeres (1935). “A combinatorial problem in geometry”, Com-
positio Math. 2, pp. 463–470.
1230–32
Geoffrey Exoo (1989). “A lower bound for R(5, 5)”, J. Graph Theory 13 (1), pp. 97–98.
Ronald L. Graham and Vojtěch Rödl (1987). “Numbers in Ramsey theory”, in: Surveys
in combinatorics 1987 (New Cross, 1987). Vol. 123. London Math. Soc. Lecture Note
Ser. Cambridge Univ. Press, Cambridge, pp. 111–153.
Ronald L. Graham, Bruce L. Rothschild, and Joel H. Spencer (1990). Ramsey theory.
2nd ed. Wiley-Interscience Series in Discrete Mathematics and Optimization. A Wiley-
Interscience Publication. John Wiley & Sons, Inc., New York, pp. xii+196.
Ben Green and Terence Tao (2008). “The primes contain arbitrarily long arithmetic
progressions”, Ann. of Math. (2) 167 (2), pp. 481–547.
Parth Gupta, Ndiame Ndiaye, Sergey Norin, and Louis Wei (2024). Optimizing the
CGMS upper bound on Ramsey numbers. arXiv: 2407.19026 [math.CO].
Zander Kelley and Raghu Meka (2023). “Strong bounds for 3-progressions”, in: 2023
IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS). IEEE
Computer Soc., Los Alamitos, CA, pp. 933–973.
Choongbum Lee (2017). “Ramsey numbers of degenerate graphs”, Ann. of Math. (2)
185 (3), pp. 791–829.
Xin Li (2023). “Two source extractors for asymptotically optimal entropy, and (many)
more”, in: 2023 IEEE 64th Annual Symposium on Foundations of Computer Science—
FOCS 2023. IEEE Computer Soc., Los Alamitos, CA, pp. 1271–1281.
Sam Mattheus and Jacques Verstraete (2024). “The asymptotics of r(4, t)”, Ann. of
Math. (2) 199 (2), pp. 919–941.
Sarah Peluse (2022). “Recent progress on bounds for sets with no three terms in arith-
metic progression”, in: Séminaire Bourbaki, Volume 2021/2022, Astérisque, no. 438
(2022), Exposé no. 1196, 547–581.
Frank P. Ramsey (1929). “On a problem of formal logic”, Proc. London Math. Soc. (2)
30 (4), pp. 264–286.
Alexander A. Razborov (1985). “Lower bounds on the monotone complexity of some
Boolean functions”, Dokl. Akad. Nauk SSSR 281 (4), pp. 798–801.
Christian Reiher and Vojtěch Rödl (2023). The girth Ramsey theorem. arXiv: 2308.
15589 [math.CO].
Klaus F. Roth (1953). “On certain sets of integers”, J. London Math. Soc. 28, pp. 104–
109.
Ashwin Sah (2023). “Diagonal Ramsey via effective quasirandomness”, Duke Math. J.
172 (3), pp. 545–567.
Issai Schur (1917). “Über die Kongruenz xm + y m ≡ z m (mod p)”, Jahresber. Dtsch.
Math.-Ver. 25, pp. 114–117.
Joel Spencer (1975). “Ramsey’s theorem—a new lower bound”, J. Combin. Theory Ser.
A 18, pp. 108–115.
(1994). Ten lectures on the probabilistic method. 2nd ed. Vol. 64. CBMS-NSF Re-
gional Conference Series in Applied Mathematics. Society for Industrial and Applied
Mathematics (SIAM), Philadelphia, PA, pp. vi+88.
1230–33
Yuval Wigderson
Institute for Theoretical Studies,
ETH Zürich
E-mail : [email protected]