On Sets of Integers Containing No Four Elements in Arithmetic Progression
On Sets of Integers Containing No Four Elements in Arithmetic Progression
E. SZEMERI~DI (Budapest)
In what follows we use capital letters to denote sequences of integers, A + B to denote the sum of two sets of integers formed elementwise, and A-q B to denote the complement of the set B with respect to the set A. Let us for convenience call an arithmetic progression of k (distinct) terms
a k-progression.
I f a set A contains no k-progression we say that A is k-free. The maximal number of elements a k-free set A ~ [0, n) can have is denoted by Zk(n). Furthermore we set
~k = i ~
n~eo
~(n)
n
Actually we can replace lim on the right hand side by lim. For, given e > 0 and n, we can find arbitrarily large m so that Zk(m)>=(Tk--*)m; in particular we m a y assume that qn<m<=(q + 1)n holds for a positive integer q. In other words there is a k-free set A c= [0, m) with cardinality IAI-->(~k-~)m. Now [0, m) can be split into (q + 1) subintervals of length at most n. One of these must contain at least
{q~)
q n. ( ~ - ~)-~-i-
*k(n) > =
whence
~'k n ,
7k = lim zk(n): n < 1 Clearly 7k = 1 - - ~ - , and 7a<_-Ta<_ - . . . . It has been proved by F. BEHREND*
that either all 7k are zero, or 7k ~ 1 as k - ~ ~o. * On sequences of integers containing no arithmetic progression, ~asopis Mat. Fis. Praha, 67 (1938), pp. 235--239.
Aeta Mathematica Academiae Scientiarum Hungarlcae 20, z969
90
E. SZEMER~DI
In 1953 ROTH* proved that 73 = 0 . In fact he proved more than that, namely
n
z a (n) << log log n " 9Roth's p r o o f uses estimates of exponential sums. In this paper we shall prove the following THEOREM. 74=0,
i.e. r4(n)=o(n).
The p r o o f is elementary. The problem of 7s, 76 . . . . is left open. The p r o o f is indirect, so from now on we assume that 74>0. For convenience we write
7=~4 9
We shall formulate in this section the two main lemmas and deduce the theorem from them. We write Q(b, c, d, e) for the system
b-2c+d
= c-2d+e
= O,
which means that either b, c, d, e form an arithmetic progression, or they are identical. Throughout the paper n~(8) shall mean a number (for example the smallest one) with the property that f o r n ->n4(e) a 4-free set A c=[0, n) cannot contain more than (7 +e)n elements. Occasionally we use the analogue meaning for na(a ) as well. Let B, C, D ~=[0, q). We regard B and C as fixed while D varies. We then define D* = {e; eE[0, q) and there are
With this notation we shall prove LEMMA (Ho, ..., H,).** There are absolute constants % > 0 , 7 ' > 0 , k o and qo with the following property: I f q>-qo, 3[q,
and if B, C are 4-free sets contained in [0, q), [BI-->(7-zo)q, [el->(7 -~o)q, then there are disjoint sets Ho, .... Hk, k<=ko, such that 6 HK=[lq, 2q],
K=O
1 IHol-<--ff7q;
K=l,2,...,k,
* On certain sets of integers. I; II, J. Lond. Math. Soc., 28 (1953), pp. 104--109; 29 (1953), pp. 20--26. ** The full force of the hypothesis that (say) C is 4-free is not needed for the proof of this lemma: see the footnote on page 95.
Acta Mathematlca Academlae Sclentlarum Hungaricae 20, z969
ON SETS OF INTEGERS
91
Ill
are sets
Bo, Co, 31 ..... Du, E1 ..... Eu~ [0, q),
all 4-free, all with at least ( 7 - el)q elements, such that Q(b, c, d, e) with b E Bo, cECo, dE Di, eE Ei is insolvable for all i= 1, ..., u, and such that for each xE[0, q) the set of all i' s for which x EEi holds is 4-free.
We now prove the theorem using these two lemmas. Let Co, 7 and ko have the meaning of lemma (H o ..... Hk). Put
e~ = rain eo, 20 ' "
u = N(k o , t)
such that in any partition of [0, u) into at most ko classes there is at least one class which contains a t-progresssion. We apply lemma BCDE with this el, and u, and with qo = 3n ~(el)From [Dd --->(7-~l)q, ~- q-->n4(~l) we see that
1
92
E. SZEMEREDI
q'
= ~
j=O
[Ho[+I~s~= ,
since e~ <=~ ?.
1
Attaching such a j(i) to each i, it gives a partition of the i's into k classes. Since u = N(ko, t) and k <=ko one of these classes contains a t-progression. In other words, there is a Jo and an arithmetic progression il ..... it such that ID~ N//So I for
i = i,,
it.
I-T7
IHfol
where the * is taken with respect to B o and Co. With the trivial relatior~ ( U N V)*c= U* N V* this implies that
{'/
IE~nH~ol < 1
for i = i l , ..., it. Put IHfol=~.q, [0, q ) - H * o = M . We notice that M is not empty, since otherwise the last inequality would imply that [Eil<_-897 q, in contradiction with the fact that = [?'2~'1 [Ei[ >= (7 -- 81) q > q.
[MI
q - IH*o]
]
, ]
1 - c~
,
1 - c,
O N SETS OF I N T E G E R S
.93
We conclude that there is at least one x CM which occurs in not less than (7 +251)t of the sets Ei. By lemma BCDE those i~'s for which xCE~ form a 4-free set. They are contained 5n an arithmetic progression of t terms and by the choice of t =n~(el), there cannot be more than (7 +el) t numbers i~ for which xCE~. Thus we have reached a contradiction and the theorem is proved. In this section we shall prove lemma (Ho . . . . ;H~). For this we need three other lemmas. The first is almost obvious. We call it therefore THE SIMPLE LEMMA. Let A~=[0, n) be 4-fi'ee and IAI >-(7-5)n. Let MC=[0, n) have a complement that is the union of disjoint arithmetic progressions Po, ~ = 1..... r each of length ]P~[>-n4(5"). Then we have
[A NM[ ~ 7]MI-(e+5")n.
PROOF. Each A 0 Pe as a 4-free subset of a progression fulfils
u>=p(6,1), a~[O,u),
,then G contains a set St of the form
Ial>=6u,
st = {y} + {0, x~} + . . . + {0, x~} with natural numbers xl ..... xt.
PROOF. The proof goes by complete induction and uses the box principle. The case l = 1 is trivial, since it states only that there is a pair of elements of G. A suitable choice of p(6, 1) is 1 + ~ concerning G shows that
[1]
, l-- 11 .
u = kq+r,
0 <=r < q .
94
~. SZEMERt~DI
1)t_ ,
p(6,1)=max[[l+~--~-]q,
Let R be the number of those sets
[1 + 2 ] q ~ ) .
GK=GO[(K--1)q, Kq],
for which lUll ~ - q . Then R _ ->
k
K=I
K=I ..... k
k, otherwise "
6
then
REMARK. An analogous lemma can be similarly proved with ~ = 73 (instead of ? = 7 4 ) on the assumption that 73 >0. We then easily arrive at a contradiction, which proves Roth's theorem 73 =0. For this purpose choose a q~3nz(e ). Next choose a 3-free set A ~ [0, 3q) with IAI =>37q and represent it as
A = B U ( C + q ) U(D+2q)
Acla Malhemalica Academlae Scie*atiarum Hungaric~e zo, i969
ON SETS OF INTEGERS
95
with B, C, D ~ [0, q); and finally set G = D f q ~-q, ~-q . One easily obtains the inequalities ]BI =>(7--2e)g, IC] =>(7-2~)q, ]GI =>(7-8s) q If we take e ~ ~- eo, e <=-j~ 7 and q large enough, we can apply the lemma with 6 = 1 ? and get ]G*I->7'q>0 which means that there is a triplet (b, c, d) with
b-2c+d = O.
1 1
But (b, c + q, d + 2q) is then a 3-progression in A, a set that was supposed to be 3-free.
PROOF OF LEMMA
IG*X.S e t eo = 1 0 0 7 ,
1 2
m=n,(eo),
7=
50.2 t"
With these choices we have q >=p(6, l) and can therefore find a set of type St in G. We consider
Since S i ~ ~ q, ~- q one has L ~ [0, q). With [C 1_->(7-%)q and ~ q > m = n~(eo) we obtain [Lo/ since 5eo< ~- 74-free.
* The derivation of this inequality is the only extent to which we use the hypothesis that C is
~-q,
=> ( ? - 5 e o ) ~ = ~ v q , *
E. SZEMEREDI
From the fact that ILt[<--q and LoC=L~C=... we infer that there is some i<=l such that
"
We decompose this L~-I into maximal progression (rood x~). We shall denote by L t h e union of those of these progressions which have 3m or more elements, and by ~ the union of the remaining ones. From
IEl -->
q~-8~q
since by our choice of l we have l ~ 24m. 7 Now let us drop m elements from each end of each of the progressions (mod x~ composing E, and denote the remaining set by M. Since every progression in L has a length of at least 3rn we have
1 [MI => ~-ILl > ~
q.
By construction [0, q ) - M can be represented as the union of disjoint progressions (rood x~) each of length at least m. Thus we can apply the Simple Lemma with e = d = eo and obtain 72 y2
q--28oq ~ ~ q ,
since eo has been chosen suitably. By definition, L~ A B is the set of those b in B which have a representation
In S~ there are at most 2~ elements. Therefore at least one y contained in S~ has the property that the equation
b-2c+ y = 0
2
has at least ~
solutions (b, r
ON SETS OF INTEGERS
97
< ?',
1 2 ) 1 We now start from some G 0 ~ ~-q, ~-q with [Go[-->]~7q and put go = [G*[. Next we define by recursion for i = 1, ..., h
r~ =
{~
, G c G ~ - I , IGI --> ~ - [ a ~ - i
-~-
',}
g~
=m,nO*
G6rl
and fixe one Gi in Fi for which IG*[ : g ~ . From G~E F~ we see that
IGI > ~ I a i-1 I > = ~..... >
o
> =
[2)
12
'
1 (~h+l [G~I~ [ ~ J q,
Thus, if we take 6 = ~ and obtain and qo =
?'q~=gh< 1-7
98
E. SZEMERI~DI
G~H,
and
j H = Gs_ ~. From the meaning of gs IGI~ 2 ]HI, then GCFj and therefore = gJ -gJ-
and
gs- ~ it
follows that if
At first we apply this process to G o = ~- q, ~ q and call the set H obtained H 1 . Then we take Go = ~-q, ~-q 7 H, and if this set contains at least 7q elements
we obtain a s e t / / 2 from it. Next we take Go = 3 q' 3- q q (H~ U Hz) to get a set Hs, and so on. As soon as we are left with
IHKI_-> [ j
k<=..~
By
for
K=l,2,...,k
this occurs certainly after a finite number of steps. To be precise, we see that {~_1[72_/h+lq}-1=2[2/h+l.
constructionHoUH1U...UHk=
for all 2, ...,
q,~q
and if
GC=HK, IG[=~IH~I
then
and
PROOFOFLl~MMABCDE.Letustakenandqtobeintegersso let A be a 4-free set contained in [0, 4nq) which satisfies IAI -->74nq. A = B (J (C+nq) U (D +2nq) (5 ( E + 3nq) [0, nq) and (in an obvious notation) B = U ( B + x q ) with B~=C[0,q),
x<n
that
nq>=6n4(3}
with B, C, D, E ~
ON SETS OF INTEGERS
99
similarly for C, D, E. For their respective cardinalities we get easily the estimates
R<= /32 q- e3
/31
PROOF.
(7--e3)n<=(7-e~)R+(7+~2)(n--R),
We list n o w the parameters used in the proof, in the order of their dependence. The reader may check them as they occur. e, u and qo are supposed to be given,
/32e1 16U'
l = 75m- 2 q, e4600.2q+2 z,
/33 -
150.2q'
100
B. SZEMERI~DI
We can safely dispense with specifying e and n since there is no feedback to the other parameters. A small ~ only demands a large n. By an already repeatedly used argument we get ~'lBx] =
X<-6
BA[O,lnq I
: cn[7'
>(V-e)~
Z [C l n 11 ~-~y<~-
7)-->
~t provided only that n is large enough. We set ez = ~ , we then have for all x, y, z, w,
>=n~(~z), so
that
[B I, It, I, IDA, IE [
number of poor Cy, --<-6 -y<-3
<=
(Y+ez)q.
enough. Consequently more than half of the Bx are full. There are only 2 q subsets of (0, so there is a full B(o) = (0, q) such that
q),
Bb=B(o )
for
bE~
0,
with
1 ~ t = > 1 2 . 2 q.
Since not more than 1 of the Cj are poor, not more than 1 of the r-tuples contain more than 1 poor sets. There are only 2 qr different r-tuples, so we find C(o), .... C(r- x), not more than ~- of them being poor, and ~c n = 6 ' oE[O,r), so that
t/
Cc+~=C(e) for
With the sets we form
cE~
and
By lemma p(3,/) we see that E contains a subset of type S, = {y} + {0, xi} + . . . + {0, x,}.
ON SETS OF INTEGERS
101
Then we have
ti
rl
= 12.2q '
>"
IZ,l-[r,_~[
-<_ 7 "
We decompose L~_, into maximal progression (mod 3x3, collect those progressions which are longer than 3m into L, and the remaining ones into 7,; as in the proof of lemma IG*] we get
3mn
l '
]L'I-> I L ~
(Here we have taken l ~72m. 2q). Dropping the first m and the last m elements of each of the progressions collected into L, we obtain a set we shall call g. Then
1 . n
M = U [eq, (e + 1)q)
eEg
has the property of the set M in the Simple Lemma. (The progressions have the modulus 3qxf and are each of length at least m; d =g3). Therefore
eEg
Z [Eel = [EAM} >=vIMl --(g+%)qn = Yqlel--(e+~a)qn -> >=eqIe1-2%qn =>(y - 150"2%a)ql/ = (~ -e2)qle 1.
Since lee[ =<(~ +e2)q for all e, the 'counting argument' applies, showing that the number of poor Ee, e <E is at most
2gz
e, [#l = g h - l g l 9
Aeta Maeberaatlca Ac~demiae Sclentlarum Hungaricae zo, z9@
102
Ee+30, e E&
Each e E g by construction occurs in at least one quadruple (b, s, d, e) with b E ~3 and s E Sl. To each e E 8 we attach one such quadruple making the d, as well as the b and the s, a function of e, d = q~(e). Let N = {~o(e); eEg}. Since S1 has at most 21 elements any particular d in D can arise as a value qo(e) at most 2 ~ times. We consider the quadruples (b, s + O, ~o(e)+ 20, e +30), e E S , oE[O, r). We want now to show that for at least one 0 Cs+0 is full (independent of e since Cs+o= C(~)), and almost all Do(e) + 20 are full (counted with multiplicity). We do this by considering all the O together. The basic tool is again the Simple Lemma. Before applying it, however, we have to remove the multiplicities with which the Cg(e)+ 0 occur. There are two sources of multiplicity: the m~pping ~o(e) = d , and the forming of the sum d + ~. We deal first with the case when ~o is one to one, where only one of these sources is present. Set
'
We construct a subset N " c=N, with the property that consecutive elements have a difference of at least 4r, but
lYl --> ~ 19 I.
For this purpose we may go from left to right retaining for our set ~ " the first element not ruled out by the restriction upon the differences. Since we exclude at most 4 r - 1 elements for each one which we keep we obtain the stated inequality. Now, each element in ~ " = ~ " + {0, 2, 4 . . . . . 2(r - 1)} is uniquely represented. Therefore we have
r-J-
I U Dxl = Z
x E ~ ~'
dE~"~=0
By construction the complement of ~ " consists of progressions (mod 2), each of length at least r. (No difficulty arises when considering elements to the left of the first and to the right of the last elements in N " , respectively, since N + 2~ c_ 6- n, n . Therefore the left hand side can be estimated by the Simple Lemma.
ON SETS OF INTEGERS
103
We take
M=
and obtain
U [xq,(x+l)q)~ ~~
xE ~"
r=n4(54),
e4 <-
~2z 600.2 ~
I Y Dx] = I D A M I >= y q l ~ " l - - ( ~ + ~ 4 ) q n = -~ rqr 1@"1- (~ + 54)qn = > yqr ]~"1 - z54qn.
Putting these estimates together gives
(*)
Z Z IDd+2~l~ Z
--> ( l ~ l - [ ~ ' l ) ( ~ - ~ 2 ) r q ->
[Dd+2ol ->
(:'-~2)[ l~l-854nlrq'e2 )
n
In the present special case we have [~]=18]->75.2~ " We therefore get the further inequality a~e=o
Z Z [Dd+20] ~ ( 7 - - 8 2 )
[1-8"75 "2q /
e2)
r q l ~ [ >=
-~ (y - 5 2 ) ( 1 - ez)rq [@1 >- (~ - 2e2)rq 1@[. By the 'counting argument' we infer that not more than 3 e 2 r l D ] = . 3 r
51 IOU
[8[
sets Da+2o, taken with their multiplicity, are poor. For at most one half of the O's 3 can we have more than [E[ p o o r sets Dd+2o. If we drop these numbers 0, of which there at most ~ r, and also those 0 for
which
C(.o) is poor, there being no more than 1 r of them, some of the numbers e
remain. So far we have proved: 3 There is a number oE[0, r) such that Cs+o=C(o) is full, at most ~u 18] of 1 the sets D~(o+2o , e E 8 are poor, and at most 8uu lel of the sets E~+3o are poor. 1 Hence for at most 2u-u]81 elements e E C we have either E~+so or D~(~)+: o poor. We call these e E 8 'bad'. The density of the bad elements in g is at most 1
104
W e c a n take m =>2u. If one of every u consecutive elements of such a progression were a bad one, the density of bad elements in any particular progression in g would be at least 2 2 3u - 1 3u and so therefore would be the density of bad elements in the whole of g. Since we have disproved this there exists an arithmetic progression of at least u good elements in g, q.e.d. Rather little has to be changed in the general case when the elements d E ~ are taken with the multiplicities of d = cp(e) not necessarily all equal to one. Set @i = {d; d = cp(e) for exactly i elements e E g } , Each N~ can be treated in exactly the same way that N was until we reach the formula ( ~ ) . However, in order to make the formula useful this time we must take a smaller e4 (and therefore a larger r):
~2
r = n4(e4).
,._1
[[Nil- S e 4 n I rq.
~2 J
ID~(e)+2el=> ( 7 - e z )
Ig[ =8/34n
/32
i rq e
The counting argument again shows that there is an o E [0, r) such that for at most 3 8u Ig[ elements e Eg the sets Do(e)+2o are full, and the proof is finished as above. We have now completed the proof of lemma B C D E and with it the proof of the theorem. The author wishes to express his thanks to E. WIRSING and P. D. T. A. EU~OT% who helped considerably in the final formulation of the proof.
(Received 20 October 1967)
MTA MATEMATIKAI KUTATO INTI~ZETE, BUDAPEST, V., RE/~LTANODA U. 13-15