Ergodic Roth
Ergodic Roth
A. MAGYAR
1. The F urstenberg correspondence principle.
The basic object of study here is that of a measure preserving system. This consist of a probability
measure space (X, A, ) ((X) = 1), and a measure preserving transformation: T : X X
(T
1
(A)) = (A) for all A A
where T
1
(A) denotes the inverse image of the set A. It was a brilliant observation of F urstenberg,
that problems in density Ramsey theory (i.e. statements about sets of integers or integer points of
positive upper density) can be translated to the settings of measure preserving systems, giving rise
to a new and rich eld called ergodic Ramsey theory. The basic tool for this translation is the
so-called F urstenberg correspondence principle which we describe below.
First, let us mention a simple observation, that Szemeredis theorem implies a multiple recurrence
type result for measurable subsets: A A, (A) = > 0. For given k N, let N = N(k, ) such
that if E [1, N] with [E[ N then E contains an AP (arithmetic progression) of length k. Now
let A
i
= T
i
(A) for 1 i n. Then, if 1
B
stands for the indicator function of a set B, then it is
clear that
N =
N
i=1
_
X
1
A
i
(x) d(x) =
_
X
N
i=1
1
A
i
(x) d(x)
hence there must be an x X such that
i
1
A
i
(x) N, that is [E(x)[ N where E(x) =
i [1, N] : x A
i
= T
i
(A). Thus E(x) contains a k-term AP: m,m+d,. . . ,m+(k-1)d, and it
follows
T
m
(A) T
md
(A) . . . T
m(k1)d
(A) ,= (1.1)
as the intersection contains the point x. Note that the argument in fact also gives that, the set of x
for which [E(x)[ N must have positive measure, and since there are only nitely many possible
sets E(x), there be a set E [1, N] with [E[ N such that x X : E(x) = E has positive
measure. Choosing again a k-term AP from E one obtains that for every set A of positive measure
(A T
d
(A) . . . T
(k1)d
(A)) = (T
m
(A) T
md
(A) . . . T
m(k1)d
(A)) > 0 (1.2)
for some d N. This is essentially the so-called multiple recurrence theorem of F urstenberg.
It may be viewed, as a generalization of Poincares recurrence theorem (one of the rst results in
ergodic theory), which states that if (A) > 0 then (A T
d
(A)) > 0 for some d N. To see
1
2
A. MAGYAR
this, assume indirectly that (A T
n
(A)) = 0 for all 1 n M = [(A)
1
] + 1. But, then for
all 1 n < m M one has (T
m
(A) T
n
(A)) = 0, as T is measure preserving, hence
(
nM
T
n
(A)) = M(A) > 1 = (X)
which is not possible.
We turn now to the construction of a measure preserving system, associated to a set of positive
upper (Banach) density E Z. This will require some basic facts from measure theory and
topology, whose proof we will outline at the end of this note.
Let X = 0, 1
Z
, that is the set of maps: x : Z 0, 1, and dene the metric d(x, y) = 2
n
if n = min[m[ : x(m) ,= y(m). It is easy to see that a basis of the topology is given by the
cylindrical sets:
C
1
,...,
k
m
1
,...,m
k
= x X : x(m
i
) =
i
, 1 i k
where m
i
N is a nite set of integers and
i
0, 1 (by convention C
(E) = limsup
N
[E [M, M +N 1][
N
(1.3)
that is, there exists a sequence of intervals I
j
= [M
j
, M
j
+N
j
1] such that
d
(E) = lim
j
[E [M
j
, M
j
+N
j
1][
N
j
(1.4)
and d
(E) is an upper bound for the limit on the right side of (4), for any sequence of intervals.
Let x
E
X be the indicator function of the set E, T : X X denote the shift operator, i.e.
Tx(n) = x(n + 1) and let A = x X : x(0) = 1. Note that were doing a typical duality
construction, sets become points of X, and points (i.e integers) become subsets of X, p.e. A
corresponds to the point 0. Notice that
n E T
n
(x
E
) A
Thus
d
(E) = lim
j
1
[I
j
[
nI
j
1
A
(T
n
(x
E
) ) (1.5)
For every j N dene
j
: ( [0, 1] by
j
(C) =
1
[I
j
[
nI
j
1
C
(T
n
(x
E
) ) (1.6)
ROTH THEOREM - THE ERGODIC APPROACH. 3
Since the family ( is countable, it is easy to see that
Proposition 1.1. There exists a sequence j
s
N such that for all cylindrical set C (, there
exists the limit
(C) := lim
s
j
s
(C) (1.7)
Proof. Write ( = C
1
, C
2
, . . . as ( is countable. Choose a sequence j
1s
= j
11
, j
12
, . . . , j
1s
, . . .
such that
j
1s
(C
1
) converges as s . Then choose a subsequence j
2s
of j
1s
, such that
j
1s
(C
2
) converges as s . Similarly, construct the nested family of sequences, and nally
choose j
s
= j
ss
for s N. It is clear that for any n N, the sequence
j
s
(C
n
) converges, as for
s n the numbers j
s
= j
ss
form a subsequence of the sequence j
ns
.
The function : ( [0, 1] is dened via (7), and note that by (5) and (6):
(A) = d
(E) (1.8)
Clearly is an additive set function on the family (, which in fact implies that is also -additive
on (. Indeed, let C =
i=1
C
i
is a partition of a cylindrical set, then as it is closed and the space
(X, d) is compact (which is proved by the same diagonalization process as above, or follows from
Tychonos theorem), so C is in fact compact, and hence is a union of nitely many sets C
i
. It is
a well-known extension result in measure theory (due to Caratheodory)
Theorem: There exists a unique probability measure : A [0, 1], such that (C) = (C) for all
C (.
From now on (out of sloppiness) well denote the extended measure also by . Now it is easy to
show
Theorem 1.1. (F urstenberg Correspondence Principle) Let E Z with d
((E m
1
) . . . (E m
k
)) (1.9)
Proof. Let X, A, T, and the set A as constructed above. Then
x T
m
1
(A) . . . T
m
k
(A) x(m
1
) = . . . = x(m
k
) = 1
hence by (6) one has for a xed j N:
j
(T
m
1
(A) . . . T
m
k
(A)) =
[n I
j
: T
n
(x
E
) (m
1
) = . . . = T
n
(x
E
) (m
k
) = 1[
[I
j
[
=
[n I
j
: n +m
1
E, . . . , n +m
k
E[
[I
j
[
=
[I
j
(E m
1
) . . . (E m
k
)[
[I
j
[
Finally, by denition:
lim
s
[I
j
s
(E m
1
) . . . (E m
k
)[
[I
j
s
[
d
((E m
1
) . . . (E m
k
))
4
A. MAGYAR
In particular, when m
i
= id, 0 i k1, the multiple recurrence theorem described in (2) implies
Szemeredis theorem, although with no quantitative version.
2. Basics - Ergodic Theory.
We introduce here some preliminary notions and basic results, especially ergodicity and weak-
mixing, which will be needed later in the proof of Furstenbergs double recurrence theorem.
To motivate our discussion, lets start with the (very) classical result due to H. Weyl. Let be an
irrational number, then the sequence n
nN
is uniformly distributed mod 1, that is on the torus
R/Z. This means that for any interval: I [0, 1] and x [0, 1], one has
lim
N
[0 n N 1 : x +n I[
N
= [I[
where [I[ denotes the length of the interval I. If one writes T
= sup
(U
c
)=0
c. Clearly, x : f(x) (c
, c
+ ] = 1 for
all > 0 hence (x : f(x) = c
) = 1.
2. 3. Let f = 1
A
, and note that f T = 1
T
1
(A)
.
3. 1. This is obvious.
n=0
U
n
T
f P
T
f (2.1)
where the convergence is understood in the L
2
-norm (i.e. f
n
f if |f
n
f|
2
0).
Moreover, if T is ergodic, then the function P
T
f is constant and is equal to:
_
X
f d.
Proof. Let M = g Ug : g H where U = U
T
. Note that M is a linear subspace, and let
M
denote its closure. Now let h be in its orthogonal complement
M
, that is
(h, Ug g) = 0 for all g H
This means
(h, Ug) = (U
1
h, g) = (h, g) for all g H
Thus U
1
h = h or equivalently h H
T
. Thus
M
= H
T
hence H =
M
H
T
.
6
A. MAGYAR
Write f = k +h with k
M and h H
T
, and note that h = P
T
f. If S
N
= 1/N
N1
n=0
U
n
, then
clearly S
N
h = h for all N. On the other hand
S
N
(Ug g) =
1
N
(U
N
g g)
thus |S
N
(Ug g)|
2
2|g|
2
/N 0 as N . For any > 0 one may write k = (Ug g) + e
where |e| < as k
M. Then,
|S
N
k|
2
|S
N
(Ug g)|
2
+|S
N
k|
2
2|g|
2
N
+ 2
for N N
n=0
_
X
g (U
n
T
f) d =
__
X
f d
_ __
X
g d
_
(2.2)
4. For all A, B A
lim
N
1
N
N1
n=0
(A T
n
B) = (A)(B) (2.3)
Proof. 1. 2. This is just a reformulation of Prop. 2.1, as 1 is a simple eigenvalue of U
T
means,
that the only functions f L
2
(X, A, ) for which U
T
f = f (that is f T = f), are the constant
functions f = c1
X
.
1. 3. Using the notation of Theorem 2.1, the left side of (2.2) is (S
N
f, g), where ( , ) denotes
the inner product, thus using the notation c
f
=
_
X
f d, one has
lim
N
(S
N
f, g) = c
f
(1
X
, g) = c
f
c
g
which is the content of (2.2)
3. 4. Let f = 1
A
, g = 1
B
be the indicator functions of the sets A, B A. Then (2.2) translates
to (2.3).
4. 1. Let A be an invariant measurable set and let B = X/A. Then (AT
n
(B)) = (AB) = 0
for all n N, thus by (2.3) it follows that (A)(B) = 0 and hence (A) = 0 or (A) = 1.
ROTH THEOREM - THE ERGODIC APPROACH. 7
3. Weak Mixing.
Let (X, A, , T) be a measure preserving system and let A, B A. It is plausible to call T mixing if
(T
n
AB) (A)(B) as n as it mixes the sets A and B after enough many applications
(think of a bartender making a cocktail). Notice, that one may translate (2.3) in Prop.2.2, that
ergodic transformations are mixing in average. There is an intermediate notion between ergodicity
and mixing, which turns out to be most useful for our purposes.
Denition 3.1. A measure preserving transformation is called weak mixing if for all A, B A
lim
N
1
N
N1
n=0
[(A T
n
B) (A)(B)[ (3.1)
Clearly if T is weak mixing then it is ergodic. Using the notation 1 for the constant 1 function,
one has
Proposition 3.1. The following are equivalent:
1. T is weak mixing.
2. For any pair of functions: f, g L
2
(X, A, )
lim
N
1
N
N1
n=0
[(U
n
T
f, g) (f, 1)(g, 1)[ = 0 (3.2)
3. For any f L
2
(X, A, ) such that (f, 1) = 0:
lim
N
1
N
N1
n=0
[(U
n
T
f, f)[ = 0 (3.3)
Proof. 1. 2. If f = 1
A
, g = 1
B
the (3.2) translates to (3.2). Taking linear combinations, (3.2)
holds for simple functions. Using the Lebesgue dominant convergence theorem for f
n
f, f
n
simple, and the Cauchy-Schwarz inequality, (3.2) follows. The other direction is obvious.
2. 3. Use the polarization identity
4(U
n
T
f, g) = (U
n
T
(f +g), f +g) +i(U
n
T
(f +ig), f +ig) (U
n
T
(f g), f g) i(U
n
T
(f ig), f ig)
The aim of this section is to show that weak mixing implies multiple recurrence. The key is a Hilbert
space version of a lemma due to van der Courput, originally invented to estimate exponential sums.
Lemma 3.1. (i) Let 1 H N and let x
1
, . . . , x
N
be elements of a Hilbert space, such that
|x
n
| 1 for all n. Then one has
8
A. MAGYAR
_
_
_
_
_
1
N
N1
n=0
x
n
_
_
_
_
_
2
4
H
H1
h=0
N1
n=0
x
n
, x
n+h
)
+
2H
N
(3.4)
where x, y) denotes the inner product of the vectors x and y.
(ii) Let x
n
be a bounded sequence of elements of a Hilbert space. For h N dene:
y
h
= limsup
N
[
1
N
N1
n=0
x
n
, x
n+h
)[
If lim
H
H1
h=0
y
h
= 0 , then
lim
N
|
1
N
N1
n=0
x
n
| = 0 (3.5)
Proof. Let z
n
= x
n
if 0 n < N and let z
n
= 0 for n < 0 or n N. Then
S
N
:=
1
N
N1
n=0
x
n
=
1
H
H1
l=0
nN
z
n+l
=
1
N
nN
1
H
H1
l=0
z
n+l
(3.6)
Note that w
n
:=
1
H
H1
l=0
z
n+l
= 0 unless H < n < N, thus by the Cauchy-Schwarz inequality
|S
N
|
2
N +H
N
2
nN
1
H
2
_
_
_
_
_
H1
l=0
z
n+l
_
_
_
_
_
2
2
N
nN
1
H
2
H1
l,k=0
z
n+l
, z
n+k
) (3.7)
Note that the inner sum on the right side of (3.7) is zero unless [l k[ < H, and for a xed h Z
the number of pairs (l, k) [0, H 1]
2
such that l k = h is equal to H [h[. Thus interchanging
the summation and integration, one obtains
|S
N
|
2
2
H
|h|H1
H [h[
H
1
N
nN
z
n
, z
n+h
)
(3.8)
Observe that z
n
, z
n+h
) = x
n
, x
n+h
) if 0 n N H thus
[
nN
z
n
, z
n+h
)
N1
n=1
x
n
, x
n+h
) [ H
Finally, since the inner sum on the right side of (3.8) is equal for h and h, estimate (3.4) follows.
ROTH THEOREM - THE ERGODIC APPROACH. 9
For part (ii), clearly one may assume |x
n
| 1 for all n. Let > 0 and let H
one has
1
H
H1
h=0
y
h
Fix such an H, and let N
H,
be such that for N N
H,
1
N
N1
n=0
x
n
, x
n+h
)
y
h
+
holds for all 0 h < H. If moreover N > 2H/ then the expression on the right side of (3.4) is
bounded by 10 and hence |S
N
|
2
10 for all N N
,H
and this proves the Lemma.
We need to make one more observation before proving the main result of this section.
Denition 3.2. Let A N be an innite set of natural numbers, and let > 0. We say that A
has natural density , if
lim
N
[A [1, N][
N
= (3.9)
Proposition 3.2. Let x
n
be a sequence of elements of a Hilbert space H. Then
lim
N
1
N
N1
n=0
|x
n
| = 0 (3.10)
if and only if for every > 0 the set X
= n : |x
n
| has natural density 0.
Proof. For xed > 0 one has
1
N
N1
n=0
|x
n
|
[X
[0, N)[
N
thus X
N1
n=0
|x
n
| , then
0 n < N : |x
n
|
2
2
N
10
A. MAGYAR
Theorem 3.1. (Multiple Recurrence for Weak Mixing Transformations)
Let (X, A, , T) be a measure preserving system, and assume that T is weak mixing. Then for
k N and for f
1
, f
2
, . . . , f
k
L
n=0
_
X
f
1
(T
n
x) f
2
(T
2n
x) . . . f
k
(T
kn
x) d(x) =
__
X
f
1
d
_ __
X
f
2
d
_
. . .
__
X
f
k
d
_
(3.11)
Proof. Writing U = U
T
for the shift, well prove the stronger statement
lim
N
_
_
_
_
_
1
N
N1
n=0
(U
n
f
1
) . . . (U
kn
f
k
) (
_
X
f
1
d) . . . (
_
X
f
k
d)
_
_
_
_
_
= 0 (3.12)
by induction on k.
First, observe that writing c
i
=
_
X
f
i
d and f
i
= g
i
+ c
i
one has that
i
f
i
i
g
i
is a sum of
(2
k
1) terms of products
i
h
i
where h
i
= g
i
for at least one value of i. Thus it is enough to
prove (3.12) in case when c
i
= 0 for at least one 1 i k.
For k = 1 the (3.12) is just the von Neumann ergodic theorem.
By induction, assume that (3.12) holds for k. We apply Lemma 3.1 for the sequence x
n
=
k
i=1
U
in
f
k
.
lim
N
1
N
N1
n=0
x
n
, x
n+h
) = lim
N
1
N
N1
n=0
_
X
_
k
i=1
U
in
f
i
__
k
i=1
U
i(n+h)
f
i
_
d
= lim
N
1
N
N1
n=0
_
X
(f
1
U
h
f
1
)
k
i=2
U
(i1)n
(f
i
U
ih
f
i
) d =
n
i=1
_
X
f
i
U
ih
f
i
d
where the last equality follows from the induction hypotheses. Since f
i
is bounded for each 1 i
k, it is enough to show that
lim
H
1
H
H1
h=0
_
X
(U
ih
f
i
) f
i
d
= 0
for at least one value of i, but this follows from the weak ergodicity of T
i
, if i is chosen such that
_
X
f
i
d = 0.
Corollary 3.2. Let f
i
= 1
A
i
be the indicator functions of the sets A
i (1ik)
of positive measure,
then for every > 0 the set of natural numbers n for which
ROTH THEOREM - THE ERGODIC APPROACH. 11
(T
n
A
1
T
2n
A
2
. . . T
kn
A
k
) (A
1
)(A
2
) . . . (A
k
) (3.13)
is of positive upper density.
Finally well need the fact that product of weak mixing transformations is also weak mixing. This
fact will also sharpen Corollary 3.2.
If T and S are measure preserving transformations on measure spaces (X, A, ) and (Y, , ) then
dene T S by T S(x, y) = (T(x), S(y)), which is measure preserving on the product space
(X Y, A , ).
If T is ergodic then T T is not necessarily ergodic, a simple example is to take X = 0, 1, 2,
(x) = 1/3, T(x) = x + 1 (mod 3). Then D = (x, x) : x = 0, 1, 2 is a non-trivial invariant set
w.r.t. T T.
Proposition 3.3. T is weak mixing on the space (X, A, ) if and only if T T is weak mixing on
the product space (X X, A A, ).
Proof. Suppose T is weak mixing. If A = C D, B = E F and S = T T, then (A
S
n
B) = (A T
n
E)(D T
n
F). For xed > 0 the set of natural numbers n, for which:
[(C T
n
E) (C)(E)[ > , as well as for which [(D T
n
F) (D)(F)[ > , has natural
density 0. Since the union of two sets of natural density 0 is also of natural density 0, (3.1) holds
for the above sets A and B. This extends immediately when A and B are nite disjoint union of
rectangular sets, and nally by approximation to any pair of A and B in the product -algebra.
The other direction follows by taking A
= AX, B
= B Y .
n=0
_
X
f
1
(T
n
x) f
2
(T
2n
x) . . . f
k
(T
kn
x) d(x)
__
X
f
1
d
_ __
X
f
2
d
_
. . .
__
X
f
k
d
_
= 0
(3.14)
Proof. It follows from Proposition 3.2, that
1
N
0n<N
[a
n
[ 0 if and only if
1
N
0n<N
[a
n
[
2
0.
Thus writing c
i
=
_
f
i
d and doing the decomposition f
i
= g
i
+c
i
as before, it is enough to show
that
1
N
N1
n=0
_
X
f
1
(T
n
x) f
2
(T
2n
x) . . . f
k
(T
kn
x) d(x)
2
0 (3.15)
12
A. MAGYAR
as N when at least one c
i
= 0. Expanding the square of the expression in (3.15) one obtains
1
N
N1
n=0
_
XX
(f
1
(T
n
x)f
1
(T
n
y))
_
f
2
(T
2n
x)f
2
(T
2n
y)
_
. . .
_
f
k
(T
kn
x)f
k
(T
kn
y)
_
d(x) d(y) 0
which follows from Theorem 3.1 and the facts that T T is weak mixing and
_
XX
f
i
(x)f
i
(y) d(x)d(y) = c
2
i
= 0
for at least one 1 i k.
This is an amazing property of weak mixing transformations, in fact it implies the following strong
form of multiple recurrence
Corollary 3.3. Let f
i
= 1
A
i
be the indicator functions of the sets A
i (1ik)
of positive measure,
then for every > 0 the set of natural numbers n for which
(T
n
A
1
T
2n
A
2
. . . T
kn
A
k
) (A
1
)(A
2
) . . . (A
k
)
(3.16)
is of natural density 1.
4. Roth Theorem.
We prove double recurrence for ergodic systems below. This extends to arbitrary measure preserving
systems, by using a general structural theorem which states that any system can be decomposed into
ergodic components. Roth theorem follows then from the Furstenberg correspondence principle
discussed earlier.
Well give the full proofs for the ergodic case, and only sketch the extension as it is quite standard
(but long and technical). The key idea, due to von Neumann and Koopman, is to decompose the
system into a weak mixing and a compact part, and establish multiple recurrence in both cases.
Denition 4.1. An (invertible) measure preserving system (X, A, , T) is called compact, if U
n
T
f :
n Z is pre-compact in H = L
2
(X, A, ) for every f H.
Recall, that a set S H is pre-compact or totally bounded if S can be covered by nitely many
balls of radius , for every > 0.
Proposition 4.1. Suppose the system (X, A, , T) is compact. Then for every k N, and every
A A such that (A) > 0, one has
ROTH THEOREM - THE ERGODIC APPROACH. 13
liminf
N
1
N
N
n=1
(A T
n
A . . . T
kn
A) > 0 (4.1)
Proof. Let =
1
k
2
and cover the set T
n
(1
A
) : n Z by balls B
1
, . . . , B
m
of radius . We can think
of coloring the integer n with color i if T
n
(1
A
) B
i
. If N > 2m then there is a monochromatic set
S
N
[1, N] such that [S
N
[ N/m. Let h ,= l both in S
N
. The letting n = h l, one has
(A T
n
A) = (AT
n
A)/2 = |T
h
1
A
T
l
1
A
| <
(A)
k
2
By induction on i = 1, . . . , k one has (A T
n
A) <
i(A)
k
2
thus
(A T
n
A . . . T
kn
A) = (A
k
i=1
(AT
in
A))
(A)
k
i=1
i(A)
k
2
>
(A)
2
Since [S
N
[ N/m the number of such ns is at least N/m1 > N/2m thus the expression on the
left side of (4.1) is at least (A)/2m > 0 for all N > 2m. This proves the Proposition.
A typical system is neither weak mixing nor compact. However L
2
(X, A, ) has a compact portion
spanned by the eigenfunctions of U
T
and a weak mixing portion corresponding to the continuous
spectrum of the operator U
T
. In fact this decomposition is most transparent using the spectral
theorem (and above spectral characterizations), but we will obtain it by mostly elementary means.
The only tool from functional analysis we use is the compactness of the integral operators with L
2
kernels, to be described below.
Let K L
2
(X X, A A, ) and dene the operator / : L
2
(X, A, ) L
2
(X, A, ) by
/f(x) =
_
X
K(x, y)f(y) d(y)
In fact one denes this operator rst for simple functions, and extends it to bounded linear operator
by continuity, using the Cauchy-Schwarz inequality. By approximating the kernel K(x, y) by a
simple functions K
(x, y) =
p,q
i,j=1
1
A
i
(x)1
B
j
(y), one approximates the operator / by ones which
have nite dimensional range, thus / maps the unit ball into a totally bounded set. This shows
that / is a compact operator (see the details in the exercises).
Theorem 4.1. (Koopman - von Neumann) Let (X, A, , T) be an invertible measure preserving
system. Put
H
c
= f L
2
(X, A, ); U
n
T
f : n Z is pre-compact (4.2)
and let
H
wm
= g L
2
(X, A, ); lim
N
1
N
N1
n=0
_
X
f (U
n
T
g) d
= 0 for all f L
2
(X, A, ) (4.3)
Then : L
2
(X, A, ) = H
c
H
wm
, in fact H
wm
= H
c
.
14
A. MAGYAR
To start we show
Proposition 4.2. Let K L
2
(XX, A A, ) be a T T invariant kernel (i.e. K(Tx, Ty) =
K(x, y) for a.e. x, y). Then /f H
c
for every f L
2
(X, A, ).
Proof. One may assume |f|
2
1, and let U = U
T
. Then for n Z
U
n
(/f)(x) = /f(T
n
x) =
_
X
K(T
n
x, y)f(y) d(y) =
_
X
K(T
n
x, T
n
z)f(T
n
z) d(z)
=
_
X
K(x, z)f(T
n
z) d(y) = /(U
n
f)(x)
This shows that the orbit: U
n
(/f) : n Z is contained in the image of the unit ball under the
map / and hence is totally bounded.
Proof. (Koopman - von Neumann) First we show that H
c
H
wm
= 0. Indeed, assume that f ,= 0
and f H
c
. Let 0 < <
1
2
|f|/2. By the pre-compactness of the orbit U
n
f; n Z, there exist
g
1
, . . . , g
m
such that for every n: 1 i m for which
|U
n
f g
i
| hence [ < U
n
f, g
i
> [
1
2
|U
n
f|
2
=
1
2
|f|
2
Thus for all n
m
i=1
[ < U
n
f, g
i
> [ |f|
2
2
/2
which implies that
lim
N
1
N
N
n=1
[ < U
n
f, g
i
> [ = 0
cannot happen for all 1 i m, so f / H
wm
.
Next, we prove that H
c
H
wm
. Let f H
c
, then < f, /g >= 0 if / L
2
( ) is
T T-invariant and g L
2
(), by Proposition 4.2. Choose g = f, then
< /f, f >=< /, f
f >= 0 , for all T T-invariant: / L
2
( )
Thus by the von Neumann Ergodic Theorem, one has for every g L
2
()
0 = lim
N
1
N
N
n=1
< (U U)
n
(f
f) , g g > = lim
N
1
N
N
n=1
[ < U
n
f , g > [
2
which shows that f satises (4.3) so f H
wm
, and this nishes the proof of the theorem.
ROTH THEOREM - THE ERGODIC APPROACH. 15
This decomposition result combined with the the multiple recurrence properties of weakly mixing
transformations, quickly yields to a proof of Roth theorem for ergodic transformations.
Theorem 4.2. Let (X, A, , T) be an ergodic invertible measure preserving system, and let A X
such that (X) > 0. Then
liminf
N
1
N
N
n=1
(A T
n
A T
2n
A) > 0
Proof. Write 1
A
= f +g, where f H
c
and g H
wm
. One has
1
N
N
n=1
(A T
n
A T
2n
A) =
1
N
N
n=1
_
(f +g) U
n
(f +g) U
2n
(f +g) d (4.4)
Expanding the product we get 8 terms. A slight modication of the proof of Proposition 4.1 shows
that
liminf
N
1
N
N
n=1
_
X
f (U
n
f) (U
2n
f) d > 0 (4.5)
if f H
c
, f 0 point-wise and
_
x
f d > 0. It is clear that 1
A
/ H
wm
as (4.3) cannot hold for the
functions 1
A
and 1
X
, hence f ,= 0. Since |f h| |f
+
h
+
| where f
+
denotes the positive part
of the function f, it is easy to see that f 0 point-wise (and similarly f 1), being the closest
function to 1
A
in L
2
-norm.
We argue now, that all the 7 other terms in (4.4) are in fact zero. First we show that
1
N
N
n=1
(U
n
f) (U
2n
g) ,
1
N
N
n=1
(U
n
g) (U
2n
f) ,
1
N
N
n=1
(U
n
f) (U
2n
g)
all converge to zero in norm. This will eliminate 6 of the 7 remaining terms. Since the proofs are
similar we will handle just the rst. It is again based on van der Courputs lemma. Indeed, let
x
n
= U
n
f U
2n
g . Then
lim
N
1
N
N
n=1
< x
n
, x
n+h
>= lim
N
1
N
N
n=1
_
X
(U
n
f) (U
2n
g) (U
n+h
f) (U
2n+2h
g) d
= lim
N
1
N
N
n=1
_
X
(fU
n
f) U
n
(gU
2h
g) d =
__
f(U
h
f) d
___
g (U
2h
g) d
_
where in the last line we used the ergodicity of T. Since g H
wm
and f 1:
lim
H
H1
h=0
__
f U
h
f du
___
g U
2h
g
_
lim
H
H1
h=0
_
g U
2h
g d
= 0
16
A. MAGYAR
For the last remaining term we note that
_
g (U
n
f) (U
2n
f) d =
_
f (U
n
f) (U
2n
g) du
and that
1
N
N
n=1
(U
n
f) (U
2n
g) 0 in norm by the same argument as above, and the theorem
is proved.