0% found this document useful (0 votes)
61 views16 pages

Ergodic Roth

The document discusses the Furstenberg correspondence principle, which translates problems in density Ramsey theory to the setting of measure-preserving systems, giving rise to the field of ergodic Ramsey theory. It introduces measure-preserving systems and defines the upper Banach density of a set. It then constructs a measure-preserving system (X,A,μ,T) associated with a set E of positive upper density, where X is the space of maps from integers to {0,1}, A is the Borel σ-algebra, and μ is a measure on X defined using the upper density of E. This system satisfies the correspondence principle, relating the measure of intersections of inverse images of a set A to the upper

Uploaded by

darkksoul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views16 pages

Ergodic Roth

The document discusses the Furstenberg correspondence principle, which translates problems in density Ramsey theory to the setting of measure-preserving systems, giving rise to the field of ergodic Ramsey theory. It introduces measure-preserving systems and defines the upper Banach density of a set. It then constructs a measure-preserving system (X,A,μ,T) associated with a set E of positive upper density, where X is the space of maps from integers to {0,1}, A is the Borel σ-algebra, and μ is a measure on X defined using the upper density of E. This system satisfies the correspondence principle, relating the measure of intersections of inverse images of a set A to the upper

Uploaded by

darkksoul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

ROTH THEOREM - THE ERGODIC APPROACH.

A. MAGYAR
1. The F urstenberg correspondence principle.
The basic object of study here is that of a measure preserving system. This consist of a probability
measure space (X, A, ) ((X) = 1), and a measure preserving transformation: T : X X
(T
1
(A)) = (A) for all A A
where T
1
(A) denotes the inverse image of the set A. It was a brilliant observation of F urstenberg,
that problems in density Ramsey theory (i.e. statements about sets of integers or integer points of
positive upper density) can be translated to the settings of measure preserving systems, giving rise
to a new and rich eld called ergodic Ramsey theory. The basic tool for this translation is the
so-called F urstenberg correspondence principle which we describe below.
First, let us mention a simple observation, that Szemeredis theorem implies a multiple recurrence
type result for measurable subsets: A A, (A) = > 0. For given k N, let N = N(k, ) such
that if E [1, N] with [E[ N then E contains an AP (arithmetic progression) of length k. Now
let A
i
= T
i
(A) for 1 i n. Then, if 1
B
stands for the indicator function of a set B, then it is
clear that
N =
N

i=1
_
X
1
A
i
(x) d(x) =
_
X
N

i=1
1
A
i
(x) d(x)
hence there must be an x X such that

i
1
A
i
(x) N, that is [E(x)[ N where E(x) =
i [1, N] : x A
i
= T
i
(A). Thus E(x) contains a k-term AP: m,m+d,. . . ,m+(k-1)d, and it
follows
T
m
(A) T
md
(A) . . . T
m(k1)d
(A) ,= (1.1)
as the intersection contains the point x. Note that the argument in fact also gives that, the set of x
for which [E(x)[ N must have positive measure, and since there are only nitely many possible
sets E(x), there be a set E [1, N] with [E[ N such that x X : E(x) = E has positive
measure. Choosing again a k-term AP from E one obtains that for every set A of positive measure
(A T
d
(A) . . . T
(k1)d
(A)) = (T
m
(A) T
md
(A) . . . T
m(k1)d
(A)) > 0 (1.2)
for some d N. This is essentially the so-called multiple recurrence theorem of F urstenberg.
It may be viewed, as a generalization of Poincares recurrence theorem (one of the rst results in
ergodic theory), which states that if (A) > 0 then (A T
d
(A)) > 0 for some d N. To see
1
2

A. MAGYAR
this, assume indirectly that (A T
n
(A)) = 0 for all 1 n M = [(A)
1
] + 1. But, then for
all 1 n < m M one has (T
m
(A) T
n
(A)) = 0, as T is measure preserving, hence
(
nM
T
n
(A)) = M(A) > 1 = (X)
which is not possible.
We turn now to the construction of a measure preserving system, associated to a set of positive
upper (Banach) density E Z. This will require some basic facts from measure theory and
topology, whose proof we will outline at the end of this note.
Let X = 0, 1
Z
, that is the set of maps: x : Z 0, 1, and dene the metric d(x, y) = 2
n
if n = min[m[ : x(m) ,= y(m). It is easy to see that a basis of the topology is given by the
cylindrical sets:
C

1
,...,
k
m
1
,...,m
k
= x X : x(m
i
) =
i
, 1 i k
where m
i
N is a nite set of integers and
i
0, 1 (by convention C

= X. Note that the


family of nite union of cylindrical sets, denoted by (, forms an algebra (i.e closed under taking
complements, unions and intersections) and hence consists of both open and closed sets. the -
algebra generated by them A is called the Borel sets on X (every family of sets is contained a
minimal -algebra, namely the intersection of all -algebras containing all the sets). The key
point, is to construct a measure on X, given a set E Z of positive upper density.
Recall the upper Banach density of E is dened to be
d

(E) = limsup
N
[E [M, M +N 1][
N
(1.3)
that is, there exists a sequence of intervals I
j
= [M
j
, M
j
+N
j
1] such that
d

(E) = lim
j
[E [M
j
, M
j
+N
j
1][
N
j
(1.4)
and d

(E) is an upper bound for the limit on the right side of (4), for any sequence of intervals.
Let x
E
X be the indicator function of the set E, T : X X denote the shift operator, i.e.
Tx(n) = x(n + 1) and let A = x X : x(0) = 1. Note that were doing a typical duality
construction, sets become points of X, and points (i.e integers) become subsets of X, p.e. A
corresponds to the point 0. Notice that
n E T
n
(x
E
) A
Thus
d

(E) = lim
j
1
[I
j
[

nI
j
1
A
(T
n
(x
E
) ) (1.5)
For every j N dene
j
: ( [0, 1] by

j
(C) =
1
[I
j
[

nI
j
1
C
(T
n
(x
E
) ) (1.6)
ROTH THEOREM - THE ERGODIC APPROACH. 3
Since the family ( is countable, it is easy to see that
Proposition 1.1. There exists a sequence j
s
N such that for all cylindrical set C (, there
exists the limit
(C) := lim
s

j
s
(C) (1.7)
Proof. Write ( = C
1
, C
2
, . . . as ( is countable. Choose a sequence j
1s
= j
11
, j
12
, . . . , j
1s
, . . .
such that
j
1s
(C
1
) converges as s . Then choose a subsequence j
2s
of j
1s
, such that

j
1s
(C
2
) converges as s . Similarly, construct the nested family of sequences, and nally
choose j
s
= j
ss
for s N. It is clear that for any n N, the sequence
j
s
(C
n
) converges, as for
s n the numbers j
s
= j
ss
form a subsequence of the sequence j
ns
.
The function : ( [0, 1] is dened via (7), and note that by (5) and (6):
(A) = d

(E) (1.8)
Clearly is an additive set function on the family (, which in fact implies that is also -additive
on (. Indeed, let C =

i=1
C
i
is a partition of a cylindrical set, then as it is closed and the space
(X, d) is compact (which is proved by the same diagonalization process as above, or follows from
Tychonos theorem), so C is in fact compact, and hence is a union of nitely many sets C
i
. It is
a well-known extension result in measure theory (due to Caratheodory)
Theorem: There exists a unique probability measure : A [0, 1], such that (C) = (C) for all
C (.
From now on (out of sloppiness) well denote the extended measure also by . Now it is easy to
show
Theorem 1.1. (F urstenberg Correspondence Principle) Let E Z with d

(E) > 0. Then there


exists a measure preserving system (X, A, , T) and a set A A, such that (A) = d

(E) and for


any nite set of integers m
1
, . . . , m
k
:
(T
m
1
(A) . . . T
m
k
(A)) d

((E m
1
) . . . (E m
k
)) (1.9)
Proof. Let X, A, T, and the set A as constructed above. Then
x T
m
1
(A) . . . T
m
k
(A) x(m
1
) = . . . = x(m
k
) = 1
hence by (6) one has for a xed j N:

j
(T
m
1
(A) . . . T
m
k
(A)) =
[n I
j
: T
n
(x
E
) (m
1
) = . . . = T
n
(x
E
) (m
k
) = 1[
[I
j
[
=
[n I
j
: n +m
1
E, . . . , n +m
k
E[
[I
j
[
=
[I
j
(E m
1
) . . . (E m
k
)[
[I
j
[
Finally, by denition:
lim
s
[I
j
s
(E m
1
) . . . (E m
k
)[
[I
j
s
[
d

((E m
1
) . . . (E m
k
))

4

A. MAGYAR
In particular, when m
i
= id, 0 i k1, the multiple recurrence theorem described in (2) implies
Szemeredis theorem, although with no quantitative version.
2. Basics - Ergodic Theory.
We introduce here some preliminary notions and basic results, especially ergodicity and weak-
mixing, which will be needed later in the proof of Furstenbergs double recurrence theorem.
To motivate our discussion, lets start with the (very) classical result due to H. Weyl. Let be an
irrational number, then the sequence n
nN
is uniformly distributed mod 1, that is on the torus
R/Z. This means that for any interval: I [0, 1] and x [0, 1], one has
lim
N
[0 n N 1 : x +n I[
N
= [I[
where [I[ denotes the length of the interval I. If one writes T

(x) = x + (mod 1), then T is a


measure preserving transformation, and T
n

(x) = x +n, hence the above statement translates to


the fact that the orbits O(x) = T
n

(x) are equi-distributed with respect to intervals I. This is a


special case of a general result to be described below.
Let (X, A, , T) (with (X) = 1) be a measure preserving system, where for our purposes we
can assume that T is invertible. Our rst object of study will be the distribution of the orbits:
O(x) = x, Tx, T
2
x, T
3
xldots of the points x X, where T
n
x = T(T
n1
x) is the image of the
point x after applying the transformation n times on it. Notice that if there exists a so-called
invariant set A A, for which T(A) = A (or equivalently T
1
(A) = A), then O(x) A for every
x A. Thus the orbits of points of A cannot be equi-distributed on the space X. This motivates
the
Denition 2.1. The transformation T is called ergodic, if T
1
(A) = A implies that (A) = 0 or
(A) = 1.
One can make this denition more exible, via the following
Proposition 2.1. The following are equivalent:
1. T is ergodic.
2. For any measurable function: f : X C, if f(x) = f(Tx) for a.e. x X then f(x) = c for
some constant c C for a.e. x X. In short, the only invariant functions are the constants.
3. For A A, if (AT
1
(A)) = 0 then (A) = 0 or (A) = 1.
Proof. 1. 2. Indeed, suppose f(Tx) = f(x) almost everywhere, that is except for x B with
(B) = 0 .Then consider the T invariant set C =

nZ
T
n
(B), and note that (C) = 0 as
well. Since f(x) = f(Tx) for all x XC, by redening f(x) = 0 for all x C, we have that
f(x) = f(Tx) for all x X.
Then the sets X
c
= x X : f(x) < c are all invariant under T thus each has measure either
1 or 0. Since

cR
U
c
= and

cR
U
c
= X, there must be c
1
< c
2
such that (U
c
1
) = 0 and
ROTH THEOREM - THE ERGODIC APPROACH. 5
(U
c
2
) = 1. Thus there exists c

= sup
(U
c
)=0
c. Clearly, x : f(x) (c

, c

+ ] = 1 for
all > 0 hence (x : f(x) = c

) = 1.
2. 3. Let f = 1
A
, and note that f T = 1
T
1
(A)
.
3. 1. This is obvious.

An example of an ergodic transformation is the irrational rotation T

. The so-called pointwise


ergodic theorem, due to Birkho, states that the orbits O(x) (of almost every point x) are equi-
distributed in the sense, that for a measurable set A, one has
lim
N
[0 n N 1 : T
n
x A[
N
= (A)
However, we will only need a weaker version, called the mean ergodic theorem due to von Neumann.
The starting point (observed by Koopman), is that one may assign a unitary operator U
T
on the
Hilbert space of square integrable functions H = L
2
(X, A, ), simply by dening
(U
T
f) (x) = f(Tx)
Indeed, then
(U
T
f, U
T
g) =
_
X
f(Tx) g(Tx) d(x) =
_
X
f(y) g(y) d(y) = (f, g)
by making a change of variables y = Tx, and using the fact that T is measure preserving.
Now let H
T
= f H : U
T
f = f be the subspace of functions invariant under U
T
, and let P
T
denote the orthogonal projection to H
T
. We state the so-called Mean (or L
2
) Ergodic Theorem.
Theorem 2.1. (von Neumann) Let T be an invertible measure preserving transformation, then for
every f H, one has
lim
N
1
N
N1

n=0
U
n
T
f P
T
f (2.1)
where the convergence is understood in the L
2
-norm (i.e. f
n
f if |f
n
f|
2
0).
Moreover, if T is ergodic, then the function P
T
f is constant and is equal to:
_
X
f d.
Proof. Let M = g Ug : g H where U = U
T
. Note that M is a linear subspace, and let

M
denote its closure. Now let h be in its orthogonal complement

M

, that is
(h, Ug g) = 0 for all g H
This means
(h, Ug) = (U
1
h, g) = (h, g) for all g H
Thus U
1
h = h or equivalently h H
T
. Thus

M

= H
T
hence H =

M

H
T
.
6

A. MAGYAR
Write f = k +h with k

M and h H
T
, and note that h = P
T
f. If S
N
= 1/N

N1
n=0
U
n
, then
clearly S
N
h = h for all N. On the other hand
S
N
(Ug g) =
1
N
(U
N
g g)
thus |S
N
(Ug g)|
2
2|g|
2
/N 0 as N . For any > 0 one may write k = (Ug g) + e
where |e| < as k

M. Then,
|S
N
k|
2
|S
N
(Ug g)|
2
+|S
N
k|
2

2|g|
2
N
+ 2
for N N

. Here we used the fact that |U


n
k|
2
= |k|
2
and hence |S
N
k|
2
|k|
2
. This implies
that S
N
k 0 as N thus S
N
f P
T
f.
For the second part, we remark that if T is ergodic then H
T
consists only of constant functions.
Also if f = c, then c =
_
X
S
N
f d =
_
X
f d.
We will also need dierent characterizations of ergodicity.
Proposition 2.2. Let (X, A , T) be an invertible measure preserving system. The following are
equivalent:
1. T is ergodic.
2. 1 is a simple eigenvalue of the unitary operator U
T
.
3. For all f, g L
2
(X, A, )
lim
N
1
N
N1

n=0
_
X
g (U
n
T
f) d =
__
X
f d
_ __
X
g d
_
(2.2)
4. For all A, B A
lim
N
1
N
N1

n=0
(A T
n
B) = (A)(B) (2.3)
Proof. 1. 2. This is just a reformulation of Prop. 2.1, as 1 is a simple eigenvalue of U
T
means,
that the only functions f L
2
(X, A, ) for which U
T
f = f (that is f T = f), are the constant
functions f = c1
X
.
1. 3. Using the notation of Theorem 2.1, the left side of (2.2) is (S
N
f, g), where ( , ) denotes
the inner product, thus using the notation c
f
=
_
X
f d, one has
lim
N
(S
N
f, g) = c
f
(1
X
, g) = c
f
c
g
which is the content of (2.2)
3. 4. Let f = 1
A
, g = 1
B
be the indicator functions of the sets A, B A. Then (2.2) translates
to (2.3).
4. 1. Let A be an invariant measurable set and let B = X/A. Then (AT
n
(B)) = (AB) = 0
for all n N, thus by (2.3) it follows that (A)(B) = 0 and hence (A) = 0 or (A) = 1.
ROTH THEOREM - THE ERGODIC APPROACH. 7
3. Weak Mixing.
Let (X, A, , T) be a measure preserving system and let A, B A. It is plausible to call T mixing if
(T
n
AB) (A)(B) as n as it mixes the sets A and B after enough many applications
(think of a bartender making a cocktail). Notice, that one may translate (2.3) in Prop.2.2, that
ergodic transformations are mixing in average. There is an intermediate notion between ergodicity
and mixing, which turns out to be most useful for our purposes.
Denition 3.1. A measure preserving transformation is called weak mixing if for all A, B A
lim
N
1
N
N1

n=0
[(A T
n
B) (A)(B)[ (3.1)
Clearly if T is weak mixing then it is ergodic. Using the notation 1 for the constant 1 function,
one has
Proposition 3.1. The following are equivalent:
1. T is weak mixing.
2. For any pair of functions: f, g L
2
(X, A, )
lim
N
1
N
N1

n=0
[(U
n
T
f, g) (f, 1)(g, 1)[ = 0 (3.2)
3. For any f L
2
(X, A, ) such that (f, 1) = 0:
lim
N
1
N
N1

n=0
[(U
n
T
f, f)[ = 0 (3.3)
Proof. 1. 2. If f = 1
A
, g = 1
B
the (3.2) translates to (3.2). Taking linear combinations, (3.2)
holds for simple functions. Using the Lebesgue dominant convergence theorem for f
n
f, f
n
simple, and the Cauchy-Schwarz inequality, (3.2) follows. The other direction is obvious.
2. 3. Use the polarization identity
4(U
n
T
f, g) = (U
n
T
(f +g), f +g) +i(U
n
T
(f +ig), f +ig) (U
n
T
(f g), f g) i(U
n
T
(f ig), f ig)

The aim of this section is to show that weak mixing implies multiple recurrence. The key is a Hilbert
space version of a lemma due to van der Courput, originally invented to estimate exponential sums.
Lemma 3.1. (i) Let 1 H N and let x
1
, . . . , x
N
be elements of a Hilbert space, such that
|x
n
| 1 for all n. Then one has
8

A. MAGYAR
_
_
_
_
_
1
N
N1

n=0
x
n
_
_
_
_
_
2

4
H
H1

h=0

N1

n=0
x
n
, x
n+h
)

+
2H
N
(3.4)
where x, y) denotes the inner product of the vectors x and y.
(ii) Let x
n
be a bounded sequence of elements of a Hilbert space. For h N dene:
y
h
= limsup
N
[
1
N
N1

n=0
x
n
, x
n+h
)[
If lim
H

H1
h=0
y
h
= 0 , then
lim
N
|
1
N
N1

n=0
x
n
| = 0 (3.5)
Proof. Let z
n
= x
n
if 0 n < N and let z
n
= 0 for n < 0 or n N. Then
S
N
:=
1
N
N1

n=0
x
n
=
1
H
H1

l=0

nN
z
n+l
=
1
N

nN
1
H
H1

l=0
z
n+l
(3.6)
Note that w
n
:=
1
H

H1
l=0
z
n+l
= 0 unless H < n < N, thus by the Cauchy-Schwarz inequality
|S
N
|
2

N +H
N
2

nN
1
H
2
_
_
_
_
_
H1

l=0
z
n+l
_
_
_
_
_
2

2
N

nN
1
H
2
H1

l,k=0
z
n+l
, z
n+k
) (3.7)
Note that the inner sum on the right side of (3.7) is zero unless [l k[ < H, and for a xed h Z
the number of pairs (l, k) [0, H 1]
2
such that l k = h is equal to H [h[. Thus interchanging
the summation and integration, one obtains
|S
N
|
2

2
H

|h|H1
H [h[
H

1
N

nN
z
n
, z
n+h
)

(3.8)
Observe that z
n
, z
n+h
) = x
n
, x
n+h
) if 0 n N H thus
[

nN
z
n
, z
n+h
)
N1

n=1
x
n
, x
n+h
) [ H
Finally, since the inner sum on the right side of (3.8) is equal for h and h, estimate (3.4) follows.
ROTH THEOREM - THE ERGODIC APPROACH. 9
For part (ii), clearly one may assume |x
n
| 1 for all n. Let > 0 and let H

such that for H H

one has
1
H
H1

h=0
y
h

Fix such an H, and let N
H,
be such that for N N
H,

1
N
N1

n=0
x
n
, x
n+h
)

y
h
+
holds for all 0 h < H. If moreover N > 2H/ then the expression on the right side of (3.4) is
bounded by 10 and hence |S
N
|
2
10 for all N N
,H
and this proves the Lemma.

We need to make one more observation before proving the main result of this section.
Denition 3.2. Let A N be an innite set of natural numbers, and let > 0. We say that A
has natural density , if
lim
N
[A [1, N][
N
= (3.9)
Proposition 3.2. Let x
n
be a sequence of elements of a Hilbert space H. Then
lim
N
1
N
N1

n=0
|x
n
| = 0 (3.10)
if and only if for every > 0 the set X

= n : |x
n
| has natural density 0.
Proof. For xed > 0 one has
1
N
N1

n=0
|x
n
|
[X

[0, N)[
N
thus X

has natural density 0. For the other direction note that if


1
N

N1
n=0
|x
n
| , then

0 n < N : |x
n
|

2



2
N

Corollary 3.1. If T is weak mixing then so is T


i
for every i N.
Proof. If T
i
is not weak mixing then there is a pair of sets A, B such that: [(T
in
A B
(A)(B)[ > for a set n X of positive upper density, for some > 0. Then the set:
iX = in : n X is also of positive upper density which contradicts the fact that T is weak
mixing.

10

A. MAGYAR
Theorem 3.1. (Multiple Recurrence for Weak Mixing Transformations)
Let (X, A, , T) be a measure preserving system, and assume that T is weak mixing. Then for
k N and for f
1
, f
2
, . . . , f
k
L

(X, A, ) one has


lim
N
1
N
N1

n=0
_
X
f
1
(T
n
x) f
2
(T
2n
x) . . . f
k
(T
kn
x) d(x) =
__
X
f
1
d
_ __
X
f
2
d
_
. . .
__
X
f
k
d
_
(3.11)
Proof. Writing U = U
T
for the shift, well prove the stronger statement
lim
N
_
_
_
_
_
1
N
N1

n=0
(U
n
f
1
) . . . (U
kn
f
k
) (
_
X
f
1
d) . . . (
_
X
f
k
d)
_
_
_
_
_
= 0 (3.12)
by induction on k.
First, observe that writing c
i
=
_
X
f
i
d and f
i
= g
i
+ c
i
one has that

i
f
i

i
g
i
is a sum of
(2
k
1) terms of products

i
h
i
where h
i
= g
i
for at least one value of i. Thus it is enough to
prove (3.12) in case when c
i
= 0 for at least one 1 i k.
For k = 1 the (3.12) is just the von Neumann ergodic theorem.
By induction, assume that (3.12) holds for k. We apply Lemma 3.1 for the sequence x
n
=

k
i=1
U
in
f
k
.
lim
N
1
N
N1

n=0
x
n
, x
n+h
) = lim
N
1
N
N1

n=0
_
X
_
k

i=1
U
in
f
i
__
k

i=1
U
i(n+h)
f
i
_
d
= lim
N
1
N
N1

n=0
_
X
(f
1
U
h
f
1
)
k

i=2
U
(i1)n
(f
i
U
ih
f
i
) d =
n

i=1
_
X
f
i
U
ih
f
i
d
where the last equality follows from the induction hypotheses. Since f
i
is bounded for each 1 i
k, it is enough to show that
lim
H
1
H
H1

h=0

_
X
(U
ih
f
i
) f
i
d

= 0
for at least one value of i, but this follows from the weak ergodicity of T
i
, if i is chosen such that
_
X
f
i
d = 0.
Corollary 3.2. Let f
i
= 1
A
i
be the indicator functions of the sets A
i (1ik)
of positive measure,
then for every > 0 the set of natural numbers n for which
ROTH THEOREM - THE ERGODIC APPROACH. 11
(T
n
A
1
T
2n
A
2
. . . T
kn
A
k
) (A
1
)(A
2
) . . . (A
k
) (3.13)
is of positive upper density.
Finally well need the fact that product of weak mixing transformations is also weak mixing. This
fact will also sharpen Corollary 3.2.
If T and S are measure preserving transformations on measure spaces (X, A, ) and (Y, , ) then
dene T S by T S(x, y) = (T(x), S(y)), which is measure preserving on the product space
(X Y, A , ).
If T is ergodic then T T is not necessarily ergodic, a simple example is to take X = 0, 1, 2,
(x) = 1/3, T(x) = x + 1 (mod 3). Then D = (x, x) : x = 0, 1, 2 is a non-trivial invariant set
w.r.t. T T.
Proposition 3.3. T is weak mixing on the space (X, A, ) if and only if T T is weak mixing on
the product space (X X, A A, ).
Proof. Suppose T is weak mixing. If A = C D, B = E F and S = T T, then (A
S
n
B) = (A T
n
E)(D T
n
F). For xed > 0 the set of natural numbers n, for which:
[(C T
n
E) (C)(E)[ > , as well as for which [(D T
n
F) (D)(F)[ > , has natural
density 0. Since the union of two sets of natural density 0 is also of natural density 0, (3.1) holds
for the above sets A and B. This extends immediately when A and B are nite disjoint union of
rectangular sets, and nally by approximation to any pair of A and B in the product -algebra.
The other direction follows by taking A

= AX, B

= B Y .

Theorem 3.2. (Weak Mixing of order k)


Let (X, A, , T) be a measure preserving system, and assume that T is weak mixing. Then for
k N and for f
1
, f
2
, . . . , f
k
L

(X, A, ) one has


lim
N
1
N
N1

n=0

_
X
f
1
(T
n
x) f
2
(T
2n
x) . . . f
k
(T
kn
x) d(x)
__
X
f
1
d
_ __
X
f
2
d
_
. . .
__
X
f
k
d
_

= 0
(3.14)
Proof. It follows from Proposition 3.2, that
1
N

0n<N
[a
n
[ 0 if and only if
1
N

0n<N
[a
n
[
2
0.
Thus writing c
i
=
_
f
i
d and doing the decomposition f
i
= g
i
+c
i
as before, it is enough to show
that
1
N
N1

n=0

_
X
f
1
(T
n
x) f
2
(T
2n
x) . . . f
k
(T
kn
x) d(x)

2
0 (3.15)
12

A. MAGYAR
as N when at least one c
i
= 0. Expanding the square of the expression in (3.15) one obtains
1
N
N1

n=0
_
XX
(f
1
(T
n
x)f
1
(T
n
y))
_
f
2
(T
2n
x)f
2
(T
2n
y)
_
. . .
_
f
k
(T
kn
x)f
k
(T
kn
y)
_
d(x) d(y) 0
which follows from Theorem 3.1 and the facts that T T is weak mixing and
_
XX
f
i
(x)f
i
(y) d(x)d(y) = c
2
i
= 0
for at least one 1 i k.

This is an amazing property of weak mixing transformations, in fact it implies the following strong
form of multiple recurrence
Corollary 3.3. Let f
i
= 1
A
i
be the indicator functions of the sets A
i (1ik)
of positive measure,
then for every > 0 the set of natural numbers n for which

(T
n
A
1
T
2n
A
2
. . . T
kn
A
k
) (A
1
)(A
2
) . . . (A
k
)

(3.16)
is of natural density 1.
4. Roth Theorem.
We prove double recurrence for ergodic systems below. This extends to arbitrary measure preserving
systems, by using a general structural theorem which states that any system can be decomposed into
ergodic components. Roth theorem follows then from the Furstenberg correspondence principle
discussed earlier.
Well give the full proofs for the ergodic case, and only sketch the extension as it is quite standard
(but long and technical). The key idea, due to von Neumann and Koopman, is to decompose the
system into a weak mixing and a compact part, and establish multiple recurrence in both cases.
Denition 4.1. An (invertible) measure preserving system (X, A, , T) is called compact, if U
n
T
f :
n Z is pre-compact in H = L
2
(X, A, ) for every f H.
Recall, that a set S H is pre-compact or totally bounded if S can be covered by nitely many
balls of radius , for every > 0.
Proposition 4.1. Suppose the system (X, A, , T) is compact. Then for every k N, and every
A A such that (A) > 0, one has
ROTH THEOREM - THE ERGODIC APPROACH. 13
liminf
N
1
N
N

n=1
(A T
n
A . . . T
kn
A) > 0 (4.1)
Proof. Let =
1
k
2
and cover the set T
n
(1
A
) : n Z by balls B
1
, . . . , B
m
of radius . We can think
of coloring the integer n with color i if T
n
(1
A
) B
i
. If N > 2m then there is a monochromatic set
S
N
[1, N] such that [S
N
[ N/m. Let h ,= l both in S
N
. The letting n = h l, one has
(A T
n
A) = (AT
n
A)/2 = |T
h
1
A
T
l
1
A
| <
(A)
k
2
By induction on i = 1, . . . , k one has (A T
n
A) <
i(A)
k
2
thus
(A T
n
A . . . T
kn
A) = (A
k
i=1
(AT
in
A))
(A)
k

i=1
i(A)
k
2
>
(A)
2
Since [S
N
[ N/m the number of such ns is at least N/m1 > N/2m thus the expression on the
left side of (4.1) is at least (A)/2m > 0 for all N > 2m. This proves the Proposition.
A typical system is neither weak mixing nor compact. However L
2
(X, A, ) has a compact portion
spanned by the eigenfunctions of U
T
and a weak mixing portion corresponding to the continuous
spectrum of the operator U
T
. In fact this decomposition is most transparent using the spectral
theorem (and above spectral characterizations), but we will obtain it by mostly elementary means.
The only tool from functional analysis we use is the compactness of the integral operators with L
2
kernels, to be described below.
Let K L
2
(X X, A A, ) and dene the operator / : L
2
(X, A, ) L
2
(X, A, ) by
/f(x) =
_
X
K(x, y)f(y) d(y)
In fact one denes this operator rst for simple functions, and extends it to bounded linear operator
by continuity, using the Cauchy-Schwarz inequality. By approximating the kernel K(x, y) by a
simple functions K

(x, y) =

p,q
i,j=1
1
A
i
(x)1
B
j
(y), one approximates the operator / by ones which
have nite dimensional range, thus / maps the unit ball into a totally bounded set. This shows
that / is a compact operator (see the details in the exercises).
Theorem 4.1. (Koopman - von Neumann) Let (X, A, , T) be an invertible measure preserving
system. Put
H
c
= f L
2
(X, A, ); U
n
T
f : n Z is pre-compact (4.2)
and let
H
wm
= g L
2
(X, A, ); lim
N
1
N
N1

n=0

_
X
f (U
n
T
g) d

= 0 for all f L
2
(X, A, ) (4.3)
Then : L
2
(X, A, ) = H
c

H
wm
, in fact H
wm
= H

c
.
14

A. MAGYAR
To start we show
Proposition 4.2. Let K L
2
(XX, A A, ) be a T T invariant kernel (i.e. K(Tx, Ty) =
K(x, y) for a.e. x, y). Then /f H
c
for every f L
2
(X, A, ).
Proof. One may assume |f|
2
1, and let U = U
T
. Then for n Z
U
n
(/f)(x) = /f(T
n
x) =
_
X
K(T
n
x, y)f(y) d(y) =
_
X
K(T
n
x, T
n
z)f(T
n
z) d(z)
=
_
X
K(x, z)f(T
n
z) d(y) = /(U
n
f)(x)
This shows that the orbit: U
n
(/f) : n Z is contained in the image of the unit ball under the
map / and hence is totally bounded.
Proof. (Koopman - von Neumann) First we show that H
c
H
wm
= 0. Indeed, assume that f ,= 0
and f H
c
. Let 0 < <
1
2
|f|/2. By the pre-compactness of the orbit U
n
f; n Z, there exist
g
1
, . . . , g
m
such that for every n: 1 i m for which
|U
n
f g
i
| hence [ < U
n
f, g
i
> [
1
2
|U
n
f|
2
=
1
2
|f|
2
Thus for all n
m

i=1
[ < U
n
f, g
i
> [ |f|
2
2
/2
which implies that
lim
N
1
N
N

n=1
[ < U
n
f, g
i
> [ = 0
cannot happen for all 1 i m, so f / H
wm
.
Next, we prove that H

c
H
wm
. Let f H

c
, then < f, /g >= 0 if / L
2
( ) is
T T-invariant and g L
2
(), by Proposition 4.2. Choose g = f, then
< /f, f >=< /, f

f >= 0 , for all T T-invariant: / L
2
( )
Thus by the von Neumann Ergodic Theorem, one has for every g L
2
()
0 = lim
N
1
N
N

n=1
< (U U)
n
(f

f) , g g > = lim
N
1
N
N

n=1
[ < U
n
f , g > [
2
which shows that f satises (4.3) so f H
wm
, and this nishes the proof of the theorem.
ROTH THEOREM - THE ERGODIC APPROACH. 15
This decomposition result combined with the the multiple recurrence properties of weakly mixing
transformations, quickly yields to a proof of Roth theorem for ergodic transformations.
Theorem 4.2. Let (X, A, , T) be an ergodic invertible measure preserving system, and let A X
such that (X) > 0. Then
liminf
N
1
N
N

n=1
(A T
n
A T
2n
A) > 0
Proof. Write 1
A
= f +g, where f H
c
and g H
wm
. One has
1
N
N

n=1
(A T
n
A T
2n
A) =
1
N
N

n=1
_
(f +g) U
n
(f +g) U
2n
(f +g) d (4.4)
Expanding the product we get 8 terms. A slight modication of the proof of Proposition 4.1 shows
that
liminf
N
1
N
N

n=1
_
X
f (U
n
f) (U
2n
f) d > 0 (4.5)
if f H
c
, f 0 point-wise and
_
x
f d > 0. It is clear that 1
A
/ H
wm
as (4.3) cannot hold for the
functions 1
A
and 1
X
, hence f ,= 0. Since |f h| |f
+
h
+
| where f
+
denotes the positive part
of the function f, it is easy to see that f 0 point-wise (and similarly f 1), being the closest
function to 1
A
in L
2
-norm.
We argue now, that all the 7 other terms in (4.4) are in fact zero. First we show that
1
N
N

n=1
(U
n
f) (U
2n
g) ,
1
N
N

n=1
(U
n
g) (U
2n
f) ,
1
N
N

n=1
(U
n
f) (U
2n
g)
all converge to zero in norm. This will eliminate 6 of the 7 remaining terms. Since the proofs are
similar we will handle just the rst. It is again based on van der Courputs lemma. Indeed, let
x
n
= U
n
f U
2n
g . Then
lim
N
1
N
N

n=1
< x
n
, x
n+h
>= lim
N
1
N
N

n=1
_
X
(U
n
f) (U
2n
g) (U
n+h
f) (U
2n+2h
g) d
= lim
N
1
N
N

n=1
_
X
(fU
n
f) U
n
(gU
2h
g) d =
__
f(U
h
f) d
___
g (U
2h
g) d
_
where in the last line we used the ergodicity of T. Since g H
wm
and f 1:
lim
H
H1

h=0

__
f U
h
f du
___
g U
2h
g
_

lim
H
H1

h=0

_
g U
2h
g d

= 0
16

A. MAGYAR
For the last remaining term we note that
_
g (U
n
f) (U
2n
f) d =
_
f (U
n
f) (U
2n
g) du
and that
1
N

N
n=1
(U
n
f) (U
2n
g) 0 in norm by the same argument as above, and the theorem
is proved.

You might also like