0% found this document useful (0 votes)
105 views64 pages

Ergodic Theory Intro

This document is an introduction to smooth ergodic theory by Stefano Luzzatto of the Abdus Salam International Centre for Theoretical Physics. It covers topics such as dynamical systems, topological structures, fundamental examples, probabilistic structures, invariant measures, ergodic measures, unique ergodicity, full branch maps, distortion, physical measures, inducing, the quadratic family, and mixing measures. The document contains chapter outlines and is intended as a draft for a textbook on smooth ergodic theory.

Uploaded by

Mike Audric
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views64 pages

Ergodic Theory Intro

This document is an introduction to smooth ergodic theory by Stefano Luzzatto of the Abdus Salam International Centre for Theoretical Physics. It covers topics such as dynamical systems, topological structures, fundamental examples, probabilistic structures, invariant measures, ergodic measures, unique ergodicity, full branch maps, distortion, physical measures, inducing, the quadratic family, and mixing measures. The document contains chapter outlines and is intended as a draft for a textbook on smooth ergodic theory.

Uploaded by

Mike Audric
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Introduction to Smooth Ergodic Theory

Stefano Luzzatto
Abdus Salam International Centre for Theoretical Physics,
[email protected] http://www.ictp.it/~luzzatto
DRAFT: version May 15, 2020

Contents
1 Introduction 3
1.1 Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Topological structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Fundamental examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Probabilistic structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Invariant Measures 12
2.1 Poincaré Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Convergence of time averages . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Existence of invariant measures . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Ergodic Measures 19
3.1 Birkhoff’s Ergodic Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Basins of attraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Ergodic decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Unique Ergodicity 23
4.1 Uniform convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Circle rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Benford’s distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Full Branch Maps 28


5.1 Invariance and Ergodicity of Lebesgue measure . . . . . . . . . . . . . . . 28
5.2 Normal numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.3 Uncountably many non-atomic ergodic measures . . . . . . . . . . . . . . . 31

1
6 Distortion 35
6.1 The Gauss map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.2 Bounded distortion implies ergodicity . . . . . . . . . . . . . . . . . . . . . 36
6.3 Sufficient conditions for bounded distortion . . . . . . . . . . . . . . . . . . 38

7 Physical measures 40

8 Inducing 43
8.1 Spreading the measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.2 Intermittency maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.3 Lyapunov exponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

9 The quadratic family 51


9.1 The Ulam-Von Neumann map . . . . . . . . . . . . . . . . . . . . . . . . . 51
9.2 The Quadratic Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

10 Mixing Measures 55
10.1 Mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
10.2 Decay of correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

A Review of measure theory 59


A.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
A.2 Basic motivation: Positive measure Cantor sets . . . . . . . . . . . . . . . 59
A.3 Non-measurable sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
A.4 Algebras and sigma-algebras . . . . . . . . . . . . . . . . . . . . . . . . . . 60
A.5 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
A.6 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
A.7 Lebesgue density theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
A.8 Absolutely continuous and singular measures . . . . . . . . . . . . . . . . . 63

2
1 Introduction
1.1 Dynamical Systems
Let M be a set. A transformation of the space M is simply a map f : M → M . The
transformation f is said to be invertible if f is a bijection and therefore its inverse f −1 can
be defined so that f ◦ f −1 = f −1 ◦ f =Identity. Any two transformations f, g : M → M can
be applied in sequence to give the composition g◦f : M → M defined by g◦f (x) = g(f (x)).

Definition 1. An invertible Dynamical System on M is a group of transformations of


the space M under composition. A non-invertible Dynamical System is a semi-group of
transformations of the space M under composition.

Example 1. The simplest abstract example of a Dynamical System is given by an arbitrary


map f : M → M . We can then define the family of maps {f n }n∈N where we define
f 0 =Identity and f n is the n0 th iterate of the map f defined inductively by the relation
f n = f ◦ f n−1 Then it is easy to see that this family forms a semi-group of transformations
of M since it is closed under composition, f n+m = f n ◦ f m , and f 0 is the identity or neutral
element of the semi-group. If f is invertible, then the inverse f −1 is defined and so are its
iterates f −n = f −1 ◦ f −(n−1) and therefore we can define the family {f n }n∈Z which is easily
seen to be a group under composition. These dynamical systems are sometimes referred
to as dynamical systems with discrete time because they are parametrised by the sets N
and Z which are discrete.
Remark 1. Other semigroups and groups of transformations arise naturally in different
settings. For example the flow defined by an Ordinary Differential Equation is a family
{ϕt }t∈R which forms a group under composition and is therefore an example of a family of
transformations in continuous time. Even more general families can be studied for example
with complex time, parametrised by C. or other parametrised by more abstract groups.
In these notes we will focus on the case of dynamical systems in discrete time since there
are some very explicit examples and they already contain an extremely rich variety of
structures.
In the study of Dynamical Systems we often think of M as a “phase space” of possible
states of the system, and the dynamical system as the ”evolution” of the system in time.
The most basic and fundamental notion in the theory of dynamical system is that of the
orbit or trajectory of a point or initial condition x0 under the action of the system.

Definition 2. For any x ∈ M we define the forward orbit or forward trajectory of x by

O+ (x) := {f n (x)}n∈N

If the dynamical system is invertible we define the full orbit or full trajectory by

O(x) := {f n (x)}n∈Z .

3
The notion of an orbit formalises the evolution of an “initial condition” x ∈ M . Notice
that the orbit of x is just a sequence of points where x0 = x, x1 = f (x), x2 := f 2 (x) =
f (f (x)) = f ◦ f (x) and generally xn := f n (x) = f ◦ · · · ◦ f (x) given by the n’th composition
of the map f with itself.
Remark 2. If f is not invertible, the inverse f −1 is not well defined as a map, though for
any n ≥ 1 we can still define the sets

f −n (x) := {y ∈ M : f n (y) = x}

and call this set the n0 th preimage of the point x.

1.2 Topological structures


One of the main goals of the theory of Dynamical Systems can be formulated as the
description and classification of the structures associated to dynamical systems and in
particular the study of the orbits of dynamical systems. The very simplest, and perhaps
one of the most important kinds of orbits is the following.

Definition 3. Let f : M → M be a map. x ∈ M is a fixed point for f if

f (x) = x.

If x is a fixed point for f , then it is easy to see that it is a fixed point for every forward
iterate of f and therefore a fixed point for the dynamical system generated by f . In this
case then the forward orbit reduces to the point x, i.e. O+ (x) = {x}.

Definition 4. x ∈ M is a periodic point for f if there exists k > 0 such that

f k (x) = x.

The minimal k > 0 for which the above condition holds is called the minimal period of x.

If x is a periodic point with minimal period k then the forward orbit of the point x is
just the finite set O+ (x) = {x, f (x), ..., f k−1 (x)}.
Remark 3. Notice that a fixed point is just a special case of a periodic orbit with k = 1
and that any periodic orbit with period k is also a periodic orbit with period any multiple
of k.
Fixed and periodic orbits are very natural structures and a first approach to the study
of dynamical systems is to study the existence of fixed and periodic orbits. Such orbits
however generally do not exhaust all the possible structures in the system and we need
some more sophisticated tools and concepts. If the orbit of x is not periodic, then O+ (x)
is a countable set and we need some additional structure on M to describe it. If M is a
topological space then we can talk about the accumulation points of the orbit of x which
describe in some sense the ”asymptotic” behaviour of the orbit of x.

4
Definition 5. The omega-limit set of a point x ∈ M is

ω(x) := {y : f nj (x) → y for some sequence nj → ∞}.

If f is invertible, the alpha-limit set of a point x ∈ M is

α(x) := {y : f −nj (x) → y for some sequence nj → ∞}.

The case in which ω(x) is the whole space is a special and quite important situation.
Thus we make the following definitions.

Definition 6. Let M be a topological space and f : M → M a map. We say that the


orbit of x is dense in M if ω(x) = M . We say that f is transitive if there exists a point
x ∈ M with a dense orbit. We say that f is minimal if every point x ∈ M has a dense
orbit.

1.3 Fundamental examples


We give here three basic examples which we will study in more detail during this course.

1.3.1 Contracting maps


Let M be a metric space with metric d(·, ·).

Definition 7. A map f : M → M is a contraction if there exists λ ∈ [0, 1) such that

d(f (x)), f (y)) ≤ λd(x, y)

for all x, y ∈ M .

For contractions we have the following well known result which we formulate here using
the notions introduced above.

Proposition 1.1 (Contraction Mapping Theorem). Let M be a complete metric space and
f : M → M a contraction. Then there exists a unique fixed point p ∈ M and ω(x) = {p}
for all x ∈ M .

Sketch of proof. Let x ∈ M be an arbitrary point.

1. The forward orbit O+ (x) = {f n (x)}n∈N of x forms a Cauchy sequence.

2. Any accumulation point of O+ (x) is a fixed point.

3. A contraction can have at most one fixed point.

Therefore every orbit converges to a fixed point and since this point is unique all orbits
converge to the same fixed point.

5
This result therefore completely describes the asymptotic behaviour of all initial condi-
tions from a topological point of view. The fixed point p has the property that it attracts
all points in the space. In more general situations we may have the existence of fixed
points which do not attract every point but only some points in the space. We formalise
this notion as follows.
Definition 8. Let p be a fixed point. The (topological) basin of attraction of p is Bp := {x :
ω(x) = {p}}. The point p is a locally attracting fixed point if Bp contains a neighbourhood
of p and a globally attracting fixed point if Bp is the whole space.
Remark 4. The notion of an attracting fixed point can be generalized to periodic points.
If p = p0 is a periodic point of prime period n with periodic orbit P = {p0 , p1 , ..., pn−1 } we
can define the (topological) basin of attraction as BP = {x : ω(x) = P } and say that the
orbit is attracting if BP contains a neighbourhood U of P . Notice that in this case, the
basin will contain neighbourhoods Ui of each point pi ∈ P made of points whose forward
orbits converge to the forward orbit of pi .

1.3.2 Circle rotations


Let S1 = R/Z denote the unit circle. For any α ∈ R we define the map fα : S1 → S1 by
fα (x) = x + α.
Then the iterates of fα have the form
fαn (x) = x + nα
and the dynamics of fα depends very much on the parameter α.
Proposition 1.2. The dynamics of fα exhibits the following dichotomy.
1. α is rational if and only if every x is periodic;
2. α is irrational if and only if every x is dense.
Sketch of proof. 1. If α = p/q is rational we have fαq (x) = x + qα = x + p = x mod 1
and so every point is periodic (of the same period q). Conversely, if x is periodic,
there exists some q such that f n (x) = x + qα = x mod 1 and so qα = 0 mod 1 and
so α must be of the form p/q.
2. If there exists even one point x with dense orbit, then the orbit is not period and so
α cannot be rational and so must be irrational. Conversely, suppose α is irrational
and let x be an arbitrary point. To show that the orbit of x is dense, carry out the
following steps.
(a) Let  > 0 and cover the circle S1 with a finite number of arcs of length ≤ . Since
α is irrational, the orbit of x cannot be periodic and so is infinite. Therefore there
must be at least one such arc which contains at least two point fαm (x), fαn (x) of
the orbit of x.

6
(b) The map fαn−m is a circle rotation fδ for some δ ≤ .
(c) The orbit of x under fδ is therefore -dense.
(d) Every point on the orbit of x under fδ is also on the orbit of x under fα and
therefore the orbit of x under fα is also  dense.
(e)  is arbitrary and so the orbit of x is dense.

1.3.3 Expanding maps


Let κ ∈ N, κ ≥ 2 and define fκ : S1 → S1 by

fκ (x) = κx.

For simplicity we consider the case κ = 10 and let f = f10 because the dynamics can then
be studied very explicitly using the decimal representation of real numbers. The other
cases all have very similar dynamics as can be seen by using the base κ representation of
real numbers. Several properties of these maps are established in the exercises.

1.4 Probabilistic structures


The topological description of the dynamics described above provides a lot of useful and
interesting information but in some cases also misses certain key features of the systems
under consideration. More specifically, except for the special case of fixed or periodic
orbits, it does not contain any information about the frequency with which a given orbit
visits specific regions of the space. We introduce here some concepts and terminology to
formalize this notion.

1.4.1 Probability measures


If M is a metric space then there is also a well defined Borel sigma-algebra and we let

M := {µ : µ is a Borel probability measure on M }.

Example 2. The simplest example of a Borel measure is the Dirac-δ measure δx on a point
x ∈ M defined by (
1 if x ∈ A
δx (A) =
0 if x ∈
/ A.
for any Borel measurable set A ⊆ M . In particular M =6 ∅ since it contains for example all
Dirac-δ measures. If M has some additional structure, such as that of a Riemannian man-
ifold, then it also contains the normalised volume which we generally refer to as Lebesgue
measure.

7
1.4.2 Time averages
If M is a metric space and f : M → M is a map, then for any x ∈ M we can define the
sequence of probability measures
n−1
1X
µn (x) = δf i (x) . (1)
n i=0

The measure µn ∈ M is then just a uniform distribution of mass on the first n points along
the orbit. A natural question is whether this sequence converges and to what it converges
and what the is the dynamical meaning of this convergence. To study R this question,
R recall
that by definition of the weak-star topology, µn → µ if and only if ϕdµn → ϕdµ for all
ϕ ∈ C 0 (M, R). In the particular case in which the sequence µn is given by the form above,
we have
Z n−1
! n−1 Z n−1 Z n−1 Z
1X 1X 1X i 1X
ϕd δf i (x) = ϕdδf i (x) = ϕ(f (x)) = ϕ ◦ f i (x).
n i=0 n i=0 n i=0 n i=0

Thus the convergence of the probability measure µn to a probability measure µ is equivalent


to the convergence of the terms
n−1 Z
1X
ϕ ◦ f i (x) (2)
n i=0
R
to the average ϕdµ when ϕ is a continuous function. The terms (2) are sometimes
called
R the time averages of the function ϕ along the orbit of the point x and the integral
ϕdµ is sometimes called the space average of the function ϕ with respect to µ and so the
convergence of one to the other is sometimes referred to as the time averages converging
to the space average.
The behaviour of time averages constitutes the main object of these notes and one of
the main objectives is to give conditions which guarantee their convergence.

1.4.3 Physical measures


If the time averages µn (x) along the orbit of a particular point x converge to some measure
µ, it is natural to ask whether there exists any other point y, not belonging to the orbit of
x, whose time averages also converge to the same measure µ. Thus we define the ”basin
of attraction” of a probability measure.

Definition 9. For µ ∈ M let


( n−1
) ( n−1 Z Z )
1X 1X
Bµ := x : δf i (x) → µ = x : ϕ ◦ f i (x) → ϕdµ ∀ ϕ ∈ C 0
n i=0 n i=0

8
It is easy to check that distinct measures must have disjoint basins (Exercise 13).
Moreover, any point for which the time averages converge clearly belongs to the basin of
some measure, thus one approach to the problem of studying time averages is to consider
measures in M and their basins. In general the basin of most measures in M will be
empty. In some cases, the basin of a measure may be just a single point (Exercise 4) and
in other cases it may be the whole space (Exercise 3). It is natural to be interested in
measures with “large” basins since these measures describe the asymptotic distribution of
orbits for a large set of points.

Definition 10. A probability measure µ ∈ M is called a physical measure if

Leb(Bµ ) > 0.

We can then ask the following questions.

Which dynamical systems admit a physical measure?

and

How many physical measures do they have?

It turns out that these are extremely challenging problems that have not yet been solved
in general. There are examples of systems which do not admit any physical measures
(Exercise 4) and examples of systems which admit countably many physical measures.
However it is generally believed that these are somehow “exceptional” cases.

Conjecture 1 (Palis conjecture). Most systems have a finite number of physical measures
such that the union of their basins has full Lebesgue measure.

In these notes we give an introduction to some of the results and techniques which have
been developed in this direction.

9
1.5 Exercises
1.5.1 Fixed and periodic points
Exercise 1. Let f be invertible and suppose that x is a fixed point. Show that x is a
fixed point for f −1 and conclude that O(x) = {x}. Show that if f is not invertible then
x ∈ f −1 (x) but f −1 (x) may also contain a point y 6= x.
Exercise 2. Show that the forward orbit O+ (x) of a point x is a finite set if and only if x
is a periodic or pre-periodic orbit.

1.5.2 Omega limits


Exercise 3. Show that if x is a fixed point, then ω(x) = O+ (x) = {x} and, more generally,
if x is a periodic point of period k then ω(x) = O+ (x).
Exercise 4. If f is continuous and ω(x) = {p} then p is a fixed point.
Exercise 5. Show that if M is compact then ω(x) 6= ∅ for all x ∈ M . Moreover ω(x) is
compact and forward invariant (y ∈ ω(x) implies f (y) ∈ ω(x)).

1.5.3 PIecewise expanding maps


Exercise 6. Show that f has an periodic orbits of any given period. Show that f is
transitive.
Exercise 7. Consider the map f (x) = 10x mod 1. Characterize all fixed, periodic and
pre-periodic points of f in terms of their decimal representation. Show in particular that
periodic points are dense in [0, 1]. Show that f is transitive, i.e. it has dense orbit. Show
that it has an infinite number of distinct dense orbits. Show that it has orbits which are
neither (pre-)periodic nor dense.
Exercise 8. Consider the questions in the previous exercise for the map f (x) = 2x mod 1,
but this time using binary instead of decimal representation.
Exercise 9. Further generalize the previous two questions establishing similar properties for
the map f (x) = κx mod 1 for an arbitrarry positive integer κ ≥ 2, using the representation
of real numbers in base κ.

1.5.4 Convergence of time averages


Exercise 10. If x is a periodic point, then µn (x) converges to the uniform distribution of
Dirac delta measures on the points of the periodic orbit. In particular if x is a fixed point,
then µn (x) → δx .
Exercise 11. If ω(x) = {p} then µn (x) → δp . In particular, if f : M → M is a contraction
mapping on a complete metric space, them µn (x) → δp for every x ∈ M .
Exercise 12. Let f (x) = 10x mod 1.
1. Find a point x such that ω(x) = [0, 1] and µn (x) → δ0 .

10
2. Find a point x such that ω(x) = [0, 1] and µn (x) → (δ0 + δ1/3 )/2.

3. Find a point x such that µn (x) does not converge.

Exercise 13. Show that if µ, ν ∈ M and µ 6= ν then Bmu ∩ Bν = ∅.


Example 3. If f : M → M is a contraction on a complete metric space with fixed point p,
then δp is the only measure which has non-empty basin and Bδp = M .
Example 4. If f (x) = x is the identity map, there are an uncountable number of measures
with non-empty basin, namely every Dirac-delta measure δx has a non-empty basin with
Bδx = {δx }. In particular the identity map has no physical measure.

1.5.5 Harder questions


Exercise 14. * Show that the conclusions above in fact hold for any piecewise affine interval
map with a finite number of branches.

11
2 Invariant Measures
The purpose of this section is to introduce a subset of the space M of probability measures
on M consisting of measures which have particular properties which have some non trivial
implications for the dynamics. For many definition or results we just need to the set M to
be equipped with a sigma-algebra B and the map f : M → M to be measurable. However
in some cases we will assume some additional structure and properties. We always let M
denote the space of probability measure defined on the sigma-algebra B and assume that
µ ∈ M.

Definition 11. µ is invariant if µ(f −1 (A)) = µ(A) for all A ∈ B.

Exercise 15. Show that if f is invertible then then µ is invariant if and only if µ(f (A)) =
µ(A). Find an example of a non-invertible map and a measure µ for which the two condi-
tions are not equivalent.
We give a few simple examples of invariant measures and then prove a result on the
existence of invariant measures and their dynamical implications.
Example 5. Let X be a measure space and f : X → X a measurable map. Suppose
f (p) = p. Then the Dirac measure δp is invariant (Exercise 16).
Example 6. An immediate generalization is the case of a measure concentrated on a periodic
orbit {p1 , . . . , pn } each of which carries some proportion ρ1 , . . . , ρn of the total mass, with
ρ1 + · · · + ρn = 1. Then, we can define a measure δP by letting
X
δP (A) := ρ i δp i . (3)
i:pi ∈A

Then δP is invariant if and only if ρi = 1/n for every i = 1, . . . , n (Exercise 17).


Example 7. Let f (x) = x be the identity map. Then every probability measure is invariant.
Example 8. Let f (x) = x+α mod 1 be a circle rotation. Then Lebesgue measure is invari-
ant since a circle rotation is essentially a translation and Lebegue measure is translation-
invariant. If α is rational then every point is periodic and thus f admits also other invariant
measures given by the Dirac-delta measure defined in (3) above. This example shows that
in general a map might admit many invariant measures.
Example 9. Let I = [0, 1], κ ≥ 2 an integer, and let f (x) = κx mod 1. Then Lebesgue
measure is invariant (Exercise 18). Notice that f also has an infinite number of periodic
orbits and thus also has an infinite number of invariant measures.

2.1 Poincaré Recurrence


Historically, the first use of the notion of an invariant measure is due to Poincaré who
noticed the remarkable fact that it implies recurrence.

12
Theorem (Poincaré Recurrence Theorem, 1890). Let µ be an invariant probability measure
and A a measurable set with µ(A) > 0. Then for µ-a.e. point x ∈ A there exists τ > 0
such that f τ (x) ∈ A.
Proof. Let
A0 = {x ∈ A : f n (x) ∈
/ A for all n ≥ 1}.
Then it is sufficient to show that µ(A0 ) = 0. For every n ≥ 0, let An = f −n (A0 ) denote
the preimages of A0 . We claim that all these preimages are disjoint, i.e. An ∩ Am = ∅ for
all m, n ≥ 0 with m 6= n. Indeed, supppose by contradiction that there exists n > m ≥ 0
and x ∈ An ∩ Am . This implies
f n (x) ∈ f n (An ∩ Am ) = f n (f −n (A0 ) ∩ f −m (A0 )) = A0 ∩ f n−m (A0 )
But this implies A0 ∩ f n−m (A0 ) 6= ∅ which contradicts the definition of A0 and this proves
disjointness of the sets An . From the invariance of the measure µ we have µ(An ) = µ(A)
for every n ≥ 1 and therefore

[ ∞
X ∞
X
1 = µ(X) ≥ µ( An ) = µ(An ) = µ(A0 ).
n=1 n=1 n=1

Assuming µ(A) > 0 would lead to a contradiction since the sum on the right hand side
would be infinite, and therefore we conclude that µ(A) = 0.
Remark 5. It does not follow immediately from the theorem that every point of A returns
to A infinitely often. To show that almost every point of A returns to A infinitely often
let A00 = {x ∈ A : there exists n ≥ 1 such that f k (x) ∈ / A for all k > n} denote the set
of points in A which return to A at most finitely many times. Again, we will show that
µ(A00 ) = 0. First of all let A00n = {x ∈ A : f n (x) ∈ A and f k (x) ∈/ A for all k > n} denote
the set of points which return to A for the last time after exactly n iterations. S∞Notice that
An are defined very differently than the An . Then A = A1 ∪ A2 ∪ A3 ∪ · · · = n=1 A00n . It is
00 0 00 00 00 00

therefore sufficient to show that for each n ≥ 1 we have µ(A00n ) = 0. To see this consider the
set f n (A00n ). By definition this set belongs to A and consists of points which never return
to A. Therefore µ(f n (A00n )) = 0. Moreover we have we clearly have A00n ⊆ f −n (f n (A00n ))
and therefore, using the invariance of the measure we have µ(A00n ) ≤ µ(f −n (f n (A00n ))) =
µ(f n (A00n )) = 0.

2.2 Convergence of time averages


We now come to one of the fundamental results of the theory which also constitutes the
main motivation for the notion of an invariant measure.
Theorem 1 (Birkhoff, 1931). Let µ be an f -invariant measure and ϕ ∈ L1 (µ). Then
n−1
1X
ϕf (x) := lim ϕ ◦ f i (x)
n→∞ n
k=0

exists for µ-almost every x.

13
For ψ ∈ L1 (µ) let ψµ  µ denote the measure which has density ψ with respect to µ,
i.e. ψ = dψµ/dµ is the Radon-Nykodym derivative of ψµ with respect to µ. Now let
I := {A ∈ B : f −1 (A) = A}
be the collection of fully invariant sets of B and notice that I is a sub-σ-algebra. Let
ψµ|I and µ|I denote the restrictions of these measures to I. Then clearly ψµ|I  µ|I and
therefore the Radon-Nykodim derivative
dψµ|I
ψI :=
dµ|I
exists. This is also called the conditional expectation of ψ with respect to I. The proof of
Theorem 1 follows easily from the following key technical statement.
Lemma 2.1. Suppose ψI < 0 (resp ψI > 0). Then
n−1 n−1
!
1X 1X
lim sup ψ ◦ f k (x) ≤ 0 resp. lim inf ψ ◦ f k (x) ≥ 0
n→∞ n n→∞ n
i=0 i=0

for µ almost every x.


Remark 6. The definition of ψI is somewhat abstract and it is not easy to have an intuition
for the difference between ψ and ψI . The main property which we will use is that ψI must
be constant along
S orbits,
S i.e. ψI ◦f = ψI . This follows by the fact that the full orbit of every
−n
point O(x) = n≥0 k≥0 f (f k (x)) is an “indecomposable” element of the sigma-algebra
I and thus ψI cannot take different values at distinct points of O(x).
Proof of Theorem 1 . For any  > 0 let ψ ± := ϕ − ϕI ± . Since (ϕI )I = ϕI we have
ψI± = ±. Thus, by Lemma 2.1 we have
n−1 n−1
1X 1X −
lim sup ϕ ◦ f k (x) − ϕI −  = lim sup ψ ◦ f k (x) ≤ 0
n→∞ n i=0
n→∞ n i=0

and
n−1 n−1
1X 1X +
lim inf ϕ ◦ f k − ϕI +  = lim inf ψ ◦ fk ≥ 0
n→∞ n n→∞ n
i=0 i=0
which imply, respectively,
n−1 n−1
1X 1X
lim sup ϕ ◦ f k (x) ≤ ϕI +  and lim inf ϕ ◦ f k ≥ ϕI − 
n→∞ n i=0
n→∞ n
i=0

for µ almost every x. Since  > 0 is arbitrary we get that the limit exists and
n−1
1X
ϕf := lim ϕ ◦ f k = ϕI (4)
n→∞ n
i=0

for µ almost every x.

14
Proof of Lemma 2.1. Let
( k−1 )
X
Ψn := max ψ ◦ fi and A := {x : Ψn → ∞}
k≤n
i=0

Then, for x ∈
/ A, Ψn is bounded above and therefore
n−1
1X Ψn
lim sup ψ ◦ f k ≤ lim sup ≤0
n→∞ n n→∞ n
i=0

So it is sufficient to show that µ(A) = 0. To see this, first compare the quantities
( k−1 ) ( k−1 ) ( k )
X X X
Ψn+1 = max ψ ◦ fi and Ψn ◦f = max ψ ◦ f i+1 = max ψ ◦ fi
1≤k≤n+1 1≤k≤n 1≤k≤n
i=0 i=0 i=1

The two sums are almost exactly the same except for the fact that Ψn+1 includes the
quantity ψ(x) and therefore we have
(
ψ + Ψn ◦ f if Ψn ◦ f > 0
Ψn+1 =
ψ if Ψn ◦ f < 0

We can write this as


Ψn+1 − Ψn ◦ f = ψ − min{0, Ψn ◦ f }
Then of course, A is forward and backward invariant, and this in particular A ∈ I and
also Ψn ◦ f → ∞ on A and therefore Ψn+1 − Ψn ◦ f ↓ ψ for all x ∈ A. Therefore, using the
invariance of µ, by the Dominated Convergence Theorem, we have
Z Z Z Z
Ψn+1 − Ψn dµ = Ψn+1 − Ψn ◦ f dµ → ψdµ = ψI dµI
A A A A

By definition we have Ψn+1 ≥ Ψn and therefore the integral on the right hand side is ≥ 0.
Thus if ψI < 0 this implies that µ(A) = µI (A) = 0. Replacing ψ by −ψ and repeating the
argument completes the proof.

2.3 Existence of invariant measures


We now prove a general result which gives conditions to guarantee that at least some
invariant measure exists. In fact we will take advantage of the topological structure on M
given by the weak-star topology and describe, for a certain class of systems, the structure
of the subset of invariant measure. Let

Mf := {µ ∈ M : µ is f -invariant}.

Then we have the following

15
Theorem 2 (Krylov-Boguliobov Theorem). Suppose M is a compact metric space and
f : M → M is continuous. Then Mf is non-empty, convex1 , compact.

We start with a key definition and some related results.

Definition 12 (Push-forward of measures). Let f − ∗ : M → M be the map from the


space of probability measures to itself, defined by

f∗ µ(A) := µ(f −1 (A)). (5)

We call f∗ µ the push-forward of µ by f .

It can be checked that this map is well defined (Exercise 20). It follows immediately
from the definition that µ is invariant if and only if f∗ µ = µ, i.e. if µ is a fixed point of f∗ .
We cannot however apply any general fixed point result, rather we will consider a sequence
in M and show that any limit point is invariant. For any µ ∈ M and any i ≥ 1 we also let

f∗i µ(A) := µ(f −i (A)).

We now prove some simple properties of the map f∗ .


R R
Lemma 2.2. For all ϕ ∈ L1 (µ) we have ϕd(f∗ µ) = ϕ ◦ f dµ.

Proof. First let ϕ = 1A be the characteristic function of some set A ⊆ X. Then


Z Z Z
−1
1A d(f∗ µ) = f∗ µ(A) = µ(f (A)) = 1f −1 (A) dµ = 1A ◦ f dµ.

The statement is therefore true for characteristic functions and thus follows for general
integrable functions by standard approximation arguments. More specifically, it follows
immediately that the result also holds if ϕ is a simple function (linear combination of
characteristic functions). For ϕ a non-negative integrable function, we use the fact that
every measurable function ϕ is the pointwise limit of a sequence ϕn of simple functions; if
f is non-negative then ϕn may be taken non-negative and the sequence {ϕn } may be taken
increasing. Then, the sequence {ϕn ◦ f } is clearly also an increasing sequence of simple
functionsR converging inRthis case to ϕ ◦R f . Therefore,R by the definition of Lebesgue integral
we have ϕn d(f∗ µ) → ϕd(f∗ µ) and ϕn ◦f dµ → R ϕ◦f dµ Since R we have already proved
the statement for simple functions we know that ϕn d(f∗ µ) = ϕn ◦ f dµ for every n and
therefore this gives the statement. For the general case we repeat the argument for positive
and negative parts of ϕ as usual.

Corollary 2.1. f∗ : M → M is continuous.


1
Recall that Mf is convex if given any µ0 , µ1 ∈ Mf , letting µt := tµ0 + (1 − t)µ1 for t ∈ [0, 1], then
µt ∈ Mf .

16
Proof. Consider a sequence µn → µ in M. Then, by Lemma 2.2, for any continuous
function ϕ : X → R we have
Z Z Z Z
ϕd(f∗ µn ) = ϕ ◦ f dµn → ϕ ◦ f dµ = ϕd(f∗ µ)

which means exactly that f∗ µn → f∗ µ which is the definition of continuity.


R R
Corollary 2.2. µ is invariant if and only if ϕ ◦ f dµ = ϕdµ for all ϕ : X → R cts.
Proof. Suppose first that µ is invariant, then the implication follow directly from Lemma
2.2. For the converse implication, we have that
Z Z Z
ϕdµ = ϕ ◦ f dµ = ϕdf∗ µ

for every continuous function ϕ : X → R. By the Riesz Representation Theorem, measures


correspond to linear functionals and therefore this can be restated as saying that µ(ϕ) =
f∗ µ(ϕ) for all continuous functions ϕ : X → R, and therefore µ and f∗ µ must coincide,
which is the definition of µ being invariant.
Proof of Theorem 2. Recall first of all that the space M of probability measures can be
identified with the unit ball of the space of functionals C ∗ (M ) dual to the space C 0 (M, R)
of continuous functions on M . The weak-star topology is exactly the weak topology on
the dual space and therefore, by the Banach-Alaoglu Theorem, M is weak-star compact
if M is compact. Our strategy therefore is to use the dynamics to define a sequence of
probability measures in M and show that any limit measure of this sequence is necessarily
invariant. For an arbitrary µ0 ∈ M we define, for every n ≥ 1,
n−1
1X i
µn = f µ0 . (6)
n i=0 ∗

Since each f∗i µ0 is a probability measure, the same is also true for µn . By compactness
of M there exists a measure µ ∈ M and a subsequence nj → ∞ with µnj → µ. By
the continuity of f∗ we have f∗ µnj → f∗ µ. and therefore it is sufficient to show that also
f∗ µnj → µ. We write
nj −1 nj −1 nj −1
! !
1 X i 1 X i+1 1 X i nj
f∗ µnj = f∗ f µ0 = f µ0 = f µ0 − µ0 + f ∗ µ0
nj i=0 ∗ nj i=0 ∗ nj i=0 ∗
nj −1 n n
1 X i µ0 f ∗ j µ0 µ0 f ∗ j µ0
= f ∗ µ0 − + = µnj + +
nj i=0 nj nj nj nj
Since the last two terms tend to 0 as j → ∞ this implies that f∗ µnj → µ and thus f∗ µ = µ
which implies that µ ∈ Mf . The convexity is an easy exercise. To show compactness, sup-
pose that µn is a sequence in Mf converging
R to some µ ∈ M.R Then, by Lemma 2.2R we have,
R any continuous function ϕ, that f ◦ ϕdµ = limn→∞ f ◦ ϕdµn = limn→∞ f dµn =
for
f dµ. Therefore, by Corollary 2.2, µ is invariant and so µ ∈ Mf .

17
2.4 Exercises
Exercise 16. Show that the Dirac-delta measure δp on a fixed point is invariant.
Exercise 17. Show that the measure 3 on a periodic orbit is invariant if and only if ρi = 1/n
for every i = 1, . . . , n.
Exercise 18. Let I = [0, 1], κ ≥ 2 an integer, and let f (x) = κx mod 1. Show that
Lebesgue measure is invariant.
Exercise 19. Let I = [0, 1] be a piecewise affine full branch map with an arbitrary finite or
countable number of branches. SAhow that Lebesgue measure is invariant.
Exercise 20. f∗ µ is a probability measure and so the map f∗ is well defined.
Exercise 21. Find an example of an infinite measure space (X̂, B̂, µ̂) and a measure-
preserving map f : X̂ → X̂ for which the conclusions of Poincare’s Recurrence Theorem
do not hold.

18
3 Ergodic Measures
We now introduce the second fundamental definition.

Definition 13. µ is ergodic if, for all A ∈ B, f −1 (A) = A and µ(A) > 0 implies µ(A) = 1.

The intuitive meaning of this definition is that the dynamics is “indecomposable”, at


least as far as the measure µ is concerned. The condition f −1 (A) = A is sometimes referred
to by saying that the set A ⊆ M is fully invariant. In non-invertible maps this is much
stronger than assuming forward invariance (Exercise 22).
In the rest of this section we discuss some dynamical consequences of ergodicity. In
later sections we address the problem of the existence of ergodic measures and the highly
non-trivial and important problems of verifying ergodicity for specific measures.

3.1 Birkhoff ’s Ergodic Theorem


The definitions of ergodicity and invariance are independent of each other but they both
come into their own when they are used together.

Theorem 3 (Birkhoff, 1931). Let M be a measure space, f : M → M a measurable map,


and µ an f -invariant ergodic probability measure. Then, for every ϕ ∈ L1 (µ) the limit
n−1 Z
1X
ϕf := lim ϕ ◦ f i (x) = ϕdµ
n→∞ n
i=0

for µ almost every x.

Remark 7. An immediate application of Birkhoff’s Ergodic Theorem gives that for any
ergodic invariant measure µ, and any Borel measurable set A, letting ϕ = 1A be the
characteristic function of A, we have
n Z
1 1X
lim #{1 ≤ j ≤ n : f j (x) ∈ A} = lim 1A (f i (x)) = 1A dµ = µ(A).
n→∞ n n→∞ n
i=1

This means that µ almost every point has the same asymptotic frequency of visits to the
set A and this frequence is exactly the probability of A.
The proof uses the following simple result.

Lemma 3.1. The following two conditions are equivalent:

1. µ is ergodic;

2. if ϕ ∈ L1 (µ) satisfies ϕ ◦ f = ϕ for µ almost every x then ϕ is constant a.e.

19
Proof. Suppose first that µ is ergodic and let ϕ ∈ L1 satisfy ϕ ◦ f = ϕ. Let

Xk,n := ϕ−1 ([k2−n , (k + 1)2−n )).

Since ϕ is measurable, the sets Xk,n are measurable. Moreover, since ϕ is constant along
orbits, the sets Xk,n are fully invariant a.e. and thus by ergodicity they have either zero or
full measure. Moreover, they are disjoint in n and their union is the whole of R and so for
each n there exists a unique kn such that µ(Xkn ,n ) = 1. Thus, letting Y = ∩n∈Z Xkn ,n we
have that µ(Y ) = 1 and ϕ is constant on Y . Thus ϕ is constant a.e..
Conversely, suppose that (2) holds and suppose that f −1 (A) = A. Let 1A denote the
characteristic function of A. Then clearly 1A ∈ L1 and 1A ◦ f = 1A and so we either have
1A = 0 a.e. or 1A = 1 a.e. which proves that µ(A) = 0 or 1.
Proof of Theorem 3. We use the notation used in the proof of Theorem 1. By definition
ϕf is invariant along orbits and so, by ergodicity of µ and Lemma 3.1, it follows that it
is constant a.e.R Moreover,
R by (4) inR the proof of Theorem 1 we have ϕf = ϕI a.e and
therefore ϕf = ϕf dµ = ϕI dµI = ϕdµ.

3.2 Basins of attraction


Birkhoff’s Ergodic Theorem allows us to give a first answer to the question of the existence
of non-empty basins of attraction for probability measures.

Corollary 3.1. If M is a compact Hausdorff space and µ an f -invariant ergodic probability


measure, then
µ(Bµ ) = 1.

Notice that this does not follow immediately from the previous statements since the set
of full measure for which the time averages converge to the space averages depends on the
function.
Proof. Since M is compact and Hausdorff we can choose a countable dense subset {ϕm }
of continuous functions and let A be the set of full measure such that the time averages
converge for every ϕm . Then, for any arbitrary continuous function ϕ, every x ∈ A, and
arbitrary  > 0, choose ϕm such that supx∈M |ϕ(x) − ϕm (x)| < . Then we have
n−1 n−1 n−1
1X 1X 1X
ϕ ◦ f i (x) = ϕm ◦ f i (x) + (ϕ ◦ f i (x) − ϕm ◦ f i (x))
n i=0 n i=0 n i=0

The first sum converges as n → ∞ and the second sum is bounded by  and therefore all
the limit points of the sequence on the left are within  of each other. Since  is arbitrary,
this implies that they converge.

20
3.3 Ergodic decomposition
We complete this section with a discussion on the existence and structure of the set of
ergodic measures. Since we will be mainly interested in measures that are both ergodic
and invariant, we consider this set of measures. Let

Ef := {µ ∈ Mf : µ is ergodic}.

Theorem 4. µ ∈ Ef if and only if µ is an extremal2 point of Mf .

Corollary 4.1. Let M be compact and f : M → M continuous. Then Ef 6= ∅ and there


exists a unique probability measure µ̂ on Mf such that µ(Ef ) = 1 and such that for all
µ ∈ Mf and for all continuous functions ϕ : M → R we have
Z Z Z 
ϕdµ = ϕdν dµ̂
M Ef M

The Corollary follows from standard abstract functional analytic results. More pre-
cisely, the fact that Ef 6= ∅ follows immediately from the Krein-Millman Theorem which
says that every compact convex subset (i.e. Mf ) of a locally convex topological vector
space (i.e. the set of all Borel measures on M ) is the closed convex hull of its extreme
elements and thus, in particular, the set Ef of extreme elements is non-empty. The decom-
position follows from Choquet’s Theorem which states exactly this decomposition result
for general non-empty compact convex sets.
Proof. Suppose first that µ is not ergodic, we will show that it cannot be an extremal
point. By the definition of ergodicity, if µ is not ergodic, then there exists a set A with

f −1 (A) = A, f −1 (Ac ) = Ac and µ(A) ∈ (0, 1).

Define two measures µ1 , µ2 by

µ(B ∩ A) µ(B ∩ Ac )
µ1 (B) = and µ2 (B) = .
µ(A) µ(Ac )

µ1 and µ2 are probability measures with µ1 (A) = 1, µ2 (Ac ) = 1, and µ can be written as

µ = µ(A)µ1 + µ(Ac )µ2

which is a linear combination of µ1 , µ2 :

µ = tµ1 + (1 − t)µ2 with t = µ(A) and 1 − t = µ(Ac ).


2
Recall that an extremal point of a convex set A is a point µ such that if µ = tµ0 + (1 − t)µ1 for
µ0 , µ1 ∈ Mf with µ0 6= µ1 then t = 0 or t = 1.

21
It just remains to show that µ1 , µ2 ∈ Mf , i.e. that they are invariant. Let B be an
arbitrary measurable set. Then, using the fact that µ is invariant by assumption and that
f −1 (A) = A we have

µ(f −1 (B) ∩ A) µ(f −1 (B) ∩ f −1 (A)) µ(f −1 (B ∩ A)) µ(B ∩ A)


µ1 (f −1 (B)) := = = = = µ1 (B)
µ(A) µ(A) µ(A) µ(A)

This shows that µ1 is invariant. The same calculation works for µ2 and so this completes
the proof in one direction.
Now suppose that µ is ergodic and suppose by contradiction that µ is not extremal so
that µ = tµ1 + (1 − t)µ2 for two invariant probability measures µ1 , µ2 and some t ∈ (0, 1).
We will show that µ1 = µ2 = µ, thus implying that µ is extremal. We will show that
µ1 = µ, the argument for µ2 is identical. Notice first of all that µ1  µ and therefore, by
the Radon-Nykodim Theorem,
R it has a density h1 := dµ1 /dµ such that for any measurable
set we have µ1 (A) = A h1 dµ. The statement that µ1 = µ is equivalent to the statement
that h1 = 1 µ-almost everywhere. To show this we define the sets

B := {x : h1 (x) < 1} and C := {x : h1 (x) > 1}

and will show that µ(B) = 0 and µ(C) = 0 implying the desired statement. We give the
details of the proof of µ(B) = 0, the argument to show that µ(C) = 0 is analogous. Firstly
Z Z Z
µ1 (B) = h1 dµ = h1 dµ + h1 dµ
B B∩f −1 (B) B\f −1 B

and Z Z Z
−1
µ1 (f B) = h1 dµ = h1 dµ + h1 dµ
f −1 B B∩f −1 (B) f −1 B\B

Since µ1 is invariant, µ1 (B) = µ1 (f −1 B) and therefore,


Z Z
h1 dµ = h1 dµ.
B\f −1 B f −1 B\B

Notice that

µ(f −1 B \ B) = µ(f −1 (B)) − µ(f −1 B ∩ B) = µ(B) − µ(f −1 B ∩ B) = µ(B \ f −1 B).

Since h1 < 1 on B \ f −1 B and and h1 ≥ 1 on f −1 B \ B and the value of the two integrals is
the same, we must have µ(B \f −1 B) = µ(f −1 B \B) = 0, which implies that f −1 B = B (up
to a set of measure zero). Since µ is ergodic we have µ(B) = 0 or µ(B) = 1. If µ(B) = 1
we would get Z Z
1 = µ1 (M ) = h1 dµ = h1 dµ < µ(B) = 1
M B

which is a contradiction. It follows that µ(B) = 0 and this concludes the proof.

22
3.4 Exercises
Exercise 22. Show that if A is fully invariant, letting Ac := M \ A denote the complement
of A, then f −1 (Ac ) = Ac and that both f (A) = A and f (Ac ) = Ac .
Exercise 23. Show that the Dirac-delta measure δp on a fixed point is ergodic.
Exercise 24. Show that the Dirac-delta measure δP on a periodic orbit is ergodic.
Example 10. Let f be the identity map. The only ergodic measures are the Dirac-delta
measures.
Example 11. Let f : [0, 1] → [0, 1] given by

.5 − 2x
 if 0 ≤ x < .25
f (x) = 2x − .5 if .25 ≤ x < .75

−2x + 2.5 if .75 ≤ x ≤ 1

Show that Lebesgue measure is invariant but not ergodic.

4 Unique Ergodicity
In general a given map may have many invariant measures. However there are certain
special, but important, examples of maps which have a unique invariant, and thus er-
godic, measure. A trivial example is constituted by contraction maps in which every orbit
converges to a unique fixed point p, and therefore δp is the unique ergodic invariant prob-
ability measure. However there are several other less trivial examples and in this section
we discuss some aspects of these examples.

Definition 14. We say that a map f : X → X is uniquely ergodic if it admits a unique


(ergodic) invariant probability measure.

4.1 Uniform convergence


Theorem 5. Let f : X → X be a continuous map of a compact metric space. Then f is
uniquely ergodic if and only if for every continuous function ϕ, the limit
n−1
1X
ϕf := lim ϕ ◦ fj (7)
n→∞ n
j=0

exists for every x ∈ X and is independent of x.

Proof of Theorem 5. Supppose first that for any continuous function ϕ the limit ϕf exists
for every x and is independent of x. By Birkhoff’s
R Ergodic Theorem, for every ergodic
invariant probability measure µ we have ϕf = ϕdµ R for µ a.e.
R x. But then if µ1 , µ2 are
ergodic invariant probability measures this implies ϕdµ1 = ϕdµ2 for every continuous

23
function ϕ and this implies µ1 = µ2 . Thus f has only one ergodic invariant probability
measure and so is uniquely ergodic.
Conversely, suppose that f is uniquely ergodic and µ is the unique
R ergodic invariant
probability measure. By Birkhoff’s Ergodic Theorem, ϕf (x) = ϕdµ for µ-a.e. x. We
need to show that that this actually holds for every x, i.e. that the sequence
n−1
1X
ϕ ◦ f j → ϕf (8)
n j=0

as continuous functions, and thus uniformly. Suppose by contradiction that (8) does not
hold. Then by the negation of the definition of uniform continuity, there exists a continuous
function ϕ and  > 0 and sequences xk ∈ X and nk → ∞ for which
nk −1
1 X
ϕ(f i (xk )) − ϕf ≥ 
nk i=0

Define a sequence of measures


nk −1 nk −1
1 X i 1 X
νk := f δx = δf i xk .
nk i=0 ∗ k nk i=0

Notice that for any x we have f∗i δx = δf i (x) . Then, for every k we have
Z Z nk −1 nk −1 Z nk −1
1 X 1 X 1 X
ϕdνk = ϕd δf i xk = ϕdδf i xk = ϕ(f i (xk ))
n k i=0 n k i=0 n k i=0

and therefore Z
ϕdνk − ϕf ≥ .

for every k. By the weak-star compactness of the space M of probability measures, there
exists a subsequence kj → ∞ and a probability measure ν ∈ M such that νkj → ν and
Z
ϕdν − ϕf ≥ . (9)

3
Moreover, arguing as in the proof of the Krylov-Boguliobov Theorem R 2 we
R get that
ν ∈ Mf . But, by unique ergodicity we must have ν = µ and so ϕdν = ϕdµ = ϕf
contradicting (9) and therefore (8) and this completing the proof.
3
Indeed,
 nkj −1
 nkj −1 nkj −1
1 X
i 1 X i+1 1 X i 1  nk j 
f∗ νkj = f∗  f∗ δxkj =
 f∗ δxkj = f∗ δxkj + f∗ δxkj − δxkj
nkj i=0 nkj i=0 nkj i=0 nk j

and therefore f∗ νkj → ν as j → ∞. Since νkj → ν by definition of ν and f∗ νkj → f∗ ν by continuity of f∗ ,


this implies f∗ ν = ν and thus ν ∈ Mf .

24
For future reference we remark that one direction of the Theorem above only needs to
be verified for a dense subset of continuous functions since it implies the statement for all
continuous functions.

Lemma 4.1. Let f : X → X be a continuous map of a compact metric space. Suppose


there exists a dense set Φ of continuous functions such that for every ϕ ∈ Φ the limit
ϕf exists for every x and is independent of x. Then the same holds for every continuous
function ϕ.

Proof. To simplify the notation we let


n−1
1X
Bn (x, ϕ) := ϕ ◦ f i (x).
n i=0

By assumption, if ϕ ∈ Φ, there exists a constant ϕ̄ = ϕ̄(ϕ) such that Bn (x, ϕ) → ϕ̄


uniformly in x. Now let ψ : X → R be an arbitrary continuous function. Since Φ is dense,
for any  > 0 there exists φ ∈ Φ such that supx∈X |ϕ(x) − ψ(x)| < . This implies

|Bn (x, ϕ) − Bn (x, ψ)| < 

for every x, n and therefore there exists n0 ≥ 1 such that

sup Bn (x, ψ) − ϕ̄ < 2 and inf Bn (x, ψ) − ϕ̄ < 2


x,n≥n0 x,n≥n0

and so in particular
sup Bn (x, ψ) − inf Bn (x, ψ) < 4.
x,n x,n

Since  is arbitrary, this implies that Bn (x, ψ) converges uniformly to some constant ψ̄.
Notice that the function ϕ and therefore the constant ϕ̄ depends on , so what we have
shown here is simply that the inf and the sup are within 2 of each other for arbitrary 
and therefore must coincide. This shows that (8) holds for every continuous function.

4.2 Circle rotations


Proposition 4.1. Let f : S1 → S1 be the circle rotation f (x) = x + α with α irrational.
Then f is uniquely ergodic.

Proof. For any m ≥ 1, consider the functions

ϕm (x) := e2πimx = cos 2πmx + i2πmx

and let Φ denote the space of all linear combinations of functions of the form ϕm . By a
classical Theorem of Weierstrass, Φ is dense in the space of all continuous functions, thus

25
it is sufficient to show uniform convergence for functions in Φ. Moreover, notice that for
any two continuous functions ϕ, ψ we have
n−1 n−1
1X 1X
Bn (ϕ + ψ, x) := (ϕ + ψ) ◦ f i (x) = (ϕ ◦ f i (x) + ψ ◦ f i (x)) = Bn (ϕ, x) + Bn (ψ, x).
n i=0 n i=0
Thus, the Birkhoff averaging operator is linear in the observable and therefore to show the
statement for all functions in Φ it is sufficient to show it for each ϕm . To see this, notice
first of all that
ϕm ◦ f (x) = e2πim(x+α) = e2πimα e2πimx = e2πimα ϕm (x)
and therefore, using |ϕm (x)| = 1 and the sum nj=0 xj = (1 − xn+1 )/(1 − x) we get
P

n−1 n−1
1X 1 X 2πimjα 1 |1 − e2πimnα | 1 1
ϕm ◦ f j (x) = e = 2πimα
≤ →0
n j=0 n j=0 n |1 − e | n |1 − e2πimα |

The convergence is uniform because the upper bound does not depend on x. Notice that
we have used here the fact that α is irrational in an essential way to guarantee that the
denominator does not vanish for any m. Notice also that the convergence is of course not
uniform (and does not need to be uniform) in m.
It follows immediately from Birkhoff’s ergodic theorem that the orbit O+ (x) = {xn }∞
n=0
of Lebesgue almost every point is uniformly distributed in S1 (with respect to Lebesgue)
in the sense that for any arc (a, b) ⊂ S1 we have
#{0 ≤ i ≤ n − 1 : xi ∈ (a, b)}
→ m(a, b).
n
As a consequence of the uniqueness of the invariant measure, in the case of irrational circle
rotations we get the stronger statement that this property holds for every x ∈ S1 .
Proposition 4.2. Let f : S1 → S1 be the circle rotation f (x) = x + α with α irrational.
Then every orbit is uniformly distributed in S1 .
Proof. Consider an arbitrary arc [a, b] ⊂ S1 . Then, for any  >R 0 there exist continuous
functions ϕ, ψ : S1 → R such that ϕ ≤ 1[a,b] ≤ ψ and such that ψ − ϕdm ≤ . We then
have that
n−1 n−1 Z Z Z
1X 1X
lim inf 1[a,b] (xj ) ≥ lim inf ϕ(xj ) = ϕdm ≥ ψdm −  ≥ 1[a,b] − 
n→∞ n n→∞ n
j=0 j=0

and
n−1 n−1 Z Z Z
1X 1X
lim sup 1[a,b] (xj ) ≤ lim sup ψ(xj ) = ψdm ≤ ϕdm +  ≤ 1[a,b] + 
n→∞ n n→∞ n
j=0 j=0
R
Since  is arbitrary, the limit exists and equals 1[a,b] dm = |b − a| and thus the sequence
is uniformly distributed.

26
4.3 Benford’s distribution
We give an interesting application of the uniform distribution result above. First of all we
define the concept of a leading digit of a number a ∈ R. We define the leading digit of a
as the first non-zero digit in the decimal expansion of a. Thus, if |a| ≥ 1 this is just the
first digit of a. If |a| < 1 this is the first non-zero digit after the decimal point. We shall
use the notation
D(a) = leading digit of a.

Definition 15. We say that the sequence {ai }∞ i=0 has a Benford distribution if for every
d = 1, . . . , 9 we have

#{0 ≤ i ≤ n − 1 : D(ai ) = d}
 
1
B(d) := lim = log10 1 + .
n→∞ n d

This give the following approximate values:


B(1) = 0.301... ≈ 30%
B(2) = 0.176... ≈ 17%
B(3) = 0.124... ≈ 12%
B(4) = 0.096... ≈ 9%
B(5) = 0.079... ≈ 8%
B(6) = 0.066... ≈ 7%
B(7) = 0.057... ≈ 6%
B(8) = 0.051... ≈ 5%
B(9) = 0.045... ≈ 4%
Notice that p  
X 1
log10 1 + =1
d=1
d
so that B(d) are the probabilities of each digit d occuring as a laeding digit.
Remark 8. Remarkably, this distribution is observed in a variety of real-life data, mostly
in case in which there is a large amount of data across several orders of magnitude. It was
first observed by American astronomer Simon Newcombe in 1881 when he noticed that
the earlier pages of logarithm tables, containing numbers starting with 1, were much more
worn that other pages. This was rediscovered by physicist Frank Benford who discovered
that a wide amount of data followed this principle.

Proposition 4.3. Let k be any integer number that is not a power of ten. Then the
sequence {k n }∞
n=1 satisfies Benford’s distribution.

We prove the Proposition in the following two lemmas.

Lemma 4.2. Let k be any integer number that is not a power of ten. Then the sequence
{log10 k n mod 1}∞ 1
n=1 is uniformly distributed in S .

27
Proof. Since k is not a power of 10 the number log10 k is irrational and this sequence
can be seen as the sequence of iterates of x0 = 0 under the irrational circle rotation
f (x) = x + log10 k and therefore is uniformly distributed.

Lemma 4.3. Let {ai }∞ i=1 be a sequence of real numbers and suppose that the sequence
{log10 ai mod 1}i=1 is uniformly distributed in S1 . Then {ai }∞

i=1 satisfies Benford’s distri-
bution.

Proof. Notice first of all that for each ai we have

D(ai ) = d ⇐⇒ d10j ≤ ai < (d + 1)10j for some j ∈ Z

Therefore

D(ai ) = d ⇐⇒ log10 d + j ≤ log10 a1 ≤ log10 (d + 1) + j


or
D(ai ) = d ⇐⇒ log10 d ≤ log10 ai mod 1 ≤ log10 (d + 1).
By assumption, {log10 ai } is uniformly distributed and therefore

#{1 ≤ i ≤ n : D(ai ) = d} #{1 ≤ i ≤ n : log10 ai mod 1 ∈ (log10 d, log10 (d + 1)}


lim = lim
n→∞ n n→∞
  n
d+1 1
= log = log10 1 + .
d d

Proof of Proposition 4.3. Notice that log10 k n = n log10 k and therefore it is sufficient to
show that the sequence {n log10 k mod 1}∞ 1
i=1 is uniformly distributed in S .

5 Full Branch Maps


Definition 16. Let I ⊂ R be an interval. A map f : I → I is a full branch map if there
exists a finite or countable partition P of I (mod 0) into subintervals such that for each
ω ∈ P the map f |int(ω) : int(ω) → int(I) is a bijection. f is a piecewise continuous (resp.
C 1 , C 2 , affine) full branch map if for each ω ∈ P the map f |int(ω) : int(ω) → int(I) is a
homeomorphism (resp. C 1 diffeomorphism, C 2 diffeomorphism, affine).

5.1 Invariance and Ergodicity of Lebesgue measure


The full branch property is extremely important and useful. It is a fairly strong property
but it turns out that the study of many maps which do not have this property can be
reduced to maps with the full branch property. In this section we start by studying the
case of piecewise affine full branch maps. We will prove the following.

28
Proposition 5.1. Let f : I → I be a piecewise affine full branch map. Then Lebesgue
measure is invariant and ergodic.
Example 12. The simplest examples of full branch maps are the maps f : [0, 1] → [0, 1]
defined by f (x) = κx mod 1 for some integer κ ≥ 1. In this case it is almost trivial to
check that Lebesgue measure is invariant. In the general case in which the branches have
different derivatives and if there are an infinite number of branches it is a simple exercise.
Exercise 25. Let f : I → I be a piecewise affine full branch map. Then Lebesgue measure
is invariant. We write fω0 to denote the derivative of f on int(ω). In the general case (even
with an infinite number of branches) we have |ω| = 1/|fω0 |. Thus, for any interval A ⊂ I
we have
X X |A| X 1 X
|f −1 (A)| = |f −1 (A) ∩ ω| = 0|
= |A| 0|
= |A| |ω| = |A|.
ω∈P ω∈P
|f ω ω∈P
|f ω ω∈P

Thus Lebesgue measure is invariant.


Lemma 5.1. Let f : I → I be a continuous (resp. C 1 , C 2 , affine) full branch map.
Then there exists a family of partitions {P (n) }∞ n=1 of I(mod 0) into subintervals such that
P (1) = P, each P (n+1) is a refinement of P (n) , and such that for each n ≥ 1 and each
ω (n) ∈ P (n) the map f n : |int(ω(n) ) : int(ω (n) ) → int(I) is a homeomorphism (resp. a C 1
diffeomorphism, C 2 diffeomorphism, affine map).
Proof. For n = 1 we let P (1) := P where P is the partition in the definition of a full branch
map. Proceeding inductively, suppose that there exists a partition P (n−1) satisfying the
required conditions. Then each ω (n−1) is mapped by f n−1 bijectively to the entire interval
I and therefore ω (n−1) can be subdivided into disjoint subintervals each of which maps
bijectively to one of the elements of the original partition P. Thus each of these subintervals
will then be mapped under one further iteration bijectively to the entire interval I. These
are therefore the elements of the partition P (n) .
Proof of Proposition 5.1. Let A ⊂ [0, 1) satisfying f −1 (A) = A and suppose that |A| > 0.
We shall show that |A| = 1. Notice first of all that since f is piecewise affine, each element
ω ∈ P is mapped affinely and bijectively to I and therefore must have derivative strictly
larger than 1 uniformly in ω. Thus the iterates f n have derivatives which are growing
exponentially in n and thus, by the Mean Value Theorem, |ω (n) | → 0 exponentially (and
uniformly). By Lebesgue’s density Theorem, for any  > 0 we can find n = n sufficiently
large so that the elements of Pn are sufficiently small so that there exists some ω (n) ∈ P (n)
with |ω (n) ∩ A| ≥ (1 − )|ω (n) | or, equivalently, |ω (n) ∩ Ac | ≤ |ω (n) | or
|ω (n) ∩ Ac |
≤
|ω (n) |
Since f n : ω (n) → I is an affine bijection we have
|ω (n) ∩ Ac | |f n (ωn ∩ Ac )|
= .
|ω (n) | |f n (ωn )|

29
Moreover, f n (ωn ) = I and and since f −1 (A) = A implies f −1 (Ac ) = Ac which implies
f −n (Ac ) = Ac we have

f n (ω (n) ∩ Ac ) = f n (ωn ∩ f −n (Ac )) = Ac .

We conclude that
|Ac | |f n (ωn ∩ Ac )| |ω (n) ∩ Ac |
= = ≤ . (10)
|I| |f n (ωn )| |ω (n) |
This gives |Ac | ≤  and since  is arbitrary this implies |Ac | = 0 which implies |A| = 1 as
required.
Remark 9. Notice that the “affine” property of f has been used only in two places: two
show that the map is expanding in the sense of Lemma ??, and in the last equality of (10).
Thus in the first place it would have been quite sufficient to replace the affine assumption
with a uniform expansivity assumption. In the first place it would be sufficient to have an
inequality rather than an equality. We will show below that we can indeed obtain similar
results for full branch maps by relaxing the affine assumption.

5.2 Normal numbers


The relatively simple result on the invariance and ergodicity of Lebesgue measure for
piecewise affine full branch maps has a remarkable application on the theory of numbers.
For any number x ∈ [0, 1] and any integer k ≥ 2 we can write
x1 x2 x3
x= + + ...
k1 k2 k3
where each xi ∈ {0, . . . , k − 1}. This is sometimes called the expansion of x in base k and
is (apart from some exceptional cases) unique. Sometimes we just write

x = 0.x1 x2 x3 . . .

when it is understood that the expansion is with respect to a particular base k. For the
case k = 10 this is of course just the well known decimal expansion of x.

Definition 17. A number x ∈ [0, 1] is called normal (in base k) if its expansion x =
0.x1 x2 x3 . . . in base k contains asymptotically equal proportions of all digits, i.e. if for
every j = 0, . . . , k − 1 we have that

]{1 ≤ i ≤ n : xi = j} 1

n k
as n → ∞.

Exercise 26. Give examples of normal and non normal numbers in a given base k.

30
It is not however immediately obvious what proportion of numbers are normal in any
given base nor if there even might exist a number that is normal in every base. We will
show that in fact Lebesgue almost every x is normal in every base.
Theorem 6. There exists set N ⊂ [0, 1] with |N | = 1 such that every x ∈ N is normal in
every base k ≥ 2.
Proof. It is enough to show that for any given k ≥ 2 there exists a set Nk with m(Nk ) = 1
such that every x ∈ Nk is normal in base k. Indeed, this implies that for each k ≥ 2 the
set of points I \ Nk which is not normal in base k satisfies m(I \ Nk ) = 0. Thus the set of
point I \ N which is not normal in every base is contained in the union of all I \ Nk and
since the countable union of sets of measure zero has measure zero we have

! ∞
[ X
m(I \ N ) ≤ m I \ Nk ≤ m(I \ Nk ) = 0.
k=2 k=2

We therefore fix some k ≥ 2 and consider the set Nk of points which are normal in
base k. The crucial observation is that the base k expansion of the number x is closely
related to its orbit under the map fk . Indeed, consider the intervals Aj = [j/k, (j + 1)/k)
for j = 0, . . . , k − 1. Then, the base k expansion x = 0.x1 x2 x3 . . . of the point x clearly
satisfies
x ∈ Aj ⇐⇒ x1 = j.
Moreover, for any i ≥ 0 we have

f i (x) ∈ Aj ⇐⇒ xi+1 = j.

Therefore the frequency of occurrences of the digit j in the expansion of x is exactly the
same as the frequence of visits of the orbit of the point x to Aj under iterations of the map
fk . Birkhoff’s ergodic theorem and the ergodicity of Lebesgue measure for fk implies that
Lebesgue almost every orbit spends asymptotically m(Aj ) = 1/k of its iterations in each of
the intervals Aj . Therefore Lebesgue almost every point has an asymptotic frequence 1/k
of each digit j in its decimal expansion. Therefore Lebesgue almost every point is normal
in base k.

5.3 Uncountably many non-atomic ergodic measures


We now use the pull-back method to show construct an uncountable family of ergodic
invariant measures. We recall that a measure is called non-atomic if there is no individual
point which has positive measure.
Proposition 5.2. The interval map f (x) = 2x mod 1 admits an uncountable family of
non-atomic, mutually singular, ergodic measures.
We shall construct these measures quite explicitly and thus obtain some additional
information about their properties. the method of construction is of intrinsic interest

31
Definition 18. Let X, Y be two metric spaces and f : X → X and g : Y → Y be two
maps. We say that f and g are conjugate if there exists a bijection h : X → Y such that
h ◦ f = g ◦ h. or, equivalently, f = h−1 ◦ g ◦ h.

A conjugacy h maps orbits of f to orbits of g.


Exercise 27. Show that if f, g are conjugate, then f n (x) = h−1 ◦ g n ◦ h(x) for every n ≥ 1.
In particular a conjugacy naps fixed points to fixed points and periodic points to cor-
responding periodic points. However, without additional assumptions on the regularity of
h it may not preserve additional structure. We that f, g are (Borel) measurably conjugate
if h, h−1 are (Borel) measurable, topologically conjugate if h is a homeomorphism, and C r
conjugate, r ≥ 1, if h is a C r diffeomorphism.
Exercise 28. Show that conjugacy defines an equivalence relation on the space of all dy-
namical systems. Show that measurable, topological, and C r conjugacy, each defines an
equivalence relation on the space of dynamical systems.
Measurable conjugacies map sigma-algebras to sigma-algebras and therefore we can
define a map
h∗ : M(X) → M(Y )
from the space M(X) of all probability measures on X to the space M(Y ) of all probabi-
ulity measures on Y , by
h∗ µ(A) = µ(h−1 (A)).

Lemma 5.2. Suppose f, g are measurably conjugate. Then

1. h∗ µ is invariant under g if and only if µ is invariant under f .

2. h∗ µ is ergodic for g if and only if µ is ergodic for f .

Proof. Exercise. (Hint: Indeed, for any measurable set A ⊆ Y we have µY (g −1 (A)) =
µX (h−1 (g −1 (A))) = µX ((h−1 ◦g −1 )(A)) = µX ((g◦h)−1 (A)) = µX ((h◦f )−1 (A)) = µX (f −1 (h−1 (A))) =
µX (h−1 (A) = µY (A). ). For ergodicity, let A ⊂ Y satisfy g −1 (A) = A. Then, it’s preimage
by the conjugacy satisfies the same property, i.e. f −1 (h−1 (A)) = h−1 (A). Thus, by the
ergodicity of µ we have either µ(h−1 (A)) = 0 or µ(h−1 (A)) = 1.
Example 13. Define the Ulam-von Neumann map f : [−2, 2] → [−2, 2] by

f (x) = x2 − 2.

Consider the piecewise affine tent map T : [0, 1] → [0, 1] defined by


(
2z, 0 ≤ z < 21
T (z) = 1
2 − 2z, 2
≤ z ≤ 1.

32
Notice that h is a bijection and both h and h−1 are smooth in the interior of their domains of
definition. Moreover, if y = h(z) = 2 cos πz, then z = h−1 (y) = π −1 cos−1 (y/2). Therefore

(2 cos πx)2 − 2
   
−1 1 −1 f (h(x)) 1 −1
h (f (h(x))) = cos = cos
π 2 π 2
1 1
= cos−1 (2 cos2 πx − 1) = cos−1 (cos 2πx) = T (x).
π π
For the last equality, notice that for x ∈ [0, 1/2] we have 2πx ∈ [0, π] and so π −1 cos−1 (cos 2πx) =
2x. On the other hand, for x ∈ [1/2, 1] we have 2πx ∈ [π, 2π] and so cos−1 (cos 2πx) =
− cos−1 (cos(2πx−2π)) = − cos−1 (cos 2π(x−1)) = −2π(x−1) and therefore π −1 cos−1 (cos 2πx) =
−2(x − 1) = −2x − 2.
Thus, any ergodic invariant measure for T can be “pulled back” to an ergodic invariant
measure for f using the conjugacy h. Using the explicit form of h−1 and differentiating,
we have
1 −1 2 −1
(h−1 )0 (x) = q = √
π 1 − x2 π 4 − x2
4

and therefore, for aninterval A = (a, b) we have, using the fundamental theorem of calculus,
Z b
2 b
Z
−1 −1 0 1
h∗ m(A) = m(h (A) = |(h ) (x)|dx = √ dx.
a π a 4 − x2
Thus µ = h∗ m is invariant and ergodic for f .
We now apply the method of defining measures via conjugacy to piecewise affine maps.
For each p ∈ (0, 1) let I (p) = [0, 1) and define the map fp : I (p) → I (p) by
(
1
p
x for 0 ≤ x < p
fp = 1 p
1−p
x − 1−p for p ≤ x < 1.

Lemma 5.3. For any p ∈ (0, 1) the maps f and fp are topologically conjugate.
Proof. This is a standard proof in topological dynamics and we just give a sketch of the
argument here because the actual way in which the conjugacy h is constructed plays a
crucial role in what follows. We use the symbolic dynamics of the maps f and fp . Let
(p) (p)
I0 = [0, p) and I1 = (p, 1].
(p) (p) (p)
Then, for each x we define the symbol sequence (x0 x1 x2 . . .) ∈ Σ+
2 by letting
(
(p)
(p) 0 if f i (x) ∈ I0
xi = (p)
1 if f i (x) ∈ I1 .

This sequence is well defined for all points which are not preimages of the point p. Moreover
it is unique since every interval [x, y] is expanded at least by a factor 1/p at each iterations

33
and therefore f n ([x, y]) grows exponentially fast so that eventually the images of f n (x) and
f n (y) must lie on opposite sides of p and therefore give rise to different sequences. The
map f : I → I is of course just a special case of fp : I (p) → I (p) with p = 1/2. We can
therefore define a bijection
hp : I (p) → I
which maps points with the same associated symbolic sequence to each other and points
which are preimages of p to corresponding preimages of 1/2.
Exercise 29. Show that hp is a conjugacy between f and fp .
Exercise 30. Show that hp is a homeomorphism. Hint: if x does not lie in the pre-image
of the discontinuity (1/2 or p depending on which map we consider) then sufficiently close
points y will have a symbolic sequence which coincides with that of x for a large number of
terms, where the number of terms can be made arbitrarily large by choosing y sufficiently
close to x. The corresponding points therefore also have symbolic sequences which coincide
for a large number of terms and this implies that they must be close to each other.
From the previous two exercises it follows that h is a topological conjugacy.
Since hp : I (p) → I is a topological conjugacy, it is also in particular measurable
conjugacy and so, letting m denote Lebesgue measure, we define the measure
µp = h∗ m.
By Proposition 5.1 Lebesgue measure is ergodic and invariant for fp and so it follows from
Lemma 5.2 that µp is ergodic and invariant for f .
Exercise 31. Show that µp is non-atomic.
Thus it just remains to show that the µp are mutually singular.
Lemma 5.4. The measures in the family {µp }p∈(0,1) are all mutually singular.
Proof. The proof is a straightforward, if somewhat subtle, application of Birkhoff’s Ergodic
Theorem. Let
Ap = {x ∈ I whose symbolic coding contain asymptotically a proportion p of 0’s}
and
A(p)
p = {x ∈ I
(p)
whose symbolic coding contain asymptotically a proportion p of 0’s}
Notice that by the way the coding has been defined the asymptotic propertion of 0’s in
the symbolic coding of a point x is exactly the asymptotic relative frequency of visits of
(p)
the orbit of the point x to the interval I0 or I0 under the maps f and fp respectively.
Since Lebesgue measure is invariant and ergodic for fp , Birkhoff implies that the relative
(p)
frequence of visits of Lebesgue almost every point to I0 is asymptotically equal to the
(p)
Lebesgue measure of I0 which is exactly p. Thus we have that
m(A(p)
p ) = 1.

34
Moreover, since the conjugacy preserves the symbolic coding we have

Ap = h(A(p)
p ).

Thus, by the definition of the pushforward measure

µp (Ap ) = m(h−1 (Ap )) = m(h−1 (h(A(p) (p)


p )) = m(Ap ) = 1.

Since the sets Ap are clearly pairwaise disjoint for distinct values of p it follows that the
measures µp are mutually singular.

Remark 10. This example shows that the conjugacies in question, even though they are
homeomorphisms, are singular with respect to Lebesgue measure, i.e. thay maps sets of
full measure to sets of zero measure.

6 Distortion
6.1 The Gauss map
Let I = [0, 1] and define the Gauss map f : I → I by f (0) = 0 and
1
f (x) = mod 1
x
if x 6= 0. Notice that for every n ∈ N the map
 
1 1
f: , → (0, 1]
n+1 n

is a diffeomorphism. In particular the Gauss map is a full branch map though it is not
piecewise affine. Define the Gauss measure µG by defining, for every measurable set A
Z
1 1
µG (A) = dx.
log 2 A 1 + x
Theorem 7. Let f : I → I be the Gauss map. Then µG is invariant and ergodic.

Lemma 6.1. µG is invariant.

Proof. It is sufficient to prove invariance on intervals A = (a, b). In this case we have
Z b
1 1 1 1+b
µG (A) = dx = log
log 2 a 1+x log 2 1+a

35
Each interval A = (a, b) has a countable infinite of pre-images, one inside each interval of
the form (1/n+1, 1/n) and this preimage is given explicitly as the interval (1/n+b, 1/n+a).
Therefore
∞  ! ∞
!
1
[ 1 1 1 X 1 + n+a
µG (f −1 (a, b)) = µG , = log 1
n=1
n + b n + a log 2 n=1
1 + n+b
∞  
1 Y n+a+1 n+b
= log
log 2 n=1
n+a n+b+1
 
1 1+a+1 1+b 2+a+1 2+b
= log ...
log 2 1+a 1+b+1 2+a 2+b+1
1 1+b
= log = µG (a, b).
log 2 1+a

We now want to relax the assumption that f is piecewise affine.

Definition 19. A full branch map has bounded distortion if

sup sup sup log |Df n (x)/Df n (y)| < ∞. (11)


n≥1 ω (n) ∈P (n) x,y∈ω (n)

Notice that the distortion is 0 if f is piecewise affine so that the bounded distortion
property is automatically satisfied in that case.

Theorem 8. Let f : I → I be a full branch map with bounded distortion. Then Lebesgue
measure is ergodic.

Lemma 6.2. Let f : I → I be a measurable map and let µ1 , µ2 be two probability measures
with µ1  µ2 . Suppose µ2 is ergodic for f . Then µ1 is also ergodic for f .

Proof. Suppose A ⊆ I with µ1 (A) > 0. Then by the absolute continuity this implies
µ2 (A) > 0; by ergodicity of µ2 this implies µ2 (A) = 1 and therefore µ2 (I \ A) = 0; and so
by absolute continuity, also µ1 (I \ A) = 0 and so µ1 (A) = 1. Thus µ1 is ergodic.

6.2 Bounded distortion implies ergodicity


We now prove Theorem 8. For any subinterval J and any n ≥ 1 we define the distortion
of f n on J as
D(f,n J) := sup log |Df n (x)/Df n (y)|.
x,y∈J

The bounded distortion condition says that D(f n , ω (n) ) is uniformly bounded. The distor-
tion has an immediate geometrical interpretation in terms of the way that ratios of lengths
of intervals are (or not) preserved under f .

36
Lemma 6.3. Let D = D(f n , J) be the distortion of f n on some interval J. Then, for any
subinterval J 0 ⊂ J we have
|J 0 | |f n (J 0 )| |J 0 |
e−D ≤ n ≤ eD
|J| |f (J)| |J|

Proof. By the Mean Value Theorem there exists x ∈ J 0 and y ∈ J such that |Df n (x)| =
|f n (J 0 )|/|J 0 | and |Df n (y)| = |f n (J)|/|J|. Therefore

|f n (J 0 )| |J 0 | |f n (J 0 )|/|J 0 | |Df n (x)|


= = (12)
|f n (J)| |J| |f n (J)|/|J| |Df n (y)|

From the definition of distortion we have e−D ≤ |Df n (x)|/|Df n (y)| ≤ eD and so substi-
tuting this into (12) gives
|f n (J 0 )| |J|
e−D ≤ n ≤ eD
|f (J)| |J 0 |
and rearranging gives the result.

Lemma 6.4. Let f : I → I be a full branch map with the bounded distortion property.
Then max{|ω (n) |; ω (n) ∈ P (n) } → 0 as n → 0

Proof. First of al let δ = maxω∈P |ω| < |I| Then, from the combinatorial structure of
full branch maps described in Lemma 5.1 and its proof, we have that for each n ≥ 1
f n (ω (n) ) = I and that f n−1 (ω (n) ) ∈ P, and therefore |f n−1 (ω (n) )| ≤ δ and |f n−1 (ω (n−1) \
ω (n) |)| ≥ |I| − δ > 0. Thus, using Lemma 6.3 we have

|ω (n−1) \ ω (n) | −D |f
n−1
(ω (n−1) \ ω (n) |)| −D |I| − δ
≥ e ≥ e =: 1 − τ.
|ω (n−1) | |f n−1 (ω (n−1) )| |I|

Then
|ω (n) | |ω (n−1) | − |ω (n) | |ω (n−1) \ ω (n) |
1− = = ≥ 1 − τ.
|ω (n−1) | |ω (n−1) | |ω (n) |
Thus for every n ≥ 0 and every ω (n) ⊂ ω (n−1) we have |ω (n) |/|ω (n−1) | ≤ τ. Applying
this inequality recursively then implies |ω (n) | ≤ τ |ω (n−1) | ≤ τ 2 |ω (n−2) | ≤ · · · ≤ τ n |ω 0 | ≤
τ n |∆|.
Proof of Theorem 8. The proof is almost identical to the piecewise affine case. The only
difference is when we get to equation (10) where we now use the bounded distortion to get

|I \ A| |f n (ωn \ A)| D |ωn \ A|


= ≤ e ≤ eD . (13)
|I| |f n (ωn )| |ωn |
Since ε is arbitrary this implies m(Ac ) = 0 and thus m(A) = 1.

37
6.3 Sufficient conditions for bounded distortion
In other cases, the bounded distortion property is not immediately checkable, but we give
here some sufficient conditions.
Definition 20. A full branch map f is uniformly expanding if there exist constant C, λ > 0
such that for all x ∈ I and all n ≥ 1 such that x, f (x), . . . , f n−1 (x) ∈
/ ∂P we have
n 0 λn
|(f ) (x)| ≥ Ce .
Theorem 9. Let f be a full branch map. Suppose that f is uniformly expanding and that
there exists a constant K > 0 such that
sup sup |f 00 (x)|/|f 0 (y)|2 ≤ K. (14)
ω∈P x,y∈ω

Then there exists K̃ > 0 such that for every n ≥ 1, ω (n) ∈ P (n) and x, y ∈ ω (n) we have
|Df n (x)|
log ≤ K̃|f n (x) − f n (y)| ≤ K̃. (15)
|Df n (y)|
In particular f satisfies the bounded distortion property.
Lemma 6.5. The Gauss map is uniformly expanding and satisfies (14).
Proof. We leave the verification that the Gauss map is uniformly expanding as an exercise.
Since f (x) = x−1 we have f 0 (x) = −x−2 and f 00 (x) = 2x−3 . Notice that both first
and second derivatives are monotone decreasing, i.e. take on larger values close to 0.
Thus, for a generic interval ω = (1/(n + 1), 1/n) of the partition P we have |f 00 (x)| ≤
f 00 (1/(n + 1)) = 2(n + 1)3 and |f 0 (y)| ≥ |f 0 (1/n)| = n2 . Therefore, for any x, y ∈ ω we
have |f 00 (x)|/|f 0 (y)|2 ≤ 2(n + 1)3 /n4 ≤ 2((n + 1)/n)3 (1/n). This upper bound is monotone
decreasing with n and thus the worst case is n = 1 whch gives |f 00 (x)|/|f 0 (y)|2 ≤ 16 as
required.
The proof consists of three simple steps which we formulate in the following three
lemmas.
Lemma 6.6. Let f be a full branch map satisfying (14). Then, for all ω ∈ P, x, y ∈ ω we
have
f 0 (x)
− 1 ≤ K|f (x) − f (y)|. (16)
f 0 (y)
Proof. By the Mean Value Theorem we have |f (x) − f (y)| = |f 0 (ξ1 )||x − y| and |f 0 (x) −
f 0 (y)| = |f 00 (ξ2 )||x − y| for some ξ1 , ξ2 ∈ [x, y] ⊂ ω. Therefore
|f 00 (ξ2 )|
|f 0 (x) − f 0 (y)| = |f (x) − f (y)|. (17)
|f 0 (ξ1 )|
Assumption (14) implies that |f 00 (ξ2 )|/|f 0 (ξ1 )| ≤ K|f 0 (ξ)| for all ξ ∈ ω. Choosing ξ = y
and substituting this into (17) therefore gives |f 0 (x) − f 0 (y)| = K|f 0 (y)||f (x) − f (y)| and
dividing through by |f 0 (y)| gives the result.

38
Lemma 6.7. Let f be a full branch map satisfying (16). Then, for any n ≥ 1 and ω (n) ∈ Pn
we have n
X
Dist(f n , ω (n) ) ≤ K |f i (x) − f i (y)| (18)
i=1

Proof. By the chain rule f (n) (x) = f 0 (x) · f 0 (f (x)) · · · f 0 (f n−1 (x)) and so
n n−1
|f (n) (x)| Y |f 0 (f i (x))| X f 0 (f i (x))
log (n) = log = log
|f (y)| i=1
|f 0 (f i (y))| i=0
f 0 (f i (y))
n−1
X f 0 (f i (x)) f 0 (f i (y)) f 0 (f i (y))
= log − +
i=0
f 0 (f i (y)) f 0 (f i (y)) f 0 (f i (y))
n−1
X f 0 (f i (x)) − f 0 (f i (y))
= log +1
i=0
f 0 (f i (y))
n−1  0 i
|f (f (x)) − f 0 (f i (y))|
X 
≤ log +1
i=0
|f 0 (f i (y))|
n−1
X |f 0 (f i (x)) − f 0 (f i (y))|
≤ using log(1 + x) < x
i=0
|f 0 (f i (y))|
n−1 n
X f 0 (f i (x)) X
≤ − 1 ≤ K|f i (x) − f i (y)|.
i=0
f 0 (f i (y)) i=1

Lemma 6.8. Let f be a uniformly expanding full branch map. Then there exists a constant
K̃ depending only on C, λ, such that for all n ≥ 1, ω (n) ∈ Pn and x, y ∈ ω (n) we have
n
X
|f i (x) − f i (y)| ≤ K̃|f n (x) − f n (y)|.
i=1

Proof. For simplicity, let ω̃ := (x, y) ⊂ ω (n) . By definition the map f n |ω̃ : ω̃ → f n (ω̃) is
a diffeomorphism onto its image. In particular this is also true for each map f n−i |f i (ω̃) :
f i (ω̃) → f n (ω̃). By the Mean Value Theorem we have that

|f n (x) − f n (y)| = |f n (ω̃)| = |f n−i (f i (ω̃)| = |(f n−i )0 (ξn−i )||f i (ω̃)| ≥ Ceλ(n−i) |f i (ω̃)|

for some ξn−i ∈ f n−i (ω̃). Therefore


n n n ∞
X
i i
X
i
X 1 −λ(n−i) n 1 X −λi n
|f (x) − f (y)| = |f (ω̃)| ≤ e |f (ω̃)| ≤ e |f (x) − f n (y)|.
i=1 i=1 i=1
C C i=0

39
7 Physical measures
We have proved above that Lebesgue measure is ergodic for every full branch map with
bounded distortion. In particular this implies ergodicity for any absolutely continuous
probability measure and thus for the Gauss measure which is invariant for the Gauss map.
In general most maps do not have explicit formulae fot eh invariant measure but we show
here that any full branch map with bounded distortion does have an ergodic invariant
probability measure which is absolutely continuous with respect to Lebesgue and thus a
physical measure.
Theorem 10. Let f : I → I be a full branch map satisfying (15). Then f admits a unique
ergodic absolutely continuous invariant probability measure µ. Morever, the density dµ/dm
of µ is Lipschitz continuous and bounded above and below.
We begin in exactly the same way as for the proof of the existence of invariant measures
for general continuous maps and define the sequence
n
1X i
µn = fm
n i=0 ∗

where m denotes Lebesgue measure.


Exercise 32. For each n ≥ 1 we have µn  m. Hint: by definition f is a C 2 diffeomorphism
on (the interior of) each element of the partition P and thus in particular it is non-singular
in the sense that m(A) = 0 implies m(f −1 (A) = 0 for any measurable set A.
Since µn  m we can let
dµn
Hn :=
dm
denote the density of µn with respct to m.
Remark 11. The fact that µn  m for every n does not imply that µ  m. Indeed,
consider the following example. Suppose f : [0, 1] → [0, 1] is given by f (x) = x/2. We
alreeady know that in this case the only physical measure is the Dirac measure at the
unique attracting fixed point at 0. In this simple setting we can see directly that µn → δ0
where µn are the averages defined above. In fact we shall show that stronger statement
that f∗n m → δp as n → ∞. Indeed, let µ0 = m. And consider the measure µ1 = f∗ m which
is give by definition by µ1 (A) = µ0 (f −1 (A)). Then it is easy to see that µ1 ([0, 1/2]) =
µ0 (f −1 ([0.1/2])) = µ0 ([0, 1]) = 1. Thus the measure µ1 is completely concentrated on the
interval [0, 1/2]. Similarly, it is easy to see that µn ([0, 1/2n ]) = µ0 ([0, 1]) = 1 and thus the
measure µn is completely concetrated on the interval [0.1/2n ]. Thus the measures µn are
concentrated on increasingly smaller neighbourhood of the origin 0. This clearly implies
that they are converging in the weak star topology to the Dirac measure at 0.
This counter-example shows that a sequence of absolutely continuous measures does
not necessarily converge to an absolutely continuous measures. This is essentially related
to the fact that a sequence of L1 functions (the densities of the absolutely continuous

40
measures µn ) may not converge to an L1 function even if they are all uniformly bounded
in the L1 norm.
The proof of the Theorem then relies on the following crucial
Proposition 7.1. There exists a constant K > 0 such that
0 < inf Hn (x) ≤ sup Hn (x) ≤ K (19)
n,x n,x

and for every n ≥ 1 and every x, y ∈ I we have


|Hn (x) − Hn (y)| ≤ K|Hn (x)|d(x, y) ≤ K 2 d(x, y). (20)
Proof of Theorem assuming Proposition 7.1. The Proposition says that the family {Hn }
is bounded and equicontinuous and therefore, by Ascoli-Arzela Theorem there exists a
subsequence Hnj converging uniformly to a function H satisfying (19) and (20). We define
the measure µ by defining, for every measurable set A,
Z
µ(A) := Hdm.
A

Then µ is absolutely continuous with respect to Lebesgue by definition, its density is


Lipschitz continuous and bounded above and below, and it is ergodic by the ergodicity of
Lebesgue measure and the absolute continuity. It just remains to prove that it is invariant.
Notice first of all that for any measurable set A we have
Z Z Z
µ(A) = Hdm = lim Hnj dm = lim Hnj dm
A A nj →∞ nj →∞ A
n−1 nj −1
1 X 1 X
= lim µnj (A) = lim f∗i m(A) = lim m(f −i (A))
nj →∞ nj →∞ n nj →∞ nj
i=0 i=0

For the third equality we have used the dominated convergence theorem to allow us to pull
the limit outside the integral. From this we can then write
nj −1
−1 1 X
µ(f (A)) = lim m(f −i (f −1 (A))
nj →∞ nj
i=0
nj
1 X
= lim m(f −i (A)
nj →∞ nj
i=1
nj −1
!
1 X 1 1
= lim m(f −i (A) + f −nj (A) − m(A)
nj →∞ nj i=0 nj nj
nj −1
1 X
= lim m(f −i (A)
nj →∞ nj
i=0
= µ(A).
This shows that µ is invariant and completes the proof.

41
It just remains to prove Proposition 7.1. We start by finding an explicit formula for
the functions Hn .
Lemma 7.1. For every n ≥ 1 and every x ∈ I we have
n−1
1X X 1
Hn (x) = Sn (x) where Sn (x) := .
n i=1 |Df n (y)|
y=f −i (x)

Proof. It is sufficientR to show that Sn is the density of the measure f∗n m with respect to m,
i.e. that f∗n m(A) = A Sn dm. By the definition of full branch map, each point has exactly
one preimage in each element of P. Since f : ω → I is a diffeomorphism, by standard
calculus we have
Z Z
n −n 1
m(A) = |Df |dm and m(f (A) ∩ ω) = n −n (x) ∩ ω)|
dm.
f −n (A)∩ω A |Df (f

Therefore
X XZ 1
−n −n
f∗n m(A) = m(f (A)) = m(f (A) ∩ ω) = dm
ω∈Pn ω∈Pn A |Df n (f −n (x)
∩ ω)|
Z X Z Z
1 X 1
= n −n
dm = dm = Sn dm.
A ω∈P |Df (f (x) ∩ ω)| A y∈f −n (x) |Df n (y)| A
n

Lemma 7.2. There exists a constant K > 0 such that

0 < inf Sn (x) ≤ sup Sn (x) ≤ K


n,x n,x

and for every n ≥ 1 and every x, y ∈ I we have

|Sn (x) − Sn (y)| ≤ K|Sn (x)|d(x, y) ≤ K 2 d(x, y).

Proof. The proof uses in a fundamental way the bounded distortion property (15). Recall
that for each ω ∈ Pn the map f n : ω → I is a diffeomorphism with uniformly bounded
distortion. This means that |Df n (x)/Df n (y)| ≤ D for any x, y ∈ ω and for any ω ∈ Pn
(uniformly in n). Informally this says that the derivative Df n is essentially the same
at all points of each ω ∈ Pn (although it can be wildly different in principle between
different ω’s). By the Mean Value Theorem, for each ω ∈ Pn , there exists a ξ ∈ ω such
that |I| = |Df n (ξ)||ω| and therefore |Df n (ξ)| = 1/|ω| (assuming the length of the entire
interval I is normalized to 1). But since the derivative at every point of ω is comparable
to that at ξ we have in particular |Df n (y)| ≈ 1/|ω| and therefore
X 1 X
Sn (x) = ≈ |ω| = 1.
|Df n (y)|
y∈f −n (x) ω∈Pn

42
To prove the uniform Lipschitz continuity recall that the bounded distortion property (15)
gives
Df n (x) n n

n
≤ eKd(f (x),f (y) ≤ 1 + K̃d(f n (x), f n (y)).
Df (y)
Inverting x, y we also have
Df n (y) 1 ˜ n
≥ ≥ 1 − K̃d(f (x), f n (y)).
Df n (x) n n
1 + K̃d(f (x), f (y))
Combining these two bounds we get
Df n (x)
n
− 1 ≤ K̂d(f n (x), f n (y))}.
Df (y)
˜ For x, y ∈ I we have
where K̂ = max{K̃, K̃}

X 1 X 1
|Sn (x) − Sn (y)| = −
|Df n (x̃)| |Df n (ỹ)|
x̃∈f −n (x) ỹ∈f −n (y)
∞ ∞
X 1 X 1
= n
− n
where f n (x̃i ) = x, f n (ỹi ) = y
i=1
|Df (x̃i )| i=1 |Df (ỹi )|
∞ ∞
X 1 1 X 1 Df n (x̃i )
≤ − = 1 −
i=1
|Df n (x̃)| |Df n (ỹ)| i=1
|Df n (x̃i )| Df n (ỹi )
∞ ∞
X 1 n n
X 1
≤ K̂ n
d(f (x̃ i ), f (ỹi )) ≤ K̂ n
d(x, y) = K̂Sn (x)d(x, y).
i=1
|Df (x̃i )| i=1
|Df (x̃i )|

Proof of Proposition 7.1. This Lemma clearly implies the Proposition since
1X 1X 1X
|Hn (x) − Hn (y)| = | Si (x) − Si (y)| ≤ |Si (x) − Si (y)|
n n n
1X
≤ KSi (x)d(x, y) = Hn Kd(x, y) ≤ K 2 d(x, y).
n

8 Inducing
Most maps, even in one-dimension, are of course not full branch and/or do not satisfy the
bounded distortion property. Thus the result we have proved above apples in principle
only to a small class of systems. It turns out however that such full branch maps are often
embedded in much more general. In this section we explain this idea and give an example
of its application.

43
Definition 21. Let f : X → X be a map, ∆ ⊆ X and τ : ∆ → N a function such that
f τ (x) (x) ∈ ∆ for all x ∈ ∆. Then the map F : ∆ → ∆ defined by
F (x) := f τ (x) (x)
is called the induced map of f on ∆ for the return time function τ .
If ∆ = X then any function τ can be used to define an induced map, on the other hand,
if ∆ is a proper subset of X then the requirement f τ (x) (x) ∈ ∆ is a non-trivial restriction.
For convenience, we introduce the notation
∆n := {x ∈ ∆ : τ (x) = n}.

8.1 Spreading the measure


The map F : ∆ → ∆ can be considered as a dynamical system in its own right and
therefore has its own dynamical properties which might be, at least a priori, completely
different from those of the original map f . However it turns out that there is a close
relation between certain dynamical properties of F , in particular invariant measures, and
dynamical properties of the original map f . More specifically, if f : X → X is a map and
F := f τ : ∆ → ∆ the induced map on some subset ∆ ⊆ X corresponding to the return
time function τ : ∆ → N, and µ̂ is a probability measure on ∆, we can define a measure
∞ X
X n−1
ν := f∗i (µ̂|∆n ).
n=1 i=0

By the measurability of τ each ∆n is a measurable set. Observe first of all that for a
measurable set B ⊆ X, we have f∗i (µ̂|∆n )(B) = µ̂|∆n (f −i (B)) = µ̂(f −i (B) ∩ ∆n ) and
therefore
∞ X
X n−1 ∞ X
X n−1
ν(B) := i
f∗ (µ̂|∆n )(B) = µ̂(f −i (B) ∩ ∆n ).
n=1 i=0 n=1 i=0
This shows that µ is a well defined measure and also shows the way the measure is con-
structed by ”spreading” the measure µ̂ around using the dynamics. Notice that ν is not a
probability measure in general. Indeed, we have
∞ X
X n−1 X∞ X
n−1 ∞
X Z
−i
ν(X) = µ̂(f (X) ∩ ∆n ) = µ̂(∆n ) = nµ̂(∆n ) = τ dµ̂
n=1 i=0 n=1 i=0 n=1
R
If τ̂ := τ dµ̂ < ∞, i.e. if the inducing time is integrable with respect to µ̂, then the total
measure is finite and we can normalize it by defining
∞ n−1
1 XX i
µ= f (µ̂|∆n ). (21)
τ̂ n=1 i=0 ∗

which is clearly a probability measure. The natural question then concerns the relation
between the measure µ and the dynamics associated to the map f .

44
Proposition 8.1. Let X be a measure space, f : X → XR a measurable map, f τ : ∆ → ∆
an induced map, and µ̂ a probability measure on ∆ with τ dµ̂ < ∞. The following holds.

1. If µ̂ is invariant for F then µ is invariant for f .

2. If µ̂ is ergodic for F then µ is ergodic for f .

Suppose additionally that there exists a reference measure m on X and that f is non-
singular4 with respect to m.

3. If µ̂  m then µ  m.

As an immediate consequence we have the following

Theorem 11. Suppose M is a Riemannina manifold with Lebesgue measure m and f :


M → M is non-singular with respect to Lebesgue measure, and there exists a subset ∆ ⊆ X
and an induced map F : ∆ → ∆ which admits an invariant, ergodic, absolutely continuous
probability measure µ̂ for which the return time τ is integrable, then f admits an invariant
ergodic absolutely continuous probability measure.

Remark 12. Notice that Proposition 11 and its Corollary are quite general and in particular
apply to maps in arbitrary dimension.
Proof of Proposition 8.1. To prove (1), suppose that µ̂ is F -invariant, and therefore µ̂(B) =
µ̂(F −1 (B)) for any measurable set B. We will show first that

X ∞
X
µ̂(B ∩ ∆n ) = µ̂(f −n (B) ∩ ∆n )) (22)
n=1 n=1

Since the sets ∆n are disjoint and their union is ∆, the sum on the right hand side is
exactly µ̂(B). So we just need to show that the sum on the right hand side is µ̂(F −1 (B)).
By the definition of F we have

[ ∞
[
−1
F (B) = {x ∈ ∆ : F (x) ∈ B} = n
{x ∈ ∆n : f (x) ∈ B} = (f −n (B) ∩ ∆n ).
n=1 n=1

Since the ∆n are disjoint, for any measure µ̂ we have



[ ∞
X
−1 −n
µ̂(F (B)) = µ̂( (f (B) ∩ ∆n )) = µ̂(f −n (B) ∩ ∆n )).
n=1 n=1

4
We recall that a map f : X → X is non-singular with respect to a measure µ if it maps positive
measure sets to positive measure sets: µ(A) > 0 implies µ(f (A)) > 0 or, equivalently, m(A) = 0 implies
m(f −1 (A)) = 0.

45
This proves (22) and therefore implies
∞ X
X n−1
µ(f −1 (B)) = µ̂(f −i (f −1 (B)) ∩ ∆n )
n=1 i=0

X
= µ̂(f −1 (B) ∩ ∆n ) + µ̂(f −2 (B) ∩ ∆n ) + · · · + µ̂(f −n (B) ∩ ∆n )
n=1
∞ X
X n−1 ∞
X
−i
= µ̂(f (B) ∩ ∆n ) + µ̂(f −n (B) ∩ ∆n )
n=1 i=1 n=1
∞ X
X n−1 ∞
X
= µ̂(f −i (B) ∩ ∆n ) + µ̂(B ∩ ∆n )
n=1 i=1 n=1
∞ X
X n−1
= µ̂(f −i (B) ∩ ∆n )
n=1 i=0
= µ(B).
This shows that µ is invariant and thus completes the proof of (1). To prove (2), assume
that µ̂ is ergodic. Now let B ⊆ X satisfy f −1 (B) = B and µ(B) > 0. We will show that
µ(B) = 1 thus implying that µ is ergodic. Let B̂ = B ∩ ∆. We first show that
F −1 (B̂) = B̂ and µ̂(B̂) = 1. (23)
Indeed, f −1 (B) = B implies f −n (B̂) = f −n (B) ∩ f −n (∆) = B ∩ f −n (∆) and therefore

[ ∞
[ [∞
−1 −n −n
F (B̂) = (f (B̂) ∩ ∆n ) = (B ∩ f (∆) ∩ ∆n ) = (B ∩ ∆n ) = B ∩ ∆ = B̂
n=1 n=1 n=1

where the third equality follows from the fact that ∆n := {x : τ (x) = n} ⊆ {x : f n (x) ∈
∆} = f −n (∆). Now, from the definition of µ we have that f −1 (B) = B and µ(B) > 0 imply
µ̂(B ∩ ∆n ) = µ̂(f −i (B) ∩ ∆n ) > 0 for some n > i ≥ 0 and therefore µ̂(B̂) = µ̂(B ∩ ∆) > 0.
Thus, by the ergodicity of µ̂, we have that µ̂(B ∩ ∆) = µ̂(B̂) = 1 and this proves (23),
and thus in particular, letting B c := X \ B denote the complement of B, we have that
µ̂(B c ∩ ∆) = 0 and therefore
∞ X
X n−1 ∞ X
X n−1
c −i c
µ(B ) := µ̂(f (B ) ∩ ∆n ) = µ̂(B c ∩ ∆n ) = 0
n=1 i=0 n=1 i=0

This implies that µ(B) = 1 and thus completes the proof of (2). Finally (3) follows directly
from the definition of µ.

8.2 Intermittency maps


We give a relatively simple but non-trivial application of the method of inducing. Let
γ ≥ 0 and consider the map fγ : [0, 1] → [0, 1] given by
fγ (x) = x + x1+γ mod 1.

46
For γ > 0 this can be thought of as a perturbation of the map f (x) = 2x mod 1 (for γ = 0)
(though it is a C 0 perturbation and not a C 1 perturbation). It is a full branch map, but
it fails to satisfy both the uniform expansivity and the bounded distortion condition since

fγ0 (x) = 1 + (1 + γ)xγ

and so in particular for the fixed point at the origin we have f 0 (0) = 1 and thus (f n )0 (0) = 1
for all n ≥ 1. Nevertheless we will still be able to prove the following:

Theorem 12. For any γ ∈ [0, 1) the map fγ admits a unique ergodic absolutely continuous
invariant probability measure.

We first construct the full branch induced map, then show that it satisfies the uniform
expansivity and distortion conditions and finally check the integrability of the return times.
Let x1 := 1, let x2 denote the point in the interior of [0, 1] at the boundary between the
two domains on which f is smooth, and let {xn }∞ n=3 denote the branch of pre images of
x2 converging to the fixed point at the origin, so that we have xn → 0 monotonically and
and f (xn+1 ) = xn . For each n ≥ 1 we let ∆n = (xn+1 , xn ]. Then, the intervals ∆n form a
partition of ∆ := (0, 1] and there is a natural induced map F : ∆ → ∆ given by F |∆n = f n
such that F : ∆n → ∆ is a C 1 diffeomorphism.

Lemma 8.1. F is uniformly expanding.

Proof. Exercise.
It remains to show therefore that F has bounded distortion and that the inducing times
are integrable. For both of these results we need some estimates on the size of the partition
elements ∆n . To simplify the exposition, we shall use the following notation. Given two
sequences {an } and {bn } we use the notation an ≈ bn to mean that there exists a constant
C such that C −1 bn ≤ an ≤ Cbn for all n and an . bn to mean that an ≤ Cbn for all n.
1 1
Lemma 8.2. xn ≈ 1/n γ and |∆n | ≈ 1/n γ +1 .

Proof. First of all notice that since xn = f (xn+1 ) = xn+1 + x1+γ


n+1 we have

|∆n | = |xn − xn+1 | = x1+γ


n+1

Also, the ratio between xn and xn+1 is bounded since

xn /xn+1 = (xn+1 + x1+γ γ


n+1 )/xn+1 = 1 + xn+1 → 1

as n → ∞. So in fact, up to a uniform constant independent of n we have

|∆n | ≈ x1+γ
(n) for any x(n) ∈ ∆n . (24)

47
Now consider the sequence yk = 1/k 1/γ and let Jk = [yk+1 , yk ]. Then, considering the
1
function g(x) = 1/x1/γ we have g 0 (x) ≈ 1/γx γ +1 and a straightforward application of the
Mean Value Theorem gives
 1+γ
1 1 1 1
|Jk | = |yk − yk+1 | = γ − γ
= |g(k) − g(k + 1)| ≈ 1 +1 = 1 = yk1+γ
k (k + 1) kγ kγ
Similarly as above we have

yk /yk+1 = ((k + 1)/k)1+γ → 1

as k → ∞, and therefore, up to a constant independent of k we have


1+γ
|Jk | ≈ y(k) for any y(k) ∈ Jk (25)

Combining (24) and (25) we see that if ∆n ∩ Jk 6= ∅ then |∆n | ≈ |Jk |. This means that
there is a uniform bound on the number of intervals that can overlap each other which
1
means that the sequences xn , yn have the same asymptotics and so xn ≈ yn = 1/n γ and in
1
particular |∆n | ≈ x1+γ
n = 1/n γ +1 .
Lemma 8.3. There exists a constant D > 0 such that for all n ≥ 1 and all x, y ∈ ∆n
Df n (x)
log ≤ D|f n (x) − f n (y)|.
Df n (y)
Proof. We start with the standard inequality
k−1 k−1
Df k (x) X
i i
X D2 f (ξi )
log k
≤ | log Df (f (x)) − log Df (f (y))| ≤ |f i (x) − f i (y)|
Df (y) i=0 i=0
Df (ξi )

for some ξi ∈ (f i (x), f i (y)), where we have used here the Mean Value Theorem and the
fact that D(log Df ) = D2 f /Df . Since x, y ∈ ∆n then xi , yi ∈ ∆n−i and so, by the previous
Lemma we have 1
|f i (x) − f i (y)| ≤ |∆n−i | ≤ 1/(n − i) γ +1 .
Moreover, by the definition of f we have

Df (x) = 1 + (1 + γ)xγ and D2 f (x) = γ(1 + γ)xγ−1

and therefore, from the fact that ξi ∈ ∆n−i we have


1 1 1
ξi ≈ 1 , Df (ξi ) ≈ 1 + , D2 f (ξi ) ≈ 1
(n − i) γ n−i (n − i)1− γ
we get
k−1 k−1 1 ∞
Df k (x) X D2 f (ξi ) X (n − i) γ −1 X 1
i i
log ≤ |f (x) − f (y)| . 1 ≤ .
Df k (y) i=0
Df (ξi ) i=1 (n − i) γ
+1
i=1
i 2

48
This gives a uniform bound for the distortion but not yet in terms of the distance as
required in the Lemma. For this we now take advantage of the distortion bound just
obtained to get
|x − y| |f i (x) − f i (y)| |f n (x) − f n (y)|
≈ ≈
|∆n | |∆n−i | |∆|
to get in particular
|f i (x) − f i (y)| ≈ |∆n−i ||f n (x) − f n (y)|.
Repeating the calculation above with this new estimate we get
k−1
Df k (x) X D2 f (ξi )
log ≤ |∆n−i ||f n (x) − f n (y)| . |f n (x) − f n (y)|
Df k (y) i=0
Df (ξi )

Lemmas 8.1 and 8.3 imply that the map F : ∆ → ∆ has a unique ergodic absolutely
continuous invariant probability
R measure µ̂. To get the corresponding measure for µ it only
remains to show that τ dµ̂ < ∞. We also know however that the density dµ̂/dm of µ̂
with respect to Lebesgue measure
R µ is Lipschitz and in particular bounded, and therefore
it is sufficient to show that τ dm < ∞.

Lemma 8.4. For γ ∈ (0, 1), the induced map F has integrable inducing times. Moreover,
for every n ≥ 1 we have

X 1
m({x : τ (x) ≥ n}) = m(∆n ) . 1
j=n nγ
1
Proof. From the estimates obtained above we have that |∆n | ≈ n−( γ +1) . Therefore
Z X X 1
τ dx . n|∆n | ≈ 1 .
n n n γ

The sum on the right converges whenever γ ∈ (0, 1) and this gives the integrability. The
estimate for the tail follows by standard methods such as the following
∞ ∞ Z ∞  ∞
X X 1 1 1 1 1
|∆n | . 1
+1
≤ 1
+1
dx ≈ 1 = 1 ≈ 1 .
j=n j=n n γ n−1 x γ x γ n−1 (n − 1) γ nγ

49
8.3 Lyapunov exponents
Let f : [0.1] → [0, 1] be a piecewise C 1 map and µ and f -invariant ergodic probability
measure. Suppose that ln |f 0 | ∈ L1µ . Then, by Birkhoff’s Ergodic Theorem, for µ a.e. x we
have
n−1 Z
1X
ln |f (f (x))| → ln |f 0 |dµ = λµ .
0 i
n i=0
We call λ = λµ the Lyapunov exponent associated to the measure µ. It gives the ”asymp-
totic growth rate” of the derivative. Indeed, notice that by the chain rule, we have
n−1 n−1
1X 1 Y 0 i 1
ln |f 0 (f i (x))| = ln |f (f (x))| = ln |(f n )0 (x)|
n i=0 n i=0 n

and therefore, by the definition of limit, for any  > 0 there exists N > 0 such that for all
n ≥ N we have
1
λ −  ≤ ln |(f n )0 (x)| ≤ λ + 
n
and so, taking exponentials,
en(λ−) ≤ |(f n )0 (x)| ≤ en(λ+) .
This means that that essentially the derivative has a well-defined asymptotic growth rate
and that this growth rate is the same for µ almost every point and corresponds, as can be
expected, to the average growth rate with respect to the measure µ.
Example 14. If f is a C 1 contraction with unique fixed point p, then the Lyapunov exponent
with respect to the unique invariant ergodic measure δp is
Z
λ := ln |f 0 (x)|dδp = ln |f 0 (p)| < 0.

Moreover since ln |f 0 | is continuous and f n (x) → p for every x, it is easy to verify that
1
ln |(f n )0 (x)| → λ.
n
Example 15. If f is a piecewise C 2 full branch, uniformly expanding map with bounded dis-
tortion, and µ is its unique ergodic f -invariant absolutely continuous invariant probability
measure, then Z
λ := ln |f 0 (x)|dµ > 0.

There is a natural relation between the Lyapunov exponent of a measure and the
induced map from which such a measure was obtained. Indeed, suppose F = F τ R: ∆ → ∆ is
an induced map with an invariant ergodic probability measure µ̂ such that τ̂ := τ δ µ̂ < ∞
and such that ln |F 0 | ∈ L1µ̂ . Let λ̂ = ln |F 0 |dµ̂ be the Lyapunov exponent associated to
R

the measure µ̂ and let µ be the f -invariant ergodic probability measure corresponding to
µ̂. Then we have the following.

50
Proposition 8.2. The Lyapunov exponent associated to µ is λ = λ̂/τ̂ .

Proof. Let x ∈ ∆ and

Rn (x) := τ (x) + τ (F (x) + · · · + τ (F n−1 (x)).

Then, by the integrability of τ we have


n−1 Z
1 1X
Rn (x) = τ (F i (x)) → τ dµ̂ = τ̂ .
n n i=0

Therefore
1 1 Rn 1
ln |(F n )0 (x)| = ln |(f Rn (x) )0 (x)| = ln |(f Rn (x) )0 (x)|
n n n Rn
Since the left hand side converges to λ̂ and Rn /n → τ̂ it follows that

1 λ̂
ln |(f Rn (x) )0 (x)| → .
Rn τ̂

Combining this with Theorem 11 we immediately get the following extension which we
state in the one-dimensional setting, but also admits a higher-dimensional version.

Theorem 13. Suppose f : [0, 1] → [0, 1] is non-singular with respect to Lebesgue measure,
and there exists a subset ∆ ⊆ X and an induced map F : ∆ → ∆ which admits an invari-
ant, ergodic, absolutely continuous probability measure µ̂ with positive Lyapunov exponent,
for which the return time τ is integrable, then f admits an invariant ergodic absolutely
continuous probability measure with a positive Lyapunov exponent.

9 The quadratic family


9.1 The Ulam-Von Neumann map
In example 13 we studied the Ulam-von Neumann map f : [−2, 2] → [−2, 2] given by

f (x) = x2 − 2.

We showed there the existence of an absolutely continuous invariant measure by using a


special property of the map, namely the differentiable conjugacy with the tent map. Here
we sketch the construction of an induced map as an alternative method for proving the
same result. Of course this method does not give an explicit formula for the invariant
measure but it is much more general and indeed can be applied also for maps fo the form
fa (x) = x2 − a for other values of the parameter a.

51
The key idea is to induce on the interval [−p, p] where −p is the fixed point in (−2, 2).
To be continued....
We conclude this chapter by introducing an important family of maps fa : R → R, the
so-called quadratic family, defined by
fa (x) = x2 + a.
Notice that for large negative of the parameter a there exists an interval I and two closed
disjoint subintervals I0 , I1 ⊂ I on which f is expanding and such that f (I0 ) = f (I1 ) =
1. Thus for these parameters the maps have invariant Cantor sets as described in the
previous chapter. For parameters a < −2 but close to −2 we still have two closed disjoint
subintervals but the map is no longer expanding. nevertheless it is possible to show that
in this case the weaker expansivity condition (??) holds, and thus we continue to have
invariant Cantor sets. In this section we will focus on the parameter a = −2 in which we
have a full branch map with two intervals whose closures are not disjoint and which also
clearly cannot satisfy the expansivity condition (??) since it has a point where f 0 (x) = 0.
Notice in particular that the symbolic coding argument cannot be applied, at least not
directly, in this case, since we used in an esential way the expansivity properties of the
map to to show that the symbolic coding was injective.
Proposition 9.1. The maps f : [−2, 2] → [−2, 2] and g : [0, 1] → [0, 1] defined by
(
2z, 0 ≤ z < 21
f (x) = x2 − 2 and g(z) = 1
2 − 2z, 2
≤ z ≤ 1.
are topologically conjugate.
The map f (x) = x2 − 2 is sometimes called the Ulam-von Neumann map.
Proof. This is one of the very exceptional situations in which we can find a conjugacy
completely explicitly. Define the map h : [0, 1] → [−2, 2] by
h(z) = 2 cos πz.
h is clearly a (orientation reversing) homeomorphism and so we just need to show that it
is a conjugacy, i.e. that it satisfies the conjugacy equation f ◦ h = h ◦ g. On one hand we
have
f (h(z)) = f (2 cos πz) = (2 cos πz)2 − 2 = 4 cos2 πz − 2 = 2(2 cos2 πz − 1) = 2 cos 2πz.
On the other hand we have, for z ∈ [0, 1/2),
h(g(z)) = h(2z) = 2 cos π2z
and, for z ∈ [1/2, 1],
h(g(z)) = h(2 − 2z) = 2 cos π(2 − 2z) = 2 cos(2π − 2πz) = 2 cos(−2πz) = 2 cos 2πz
This proves the conjugacy.

52
The conjugacy gives that
f n = h ◦ g n ◦ h−1
and therefore, by the chain rule, letting z = h−1 (x), we have

h0 (g n (z)) n 0 −1
(f n )0 (x) = h0 (g n (h−1 (x))) · (g n )0 (h−1 (x)) · (h−1 )0 (x) = (g ) (h (x))
−h0 (z)

Since h0 (z) = −2π sin πz and |g 0 (z)| ≡ 2 this gives

| sin π(g n (z))| n


|(f n )0 (x)| = 2
| sin πz|

which shows that the derivative along every orbit grows at a rate 2n with a constant that
depend sonly on the initial and final point in the orbit, and which in particular can get
arbitrarily small. In particular we have that

| sin π(g n (z))|


 
1 n 0 1
ln |(f ) (x)| = ln + ln 2.
n n | sin πz|

and so
1
lim sup ln |(f n )0 (x)| = ln 2.
n→∞ n
Proposition 9.2. For every periodic point p ∈ (−2, 2) of period n, |(f n )0 (p)| = 2n .

Remark 13. Notice that by the topological conjugacy with the tent map, f has a dense
set of periodic points which means that there are periodic points arbitrarily close to the
critical point. For a periodic point p of period n we have

|(f n )0 (p)| = |f 0 (p)f 0 (f (p)) · · · f 0 (f n−1 (p))|

thus the result says that for any periodic orbit the derivatives compensate each other
exactly. In particular, any orbit for which some point lies very close to the critical point
must have a very high period in order to compensate the small derivative near the critical
point.
Proof. The assumption that p ∈ (−2, 2) implies that the entire orbit lies in (−2, 0) ∪ (0, 2).
Indeed, if some iterate of p falls on the critical point at 0 or on one of the endpoints ±2, it
would then fall onto the fixed point at 2 contradicting the assumption that p is a periodic
orbit and that p ∈ (−2, 2). Since the entire orbit lies in (−2, 2) we can use the fact that
the conjugacy h is C 1 and that p is a fixed point for f n and is therefore mapped to a fixed
point q for g n , to get that the derivative of f n at p is the same as the derivative of g n at q
which is necessarily 2n .

53
9.2 The Quadratic Family
We have seen that, at least in theory, the method of inducing is a very powerful method
for constructing invariant measures and studying their statistical properties. The question
of course is whether the method is really applicable and more generally if there are many
maps with acip’s. Let C 2 (I) denote the family of C 2 maps of the interval. We say that c ∈ I
is a critical point if f 0 (c) = 0. In principle, critical points constitute a main obstruction to
the construction and estimates we have carried out above, since they provide the biggest
possible contraction. If a critical point is periodic of period k, then c is a fixed point
for f k and (f k )0 (c) = 0 and so c is an attracting periodic orbit. On the other hand we
have already seen that maps with critical points can have acid’s as in the case of the map
f (x) = x2 − 2 which is smoothly conjugate to a piecewise affine ”tent map”. This map
belongs to the very well studied quadratic family

fa (x) = x2 + a.

It turns out that any interesting dynamics in this family only happens for a bounded
interval
Ω = [−2, a∗ ]
of parameter values. For this parameter interval we define

Ω+ := {a ∈ Ω : fa admits an ergodic acip µ}

and
Ω− := {a ∈ Ω : fa admits an attracting periodic orbit}.
Over the last 20 years or so, there have been some quite remarkable results on the structure
of these sets. First of all, if a ∈ Ω+ then µ is the unique physical measure and m(B
R µ ) = 1,
i.e. the time average of Lebesgue almost every x for a function ϕ converge to ϕdµ. On
the other hand, if a ∈ Ω− then the Dirac measure δO+ (p) on the attracting periodic orbit
is the unique physical measure and m(BδO+ (p) ) = 1, Lebesgue almost every x converges to
the orbit of p. Thus in particular we have

Ω+ ∩ Ω− = ∅.

Moreover, we also have the following results:


Theorem 14. 1. Lebesgue almost every a ∈ Ω belongs to either Ω+ or Ω− ;
2. Ω− is open and dense in Ω;
3. m(Ω+ ) > 0.
The last of these statements is actually the one that was proved first, by Jakobson in
1981. He used precisely the method of inducing to show that there are a positive Lebesgue
measure set of parameters for which there exists a full branch induced map with exponential
tails (and this exponential decay of correlations).

54
10 Mixing Measures
Our emphasis so far has been on the existence of physical measures for which we ob-
tain a description of the statistical distribution of typical orbits in phase space. However
many dynamical systems may share a same physical measures, for example irrational cir-
cle rotations and the family of maps of f (x) = κx mod 1 for κ ≥ 2 an integer all admit
Lebesgue measure as an invariant and ergodic measure. Thus almost all orbits are uni-
formly distributed with respect to Lebesgue. Nevertheless these are all different maps and
the dynamics of each one contains some characteristic features. We have already seen that
indeed irrational circle rotations exhibit some extra rigidity in that every orbit is uniformly
distributed. Moreover, intuitively the dynamics of circle rotations is more ”regular” than
that of expanding maps which is quite ”chaotic”. In this section we introduce the notion of
mixing which is a formal way to make a finer distinction between different kinds of dynam-
ical behaviour, and in particular will enable us to formally distinguish between irrational
circle rotations and expanding maps. A further, in some sense even finer, distinction can
be achieved through the concept of entropy which distinguishes for example expanding
maps with different number of branches. However the treatment of this concept is beyond
the scope of these notes.

10.1 Mixing
Let M be a measure space and f : M → M a measurable map. Suppose that µ is an
f -invariant probability measure.

Definition 22. µ is mixing if, for all measurable sets A, B ⊆ M ,

µ(A ∩ f −n (B)) − µ(A)µ(B) → 0

as n → ∞.

A good way to understand this definition is by dividing through by µ(B) and writing

µ(A ∩ f −n (B))
→ µ(A)
µ(B)

or, using the fact that µ is f -invariant,

µ(A ∩ f −n (B))
→ µ(A)
µ(f −n B)

What this says therefore is that the proportion of f −n (B) which intersects A converges
simply to the measure of A, i.e. f −n (B) is increasingly uniformly distributed over the
whole space (as seen through the measure µ).

Lemma 10.1. Suppose µ is mixing. Then it is ergodic.

55
Proof. Exercise.
As we shall see, however, the opposite is false and there are lots of systems which are
ergodic but not mixing.
Example 16. Let {p1 , . . . , pk } denote the periodic orbit with k ≥ 2 and consider the Dirac
measure uniformly distributed on the points of the orbit. Let A = pj and B = pj 0 for some
j 6= j 0 . Then f −n (A) ∩ B 6= ∅ for an infinite number of iterates and thus the measure
cannot be mixing.
Remark 14. Notice that this argument does not work if k = 1, i.e. for a fixed point.
Indeed, if p is a fixed point, and µ = δp the Dirac measure on p, then for any two sets A, B
we one of three possibilities. If they both contain p then f −n (A) ∩ P also contains p and
therefore µ(f −n (A) ∩ P ) ≡ 1 and µ(A)µ(B) = 1 as required. If neither contains p then
we just have zero measure all round which also works. If only one contains p then we also
have zero measure all round which also formally satisfies the definition. This is a kind of
”anomaly” in the sense that it formally satisfies the definition of mixing even though it
does not really satisfy the ”spirit” of the definition. However this case essentially means
that we are living in a one-point space and is therefore naturally degenerate.
Example 17. For an irrational circle rotation, consider two small intervals A, B. Then
f −n (A) ∩ B = ∅ for some infinite number of times and therefore again clearly cannot be
mixing.
Example 18. Consider the map f (x) = 10x mod 1. We will not give a formal argument
here, but notice that given any set B of positive Lebesgue measure, the pre-images f −n (B)
consist of exactly 10n scaled down ”copies” of B uniformly distributed in the unit interval.
Thus it is intuitively clear that any given set A will intersect a proportion of these pre-
images which converges to the measure of A, i.e. the proportion of A in the unit interval.
The same argument works for f (x) = κx for any other integer κ ≥ 2. Indeed, we will not
prove it here but the unique ergodic invariant absolutely continuous probability measure for
any full branch uniformly expanding maps with bounded distortion is also always mixing.
Mixing is also easily seen to be preserved under conjugacies. More precisely, let f : X →
X and g : Y → Y be two measurable maps and let h : X → Y be a measurable conjugacy.
Suppose that ν is a measure on X and define µ = h∗ ν so that µ(A) = ν(h−1 (A)). Then
we have shown above µ is invariant and ergodic for g as long as ν is invariant and ergodic
for f .
Lemma 10.2. Suppose ν is mixing for f . Then µ is mixing for g.
Proof. By the definition of conjugacy we have
h−1 (g −n (A)) = {x : g n (h(x)) ∈ A} = {x : h(f n (x)) ∈ A} = f −n (h−1 (A)).
Therefore, using the mixing of ν we have, for any measurable sets A, B ⊆ X,
µ(g −n (A) ∩ B) = ν(h−1 (g −n (A) ∩ B)) = ν(h−1 (g −n (A))) ∩ h−1 (B)
= ν(f −n (h−1 (A)) ∩ h−1 (B)) → ν(h−1 (A))ν(h−1 (B)) = µ(A)µ(B).

56
10.2 Decay of correlations
A natural question concerns the speed of mixing. If a measure µ is mixing for some map
f is there a particular speed at which the mixing occurs, e.g. is it exponential? A priori
there is no reason that such a rate should exist independently of the choice of sets A, B
and indeed it is known that in general, it is possible to find subsets A, B such that the
convergence in the definition of mixing is arbitrarily slow. However it turns out that we can
still talk meaningfully about rates of mixing by simultaneously generalising and restricting
the notion of mixing through the notion of correlations.
Definition 23. For measurable functions ϕ, ψ : M → R we define the correlation function
Z Z Z
n
Cn (ϕ, ψ) = ψ(ϕ ◦ f )dµ − ψdµ ϕdµ

We say that the correlation function decays if Cn (ϕ, ψ) → 0 as n → ∞.


If ϕ, ψ are characteristic functions of sets A, B we recover the expression used in the
definition of mixing:
Z Z Z
n
Cn (1A , 1B ) = 1A (1B ◦ f )dµ − 1A dµ 1B dµ
Z Z Z
= 1A∩f −n (B) dµ − 1A dµ 1B dµ

= |µ(A ∩ f −n (B)) − µ(A)µ(B)|.

It turns out that it is sometimes possible to give precise estimates for the rate of decay of
the correlations function as long as we restrict our attention to specific classes of functions.
Definition 24. Given classes B1 , B2 of functions and a sequence {γn } of positive numbers
with γn → 0 as n → ∞ we say that the correlation function Cn (ϕ, ψ) decays for functions
ϕ ∈ B1 , ψ ∈ B2 at the rate given by the sequence {γn } if, for any ϕ, ψ ∈ B there exists a
constant C = C(ϕ, ψ) > 0 such that

Cn (ϕ, ψ) ≤ Cγn

for all n ≥ 1.
For example, if γn = eγn we say that the correlation decays exponentially, if γn = n−γ
we say that the correlation decays polynomially. The key point here is that the rate,
i.e. the sequence {γn } is not allowed to depend on the observables but only on the class
of observables. Thus the rate of decay becomes in some sense an intrinsic property of
the system (and of the class of observables). As mentioned above, we cannot hope to
obtain decay of correlations for classes of functions which are too big, e.g. that contain all
characteristic functions. Most results that obtain rates of decay of correlations do so for
Hölder continuous function or functions of bounded variation, but other results also exist

57
for functions with weaker continuity properties or, in higher dimensions, even with quite
strange regularity conditions.
Suppose that f : M → M admits an induced uniformly expanding full branch map
F = f τ : ∆ → ∆ satisfying the bounded distortion property. We have seen above that F
admits a unique
R ergodic acip µ̂ with bounded density. If the return times are Lebesgue
integrable τ dm < ∞ then there exists an ergodic acip µ for f . The rate of decay of
correlation of µ is captured by the rate of decay of the tail of the return time function.
More precisely, we recall that

∆n := {x ∈ ∆ : τ (x) = n}.

Theorem 15. The rate of decay of correlation with respect to µis determined by the regu-
larity of the observables and therate of decay of |∆n |. For Hölder continuous observables,
if |∆n | → 0 exponentially, then the rate of decay is exponential, if |∆n | → 0 polynomially,
then the rate of decay is polynomial. For non-Hölder continuous observables the rate slows
down by factors related to the modulus of continuity.

The precise formulation of this statement is contained in several papers.

References
[1] Lynch, Vincent, Decay of correlations for non-Hölder observables. Discrete Contin. Dyn. Syst. 16 (2006),
no. 1, 19–46.
[2] Young, Lai-Sang Statistical properties of dynamical systems with some hyperbolicity. Ann. of Math.
(2) 147 (1998), no. 3, 585–650.
[3] Young, Lai-Sang Recurrence times and rates of mixing. Israel J. Math. 110 (1999), 153–188.

References
[1] Anatole Katok and Boris Hasselblatt, Introduction to the modern theory of dynamical systems, Ency-
clopedia of Mathematics and its Applications, vol. 54, Cambridge University Press, Cambridge, 1995.
With a supplementary chapter by Katok and Leonardo Mendoza.
[2] Ricardo Mañé, Ergodic theory and differentiable dynamics, Ergebnisse der Mathematik und ihrer Gren-
zgebiete (3) [Results in Mathematics and Related Areas (3)], vol. 8, Springer-Verlag, Berlin, 1987.
Translated from the Portuguese by Silvio Levy.
[3] Karl Petersen, Ergodic theory, Cambridge Studies in Advanced Mathematics, vol. 2, Cambridge Uni-
versity Press, Cambridge, 1983.
[4] Ya. G. Sinaı̆, Topics in ergodic theory, Princeton Mathematical Series, vol. 44, Princeton University
Press, Princeton, NJ, 1994.
[5] Peter Walters, An introduction to ergodic theory, Graduate Texts in Mathematics, vol. 79, Springer-
Verlag, New York, 1982.

58
A Review of measure theory
In this section we introduce only the very minimal requirements of Measure Theory which
will be needed later. For a more extensive introduction see any introductory book on
Measure Theory or Ergodic Theory, for example [?Kol33, ?Bil79, ?KF60, ?Fal97]. For
simplicity, we shall restrict ourselves to measures on the unit interval I = [0, 1] although
most of the definitions apply in much more general situations.

A.1 Definitions
A.2 Basic motivation: Positive measure Cantor sets
The notion of measure is, in the first instance, a generalization of the standard idea of
length. Indeed, while we know how to define the length of an interval, we do not apriori
know how to measure the size of sets which contain no intervals but which,Plogically,
have positive “measure” Let {ri }∞ i=0 be a sequence of positive numbers with ri < 1.
We define a set C ⊂ [0, 1] by recursively removing open subintervals from [0, 1] in the
following way. Start by removing an open subinterval I0 of length r0 from the interior of
[0, 1]. Then [0, 1] \ I0 has two connected components. Remove intervals I1 , I2 of lengths
r1 , r2 respectively from the interior of these components. Then [0, 1] \ (I0 ∪ I1 ∪ I2 ) has 4
connected components. Now remove intervals I3 , . . . , I7 from each of the interiors of these
components and continue in this way. Let

[
C = [0, 1] \ Ii
i=0

Then C does not contain any intervals since every interval is eventually subdivided by the
removal of one of the subintervals Ik from its interior, and therefore it does not make sense
talk about C as having any length. However the total length of the intervals
to P Premoved
is ri < 1 and therefore it would make sense to say that the size of C is 1 − ri . The
Theory of Measures formalizes this notion in a rigorous way and makes it possible to of
assign a size to sets such as C.

A.3 Non-measurable sets


The example above shows that it is desitable to generalize the notion of “length” to a
notion of “measure” which can apply to more complicated subsets which are not intervals
and which can formalize what we mean by saying for example that the Cantor set defined
above has positive measure. It turns out that however that in general it is not possible to
define a measure in a consistent way on all possible subsets. In 1924 Banach and Tarski
showed that it is possible to divide the unit ball in 3-dimensional space into 5 parts and
re-assemble these parts to form two unit balls, thus apparently doubling the volume of the
original set. This implies that it is impossible to consistently assign a well defined volume

59
in an additive way to every subset. See a very interesting discussion on wikipedia on this
point.
A simpler example is the following. Consider the unit circle S1 and an irrational circle
rotation fα : S1 → S1 . Then every orbit is dense in S1 . Let A ⊂ S1 be a set containing
exactly one point from each orbit. Assuming that we have defined a general notion of
a measure for which the measure m(A) has meaning and that generalizes the length of
intervals so that the measure of any interval coincides wth its length. In particular such
a measure will be translation invariant in the sense that the measure of a set cannot
be changed by simply translating this set. Therefore, since a circle rotation f is just
a translation we have m(f n (A) = m(A) for every n ∈ Z. Morover, since A contains
only one single point from each orbit and all points on a given orbit are distinct we have
f n (A) ∩ f m (A) = ∅ for all m, n ∈ Z with m 6= n and therefore we have
+∞
! +∞ +∞
[ X X
1 n n
1 = m(S ) = m f (A) = m(f (A)) = m(A)
i=−∞ i=−∞ i=−∞

This is clearly impossible as the right hand side is zero if |A| = 0 or infinity if |A| > 0.
Remark 15. This counterexample depends on the Axiom of Choice to ensure that it is
psossible to define such a set constructed by choosing a single point from each of an
uncountable family of subsets.

A.4 Algebras and sigma-algebras


Let X be a set and A a collection of (not necessarily disjoint) subsets of X.
Definition 25. We say that A is an algebra (of subsets of X if
1. ∅ ∈ A and X ∈ A.
2. A ∈ A implies Ac ∈ A
3. for any finite collection A1 , . . . , An of subsets in A we have
n
[
Ai ∈ A
i=1

We say that A is a σ-algebra (sigma-algebra) if moreover


(3’) for any countable collection A1 , A2 , . . . of subsets in A, we have

[
Ai ∈ A.
i=1

Given an algebra A of subsets of a set X we define the sigma-algebra σ(A) as the


smallest σ-algebra containing A. This is always well defined and is in general smaller than
the sigma-algebra of all subsets of X.

60
A.5 Measures
Let X be a set and A be a σ-algebra of subsets.

Definition 26. A measure is a function

µ : A → [0, ∞]

which is countably additive, i.e.



! ∞
[ X
µ Ai = µ(Ai )
i=1 i=1

for any countable collection {Ai }∞


i=1 of disjoint sets in A.

This definition shows that the σ-algebra is aas intrinsic to the definition of a measure
as the space itself. In general therefore we talk of a Measure Space as a triple (X, A, µ)
although the space and the σ-algebra are often omitted if they are given as fixed.
Remark 16. We say that µ is a finite measure if µ(X) < ∞ and that it is a probability
measure if µ(X) = 1. Notice that if µ̂ is a finite measure we can easily define a probability
measure µ by simply letting
µ̂
µ= .
µ̂(X)
The fact that such a countably additive function exists is non-trivial. It is usually easier
to find finitely additive functions on algebras; for example the standard length is a finitely
additive function on the algebra of finite unions of intervals. The fact that this extends to a
countably additive function on the corresponding σ-algebra is guaranteed by the following
fundamental

Theorem (Extension Theorem). Let µ̃ be a finitely additive function defined on an algebra


à of subsets. Then µ̃ can be extended in a unique way to a countably additive function µ
on the σ-algebra A = σ(Ã).

In the case in which X is an interval I ⊆ R (or the unit circle S1 which we think of as
just the unit interval with its endpoints identified) there is a vary natural sigma-algebra.

Definition 27. Let B̃ denote the algebra of all finite unions of subintervals of I. Then,
the generated σ-algebra B = σ(B̃) is called the Borel σ-algebra. Any measure defined on
B is called a Borel measure.

Remark 17. Notice that a Cantor set C ⊂ I is the complement of a countable union of
open intervals and therefore belongs to Borel σ-algebra B.

61
A.6 Integration
The abstract notion of measure leads to a powerful generalization of the standard definition
of Riemann integral. For A ∈ B we define the characteristic function

1 x∈A
χA (x) =
0 x∈ /A

A simple function is one which can be written in the form


N
X
ζ= ci χAi
i=1

where ci ∈ R+ are constants and the Ai are disjoint Borel measurable sets. These are
functions which are “piecewise constant” on a finite partition {Ai } of X.

Definition 28 (Integrals of nonnegative functions). For simple functions let


Z N
X
ζdµ = ci µ(Ai ).
X i=1

Then, for general, measurable, non-negative f we can define


Z Z 
f dµ = sup ζdµ : ζ simple, and ζ ≤ f .
X X

The integral is called the Lebesgue integral of the function f with respect to the measure
µ (even if µ is not Lebesgue measure).

Remark 18. Notice that, in contrast to the case of Riemann integration in which the
integral is given by a limiting process which may or may not converge, this supremum is
always well defined, though it may not always be finite.
More generally, for any measurable f we can write f = f + (−f − ) where f + (x) =
max{f (x), 0} and f − (x) = −min{0, f (x)} both of which are clearly non-negative.

Definition 29 (Integral of any measurable function). Let f be a µ measurable function.


If Z Z
+
f dµ < ∞ and f − dµ < ∞

then we say that f is µ-integrable and let


Z Z Z
f dµ = f dµ − f − dµ.
+

We let L1 (µ) denote the set of all µ-integrable functions.

62
Example 19. Let f : [0, 1] → R be given by
(
0 if x ∈ Q
f (x) =
1 otherwise.

Notice that this function is not Riemann integrable in the sense that the required limit
does not converge. From the point of view discussed above, however, it is just a simple
function which takes the value 0 on the measurable set Q and the value 1on the measurable
set R \ Q. For m = Lebesgue measure we have m(Q) = 0 since Q is countable, and therefore
m((R \ Q) ∩ [0, 1]) = 1 and so
Z
f dm = m((R \ Q) ∩ [0, 1]) = 1.
[0,1]

A.7 Lebesgue density theorem


Theorem 16 (Lebesgue Density Theorem). Let µ be a probability measure on I and let A
be a measurable set with µ(A) > 0. Then for µ almost every point x ∈ A we have

m(x − , x + )
→1 (26)
2
as  → 0.
Points x satisfying (26) are called (Lebesgue) density points of A. This result says that
in some very subtle way, the measure of the set A is “bunched up”. A priori one could
expect that if µ(A) = 1/2 then for any subinterval J the ratio between A ∩ J and J might
be 1/2, i.e. that the ratio between the measure of the whole interval and the measure of
the set A is constant at every scale. This theorem shows that this is not the case. We shall
not prove this result here.

A.8 Absolutely continuous and singular measures


Definition 30. Let µ1 , µ2 be probability measures.
1. µ1 is absolutely continuous wrt µ2 if µ2 (A) = 0 ⇒ µ1 (A) = 0 for every A.

2. µ1 , µ2 are mutually singular if there exists A such that µ1 (A) = 1 and µ2 (A) = 0.
If µ1 is absolutely continuous with respect to µ2 we write µ1  µ2 . If µ1  µ2 and
µ2  µ1 then we say that µ1 and µ2 are equivalent.
Example 20. Let X = R[0, 1] and m denote Lebesgue measure. Let ϕ ∈ L1 (m) be a non-
negative function with ϕdm = 1, and define the measure µϕ by
Z
µϕ (A) = ϕdm.
A

63
Then it is easy to see that µϕ  m. In fact the Radon-Nykodim Theorem says that all
absolutely continuous invariant measures
1
R are of this form: if µ1  µ2 then there exists a
non-negative function ϕ ∈ L (µ2 ) with ϕdµ2 such that
Z
µ1 (A) = ϕdµ2 (27)
A

for any measurable set A. The function ϕ is called the density or the Radon-Nykodim
derivative of µ1 with respect to µ2 , and is is sometimes written as ϕ = dµ1 /dµ2
Example 21. Let X = [0, 1] and m denote Lebesgue measure. Then, for any x ∈ [0, 1] the
Dirac delta measure δx and Lebesgue measure are mutually singular.
Exercise 33. Suppose µ1  µ2 . Show that for any measurable set A, µ1 (A) > 0 ⇒ µ2 (A) >
0 and µ2 (A) = 1 ⇒ µ1 (A) = 1
It is not the case that any two distinct measures need to be either absolutely continuous
or mutually singular. For example if we let µ1 = (δp + µ2 )/2 where δp is a Dirac measure
on some point p. Then µ1 and µ2 are neither absolutely continuous nor mutually singular.

64

You might also like