0% found this document useful (0 votes)
19 views119 pages

Hochman - Lectures On Fractal Geometry

This document outlines a course on fractal geometry and geometric measure theory. It introduces fractal geometry and discusses irregular sets like Cantor sets that are studied. It provides an overview of topics that will be covered in the course like dimension, measures, self-similar sets and measures, projections, and connections to dynamics.

Uploaded by

Ramón García
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views119 pages

Hochman - Lectures On Fractal Geometry

This document outlines a course on fractal geometry and geometric measure theory. It introduces fractal geometry and discusses irregular sets like Cantor sets that are studied. It provides an overview of topics that will be covered in the course like dimension, measures, self-similar sets and measures, projections, and connections to dynamics.

Uploaded by

Ramón García
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 119

Lectures on fractal geometry

Michael Hochman∗

November 6, 2023

Contents

1 Introduction 3
1.1 What is fractal geometry? . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 What is this course about? . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Prerequisites, conventions and notation . . . . . . . . . . . . . . . . . . 4

2 Dimension 5
2.1 A family of examples: Middle-α Cantor sets . . . . . . . . . . . . . . . . 6
2.2 Minkowski dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Hausdorff dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Trees and partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Using measures to compute dimension 23


3.1 The mass distribution principle . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Billingsley’s lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 A metric on symbolic space . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Measure on symbolic space . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Frostman’s lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4 Product sets 34

5 Differentiation of measures in Rd 37
5.1 The Besicovitch covering theorem . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Density and differentiation theorems . . . . . . . . . . . . . . . . . . . . 44

©2023. This is a draft! Send comments to [email protected]

1
6 Pointwise dimension of measures 49
6.1 Dimension of a measure at a point . . . . . . . . . . . . . . . . . . . . . 49
6.2 Upper and lower dimension of measures . . . . . . . . . . . . . . . . . . 52

7 Hausdorff measures 55
7.1 Hausdorff measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.2 Properties of Hausdorff measures . . . . . . . . . . . . . . . . . . . . . . 57

8 Projections (Marstrand’s theorem) 60


8.1 Dimension of projections . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
8.2 Absolute continuity of projections . . . . . . . . . . . . . . . . . . . . . . 65

9 Iterated function systems 67


9.1 Iterated function systems . . . . . . . . . . . . . . . . . . . . . . . . . . 67
9.2 Existence of the attractor . . . . . . . . . . . . . . . . . . . . . . . . . . 69
9.3 Cylinder sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
9.4 Symbolic coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
9.5 Stationary measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

10 Self-similar sets and measures 76

11 Entropy 81
11.1 The entropy function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
11.2 Conditional entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
11.3 Commensurable partitions and geometric operations . . . . . . . . . . . 85
11.4 Entropy and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
11.5 Entropy of self-similar measures . . . . . . . . . . . . . . . . . . . . . . 90

12 Components and multiscale formula for entropy 96


12.1 Component measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
12.2 Computing entropy from component entropies . . . . . . . . . . . . . . . 97

13 Additive combinatorics 98
13.1 Sumsets and inverse theorems . . . . . . . . . . . . . . . . . . . . . . . . 98
13.2 Trivial bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
13.3 Small doubling and Freiman’s theorem . . . . . . . . . . . . . . . . . . . 99
13.4 Power growth, the “fractal” regime . . . . . . . . . . . . . . . . . . . . . 100
13.5 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
13.6 Entropy growth under convolution . . . . . . . . . . . . . . . . . . . . . 103
13.7 Application to self-similar measures . . . . . . . . . . . . . . . . . . . . . 106

2
13.8 The Kaimanovich-Vershik lemma . . . . . . . . . . . . . . . . . . . . . . 112

14 Appendix 114
14.1 Integration of measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
14.2 The weak-* topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
14.3 Lifting measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

1 Introduction
1.1 What is fractal geometry?
Fractal geometry and its sibling, geometric measure theory, are branches of analysis
which study the structure of “irregular” sets and measures in metric spaces, primarily
Rd . The distinction between regular and irregular sets is not a precise one but informally,
regular sets might be understood as smooth sub-manifolds of Rk , or perhaps Lipschitz
graphs, or countable unions of the above; whereas irregular sets include just about
everything else, from the middle- 31 Cantor set (still highly structured) to arbitrary
Cantor sets (irregular, but topologically the same) to truly arbitrary subsets of Rd .
For concreteness, let us compare smooth sub-manifolds and Cantor subsets of Rd .
These two classes differ in many aspects besides the obvious topological one. Manifolds
possess many smooth symmetries; they carry a natural measure (the volume) which has
good analytic properties; and in most natural examples, we have a good understanding
of their intersections with hyperplanes or with each other, and of their images under
linear or smooth maps. On the other hand, Cantor sets typically have few or no smooth
symmetries; they may not carry a “natural” measure, and even if they do, its analytical
properties are likely to be bad; and even for very simple and concrete examples we do
not completely understand their intersections with hyperplanes, or their images under
linear maps.
The motivation to study the structure of irregular sets, besides the obvious theo-
retical one, is that many sets arising in analysis, number theory, dynamics and many
other mathematical fields are irregular to one degree or another, and the metric and
geometric properties of these objects often provides meaningful information about the
context in which they arose. At the simplest level, the theories of dimension provide a
means to compare the size of sets which coarser notions fail to distinguish. Thus the
set of well approximable numbers x ∈ R (those with bounded partial quotients) and the
set of Liouvillian numbers both have Lebesgue measure 0, but set of well-approximable
numbers has Hausdorff dimension 1, hence it is relatively large, whereas the Liouvillian
numbers form a set of Hausdorff dimension 0, and so are “rare”. Going deeper, however,
it turns out than many problems in dynamics and number theory can be formulated in

3
terms of bounds on the dimension of the intersection of certain very simple Cantor sets
with lines, or linear images of products of Cantor sets. Another connection to dynamics
arises from the fact that there is often an intimate relation between the dimension of an
invariant set or measure and its entropy (topological or measure-theoretic). Geometric
properties may allow us to single out physically significant invariant measures among the
many invariant measures of a system. Finer information encoded in an invariant mea-
sure may actually encode the dynamics which generated it, leading to rigidity results.
The list goes on.

1.2 What is this course about?

Our goal in this course is primarily to develop the foundations of geometric measure
theory, and we cover in detail a variety of classical subjects. A second goal is to present
recent advances in the theory of self-similar sets and measures, and the connection
with additive combinatorics. We also hope to present applications and interactions
with dynamics and metric number theory, and we shall accomplish this mainly by our
choices of methods, examples, and open problems which we discuss.

1.3 Prerequisites, conventions and notation

We assume familiarity with the basic results on metric spaces, measure theory and
Lebesgue integration. We work in Rd or sometimes a complete metric space, and denote
by Br (x) the closed ball of radius r around x:

Br (x} = {y : d(x, y) ≤ r}

The open ball is denoted Br◦ (x); as our considerations are rarely topological is will
appear less often. We denote the indicator function of a set A by 1A .
All sets and functions we encounter will be Borel measurable, unless otherwise
stated. Also, all measures are Radon measures unless otherwise stated: recall that
µ is Radon if it is a Borel measure taking finite values on compact sets. Such measures
are regular, i.e.

µ(E) = inf{µ(U ) : U is open and E ⊆ U }


= sup{µ(K) : K is compact and K ⊆ E}

We denote Lebesgue on R and Rd measure by Leb(·).


A measure µ on X is supported ona set A ⊆ X if µ(X \ A) = 0. If X is a separatble

4
metric space, then the support of µ is the smallest closed set of full measure, i.e.
\
supp µ = {C ⊆ Rd | C closed, µ(Rd \ C) = 0}
[
= Rd \ {U ⊆ Rd | U open, µ(U ) = 0}

In the second representation we may take the union over balls with rational radii and
centers in a countable dense set; the union then is a countable union of nullsets, and
we conclude that µ(Rd \ µ) = 0. Thus, supp µ is a set of full measure, and any open set
intersecting it by definition has positive measure. In particular if x ∈ µ then µ(Br (x)) >
0 for all r > 0.

We use standard Big-O and little-o notation. Thus O(f (t)) denotes a quantity
bounded by C · f (t) for some C > 0 and for all sufficiently large or small t (depending
on the context), and o(f (t)) denotes a quantity such that for all c > 0 is bounded by
c · f (t) for all sufficiently large or small t. For example, if g(t) = t + o(1) as t → 0 then
g(t)/t → 1.

We set N = {1, 2, 3 . . .}.

2 Dimension

In much of mathematics, the dimension of a set describes, roughly speaking, the number
of degrees of freedom one needs to parametrize the set. This is the case in linear algebra
and also in the theory of smooth manifolds.

In the theory of metric spaces, however, one generally does not have a natural
notion of parametrizations. Nevertheless one would like to have a number describing
the “size” of a set in a metric space. It turns out that one can define reasonble notions of
dimension in this more general setting which capture the intuitive meaning of dimension
and coincide with the more classical ones in the cases mentioned above. Nearly all
these notions all measure how many balls one needs to cover the set at different scales,
and often, with the right combinatorial or probabilistic interpretation, they do in fact
describe the number of degrees of freedom one has.

In this course we focus on the two main notions of dimension, the Minkowski (box)
dimension and the Hausdorff dimension. We give the definitions in general for metric
spaces, but most of our applications and some of the results in these sections will already
be special to Rd .

5
2.1 A family of examples: Middle-α Cantor sets

Before discussing dimension, we present one of the simplest families of “fractal” sets,
which we will serve to demonstrate the definitions that follow.
Let 0 < α < 1. The middle-α Cantor set Cα ⊆ [0, 1] is defined by a recursive
procedure: For n = 0, 1, 2, . . . we construct a set Cα,n which is a union of 2n closed
intervals Ii1 ,...,in , indexed by sequences i = i1 . . . in ∈ {0, 1}n , each of length ((1−α)/2)n .
To begin let Cα,0 = [0, 1] and I∅ = [0, 1] (indexed by the unique empty sequence).
Assume that Cα,n has been defined and is the disjoint union of the 2n closed inter-
vals Ii1 ...in , i1 . . . in ∈ {0, 1}n . For each one of the intervals Ii! ,...,in , remove the open
subinterval with the same center as Ii1 ...in and length α times shorter, leaving two closed
sub–intervals, one on the left, which we denote Ii1 ...in 0 , and one on the right, which we
denote Ii1 ...in 1 . We thus have defined Ij! ,...,jn+1 for all j1 , . . . , jn+1 ∈ {0, 1}n+1 , and we
define
[
Cα,n+1 = Ii
i∈{0,1}n+1

Clearly Cα,0 ⊇ Cα,1 ⊇ . . ., and since the sets are compact,


\
Cα = Cα,n
n=0

is compact and nonempty.


All of the sets Cα , 0 < α < 1 are mutually homeomorphic, since all are topologically
Cantor sets (i.e. compact and totally disconnected without isolated points). They all
are of first Baire category. And they all have Lebesgue measure 0, since one may verify
that Leb(Cαn ) = (1 − α)n → 0. Hence none of these theories can distinguish between
them.
Nevertheless qualitatively it is clear that Cα becomes “larger” as α → 0, since
decreasing α results in removing shorter intervals at each step. In order to quantify this
one uses dimension.

2.2 Minkowski dimension

Let (X, d) be a metric space, for A ⊆ X let

|A| = diam A = sup d(x, y)


x,y∈A

S
A cover of A is a collection of sets E such that A ⊆ E∈E E. A δ-cover is a E cover
such that |E| ≤ δ for all E ∈ E.

6
The simplest notion of dimension measures how many sets are needed to cover a set
as the scale tends to zero.

Definition 2.1. Let (X, d) be a metric space. For a set A and δ > 0, let N (A, δ) denote
the minimal size of a δ-cover of A, i.e.

[
k
N (A, δ) = min{k : ∃A1 , . . . , Ak ⊆ X such that A ⊆ Ai and |Ai | ≤ δ}
i=1

and set N (A, δ) if A does not admit a finite δ-cover. The Minkowski dimension of A is

log N (A, δ)
dimM (A) = lim
δ→0 log(1/δ)

provided the limit exists. We also define the upper and lower dimensions

log N (A, δ)
dimM (A) = lim sup
δ→0 log(1/δ)
log N (A, δ)
dimM (A) = lim inf
δ→0 log(1/δ)

(these always exist, though in general dimM may be ∞).

First properties

1. The δ-covering number N (A, δ) of A is finite for all δ > 0 if (and in a complete
metric space, only if!) A is compact. Even in this case the Minkowski
dimension may be infinite.
2. Clearly
dimM ≤ dimM

and dimM exists if and only if the two are equal.


3. dimM A = α ∈ R means that N (A, δ) grows approximately as δ −α as δ → 0;
more precisely, dimM A = α if and only if for every ε > 0,

δ −(α−ε) ≤ N (A, δ) ≤ δ −(α+ε) for sufficiently small δ > 0

equivalently,
N (A, δ) = δ −α+o(1) as δ → 0

4. Clearly N (A, δ) ≤ N (B, δ) when A ⊆ B. Consequently,

A⊆B −→ dimM A ≤ dimM B

7
and similarly for the upper and lower versions.

Example 2.2. .

1. A point has Minkowski dimension 0, since N ({x0 }, δ) = 1 for all δ. More generally
N ({x1 , . . . , xn }, δ) ≤ n, so finite sets have Minkowski dimension 0.

2. A box B in Rd can be covered by c · δ −d boxes of side δ, i.e. N (B, δ) ≤ cδ −d .


Hence dim B ≤ d.

3. If A ⊆ Rd and dimM A < d, then Leb(A) = 0. Indeed, given δ > 0, let


A1 , . . . , AN (A,δ) be an minimal δ-cover of A. Then Leb(Ai ) ≤ c · |Ai |d ≤ c · δ d
S
(where c > 0 is a constant depending on d), and A ⊆ Ai , so

X
N (A,δ)
Leb(A) ≤ Leb(Ai )
n=1

X
N (A,δ)
≤ c · |Ai |d
n=1

= c · N (A, δ) · δ d

Writing α = dimM A, there are arbitrarily small δ > 0 such that N (A, δ) <
δ −α+o(1) . We thus have shown that Leb(A) ≤ c · δ d−α+o(1) for δ arbitrarily close
to 0, and since d − α > 0 this implies Leb(A) = 0.

4. A line segment in Rd has Minkowski dimension 1. A relatively open bounded


subset of a plane in R3 has Minkowski dimension 2. More generally, any compact
k-dimensional C 1 -sub-manifold of Rd has box dimension k.

5. For Cα as before, dimM Cα = log 2/ log(2/(1 − α)). Let us demonstrate this.

To get an upper bound, notice that for δn = ((1 − α)/2)n the sets Cαn are covers
of Cα by 2n intervals of length δn , hence N (Cα , δn ) ≤ 2n .

If δn+1 ≤ δ < δn then clearly

N (Cα , δ) ≤ N (Cα , δn+1 ) ≤ 2n+1

On the other hand every set of diameter ≤ δ can intersect at most three maximal
intervals in Cαn+1 , hence

1 n
N (Cα , δ) ≥ · 2 ≥ 2n−2
3

8
so for δn+1 ≤ δ < δn

(n − 2) log 2 log N (Cα , δ) (n + 1) log 2


≤ ≤
(n + 1) log(2/(1 − α)) log 1/δ n log(2/(1 − α))

and so, taking δ → 0,

dimM Cα = log 2/ log(2/(1 − α))

Remark 2.3. In the last example we analyzed dimM A by examining N (A, εk ) for a
certain sequence εk → 0 (specifically εk = ρk for ρ = ((1 − α)/2)n ). The fact that this
gives the right dimension is not a coincidence, and we can formulate it in genreal as
follows.
First note that from the definition, if δ < δ ′ then N (A, δ) ≥ N (A, δ ′ ). Now let
εk & 0 and suppose εk /εk+1 ≤ C < ∞. For every δ > 0 there is a k = k(δ) such that
εk+1 < δ ≤ εk . This implies

N (A, εk+1 ) ≤ N (A, δ) ≤ N (A, εk )

The assumption implies that log(1/δ)/ log(1/εk(δ) ) → 1 as δ → 0, so the inequality


above implies the claim after taking logarithms and dividing by log(1/δ), log(1/εk ),
log(1/εk+1 ).

Proposition 2.4. Properties of Minkowski dimension

1. dimM A = dimM A

2. dimM A depends only on the metric space (A, d|A×A ).

3. If f : X → Y is Lipschitz then dimM f A ≤ dimM A, and if f is bi-Lipschitz then


dimM f A = dimM A. The same holds for upper and lower Minkowski dimensions.

Proof. By inclusion dimM A ≤ dimM A, so for the first claim we can assume that
dimM A < ∞. Then N (A, ε) = N (A, ε) for every ε > 0, because in general if A ⊆
Sn Sn
i=1 Ai then A ⊆ i=1 Ai , and if {Ai } is a δ-cover then so is {Ai }. This implies the
claim.
S S
For the second claim, note that If A ⊆ Ai for Ai ⊆ X then A ⊆ (Ai ∩ A) and
|Ai ∩ A| ≤ |Ai |, so N (A, ε) is unchanged if we consider only covers by subsets of A. In
particular the Minkowski dimension does not change if we restrict to the metric space
(A, d|A×A ).
S S
Finally if A ⊆ Ai then f (A) ⊆ f (Ai ), and if c is the Lipschitz constant of f

9
then |f (E)| ≤ c|E|. Thus N (f A, cε) ≤ N (A, ε) and the claim follows, since

log N (f A, ε)
dimM f A = lim
ε→0 log(1/ε)
log N (A, ε/c)
≤ lim
ε→0 log(1/ε)
log N (A, ε/c)
= lim
ε→0 log(1/ε) + log c
log N (A, ε/c)
= lim
ε→0 log(c/ε)
= dimM A

and similarly for the upper and lower dimensions.

The example of the middle-α Cantor sets demonstrates that Mankowski dimension is not
a topological notion, since the sets Cα all have different dimensions, but for 0 < α < 1
they are all topologically a Cantor set and therefore homeomorphic. On the other hand
the last part of the proposition shows that dimension is an invariant in the bi-Lipschitz
category. Thus,

Corollary 2.5. For 1 < α < β < 1, the sets Cα , Cβ , are not bi-Lipschitz equivalent, and
in particular are not C 1 -diffeomorphic, i.e. there is no bi-Lipschitz map f : Cα → Cβ .

Finally, let us discuss the role of the metric d. On often defines two metrics on the
same space to be equivalent if they define the same topology, i.e., the same notion of
convergence. This equivalence, however, may change dimension radially (we shall see
examples later).
Nevertheless, in Rd every two norms k·k and k·k′ not only define equivalent metrics,
but satisfy the stronger property that C −1 kvk′ ≤ kvk ≤ C kvk′ for some constant C. It
follows that the identity map from (Rd , k·k) to (Rd , k·k′ ) is bi-Lipschitz. We conclude
that

Lemma 2.6. If A ⊆ Rd then every choice of norm on Rd gives the same values of
dimM A (if it exist) and of dimM A, dimM A.

Exercises

1. Let H be an infinite-dimensional Hilbert space and let B ⊆ H denote the


unit ball. Show that dimM B = ∞.
2. Given α > 0, compute the Minkowski dimension of {0} ∪ {1/nα : n ∈ N}.
3. Show that if f : [0, 1] → R is differentiable then its graph has Minkowski
dimension 1.

10
e (A, δ) denote the size of the smallest cover of A by balls of radius δ
4. Let N
centered in A. Show that

e (A, δ)
log N
lim
δ→0 log(1/δ)

exists if and only if dimM A exists and in that case the limit and the dimension
are equal.
5. Let εk & 0 and suppose εk /εk+1 ≤ C for some C ∈ R. Show that

log N (A, εk )
lim
k→∞ log(1/εk )

exists if and only if dimM A exists and in that case the limit and the dimension
are equal.
6. (a) Give an example of εk & 0 for whic the conclusion of the previous
exercise fails.
(b) Does it always fail sup{εk /εk+1 : k ∈ N} = ∞?
7. Show that if f : X → Y is a non-Lipschitz map between metric spaces and
A ⊆ X then it may happen that dimM f (A) > dimM A.
8. Let f : X → Y be an α-Hölder map between metric spaces, i.e. there is a
constant C > 0 such that d(f (x), f (x′ )) ≤ C · d(x, x′ )α . For A ⊆ X, give a
bound for dimM f (A) in terms of α and dimM A.
Use the sets Cα to show that the bound is tight.

2.3 Hausdorff dimension


Minkowski dimension has some serious shortcomings. One would want the dimension
of a “small” set to be 0, and in particular that a countable set should satisfy this.
Minkowski dimension does not have this property. For example,

dimM (Q ∩ [0, 1]) = dimM Q ∩ [0, 1] = dimM [0, 1] = 1

One can also find examples which are closed, for instance

1
A = {0} ∪ { : n ∈ N}
n

Indeed, in order to cover A with balls of radius ε, we will need precisely one ball for
each point 1/k such that |1/k − 1/(k + 1)| > 2ε. This is equivalent to 1/k(k + 1) > 2ε,
√ √
or: k < 1/ 2ε. On the other hand all other points of A lie in the interval [0, 2ε],
√ √
which can be covered by O(1/ 2ε) ε-balls. Thus N (A, ε) ≈ 1/ 2ε, so dimM A = 1/2.

11
These examples, being countable, also demonstrate that Minkowski dimension be-
haves badly under countable unions: letting An = {1, 1/2, . . . , 1/n} ∪ {0}, we see that
A1 ⊆ A2 ⊆ . . . but

[
dimM An = 0 6→ 1/2 = dimM An
n=1

An alternative notion of dimension is the Hausdorff dimension. It also measures


how many balls are needed to cover a set, but, unlike Minkowski dimension, in which
all balls contribute equally to the count, the Hausdorff dimension gives smaller balls a
smaller weight. This makes the definition more complicated, and also makes computing
the Hausdorff dimension more difficult. But, in exchange, one gets a better bahaved
quantity that as become the main notion of dimension in fractal geometry.
To motivate the definition, recall that a set A ⊆ Rd is small in the sense of a
nullset with respect to Lebesgue measure if for every ε > 0 there is a cover of A by
P
balls B1 , B2 , . . . such that vol(Bi ) < ε. The volume of a ball B is c · |B|d , so this is
equivalent to
X
A is Lebesgue-null ⇐⇒ inf{ |E|d : E is cover of A by balls} = 0 (1)
E∈E

Since every set of diameter t is contained in a ball of diameter 2t, one may consider
general covers on the right hand side.
Now we pretend that there is a notion of α-dimensional volume. The “volume” of
a ball B would be or order |B|α , and we can define when a set is small with respect to
this “volume”:

Definition 2.7. Let (X, d) be a metric space and A ⊆ X. The α-dimensional Hausdorff
content Hα∞ is
X
Hα∞ (A) = inf{ |E|α : E is a cover of A}
E∈E

We say that A is α-null if Hα∞ (A) = 0.

Note that Hα∞ (A) ≤ |A|α so Hα∞ (A) < ∞ when A is bounded. For unbounded sets
Hα∞ may be finite or infinite.
One can do more than define α-null sets: a modification of Hα∞ leads to an “α-
dimensional” measure on Borel sets in much the same way that the infimum in (1)
defines Lebesgue measure (Hα∞ itself is not a measure when 0 < α < d, since for
example on the line we have Hα∞ ([0, 1)) + Hα∞ ([1, 2)) 6= Hα∞ ([0, 2)) for α < 1). These
measures, called Hausdorff measures, will be discussed in section 7.1, at which point
the reason for the “∞” in the notation will be explained. At this point the notion of
α-null sets is sufficient for our needs.

12
Lemma 2.8. If Hα∞ (A) = 0 then Hβ∞ (A) = 0 for β > α.
P
Proof. Let 0 < ε < 1. Then there is a cover {Ai } of A with |Ai |α < ε. Since ε < 1,
we know |Ai | ≤ 1 for all i. Hence
X X X
|Ai |β = |Ai |α |Ai |β−α ≤ |Ai |α < ε

so, since ε was arbitrary, Hβ∞ (A) = 0.

Consequently, for any A 6= ∅ there is a unique α0 such that Hα∞ (A) = 0 for α > α0
and Hα∞ (A) > 0 for 0 ≤ α < α0 (the value at α = α0 can be 0, positive or ∞).

Definition 2.9. The Hausdorff dimension dim A of A is

dim A = inf{α : Hα∞ (A) = 0}


= sup{α : Hα∞ (A) > 0}

Proposition 2.10. Properties:

1. A ⊆ B =⇒ dim A ≤ dim B.

2. A = ∪Ai =⇒ dim A = supi dim Ai .

3. dim A ≤ dimM A.

4. dim A depends only on the induced metric on A.

5. If f is a Lipschitz map X → X then dim f X ≤ dim X, and bi-Lipschitz maps


preserve dimension.

Proof. 1. Clearly if B is α-null and A ⊆ B then A is α-null, the claim follows.

2. Since Ai ⊆ A, dim A ≥ supi dim Ai by (1).


To show dim A ≤ supi dim Ai , it suffices to prove for α > supi dim Ai that A
is α-null. This follows from the fact that each Ai is α-null in the same way
that Lebesgue-nullity is shown to be stable under countable unions: for ε > 0
S P S
choose a cover Ai ⊆ j Ai,j with j |Ai,j | < ε/2 . Then A ⊆
α n
i,j Ai,j and
P ∞
i,j |Ai,j | < ε. Since ε was arbitrary, Hα (A) = 0.
α

3. Let β > α > dimM A. For sufficiently small δ > 0, there is an N < δ −α and a cover
S PN PN β −α δ β =
A⊆ N i=1 Ai with diam Ai ≤ δ. Hence i=1 (diam Ai ) ≤
β
i=1 δ ≤ δ
δ β−α . Since δ can be taken arbitrarily close to 0, we have Hβ∞ (A) = 0. Since β >
dimM A was arbitrary (for any such β we can find suitable α), dim A ≤ dimM A.

13
S S
4. This is clear since if A ⊆ Ai then A ⊆ (Ai ∩ A) and |Ai ∩ A| ≤ |Ai |. Hence
the infimum in the definition of Hα∞ is unchanged if we consider only covers by
subsets of A.
S
5. If c is the Lipschitz constant of f then |f (E)| ≤ c|E|. Thus if A ⊆ Ai then
S P P
f (A) ⊆ f (Ai ) and |f (Ai )|α ≤ cα |Ai |α . Thus Hα∞ (f (A)) ≤ cα Hα∞ (A) and
the claim follows.

It is often convenient to restrict the sets in the definition of Hausdorff content to


specific families of sets, such as balls or cubes. The following easy result allows us to
do this. Let E be a family of sets and for A ⊆ X define
X
Hα∞ (A, E) = inf{ |Ei |α : {Ei }∞
i=1 ⊆ E is a cover of A}

Lemma 2.11. Let E be a family of subsets of X and suppose that there is a constant
C such that every bounded set A ⊆ X can be covered by ≤ C elements of E, each of
diameter ≤ C|A|. Then for every set A ⊆ X and every α > 0,

Hα∞ (A) ≤ Hα∞ (A, E) ≤ C 1+α Hα∞ (A) (2)

In particular Hα∞ (A) = 0 if and only if Hα∞ (A, E) = 0, hence

dim A = inf{α : Hα∞ (A, E) = 0}


= sup{α : Hα∞ (A, E) > 0}

Proof. The left inequality in (2) is immediate from the definition, since the infimum in
the definition of Hα∞ (A, E) is over fewer covers than in the definition of Hα∞ (A). On
the other hand if F is a cover of A then we can cover each F ∈ F by ≤ C sets E ∈ E
P
with |E| ≤ C|F |. Taking the collection F ′ ⊆ E of these sets we have F ∈F ′ |F |α ≤
P
C 1+α F ∈F |F |α , giving the other inequality. The other conclusions are immediate.

In particular, the family of open balls, and the family of closed balls, both satisfy
the hypothesis, and we shall freely use them in our arguments.

Example 2.12. 1. A point has dimension 0, so by stability under countable unions,


countable sets have dimension 0. This shows that the inequality dim ≤ dimM can
be strict.

2. Any A ⊆ Rd has dim A ≤ d. It suffices to prove this for bounded A since we can
S
write A = D∈D1 A ∩ D, and by countable stability it is enough to deal with each

14
A ∩ D separately. For bounded A, let A ⊆ [−r, r]d for some r. Then

dim A ≤ dim[−r, r]d ≤ dimM [−r, r]d = d

3. [0, 1]d has dimension at least 1, and more generally any set in Rd of positive
measure Lebesgue, has dimension at least d. This follows since Hd (A) = 0 if and
only if Leb(A) = 0.

4. Combining the last two examples, any set in Rd of positive Lebesgue measure has
dimension d.

5. A set A ⊆ Rd can have dimension d even when its Lebesgue measure is 0. In-
deed, we shall later show that Cα has the same Hausdorff and Minkowski di-
S
mensions. Let A = n∈N C1/n . Then dim C ≤ 1 because A ⊆ [0, 1], but
dim A ≥ supn dim C1/n = 1. Hence dim A = 1. On the other hand Leb(C1/n ) = 0
for all n, so Leb(A) = 0.

6. A similar argument, we can show that a k-dimensional C 1 sub-manifold M of Rd ,


has Hausdorff dimension k. We get an upper bound by estimating the Minkowsky
dimension (e.g. thining of M locally as a Lipschitz graph); for the lower bound
one can use a volume form given by the local coordinates to argue as we did in
the last example.

7. A real number x is Liouvillian if for every n there are arbitrarily large integers
p, q such that
p 1
|x − | < n
q |q|
These numbers are extremely well approximable by rationals and have various in-
teresting properties, for example, irrational Liouville numbers are transcendental.
Let L ⊆ [0, 1] denote the set of Liouville numbers. We claim that dim L = 0. Let

p 1
Ln = {x ∈ [0, 1] : |x − | < n for arbitrarily large q and p }It
q q

Since L ⊆ Ln , it suffices to show that dim Ln → 0 as n → ∞.


In fact we will show that for any α > 0, if n > 2/α. then Hα∞ (Ln ) = 0, which is
enough.
Fix α > 0 and n > 2/α. Write

p 1
Ln,k = {x ∈ [0, 1] : |x − | < n for some q > k and some 0 ≤ p ≤ q}
q q

15
T
Evidently Ln = k∈N Ln,k and Ln ⊆ Ln,k for all k. Therefore it suffices that we
prove that Hα∞ (Ln,k ) → 0 as k → ∞.
For k fixed and q > k, the set

p 1
Ln,k,q = {x ∈ [0, 1] : |x − | < n for some 0 ≤ p ≤ q}
q q

consists of q + 1 open intervas Iq,0 , . . . , Iq,q of length 2 · q −n , centered at points of


the form p/q. Therfore the collection {Iq,p : q > k, 0 ≤ p ≤ q} is a cover of Ln,k .
It follows that
X X
Hα∞ (Ln,k ) ≤ |Iq,p |α
q>k 0≤p≤q
X X
= (2q −n )α
q>k 0≤p≤q
X
= (q + 1)(2q −n )α
q≥k

Since n > 2/α there is an ε > 0 such that α > (1 + ε)2/n, hence
X
≤ 2α (q + 1)q −2−ε
q>k
X
= O(q −1−ε )
q>k

This is the tail of a convergent series, so it tends to 0 as k → ∞, as desired..


Remark: Since
\ \ [
L= Ln,k,q
n∈N k∈N q>k

and Ln,k,q is open, we see that L is a Gδ subset (countable intersection of open


sets) and it is dense. Thus, from the point of view of Baire category theory, L is
a very large subset of [0, 1]. Nevertheless dim L = 0. This shows that topological
largeness does not imply large Hausdorff dimension (of course, density implies
that the Minkowski dimension is the same as that of the whole space).

Exercises

1. Show that Hα∞ ([0, 1)) + Hα∞ ([1, 2)) 6= Hα∞ ([0, 2)) for 0 < α < 1.
2. Is it true that dim A = dim A for every A ⊆ R?
S
3. Show that if A1 ⊆ A2 ⊆ . . . then dim Ai → dim( Ai ) as i → ∞. Show that
the analogous statement for decreasing chains and intersections is false.

16
4. Let A ⊆ R2 be the graph of a differentiable function f : [a, b] → R for some
a < b. Show that dim A = 1.

5. Show that if in the definition of Hα∞ we allow only balls of radius 2−n , n ∈ N,
and define dimension in terms of this new quantity, then we obtain Hausdorff
dimension again.

6. Let f : X → Y be an α-Hölder map between metric spaces, i.e. there is a


constant C > 0 such that d(f (x), f (x′ )) ≤ C · d(x, x′ )α . For A ⊆ X, give a
bound for dimM f (A) in terms of α and dimM A.

2.4 Trees and partitions

A useful and powerful tool in fractal geometry is to model metric spaces using trees.
This idea, which takes many forms, not only provides a convenient heuristic but also,
when formalized, strong analytical tools. In this section we consider the simplest case,
but we shall return to similar ideas often.
Consider the interval [0, 1]. We can identify it with the space of infinite binary
sequences using the binary expansion: Let

π : {0, 1}N → [0, 1]

denote the map



X
π(ω) = ωn 2−n
n=1

This is not a bijection because rationals of the form k/2n have two binary expansions,
one ending in a constant string of 0s a the other in 1s. However this rarely poses a
problem, as we shall see.
The set {0, 1}N can be viewed as the space of maximal infinite paths in the full
binary tree. If we write {0, 1}∗ for the set of finite binary sequences (including the empty
sequence ∅), then the elements of {0, 1}∗ form the nodes of a tree, with edges between
each word w = w1 . . . wn {0, 1}∗ and its extensions, w1 . . . wn 0 and w1 . . . wn 1. An infinite
sequence w! w2 w . . .3 corresponds uniquely to the infinite path (w|n )∞
n=0 starting at the
root, where w|n = w1 . . . wn is the initial segment of length n of w.
Each verticex w ∈ {0, 1}∗ defines a cylinder set, denoted [w], consisting of all paths
from the root passing through w:

[w1 . . . wn ] = {v ∈ {0, 1}N : v1 . . . vn = w1 . . . wn }

17
Then ( )
x has a binary expansion
π[w1 . . . wn ] = x ∈ [0, 1] :
starting with ww . . . wn

This is the closed interval of length 2−n whose left endpoint is k/2n , where k =
Pn −i+1 is the integer with binary expansion w w . . . w .
i=1 wi 2 1 2 n

The family of sets


Cn = {[w] : w ∈ {0, 1}n }

forms a partition of {0, 1}N into 2n sets, but the intervals π[w] corresponding to [w]
in [0, 1] do not form a partition, because adjascent pairs intersect at their endpoints.
Nevertheless, for many purposes, one wants a comparable partiiton of Dn of [0, 1) or of
R. We therefore introduce for each n the partition D2n of R into half-open intervals,
 
k k+1
D 2n = [ n, n ) : k ∈ Z
2 2

This induces a partition of [0, 1) which we denote in the same manner.

Observe that if I = [k/2n , (k + 1)/2n ) ∈ Dn , then π −1 D consists of a cylinder set


[w1 . . . wn ], where w1 w2 . . . w1 is the binary representation of k, and 0.w1 . . . wn 000 . . .
is the representation of k/2n terminating in 0’s, together with the other preimage
η1 . . . ηn 1111 . . . of k/2n ;unless k = 0, in which case π −1 (I) is only the culinder set
[w]. Either way, we have
π(π −1 [w]) = I

Also note that in both cases π −1 (I) can be covered by at most two elements of Cn .

We return now to dimension. Note that if we wish to cover a sets by the elements
of a partition, we have only one way to do it, namely, to take the partition elements
that intersect the set non-trivially. This makes such covers easier to work than covers
by balls, where there may be many choices.

Definition 2.13. For a partition E of a set X, the covering number of A ⊆ X is

N (A, E) = #{E ∈ E : E ∩ A 6= ∅}

The following lemma is the reason that Minkowski dimension is sometimes called
box dimension.

18
Lemma 2.14. 1. For A ⊆ [0, 1],

log N (X, D2n )


dimM A = lim
n→∞ n log 2
N (π −1 A, Cn )
= lim
n→∞ n log 2

provided one side (equivalently the other side) exists, and similalry for dimM and
dimM .

2. For E ⊆ {0, 1}N we have

N (E, Cn )
dimM πE = lim
n→∞ n log 2

provided one side (equivalently the other side) exists,and similalry for dimM and
dimM .

Proof. Since every D ∈ D2n satisfies |D| = 2−n , and since every set B with |B| ≤ 2−n
can be covered by at most 2 intervals I ∈ D2n we find that

N (A, D2n ) ≤ N (A, D2n ) ≤ 2 · N (A, D2n )

Upon dividing by 2 log n (and interpolating for scales between 2−n and 2−n−1 ), this
proves the first equality.
For the second inequality, note that for I ∈ D2n , the set π −1 I is covered by either
one or two generation-n cylinder sets. Thus, for A ⊆ [0, 1),

N (A, 2−n ) ≤ N (π −1 A, Cn ) ≤ 2 · N (A, 2−n )

and the second equality follows as before.


Finally, since a non-empty subset of π[w1 . . . wn ] is covered by one or two elements
of Dn , for E ⊆ {0, 1}N we have

N (E, Cn ) ≤ N (πE, D2n ) ≤ 2 · N (E, Cn )

and the second statement follows.

The analogous statement for Hausdorff dimension follows from Lemma 2.11. The
proof is left to the reader.
Finally, everything we have done here can be generalized to Rd and to expansions
in other bases.

19
Definition 2.15. Let b ≥ 2 be an integer. The partition of R into b-adic intervals is

k k+1
Db = {[ , ) : k ∈ Z}
b b

The corresponding partition of Rd into b-adic cubes is

Dbd = {I1 × . . . × Id : Ii ∈ Db }

(We suppress the superscript d when it is clear from the context).

2.5 Examples

Example 2.16. Let E ⊆ N. The upper and lower densities of E are

1
d(E) = lim sup |E ∩ {1, . . . , n}|
n→∞ n
1
d(E) = lim inf |E ∩ {1, . . . , n}|
n→∞ n

(here | · | denotes cardinality). Let

ΩE = {ω ∈ {0, 1}N : ∀n ∈ E ωn = 0}

and

XE = πΩE = {x ∈ [0, 1] : x has a binary expansion with 0’s at all positions n ∈ E}

Then
dimM XE = d(N \ E) , dimM XE = d(N \ E)

Remark 2.17. 1. The Hausdorff fimension of XE is harder to compute directly. We


have the general bound dim XE ≤ dimM XE . In fact, we have equality, but we
postpone the proof to the next section.

2. One can produce sets E ⊆ N with d(E) < d(E). This shows that the lower and
upper Minkowski dimension need not coincide. There are even sets with d(E) = 0
and d(E) = 1, so we can have dimM X = 0 and dimM X = 1.

Proof. We claculate the covering numbers in the symbolic model. Clearly

N (ΩE , Cn ) = 2|{1,...,n}\E|

this is just the number of binary sequences of length n with 0’s in the positions in E).

20
Hence
log N (ΩE , Cn ) |{1, . . . , n} \ E| |{1, . . . , n} ∩ (N \ E)|
= =
n n n
taking lim sup or lim inf gives the claim.

Example 2.18. Let Nk = k!, and define two sets:

Xeven = {0.x1 x2 x3 . . . : xn = 0 if ∃k N2k ≤ n < N2k+1 }


Xodd = {0.x1 x2 x3 . . . : xn = 0 if ∃k N2k−1 ≤ n < N2k }

and

Ωeven = {ω ∈ {0, 1}N : xn = 0 if ∃k N2k ≤ n ≤ N2k+1 }


Ωodd = {ω ∈ {0, 1}N : xn = 0 if ∃k N2k−1 ≤ n ≤ N2k }

Finally let

X = Xeven ∪ Xodd
Ω = Ωeven ∪ Ω∞odd

Note that Ω does not contain sequences ending in all 1’s, so in this example, π : Ω → X
is a bijection.
We claim that dimM = 1, dimM X = 1/2 and dim X = 0.
We shall do all computations in the symbolic model.
First consider N (Ωeven , CN2k ). Since the symbols at coordinates [N2k−1 , . . . , N2k )
are not constrained, we see that

N (Ω, CN2k ) ≥ N (Ωeven , CN2k ) ≥ 2N2k −N2k−1 = 2(2k)!−(2k−1)!

so
log N (X, 2−N2k ) (2k)! − (2k − 1)! (2k − 1)!
N
≥ =1− →1
log 2 2k (2k)! 2k!
Thus dimM X ≥ 1, and of course there is equality (since X ⊆ [0, 1]).
Next, consider N (Ω, C2N2k ) . Clearly

N (Ω, C2N2k ) ≤ N (Ωeven , C2N2k ) + N (Ωodd , C2N2k )

Since points in Ωeven have all coordinates from N2k+1 to 2N2k equal to zero, we have

N (Ωeven , C2N2k ) = N (Ωeven , CN2k ) ≤ 2N2k

21
On the other hand, points in Ωodd have all coordinates from N2k−1 to N2k − 1 equal to
0, and no restrictions on coordiantes from N2k to 2N2k , we have

N (Ωeven , C2N2k ) = N (Ωeven, CN2k−1 ) · 2N2k ≤ 2N2k +N2k−1

Thus
2N2k ≤ N (X, 2−2N2k ) ≤ 2N2k + 2N2k +N2k−1 ≤ 2 · 2N2k +N2k−1

so
log N (X, 2−2N2k ) log 2 + N2k + N2k−1 1
2N
≤ →
log 2 2k 2N2k 2
Hence dimM ≤ 1/2. One can show that this is an equality, by considering scales between
Nℓ and 2Nℓ and separately between 2Nℓ and Nℓ+1 , and noting that in both cases
the relative number of levels of the tree at which nodes have two children goes down
compared to the case analyzed above. We leave the details to the reader.
Finally, for δ > 0 and k ∈ N consider an optimal cover Ek ⊆ CN2k of Ωodd , and an
optimal cover Fk ⊆ CN2k+1 of Xeven . Since

N (Ωodd , ΩN2k ) = N (Ωodd , ΩN2k−1 ) ≤ 2N2k−1


N (Ωeven , CN2k+1 ) = N (Ωeven , CNsk ) ≤ 2N2k

we conclude that
X X X
|I|δ = |I|δ + |I|δ
I∈Ek ∪Fk I∈Ek I∈Fk

≤ |Ek |2−N2k δ + |Fk |2−N2k+1 δ


≤ 2N2k−1 · 2−N2k δ + 2N2k 2−N2k+1 δ
= 2N2k−1 (1−δ2k) + 2N2k (1−δ(2k+1))
→0 as k → ∞

It follows that Hδ∞ (X) = 0 for every δ > 0, so dim X = 0.

Exercises

1. Construct a set E ⊆ N with d(E) < d(E), completing the proof that there exist
sets with dimM A < dimM A.

2. Show that for any 0 ≤ α < β <≤ 1 there is a set A ⊆ [0, 1] with dimM A = α and
dimM A = β.

22
3 Using measures to compute dimension
The Mankowski dimension of a set is often straightforward to compute, and gives an
upper bound on the Hausdorff dimension. Lower bounds on the Hausdorff dimension
are trickier to come by. The main method to do so is to introduce an appropriate
measure on the set. In this section we discuss some relations between the dimension of
sets and the measures support on them.

3.1 The mass distribution principle

Definition 3.1. A measure µ is α-regular if µ(Br (x)) ≤ C · rα for every x, r.

For example, Lebesgue measure on Rd measure is d-regular. The length measure on


a line in Rd is 1-regular.

Proposition 3.2. Let µ be an α-regular measure and µ(A) > 0. Then dim A ≥ α.

Proof. We shall show that Hα∞ (A) ≥ C ′ · µ(A) > 0, from which the result follows. Note
that every bounded E ⊆ X is contained in a ball of radius |E|, so µ(E) < C · |E|α .
S
Therefore, if A ⊆ ∞i=1 Ai then

X X
|Ai |α ≤ C −1 µ(Ai ) ≥ C −1 µ(A)

This shows thar Hα∞ (A) ≥ C −1 µ(A) > 0, as claimed.

We can now complete the calculation of the dimension of Cα . Write

log 2
β=
log(2/(1 − α))

We already saw that dimM Cα ≤ β so, since dim Cα ≤ dimM Cα , we have an upper
bound of β on dim Cα .
Let µ = µα on Cα denote the measure which gives equal mass to each of the 2d
intervals in the set Cαn introduced in the construction of Cα . Let δn = ((1 − α)/2)n be
the length of these intervals. Then for every x ∈ Cα , one sees that Bδn (x) contains one
of these intervals and at most a part of one other interval, so

µ(Bδn (x)) ≤ 2 · 2−n = C · δnβ

Using the fact that Bδn+1 (x) ⊆ Br (x) ⊆ Bδn (x) whenever δn+1 ≤ r < δn for x ∈ Cα we
have
2 β β
µ(Br (x)) ≤ µ(Bδn (x)) ≤ C · δnβ ≤ C · ( ) · δn+1 ≤ C ′ rβ
1−α

23
Hence by the mass distribution principle, dim Cα ≥ β. Since this is the same as the
upper bound, we conclude dim Cα = β.
Specializing to Rd , the analogous results are true if we define regularity in terms of
the mass of b-adic cubes rather than balls:

Definition 3.3. µ is α-regular in base b if µ(D) ≤ C · b−αn for every D ∈ Dbn .

Proposition 3.4. If µ is α-regular in base b then dim µ ≥ α.

We leave the proof as an exercise.

Example 3.5. Let E ⊆ N and let XE



X
XE = { 2−n xn : xn = 0 if n ∈ E and xn ∈ {0, 1} otherwise}
n=1

In Example 2.16 we saw that dimM E = d(E) = lim inf n1 |E ∩ {1, . . . , n}|. We claim
that this is also the Hausdorff dimension. Since dim XE ≤ dimM XE = d(E), we need
to show the lower bound.
We may assume N \ E in infinite, since if not then XE is finite and the claim is
trivial. Let ξn be independent random variables where ξn ≡ 0 if n ∈ E and Xn ∈ {0, 1}
with equal probabilities if n ∈ N \ E. The random real number ξ = 0.ξ1 ξ2 . . . belongs
to XE so, since XE is closed, the distribution measure µ of ξ is supported on XE
(that is, the measure µ(A) = P(ξ ∈ A)). Hence µ gives positive mass only to those
D ∈ Dk whose interiors intersect XE , and all such intervals are given equal mass,
namely µ(D) = 2−|{1,...,n}\E| . If α < d(E) then by definition nα < |E ∩ {1, . . . , n}| for
all large enough n, and hence there is a constant Cα such that

µ(D) ≤ Cα · 2−αk = Cα · |D|α for all D ∈ Dk

so µ is α-regular in the dyadic sense. Since µ(XE ) = 1, by the mass distribution


principle, dim XE ≥ α. Since this is true for all α < d(E), we have dim XE ≥ d(E), as
required.

Exercises

1. Prove Proposition 3.4.

2. For E ⊆ N and XE as defined at the end of Section 2.2, compute dim XE .

24
3.2 Billingsley’s lemma

In Rd there is a very useful generalization of the mass distribution principle due to


Billingsley, which also gives a lower bound on the dimension. We formulate it using
b-adic cubes, although the formulation using balls holds as well.
We write Dn (x) for the unique element D ∈ Dn (x) containing x, so that Dbn (x),
n = 1, 2, . . ., is a sequence of dyadic cubes decreasing to x. We also need the following
lemma, which is one of the reasons that working with b-adic cubes rather than balls is
so useful:

S∞
Lemma 3.6. Let E ⊆ n=0 Dbn
be a collection of b-adic cubes. Then there is a sub-
S S
collection F ⊆ E whose elements are pairwise disjoint and F = E.

Proof. Let F consist of the maximal elements of E, that is, all E ∈ E such that if E ′ ∈ E
then E 6⊆ E ′ . Since every two b-adic cubes are either disjoint or one is contained in
S
the other, F is a pairwise disjoint collection, and for the same reason, every x ∈ E is
S S
contained in a maximal cube from E, hence F = E.

Proposition 3.7 (Billingsley’s lemma). If µ is a finite measure on Rd , A ⊆ Rd with


µ(A) > 0, and suppose that for some integer base b ≥ 2,

log µ(Dbn (x))


α1 ≤ lim inf ≤ α2 for every x ∈ A (3)
n→∞ −n log b

Then α1 ≤ dim A ≤ α2 .

Proof. We first prove dim A ≥ α1 . Let ε > 0. For any x ∈ A there is an n0 = n0 (x)
depending on x such that for n > n0 ,

µ(Dbn (x)) ≤ (b−n )α1 −ε

Thus we can find an n0 and a set Aε ⊆ A with µ(Aε ) > 0 such that the above holds for
every x ∈ Aε and every n > n0 . It follows that µ|Aε is (α1 − ε)-regular with respect to
b-adic partitions, and hence dim Aε ≥ α1 −ε. Since dim A ≥ dim Aε and ε was arbitrary,
dim A ≥ α1 .
Next we prove dim A ≤ α2 . Let ε > 0 and fix n0 . Then for every x ∈ A we can find
an n = n(x) > n0 and a cube Dx ∈ Dbn (x) such that µ(Dx ) ≥ (b−n )α2 +ε . Apply the
lemma to choose a maximal disjoint sub-collection {Dxi }i∈I ⊆ {Dx }x∈A , which is also

25
a cover of A. Using the fact that |Dxi | = C · b−n(xi ) , and writing C ′ = C α2 +2ε , we have
X
H∞
α2 +2ε
(A) ≤ |Dxi |α2 +2ε
i∈I
X
= (C · b−n(xi ) )α2 +2ε
i∈I
X
≤ C′ (b−n(xi ) )ε (b−n(xi ) )α2 +ε
i∈I
X
≤ C ′ b−εn0 µ(Dxi )
i∈I
≤ b−εn0 · C µ(Rd ) ′

Since µ is finite and n0 was arbitrary, we find that H∞


α2 +2ε (A) = 0. Hence dim A ≤

α2 + 2ε and since ε was arbitrary, dim A ≤ α2 .

Remark 3.8. The condition that the left inequality in (3) hold for every x ∈ A can be
relaxed: if it holds on a set A′ ⊆ A of positive measure, then the proposition implies
that dim A′ ≥ α1 , so the same is true of A.
In order to conclude dim A ≤ α2 , however, it is essential that (3) hold at every point.
Indeed every non-empty set supports point masses, for which the inequality holds with
α2 = 0, and this of course implies nothing about the set.

As an application we shall compute the dimension of sets of real numbers with


prescribed frequencies of digits. For concreteness we work in base 10. Given a digit
0 ≤ u ≤ 9 and a point x ∈ [0, 1], let x = 0.x1 x2 x3 . . . be the decimal expansion of x and
write
1
fu (x) = lim #{1 ≤ i ≤ n : xi = u}
n→∞ n

for the asymptotic frequency with which the digit u appears in the expansion, assuming
that the limit exists.
A number x is called simply normal if fu (x) = 1/10 for all u = 0, . . . , 9. Such
numbers may be viewed as having the statistically most random decimal expansion
(“simple” because we are only considering statistics of single digits rather than blocks
of digits. We will discuss the stronger version later.). It is a classical theorem of Borel
that for Lebesgue-a.e. x ∈ [0, 1] is simply normal; this is a consequence of the law
of large numbers, since when the digit functions xi : [0, 1] → {0, . . . , 9} are viewed as
random variables, they are independent and uniform on {0, . . . , 9}.
However, there are of course many numbers with other frequencies of digits, and it
is natural to ask how common this is, i.e. how large these sets are. Given a probability

26
vector p = (p0 , . . . , p9 ) let

N (p) = {x ∈ [0, 1] : fu (x) = pu for u = 0, . . . , 9}

Also, the Shannon entropy of p is

X
9
H(p) = − pi log pi
i=0

where 0 log 0 = 0 and the logarithm by convention is in base 2.

Proposition 3.9. dim N (p) = H(p)/ log 10.

e denote the product measure on {0, . . . , 9}N with marginal p, and let µ
Proof. Let µ
P
e by (u1 , u2 , . . .) 7→ ∞
denote the push-forward of µ −i
u=1 ui 10 . In other words, µ is the
distribution of a random number whose decimal digits are chosen i.i.d. with marginal
p.
For x = 0.x1 x2 . . . it is clear that µ(D10n (x)) = px1 px2 . . . pxn , so if x ∈ N (p) then

1X
n
log µ(D10n (x)) 1
= − · log pxi
−n log 10 log 10 n
i=1
9  
1 X 1
= − #{1 ≤ i ≤ n : xi = u} · log pu
log 10 n
u=0

1 X
9
−−→ − fu (x) · log pu
n∞ log 10
u=0

1 X9
= (− pu log pu )
log 10
u=0
1
= H(p)
log 10

The claim now follows from Billingsley’s lemma.

Corollary 3.10. The dimension of the non-simply-normal numbers is 1.

Proof. Let pε = (1/10 − ε, . . . , 1/10 − ε, 1/10 + 10ε). Then H(pε ) → log 10, and so
dim N (pε ) → 1. Since N (pε ) is contained in the set of non-simply-normal numbers, the
conclusion follows.

Exercises

1. Show that the set of numbers for which the digit frequencies does not exist is 1.

27
3.3 A metric on symbolic space
For a finite set Λ, the space ΛN can be given the metric

d(ω, η) = 2−n for n = min{k ≥ 0 : ωk+1 6= ηk+1 }

This metric is compatible with the product topology, which is compact. In this metric,
as sequence w(k) ∈ ΛN converges to w if and only if for every ℓ, w(k) |ℓ = w|ℓ for all large
enough k.
Lemma 3.11. ΛN is compact in the metric d.
Proof. Let (w(n) )∞ N
n=1 ⊆ Λ be a sequence. We must show that it has a convergent
subsequence.
Write w(0,n) = w(n) . Some elements u1 ∈ Λ appears in infinitely many of the
sequences w(0,n) as the first symbol; so we can choose a subsequence (w(1,n) )∞
n=1 of
(w(0,n) )∞
n=1 whose members all start with u1 .
Next, define u2 to be a symbol apperaing as the second symbol of in infinitely
many of the elements w(1,n) , and let (w(2,n) )∞
n=1 be a subsequence of (w
(1,n) )∞ whose
n=1
elemeents all have u2 in the second coordinate.
Continue in this way inductively: Given (w(k,n) )∞
k=1 we define a subsequence (w
(k+1,n) )∞
n=1
of (w(k,n) )∞
n=1 consisting of elements that all have some fixed uk+1 ∈ Λ in their k + 1-th
coordinate.
Finally, the sequence (w(n,n) )∞
n=1 is a subsequence of the original seuqence (w
(n) ),

and for all n we have w(n,n) |n = u1 . . . un , so w(n,n) → u ∈ ΛN .

The cylinder sets

[ω1 . . . ωn ] = {η ∈ ΛN : η1 . . . ηn = ω1 . . . ωn }

have diameter 2−n and they are both open and closed; they are closed because any
sequence of points in [w] begine with w, and so every limit point must also begin with
w. They are open because
[
[w] = ΛN \ [η]
η∈Λ∗ \{w}

so [w] is the complement of a finite union of closed sets, and is hence closed. Further-
more, it is not hard to see that every ball in the metric d is a cylinder set: if w ∈ ΛN
then

B2−n (w) = {τ ∈ ΛN : d(τ, w) ≤ 2−n )


= [ω1 . . . ωn ]

28
and since all distances in ΛN are of the form 2−n or 0, for every 2−(n+1) < r < 2−n we
have Br (w) = Br◦ (w) = B2−(n+1) (w).

3.4 Measure on symbolic space


We write M(ΛN ) for the set of positive finite Borel measures on ΛN .
Let An denote the algebra generated by the cylinders [a] for a ∈ Λn . Since for k ≤ n
every C ∈ Ck is the disjoint union of the cylinders C ′ ∈ Cn intersecting C, it follows
easily that An is the family of finite unions of elements of Cn . In particular all elements
of An are open and compact.
Each An is a finite algebra and hence a σ-algebra. Since An ⊆ An+1 , the family
S
A= ∞ n=1 An is a countable algebra that is not a σ-algebra. However,

Lemma 3.12. Every finitely additive measure µ on A extends to a σ-additive measure


on the Borel sets of ΛN .

Proof. Since A consists of open sets and contains all cylinder sets (i.e. all balls) it
generates the Borel σ-algebra. Since µ is finitely additive, the statement will follow if
we show that (ΛN , A, µ) satisfies the conditions of the Caratheodory extension theorem,
S
namely, that if A1 , A2 , . . . ∈ A are pairwise disjoint and A ∈ A, and if A = Ai , then
P
µ(A) = µ(Ai ).
Indeed, A is a finite union of (closed) cylinder sets, hence is itself closed, and there-
fore, compact; and the Ai are unions of (open) cylinder sets, so they are open; combining
these observations, by compcatness there exists a finite sub-cover {Ai }i∈I of A; but since
S
the Ai are disjoint and A = Ai , we conclude that Aj = ∅ for j ∈ N \ I; finally, by
disjointness and finite additivity of µ,
[
µ(A) = µ( Ai )
i∈I
X
= µ(Ai )
i∈I
X∞
= µ(Ai ) because µ(Ai ) = 0 for i ∈
/I
i=1

as desired.

The previous lemma is the reason that working in ΛN is more convenient than
S
working in [0, 1]d . In the latter space the union D2n is also a countable algebra, but
the extension theorem doesn’t automatically hold.

Definition 3.13. For µn , µ ∈ M(ΛN ), we write µn → µ if µn (C) → µ(C) for every


cylinder set C.

29
Lemma 3.14. For n ∈ N, let µn ∈ M(ΛN ) with µn (ΛN ) ≤ 1. Then there is a subse-
quence nk → ∞ and µ ∈ M(ΛN ) such that µnk → µ.
S
Proof. Since A = An is countable, a diagonal argument similar to the one in the
previous lemma lets us define a subsequence (µnk )∞
k=1 of (µn ) such that µnk (A) converges
for all A∈ A.
Define µ : A → [0, 1] by

µ([a]) = lim µnk ([a]) for a ∈ Λ∗


n→∞

For any two disjoint sets A′ , A′′ ∈ A we have A′ , A′′ ∈ Ank for all large enough k, hence
µnk (A′ ∪ A′′ ) = µnk (a′ ) + µnk (A′′ ) for all large enough k. Taking the limit as k → ∞
the same holds for µ, so µ is finitely additive, and by the previous lemma it extends to
a countably additive Borel measure.

Lemma 3.15. If Y ⊆ ΛN is closed, µn ∈ M(ΛN ) are supported on Y , and µn → µ,


then µ is supported on Y .

Proof. ΛN \ Y is open, so it is a union of cylinder sets C1 , C2 , . . ..


For evey k, since µn is supported on Y , we have µn (Ck ) = 0, so also µ(Ck ) =
lim µn (Ck ) = 0.
Thus
[
µ(ΛN \ Y ) = µ( Ck ) = 0

so µ is supported on Y .

With the exception of Lemma 3.12, everything we did here can be done in a general
compact metric space (X, d). Then convergence of measures µn → µ is defined by the
R R
condition that f dµn → f dµ for all f ∈ C(X); this definition is equivalent to ours
for ΛN , and is called weak-* convergence. Using separability of C(X), one can prove
sequential compactness for this notion of convergence. Using seperability of X, one can
also establish the analog of Lemma 3.15.

Definition 3.16. Let (X, A), (Y, B) be measurable spaces and f : X → Y a measurable
map. The push-forward of a measure µ on (X, A) through f is the measure f∗ µ on (Y, B)
defined by
(f∗ µ)(B) = µ(f −1 (B))

Exercises

1. If µ is an α-regular measure on a metric space (X, d) and f : X → Y , what can one


say about the regularity of µ assuming that f is Lipschitz or that it is γ-Hölder?

30
R R
2. For µn , µ ∈ M(ΛN ), show that µn → µ if and only if f dµn → f dµ for every
f ∈ C(ΛN ).

3. Show that is ω (n) ∈ ΛN and ω (n) → ω then δω(n) → δω .

4. A neasyre is atomic if it is a linear combination of delta masses. Is the limit of


atomic measures on ΛN also atomic?

5. If Y ⊆ ΛN is closed and µn → µ in ΛN , is it true that µn (Y ) → µ(Y )?

3.5 Frostman’s lemma


In the examples above we were fortunate enough to find measures which gave optimal
lower bounds on the dimension of the sets we were investigating, allowing us to compute
their dimension. It turns out that this in not entirely a matter of luck.

Theorem 3.17 (Frostman’s “lemma”). If X ⊆ Rd is closed and Hα∞ (X) > 0, then
there is an α-regular probability measure supported on X.

Corollary 3.18. If dim X = α then for every 0 ≤ β < α there is a β-regular probability
measure µ on X.

Proof of the Corollary. IF β < α and dim A = α then by definition, Hβ∞ (A) > 0, and
the claim follows from the theorem.
S
The corollary is not true for β = α. Indeed, if X = Xn and dim Xn = α − 1/n
then dim X = α, but any α-regular measure µ must satisfy µ(Xn ) = 0 for all n (since
if µ(Xn ) > 0 then dim Xn ≥ α by the mass distribution principle), and hence µ(X) ≤
P
µ(Xn ) = 0.
In order to prove the theorem we may assume without loss of generality that X ⊆
[0, 1]d . Indeed we can intersect X with each of the level-0 dyadic cubes, writing X =
S ∞
D∈D0 X ∩ D, and we saw the he proof of Proposition 2.10 that if Hα (X ∩ D) = 0 for
each D in the union then Hα∞ (X) = 0. Thus there is a D ∈ D0 for which Hα∞ (X∩D) > 0,
and by translating X we may assume that D = [0, 1]d .
For the proof, it is convenient to transfer the problem to the symbolic setting. Let
Λ = {0, 1}d . Then ΛN can be identified with {0, 1}N , where ω ∈ ΛN is identified with
the d-tuple of sequences obtained by projecting ω to each coordinate of the space Λ.
Define

π d :ΛN → [0, 1]d

by
(ω (1) , . . . ,(d) ) 7→ (π(ω (1) ), . . . , π(ω (d) ))

31
Then π d maps Λd onto [0, 1]d . One may verify that

• For D ∈ Dn , the set (π d )−1 (D) can be covered by 2d cylinder sets from Cn .

• For C ∈ Cn , the set π d (C) can be covered by 2d sets from Dn .

Lemma 3.19. 1. If Y ⊆ ΛN is closed and X = πY (in particular, if Y = π −1 (X)),


then
Hα∞ (Y ) < Hα∞ (X) < c2 · Hα∞ (Y )

for constants 0 < c1 , c2 < ∞ depending only on d.

2. If µ is a probability measure on ΛN and ν = π∗d µ, then µ is α-regular if and only


if ν is α-regular.
S
Proof. Briefly, for (1), note that by the two properties stated before the lemma, Hα∞ (Y ), Hα∞ (X, Dn )
S
are comparable up to multiplicative constants, and that Hα∞ (X), Hα∞ (X, Dn ) are sim-
ilarly comparable. Part (2) follows form the same properties and the fact that every ball
S
in [0, 1]d is contained boundedly many elements of Dn of comparable diameter.

Thus, Theorem 3.17 is equivalent to the analogous statement in ΛN . It is the latter


statement that we will prove:

Theorem 3.20. Let Y ⊆ ΛN be a closed set with Hα∞ (Y ) > 0. Then there is an
α-regular probability measure supported on Y .

Proof. Let Y ⊆ ΛN be closed with Hα∞ (Y ) > 0. We will produce the desired measure
as a limit of suitable “finite” approximations.
For n ∈ N, we say that a measure µ on ΛN is n-admissible if for every k ≤ n and
C ∈ Ck , (
2−αk if C ∩ Y 6= ∅
µ(C) ≤ (4)
0 otherwise

Note that such a measure takes values in [0, 1], and are supported on Y .
Let
Mn = {µ ∈ M(ΛN ) : µ is n-admissible}

This set is not empty because it contains the zero measure..

Choose µn ∈ Mn that maximizes the function f : ν 7→ ν(ΛN ). Such a maximizer


exists because f is continuous and Mn is compact. In detail, choose µn,k ∈ Mn be such
that
lim µn,k (ΛN ) = sup{ν(ΛN ) : ν ∈ Mn }
k→∞

32
Let µn be a subsequential limt of (µn,k )∞ k N
k=1 . Since Λ = [∅] is a sylinder set, µn (Λ ) =
limk→∞ µn,k (ΛN ) is equal to the right hand side above. Also, since n-admissibility is
defined by weak inequalities on the masses of cylinder sets, µ is n-admissible, and it is
supported on Y because Y is closed (Lemma 3.15).
Next, let µ be a measure on (ΛN , Borel) which arises as a sub-sequential limit
µ = lim µnk . It is immediate that
(
2−αk if [a] ∩ Y 6= ∅
µ([a1 . . . ak ]) = lim µnk ([a1 . . . ak ]) ≤
k→∞ 0 otherwise

Hence µ is α-regular. It is also supported on Y , since each µnk is.


To complete the proof we must show that µ(Y ) > 0, which by the above is the same
as µ 6≡ 0. To this end we shall prove

Lemma 3.21. µn (ΛN ) ≥ H∞


β
(Y ) for each n = 1, 2, . . ..

Once proved it will follow that µ(ΛN ) = lim µnk (ΛN ) ≥ H∞


β
(Y ) > 0, so µ 6= 0.

Proof. Fix n. First we claim that for every ω ∈ ΛN there is some 0 ≤ k ≤ n such
that equality holds in (4) for a = ω1 . . . ωk . For suppose not; then there is a point
ω = ω1 ω2 . . . such that µn ([ω1 . . . ωk ]) < 2−αk for all 0 ≤ k ≤ n. Define
n o
c = min 2−αk − µn ([ω1 . . . ωk ]) : 0 ≤ k ≤ n

so that c > 0, and let µ′n = µn + c · δω . Then µ′n is n-admissible, since (4) holds for
C = [ω1 . . . ωk ] by choice of c, and for any other cylinder set C ′ it holds because ω ∈
/ C′
an therefore µ′n (C ′ ) = µn (C). But now µ′n (Λn ) = µn (ΛN ) + c, contradicting maximality
of µn .
Thus for every ω = ω1 ω2 . . . ∈ Y we have at least one cylinder set Cω = [ω1 . . . ωk ]
with 0 ≤ k ≤ n and such that µn ([ω1 . . . ωk ]) = 2−αk .
Let E = {Eω }ω∈Y be the cover of Y thus obtained. Lemma 3.6 provides us with a
disjoint subcover F ⊆ E of Y .
Finally, for F ∈ F we have µ(F ) = 2−αn = |F |α , hence
X X
Hβ∞ (Y ) ≤ |F |β = µn (F ) = µn (Y ) = µn (ΛN )
F ∈F F ∈F

as claimed.

It may be of interest to note that the argument in the proof above is a variant of the
max flow/min cut theorem from graph theory. To see this, consider Λ≤n nad the tree
of height n + 1 in ΛN . The lemma shows that the maximal flow from the root [ω] = ΛN

33
to the set of leaves a ∈ Λn , is equal to the weight minimal cut, and that the weight of
any cutset is bounded below by Hβ∞ (Y ). See ??.
We have proved Frostman’s lemma for closed sets in Rd but the result is known
far more generally for Borel sets in complete metric spaces. See Mattila ?? for further
discussion.

4 Product sets
In this section we conside rproduct sets. For simplicity, we restrict the discussion to
Rd , although the results hold in general metric spaces. It is convenient to work with
the sup-norm k·k∞ , because under this norm if A ⊆ Rd and B ⊆ Rk are bounded, then
A × B ⊆ Rd+k and |A × B| = max{|A|, |B|}.

Proposition 4.1. If X ⊆ Rd and Y ⊆ Rk are and if dimM X, dimM Y exist, then

dimM X × Y = dimM X + dimM Y

In general, we have

dimM X × Y ≤ dimM X + dimM Y


dimM X × Y ≥ dimM X + dimM Y

and if one of dimM X, dimM Y exist, the the inequalities above are equalities.
′ ′
Proof. A b-adic cell in Rd × Rd is the product of two b-adic cells from Rd , Rd , and it is
simple to verify that

N (X × Y, Db ) = N (X, Db ) · N (Y, Db )

taking logarithms and inserting this into the definition of dimM , the claim follows from
properties of lim sup and lim inf.

Turning to Hausdorff dimension, the situation is more subtle.

Proposition 4.2. For X, Y ⊆ Rd ,

dim X + dim Y ≤ dim(X × Y ) ≤ dim X + dimM Y

Proof. Write α = dim X and β = dim Y .


We first prove dim(X × Y ) ≥ α + β. Let ε > 0 and apply Frostman’s lemma to
obtain an (α − ε)-regular probability measure µε supported on X and a (β − ε)-regular

34
probability measure νε supported on Y . Then θε = µε × νε is a probability measure
supported on X × Y . We claim that it is (α + β − 2ε)-regular. Indeed, assuming without
loss of generality that we are using the ℓ∞ norm on all spaces involved, for (x, y) ∈ X ×Y
we have Br (x, y) = Br (x) × Br (y) so

θε (Br (x, y)) ≤ µε (Br (x)) · µε (Br (y)) ≤ C1 rα−ε · C2 rβ−ε = Crβ+β−2ε

Hence by the mass distribution principle, dim X × Y ≥ α + β − 2ε, and since ε was
arbitrary, dim X × Y ≥ α + β.

For the other inequality write γ = dimM Y and let 0 < ε < 1. Since H∞ α+ε (X) = 0
S∞ P
we can find a cover X ⊆ i=1 Ai with |Ai |α+ε < ε, and in particular |Ai | < ε1/(α+1)
for each i.

Next, for each i, there is a cover Ai,1 , . . . , Ai,N (Y,|Ai |) of Y by N (Y, |Ai |) sets of
diameter |Ai |.

Assuming ε is small enough, using |Ai | < ε1/(α+1) and the definition of γ, we have
that |N (Y, |Ai |)| < |Ai |−(γ+ε) for each i. Thus {Ai × Ai,j } is a cover of X × Y satisfying

X X i |)
∞ N (Y,|A ∞
X
|Ai × Ai,j |α+γ+2ε = |Ai |α+γ+2ε N (Y, |Ai |)|
i=1 j=1 i=1
X∞
≤ |Ai |α+γ+2ε |Ai |−(γ+ε)
i=1

X
< |Ai |α+ε
i=1

α+γ+2ε
This shows that H∞ (X × Y ) = 0, so dim X × Y ≤ α + β, as desired.

Corollary 4.3. If dim X = dimM X or dim Y = dimM Y then

dim X × Y = dimM X × Y = dim X + dim Y

If both dim X = dimM X and dim Y = dimM Y then

dim X × Y = dimM X × Y

35
Proof. Suppose e.g. that dim Y = dimM Y . Then

dim X × Y ≥ dim X + dim Y


= dim X + dimM Y
≥ dim X × Y

so we have equalities throughout.


Now suppose that dim X = dimM X and dim Y = dimM Y . Then

dim X × Y ≤ dimM (X × Y )
= dimM X + dimM Y
= dim X + dim Y
= dim X × Y

so all are equalities.

The following example shows that one cannot do much better than this: although
we always have dim X × Y ≥ dim X + dim Y , the ineuqality may be strict. In fact, we
show that it may happen that dim X = dim Y = 0 but dim X × Y = 1.
Recall that for E ⊆ N the set XE is the set of x ∈ [0, 1] whose n-th binary digit
is 0 if n ∈ E, and otherwise may be 0 or 1. We saw in Example 3.5 that dim XE =
d(N \ E) = lim inf n1 |{1, . . . , n} \ E|. Now let E, F ⊆ N be the sets


[
E = N∩ [(2n)!, (2n + 1)!)
n=1
[∞
F = N∩ [(2n + 1)!, (2n)!)
n=1

These sets are complementary, and it is clear that d(E) = d(F ) = 0, so dim XE =
dim XF = 0.
On the other hand observe that for any every x ∈ [0, 1] there are x1 ∈ XE and
x2 ∈ XF such that x1 + x2 = x, since for x1 we can take the number whose binary
expansion is the same as that of x at coordinates outside E but 0 elsewhere, and
similarly for x2 using F . Writing π(x, y) = x + y, we have shown that π(X × Y ) ⊇ [0, 1]
(in fact there is equality). But π is a 1-Lipschitz map R × R → R, so dim X × Y ≥
dim π(X × Y ) ≥ dim[0, 1] = 1.

Remark 4.4. There is a slight generalization of Proposition 4.2 using the notion of
packing dimension, which is defined by

36
pdim X = inf{sup dimM Xi : {Xi }∞
i=1 is a partition of X}
i

This notion is designed to fix the deficiency of box dimension with regard to countable
S
unions, since it is easy to verify that pdim An = supn pdim An . We will not discuss it
much but note that pdim is a natural notion of dimension in certain contexts, and can
also be defined intrinsically in a manner similar to the definition of Hausdorff dimension,
S
which is the one that is usually given. In particular, note that if Y = ∞ n=1 Yn then by
the previous theorem,

[
dim X × Y = dim (X × Yn ) ≤ sup(dim X + dimM Yn ) = dim X + sup dimM Yn
n n
n=1
S
Now optimize over partitions Y = Yn and using the definition of pdim, we find that

dim X × Y ≤ dim X + pdim Y

Exercises

1. Prove that in Proposition 4.1, a strict intequality is possible for upper and lower
Minkowski dimensions.

2. Prove the conclusion of Proposition 4.1 for general metric spaces. For this purpose
define the metric in X × Y by d((x, y), (x′ , y ′ )) = max{d(x, y), d(x′ , y ′ )}.

3. For every 0 ≤ α, β < 1 with α + β < 1, show that there are sets X, Y ⊆ [0, 1] such
that dim X = α, dim Y = β and dim X × Y = 1.

5 Differentiation of measures in Rd
We have seen that measures can play an important auxiliary role in computing the
dimension of sets. In this section we etablish some general results on the local structure
of measures in Rd , which, roughly speaking, show that the local structure of a measure
µ on a set A ⊆ Rd is of its structure on Rd \ A. We also obtain local criteria for absolute
continuity of one measure with respect to another.

5.1 The Besicovitch covering theorem


In this section we develop some combinatorial machinery related to collections of balls
in Rd . Recall our convention that balls are closed.1
1
Some of the results in this section are not valid if one uses open balls.

37
Parts of the discussion below are vaid in any metric space but the main results are
special for Rd . The choice of norm on Rd is not very significant, but may affect the
constants. For concreteness we fix the Euclidean norm.

Definition 5.1. Given r > 0, a set A ⊆ Rd is r-separated if every x, y ∈ A satisfy


d(x, y) ≥ r.

By Zorn’s lemma, given r > 0, every set in Rd contains r-separated sets which are
maximal with respect to inclusion. By seperability, any r-separated set in Rd are at
most countable.

Lemma 5.2. Let r > 0 and let A ⊆ Rd be a r-separated. Then |B2r (z) ∩ A| ≤ C for
every z ∈ Rd , where C = C(d).

Proof. If this were false then we could find sequences rn > 0, points xn ∈ Rd and
rn -separated En ⊆ Rn such that

|B2rn (xn ) ∩ En | ≥ n

By re-scaling and translating xn to the origin we find that B2 (0) contains 1-separated
sets of arbitrarily large size. This contradicts the compactness of B2 (0).

Definition 5.3. Let E be a family of subsets of a set.

1. We say that E has bounded diameters supE∈E |E| < ∞.

2. We say that E has multiplicity C if no point is contained in more than C elements


of E.

Thus, if a cover E of A has multiplicity C, then


X
1A ≤ 1E ≤ C
E∈E

P
Restricting the right inequality to A gives 1A ≥ 1
C E∈E 1E∩A , so for any measure µ,
Z
µ(A) = 1A dµ
Z X
1
≥ 1E∩A dµ
C
E∈E
1 X
= µ(A ∩ E)
C
E∈E

Thus, a measure is “almost” super-additive on families of sets with bounded multiplicity.

38
Lemma 5.4. Let E be a collection of balls in Rd with multiplicity C and such that each
B ∈ E has radius ≥ R. Then any ball Br (x) of radius r ≤ 2R intersects at most 4d C of
the balls in E.

Proof. Let E1 , . . . , Ek ∈ E be balls intersecting Br (x). Choose xi ∈ Ei ∩ Br (x) and


let Ei′ ⊆ Ei be a ball of radius R containing xi . Then Ei′ ⊆ B4R (x). The collection
{E1′ , . . . , Ek′ } has multiplicity C, so, writing c = vol B1 (0), by the discussion above

c · (4R)d = vol(B3R (x))


[
k
≥ vol( Ek′ )
i=1

1 X
k
≥ vol(Ei′ )
C
i=1
k
= · c · Rd
C

Therefore k ≤ 4d C, as claimed.

Lemma 5.5. Let r, s > 0, x, y ∈ Rd , and suppose that y ∈


/ Br (x) and x ∈
/ Bs (y). If
z ∈ Br (x) ∩ Bs (y) then ∠(x − z, y − z) ≥ 2π/3.

Proof. Clearly z 6= x, y and the hypothesis remains unchanged if we replace the smaller
of the radii by the larger, so we can assume s = r. Since the metric is induced by a
norm, by translating and re-scaling we may assume z = 0 and r = 1. Thus the problem
is equivalent to the following: given x, y with kxk = kyk = 1 and d(x, y) > 1, give a
positive lower bound ∠(x, y). This follows from the cosine law, since by the cosine law,

1 < kx − yk2
= kxk2 + kyk2 − 2 kxk kyk cos ∠(x, y)
≤ 2 − 2 cos ∠(x, y)

hence cos ∠(x, y) ≤ 1/2, and so ∠(x, y) ≥ 2π/3.

Definition 5.6. A Besicovitch cover of A ⊆ Rd is a cover of A by closed balls such


that every x ∈ A is the center of one of the balls.

Proposition 5.7 (Besicovitch covering lemma). There are constants C = C(d), C ′ =


C ′ (d), such that every bounded Besicovitch cover E of a set of A ⊆ Rd has a sub-cover
F ⊆ E of A with multiplicity C. Furthermore, there are C ′ sub-collections F1 , . . . , FC ′ ⊆
S ′
E such that F = Ci=1 Fi and each Fi is a disjoint collection of balls.

39
Proof. We may write E = {Br(x) (x)}x∈A , discarding redundant balls if necessary. Let
R0 = supx∈A r(x), so by assumption R0 < ∞, and let Rn = 2−n R0 . Also write

An = {x ∈ A : Rn+1 < r(x) ≤ Rn }

Note that A0 , A1 , . . . is a partition of A.


S
Define disjoint sets A′−1 , A′0 , . . . ⊆ A inductively, writing Sn = ′
k<n Ak for the union
of what was defined before stage n. Begin with A′−1
= ∅, and at stage n ≥ 0 let A′n be
S
a maximal Rn /2-separated subset of An \ x∈Sn Br(x) (x).
S
Set A′ = A′n , and F = {Br(x) (x)}x∈A′ .
S
We first claim that F is a cover of A. Otherwise, let x ∈ A \ E∈F E. There is a
unique n such that x ∈ An , i.e. such that Rn+1 < r(x) ≤ Rn . Since A′n is a maximal
Rn /2-separated subset of An , we must have d(x, y) < Rn /2 for some y ∈ A′n . But
S
A′n ⊆ An so r(y) > Rn+1 = Rn /2, and therefore x ∈ Br(y) (y) ⊆ E∈F E, contrary to
the hhypothesis of the Proposition.
We next show that F has bounded multiplicity. Fix z ∈ Rd . For each n the set
A′n is Rn /2 separated and r(x) ≤ Rn for x ∈ A′n , so by Lemma 5.2, z can belong to
at most C1 = C1 (d) of the balls Br(x) (x), x ∈ A′n . Thus it suffices for us to show
that there are at most C2 = C2 (d) distinct n such that z ∈ Br(x) (x) for some x ∈ A′n ,
because then z belongs to no more than C = C1 · C2 elements of E. Suppose, then, that
n1 > n2 > . . . > nk and xi ∈ A′ni are such that z ∈ Br(xi ) (xi ). By construction, if i < j
then xj ∈
/ Br(xi ) (xi ), and also r(xj ) ≤ Rj ≤ Ri /2 < r(xi ) so xi ∈
/ Br(xj ) (xj ). Thus, by
Lemma 5.5, ∠(xi − z, xj − z) ≥ C3 > 0 for all 1 ≤ i < j ≤ k. Since the unit sphere in
Rd is compact and the angle between vectors is proportional to the distance between
them, this shows that k ≤ C2 = C2 (d), as required.
For the last part, we shall define a function f : A′ → {1, . . . , 3d C + 1} such that
Br(x) (x)∩Br(y) (y) 6= ∅ implies f (x) 6= f (y), where C is the constant found earlier. Then
Fi = {Br(x) (x) : x ∈ A′ , f (x) = i} have the desired properties.
We define f using a double induction. We first induct on n and at each stage
S
define it on A′n . Thus suppose we have already defined f on i<n A′i . Note that An is
countable, since its points are Rn /2 separated, so we may write A′n = {a1 , a2 , . . .} and
S
set A′n,k = i<n A′i ∪{a1 , a2 , . . . , ak }. We have already defined f on A′n,0 , and we proceed
inductively; assuming it has been defined on A′n,k , the collection {Br(x) (x)}x∈A′n,k has
multiplicity C, all its elements have radius ≥ Rn /2, and r(ak+1 ) ≤ Rn , so by Lemma
5.4, Br(ak ) (ak ) can intersect at most 4d C of the balls; hence, there is a value u ∈
{1, . . . , 4d C + 1} which is not assigned by f to the any of the centers of these balls, and
we define f (ak ) = u. This completes the proof.

In the proof of Billingsley’s lemma (Proposition 3.7), we used the fact that any cover

40
of A by b-adic cubes contains a disjoint sub-cover of A (Lemma 3.6). Covers by balls do
not have this property, but the proposition above and the calculation before Lemma 5.4
often are a good substitute and can be used for example to prove Billingsley’s lemma
for balls.

Corollary 5.8. Let µ be a finite measure on a Borel set A ⊆ Rd , and let E be a


Besicovitch cover of a A. Then there is a finite, disjoint sub-collection F ⊆ E with
S
µ( F ∈F F ) > C1 µ(A), where C = C(d).

Proof. By the previous proposition there are disjoint sub-collections E1 , . . . , Ek ⊆ E such


S
that ki=1 Ei is a cover of A, and k ≤ C ′ = C ′ (d). Thus

[
k [ X
k [
µ(A) ≤ µ( E) ≤ µ( E)
i=1 E∈Ei i=1 E∈Ei

S
so there is some i with µ( E∈Ei E) ≥ k1 µ(A) ≥ 1
′ µ(A). Since Ei is countable, we can
S
C
find a finite sub-collection F ⊆ Ei such that µ( F ∈F F) > 1
2C ′ µ(A). This proves the
claim with the constant C = 2C ′ .

Theorem 5.9 (Besicovitch covering theorem). Let µ be a Radon measure on Rd , let


A be a Borel set and let E be a collection of balls such that each x ∈ A belongs to balls
E ∈ E of arbitrarily small radius centered at x. Then there is a disjoint sub-collection
S
F ⊆ E that covers A up to µ-measure 0, i.e. µ(A \ F ∈F F ) = 0.

Proof. We clearly may assume that E has bounded diameter, that µ is supported on A
(i.e. µ(Rd \ A) = 0), and that µ(A) > 0. Assume also that µ(A) < ∞, we will remove
this assumption later. Finally we may assume µ(Rd \ A) = 0, since we can always
replace µ|A .
We will define by induction an increasing sequence F1 ⊆ F2 ⊆ . . . of disjoint, finite
sub-collections of E such that, at each step; we do so by applying the previous corollary
at each step to a large subset of the set that has not yet been covered. The will then
S
show that F = ∞ k=0 Fk has the desired properties.
Let C be the constant from the previous corollary. To begin, let F1 be the family
obtained by applying the previous corollary to E, so
[ 1
µ( F) > µ(A)
C
F ∈F1

S
Assuming Fk has been defined, write Fk = F ∈Fk F . This is a closed set (it is
a finite union of closed balls), By assumption, for every x ∈ A \ Fk there are balls
Br (x) ∈ E with arbitrarily small radius and when r is small enough, Br (x) ∩ Fk = ∅ (we

41
use here the fact that Fk is closed), so the collection

Ek = {Br (x) ∈ E | x ∈ A \ Fk , Br (x) ∩ Fk = ∅}

is a Besicovitch cover of A \ Fk . We apply the previous corollary and obtain a finite,


disjoint collection of balls Fk′ ⊆ Ek such that
[ 1 (ε)
µ( F) > µ(A \ Fk )
C
F ∈Fk′

Then Fk+1 = Fk ∪ Fk′ is finite and disjoint.


S S
Now let F = ∞k=1 Fk and F = F ∈F F . We claim that µ(A\F ) = 0. For otherwise,
we have A \ F ⊆ A \ Fk for all k and hence

µ(A \ Fk ) ≥ µ(A \ F )

and consequently

X [
µ(A) = µ( F)
k=1 F ∈Fk′

X 1
≥ µ(A \ Fk )
C
k=1

1 X
≥ µ(A \ F )
C
k=1

=∞

contradicting finiteness of the measure of A.


Now suppose that µ(A) = ∞. It is not hard to see that we can partition Rd
into countably many bounded sets Ki whose boundaries have µ-measure zero (e.g.
use Lebesgue-randomly placed hyperplanes to form the division). If Ui is the interior
of Ki then we can apply the case of finite measure to each Ui with the sub-family
Ei = {B ∈ E | E ⊆ Ui }, which again satisfies the hypothesis. We obtain a disjoint
sub-faimly Ei′ ⊆ Ei for each i that covers Ui up to µ-measure zero. Also the elements of
S
Ei′ and Ej′ are disjoint for i 6= j. Thus Ei′ has the desired properties.

Proof like the one in class. We clearly may assume that E has bounded diameter, that
µ is supported on A (i.e. µ(Rd \ A) = 0), and that µ(A) > 0. Assume also that
µ(A) < ∞, we will remove this assumption later. Finally we may assume µ(Rd \ A) = 0,
since we can always replace µ|A .

42
We will define by induction an increasing sequence F1 ⊆ F2 ⊆ . . . of disjoint, finite
sub-collections of E such that, at each step; we do so by applying the previous corollary
at each step to a large subset of the set that has not yet been covered.. The will then
S
show that F = ∞ k=0 Fk has the desired properties.

Let C be the constant from the previous corollary. To begin, let F1 be the result of
applying the previous corollary to E, so
[ 1
µ( F) > µ(A)
C
F ∈F1

so that
[ 1
µ(A \ F ) < (1 − )µ(A)
C
F ∈Fk
S
Assuming Fk has been defined and writing Fk = F ∈Fk F , fix a parameter δ > 0
with the property that

1−δ 1
(1 − )µ(A \ F ) < (1 − )k+1 µ(A)
C C

this is possible since µ(A \ F ) < (1 − 1 k


C ) µ(A).

Since µ is Radon and Fk is closed (it is a finite union of closed balls), there exists
an ε > 0 such that
(ε)
µ(A \ Fk ) > (1 − δ)µ(A \ F )
(ε)
By assumption, the collection of balls in E whose radius is < ε and center is in A \ Fk
(ε)
is a Besicovitch cover of A \ Fk . Apply the previous corollary to this collection and
the set A \ Fk . We obtain a finite, disjoint collection of balls Fk′ ⊆ E such that
(ε)

[ 1 (ε) 1
µ( F) > µ(A \ Fk ) > (1 − δ)µ(A \ F )
C C
F ∈Fk′

As the elements of Fk′ are of radius < ε and have centers in A \ Fk , they are disjoint
(ε)

from Fk . It follows that Fk+1 = Fk ∪ Fk′ is finite and disjoint, and


[ [
µ(A \ F ) ≤ µ(A \ Fk ) − µ( F)
F ∈Fk+1 F ∈Fk′
1−δ
≤ (1 − )µ(A \ F )
C
1
< (1 − )k+1 µ(A)
C

by choice of δ. This completes the construction.

43
Remark 5.10. To see that the Besicovitch theorem is not valid for families of open balls,
P
consider the measure on [0, 1] given by µ = 21 δ0 + ∞ n=1 2
−n−1 δ
1/n , and consider the

S ∞ ◦
collection of open balls E = {B1/n (0)}n≥1 ∪ n=1 {B1/k (1/n)}k>n . Any sub-collection
F whose union has full µ-measure must contain B1/n (0) for some n, since it must cover
0, but it also must cover 1/n so it must contain B1/k (1/n) for some k, and hence F is
not disjoint.
The results of this section should be compared to the Vitali covering lemma:

Lemma 5.11 (Vitali covering lemma). Let A be a subset of a metric space, and
{Br(x) (x)}x∈A a collection of balls with centers in A such that supi∈I r(i) < ∞. Then
one can find a subset A′ ⊆ A such that {Br(j) (x(j))}x∈A′ are pairwise disjoint and
S S
x∈A Br(x) (x) ⊆ x∈A′ B5r(x) (x).

This lemma is enough to derive an analog of Theorem 5.9 when the measure of a
ball varies fairly regularly with the radius. Specifically,

Theorem 5.12 (Vitali covering theorem). Let µ be a measure such that µ(B3r (x)) ≤
cµ(Br (x)) for some constant c. Let {Br(x) (x)}x∈A be as in the Vitali lemma, with A a
Borel set. Then there is a set of centers A′ ⊆ A such that {Br(x) (x)}x∈A′ is disjoint,
S S
and µ( x∈A′ Br(x) (x)) > c−1 µ( x∈A Br(x) (x)).

Lebesgue measure on Rd has this “doubling” property, as do the Hausdorff measures,


which we will discuss later on. For general measures, even on Rd , there is no reason
this should hold.

5.2 Density and differentiation theorems

For a general set A ⊆ Rd and x ∈ A, small balls Br (x) may intersect both A and its
complement. So, no matter how “close” you get to x, you will not be able to avoid
seeing some of the complement. For example if A is a half plane and x is a point on the
boundary of A then Br (x) ∩ A is exactly “half” of Br (x); “half” is exactly true if we
measure it with respect to Lebesgue measure. For another example, consider A = Q in
the line. Then for x ∈ A, both A and R \ A are dense in every ball Br (x).
Nevertheless, for Lebesgue measure λ there is a weaker form of separation between
A and Rd \ A that holds at a.e. point. Let µ = λ|A and write c for the volume of the
unit ball. Then the Lebesgue density theorem states that

µ(Br (x)) λ(Br (x) ∩ A)


lim d
= lim =1
r→0 cr r→0 crd

for λ-a.e. x ∈ A, ir, equivalently, for µ-a.e. x. This implies that λ(Br (x) \ A)/crd → 0

44
as r → 0 for µ-a.e. x. Thus, if we look at small balls around a µ-typical point, we see
measures which have an asymptotically negligible contribution from Rd \ A.
In this section we establish similar results for general Radon measures in Rd . Note
that in the limits above, crd = λ(Br (x)), so we can re-state the Lebesgue density
theorem as
λ(Br (x) ∩ A)
lim =1 λ-a.e. x ∈ A
r→0 λ(Br (x))
This is the form that our results for general measures will take.
Let µ be a finite measure on Rd and f ∈ L1 (µ). Define
Z
+ 1
f (x) = lim sup f dµ
r→0 µ(Br (x)) Br (x)
Z
− 1
f (x) = lim inf f dµ
r→0 µ(Br (x)) Br (x)

It will be convenient to write Z


fr (x) = f dµ
Br (x)

(we have suppressed µ in this notation).


Note that, although our balls are closed, the value of f + , f − does not change if
we define them using open balls. To see this we just need to note that, by dominated
R R R R
convergence, Bs (x) f dµ → Br (x) f dµ as s & r and Bs (x) f dµ → B ◦ (x) f dµ as s % r,
r
and similarly for the mass of balls (since these are integrals of the function f = 1). The
same considerations show that f + and f − may be defined taking the lim sup and lim inf
as r → ∞ along the rationals.

Lemma 5.13. f + , f − are measurable.

Proof. First, for each r > 0, we claim that fr is measurable. It suffices to prove this for
f ≥ 0, since a general function can be decomposed into positive and negative parts.
We claim that, in fact, if f ≥ 0 then fr is upper semi-continuous (i.e. fr−1 ((−∞, t))
is open for all t), which implies measurability. To see this note that if xn → x and
s > r, then Br (xn ) ⊆ Bs (x) for large enough n, which implies fr (xn ) ≤ fs (x). Thus

lim sup fr (xn ) ≤ fs (x)


n→∞
R
But by dominated convergence again, Bs (x) f dµ(x) → fr (x) as s & r, so

lim sup fr (xn ) ≤ fr (x)


n→∞

This holds whenever xn → x, which is equivalent to upper semi-continuity.

45
R
Since Br (x) f dµ/µ(Br (x)) = fr (x)/gr (x), where g ≡ 1, we see that f ± are upper
and lower limits of measurable functions fr /gr as r → ∞ along the rationals. Hence f ±
are measurable.

Theorem 5.14 (Differentiation theorems for measures). Let µ be a Radon measure on


Rd and f ∈ L1 (µ). Then for µ-a.e. x we have
Z
1
lim f dµ = f (x)
r→0 µ(Br (x)) Br (x)

Proof. We may assume that f ≥ 0. For a < b let

Aa,b = {x : f − (x) < a < b < f (x)}

It is easy to verify that f − (x) ≥ f (x) holds µ-a.e. if and only if µ(Aa,b ) = 0 for all
0 < a < b.
Suppose then that µ(Aa,b ) > 0 for some a < b and let U an open set containing
Aa,b . By definition of Aa,b , for every x ∈ Aa,b there are arbitrarily small radii r such
that Br (x) ⊆ U and fr (x) < aBr (x). Applying the Besicovitch covering theorem to the
collection of these balls, we obtain a disjoint sequence of balls {Bri (xi )}∞
i=1 such that
S∞ R
Aa,b ⊆ i=1 Bri (xi ) ⊆ U up to a µ-null-set, and Br (xi ) f dµ = fr (xi ) < aBr (xi ) for
i
each i. Now,
Z
b · µ(Aa,b ) < f dµ
Aa,b
XZ

≤ f dµ
i=1 Bri (xi )

X
< a · µ(Bri (xi ))
i=1
≤ a · µ(U )

Since µ is regular, we can find open neighborhoods U of Aa,b with µ(U ) arbitrarily close
to µ(Aa,b ). Hence, the inequality above shows that b · µ(Aa,b ) ≤ a · µ(Aa,b ), which is
impossible. Therefore µ(Aa,b ) = 0, and we have proved that f − ≥ f µ-a.e.
Similarly for a < b define

A′a,b = {x ∈ Rd : f (x) < a < b < f + (x)}

Then f + (x) = f (x) µ-a.e. unless µ(A′a,b ) > 0 for some a < b. Suppose such a, b exist

46
and let U and {Bri (xi )}∞ ′
i=1 be defined analogously for Aa,b . Then

Z ∞ Z
X
f dµ ≥ f dµ
U i=1 Bri
X∞
> b · µ(Bri (xi ))
i=1
≥ b · µ(A′a,b )

On the other hand, by regularity and the dominated convergence theorem, we can find
R R
U as above such that U f dµ is arbitrarily close to Aa,b f dµ < a · µ(A′a,b ), and we again
obtain a contradiction. Thus f + ≤ f µ-a.e.
We have shown that f − (x) ≥ f (x) ≥ f + (x) µ-a.e. On the other hand, f − ≤ f +
everywhere. Thus µ-a.e. we have f − ≤ f + ≤ f ≤ f − , so we have equality throughout.

The formulation of the theorem makes sense in any metric space but it does not
holds in such generality. The main cases in which it holds are Euclidean spaces and
ultrametric spaces, in which balls of a fixed radius form a partition of the space, for
which the Besicovitch theorem holds trivially.

Corollary 5.15 (Besicovitch density theorem). If µ is a probability measure on Rd and


µ(A) > 0, then for µ-a.e. x ∈ A,

µ(Br (x) ∩ A)
lim =1
r→0 µ(Br (x))

and for µ-a.e. x ∈


/ A the limits are 0.

Proof. Apply the differentiation theorem to f = 1A .

Applying the corollary to Ac = Rd \ A we see that the limit is µ-a.s. 0 if x ∈


/ A.
Thus, at small scales, most balls are almost completely contained in A or in Ac . So
although the sets may be topologically intertwined, from the point of view of µ, they
are quite well separated. This is especially useful when studying local properties of the
measure, since often these do not change if we restrict the measure to a subset. We will
see examples of this later.
Another useful consequence is the following:

Proposition 5.16. Let ν, µ be Radon measures on Rd . Then ν  µ if and only if

ν(Br (x))
lim
r→0 µ(Br (x))

47
exists and is positive and finite for ν-a.e. x, and in this case,

ν(Br (x)) dν
lim = (x)
r→0 µ(Br (x)) dµ

In particular, if λ is Lebesgue measure, then ν  λ if and only if

ν(Br (x))
lim
r→0 rd

exists and is positive and finite for ν-a.e. x.

Proof. Suppose that ν  µ and set f = dν/dµ. Then by Theorem 5.14 we have
R
ν(Br (x)) B (x) f dµ
lim = lim r = f (x) µ-a.e.
r→0 µ(Br (x)) r→0 µ(Br (x))

The set where the limit exists and f is positive has ν-measure 1, proving the claim.
Now suppose that ν 6 µ. Then there is a set A with µ(A) = 0 and ν(A) > 0.
Since ν(B ∩ A) = (µ + ν)(B ∩ A) for every set B, by the density theorem we have, for
(µ + ν)-a.e. x ∈ A (equivalently ν-a.e. x ∈ A),

ν(Br (x) ∩ A) (µ + ν)(Br (x) ∩ A)


lim = lim =1
r→0 (µ + ν)(Br (x)) r→0 (µ + ν)(Br (x))

Also
ν(Br (x) ∩ A)
lim =1
r→0 ν(Br (x))
for ν-a.e. x ∈ A, so for such x,

ν(Br (x))
lim =1
r→0 (µ + ν)(Br (x))

This implies that µ(Br (x))/ν(Br (x)) → 0 for ν-a.e. x ∈ A, or equivalently, ν(Br (x))/µ(Br (x)) →
∞, so the conclusion fails.
The last statement follows from the first using the fact that λ(Br (x)) = c · rd .

We note two extensions of our results.


First, up to this point we have considered balls in the Euclidean metric, but an
examination of the arguments will show that they are valid in any norm.
Second, we have the analogous results for b-adic cubes:

Theorem 5.17. Let µ be a Radon measure on Rd and f ∈ L1 (µ). Let b ≥ 2 be an

48
integer base. Then for µ-a.e. x we have
Z
1
lim f dµ = f (x)
n→∞ µ(Dbn (x)) Dbn (x)

In particular if µ(A) > 0 then for µ-a.e. x ∈ A,

µ(Dbn (x) ∩ A)
lim =1
n→∞ µ(Dbn (x))

Similarly the other corollary and proposition above hold along b-adic cubes. The
proofs are identical to the one above, using Lemma 3.6 instead of the Besicovitch cover-
ing lemma. Alternatively, this is a consequence of the Martingale convergence theorem.

6 Pointwise dimension of measures


In this section we introduce a notion of dimension for Radon measures on Rd .

6.1 Dimension of a measure at a point


We restrict the discussion to sets and measures on Euclidean space. As usual, balls are
closed, and we fix the Euclidean norm on Rd (but one could use any other norm with
no change to the results).
Recall that the support of a measure is the smallest closed set of full measure (see
Section 1.3).

Definition 6.1. The (lower) pointwise dimension of a Radon measure µ at x ∈ µ is

log µ(Br (x))


dim(µ, x) = lim inf (5)
r→0 log r

µ is exact dimensional at x if the limit (not just lim inf) exists.

Thus dim(µ, x) = α means that the decay of µ-mass of balls around x scales no
slower than rα , i.e. for every ε > 0, we have µ(Br (x)) ≤ rα−ε for all sufficiently small
r, but µ(Br (x)) ≥ rα+ε for arbitrarily small r.
Remark 6.2. 1. One can also define the upper pointwise dimension using limsup, but
we shall not have use for it,

2. In many of the cases we consider the limit 5 exists, and there is no need for lim sup
or lim inf.

Example 6.3. 1. If µ = δu is the point mass at u, then µ(Br (u)) = 1 for all r, hence
dim(µ, u) = 0.

49
2. If µ is Lebesgue measure on Rd then for any x, µ(Br (x)) = crd , so dim(µ, x) = d.

3. Let µ = λ+δ0 where λ is the Lebesgue measure on the unit ball. Then if x 6= 0 is in
the unit ball, µ(Br (x)) = λ(Br (x)) for small enough r, so dim(µ, x) = dim(λ, x) =
d. On the other hand µ(Br (0)) = λ(Br (0)) + 1, so again dim(µ, 0) = 0.
This example shows that in general the pointwise dimension can depend on the
point.

The dimension at a point is truly a local property:

Lemma 6.4. If ν, µ are Radon measures and ν  µ then dim(ν, x) = dim(µ, x) for
ν-a.e. x.
In particular, if µ(A) > 0 and ν = µ|A , then dim(µ, x) = dim(ν, x) for µ-a.e..
x ∈ A.

Proof. Let f = dν/dµ. By Proposition 5.16, limr→0 ν(Br (x))/µ(Br (x)) = f (x) ∈ (0, ∞)
for ν-a.e. x. Taking logarithms and dividing by log r, we have
 
log ν(Br (x)) log µ(Br (x)) f (x) + o(1)
lim − = lim =0 ν-a.e. x
r→0 log r log r r→0 log r

Thus the limit inferior of the two terms are equal, giving, dim(ν, x) = dim µ(x), as
claimed.

We saw that Hausdorff dimension of sets may be defined using b-adic cells rather
than arbitrary sets. We now show that pointwise dimension can similarly be defined
using decay of mass along b-adic cells rather than balls.

Definition 6.5. The b-adic pointwise dimension of a Radon measure µ at x is

− log µ(Dbn (x))


dimb (µ, x) = lim inf
n→∞ n log b

Note that we may have x ∈ µ and µ(Dbn (x)) = 0 for some b, n, so dimb (µ, x) may
not be defined on all of µ. However, it is define µ-a.e., since there are countably many
b-adic cubes D with measure zero, so µ-a.e. every x belongs only to cells of positive
measure.
In general dim(µ, x) 6= dimb (µ, x). Nevertheless, at most points the notions agree:

Proposition 6.6. For µ-a.e. x we have dim(µ, x) = dimb (µ, x).

Proof. We have Dbn (x) ⊆ Bc·b−n (x). Therefore µ(Dbn (x)) ≤ µ(Bc·b−n (x)), hence

dimb (µ, x) ≥ dim(µ, x) for µ-a.e. x

50
We want to prove that equality holds a.e., hence suppose it does not.
Then we can find an α and ε > 0, and a set A with µ(A) > 0, such that dimb (µ, x) >
α + 3ε and dim(µ, x) < α + ε for x ∈ A.
Applying Egorov’s theorem to the limits in the definition of dimb , and replacing A
by a set of slightly smaller but still positive measure, we may assume that there is an
r0 > 0 such that µ(Dbn (x)) < b−n(α+2ε) for every x ∈ A and n satisfying b−n < r0 .
Let ν = µ|A and let x be ν-typical.
By Lemma 6.4, dim(ν, x) = dim(µ, x) < α + ε, so there are arbitrarily large k for
which
ν(Bb−k (x)) ≥ b−k(α+ε)

On the other hand, for every k such that b−k < r0 ,


X
ν(Bb−k (x)) ≤ {ν(D) : D ∈ Dbk and ν(D ∩ Br (x)) > 0}
< 2−k(α+2ε · #{D ∈ Dbk and ν(D ∩ Br (x)) > 0}

The number of cells on the last line is at most 2d , so we have found that if b−k < r0
then
ν(Bb−k (x)) < 2d b−k(α+2ε)

ombining the two bounds, for arbitrarily large k we have b−k(α+ε) ≤ 2d ·b−k(α+2ε) , which
is impossible.

As a consequence,

1. The analog of Lemma 6.4 holds for dimb (this could also be derived directly from
the differentiation theorem along b-adic cells).

2. The pointwise dimension of µ is a.s. independent of the norm used in the definition.
This follows since the equivalence with dimb is valid in any norm.

Exercises

1. Construct an example of a Radon measure on R and x ∈ R such that dim(µ, x) =


∞.

This shows that the pointwise dimension of a set in Rd does not have to be ≤ d
at every point (but it does at a.e. point, as will be shown in the next section).

2. Construct an example of a probability measure on [0, 1] that has a different di-


mension at every point.

51
6.2 Upper and lower dimension of measures

Having defined dimension at a point, we now turn to global notions of dimension for
measures. These are defined as the largest and smallest pointwise dimension, after
ignoring a measure-zero sets of points.
Recall that if f is a measurable function on a measure space (X, B, µ) then the
essential supremum of f is

esssup f (x) = sup{t ∈ R | µ({x : f (x) > t}) > 0}


x∼µ

= inf{t ∈ R | µ({x : f (x) > t}) = 0}

and the essential infimum of f is

essinf f (x) = inf{t ∈ R | µ({x : f (x) < t}) > 0}


x∼µ

= sup{t ∈ R | µ({x : f (x) < t}) = 0}

Definition 6.7. The upper and lower Hausdorff dimension of a Radon measure
µ are defined by

dim µ = esssup dim(µ, x)


x∼µ
dim µ = essinf dim(µ, x)
x∼µ

If dim µ = dim µ, then their common value is called the pointwise dimension of µ
and is denoted dim µ.
To see that these two quantities need not agree, take µ = λ+δ0 , where λ is Lebesgue
measure. Then dim µ = 0 (because dim(µ, 0) = 0 and µ({0}) > 0), and dim µ = d
because for any x ∈ Rd \ {0}, dim(µ, x) = d.

Lemma 6.8. If µ is an α-regular measure supported on A ⊆ Rd , then dim(µ, x) ≤ α


for every x ∈ Rd , and in particular dim µ ≥ α.

The proof is immediate from the definitions:


The next proposition establishes the fundamental connection between between the
dimension of sets and measures.

Proposition 6.9. For any Borel set A ⊆ Rd ,

dim A = sup{dim µ : µ supported on A}


= sup{dim µ : µ supported on A}

52
and for any µ ∈ P(Rd ),

dim µ = inf{dim A : A Borel, µ(Rd \ A) = 0}


dim µ = inf{dim A : A Borel, µ(A) > 0}

Proof. For the first part, note that trivially we have dim µ ≤ dim µ, so

sup{dim µ : µ supported on A} ≤ sup{dim µ : µ supported on A}

The measure µ is supported on A, so by definition of dim µ, for every ε > 0 there is


a subset Aε ⊆ A of positive measure with dim(µ, x) > dim µ − ε all for x ∈ Aε .
By Billingsley’s lemma (Proposition 3.7), this implies that dim Aε ≥ dim µ−ε, hence
(since Aε ⊆ A) also dim A ≥ dim µ − ε. Since ε was arbitrary, dim A ≥ dim µ. This
proves
sup{dim µ : µ supported on A} ≤ dim A

On the other hand, by Frostman’s lemma, for every ε > 0 there is a (dim A − ε)-regular
measure µ supported on A (we only proved this for closed A, but it is true for Borel
sets as well). Thus dim µ ≥ dim A − ε. Since ε was arbitrary, we have shown that

dim A ≤ sup{dim µ : µ supported on A}

Combining these inequalities in the last threee equations, we have proved the first part
of the proposition.
For the second part write α = dim µ. We begin with the first identity. Let

A0 = {x ∈ A : dim(µ, x) ≤ α}

By the definition of dim we have µ(Rd \ A0 ) = 0. Therefore the upper bound in


Billingsley’s lemma applies to A0 and µ, giving dim A0 ≤ α. Hence

α ≥ inf{dim A : µ(Rd \ A) = 0}

On the other hand if A is a set such that µ(Rd \ A) = 0, then the essential supremum
of dim(µ, x) for x ∈ A is α, so for every ε > 0 there is a subset Aε ⊆ A of positive
measure such that dim(µ, x) ≥ α − ε for x ∈ Aε . By the lower bound in Billingsley’s
lemma, dim Aε ≥ α − ε, and since dim A ≥ dim Aε , we have dim A ≥ α − ε. Since ε was
arbitrary, dim A ≥ α. This shows that

α ≤ inf{dim A : µ(Rd \ A) = 0}

53
proving the first identity.
For the second identity write β = dim µ. If µ(A) > 0 then after removing a set of
measure 0 from A, we have dim(µ, x) ≥ dim µ for x ∈ A, so by Billingsley’s lemma,
dim A ≥ dim µ. This shows that

β ≤ inf{dim A : µ(A) > 0}

Given ε > 0 we can find a set Aε of positive measure such that dim(µ, x) ≤ β + ε for
x ∈ Aε , and then by Billingsley’s lemma dim Aε ≤ β + ε. Since ε was arbitrary this
shows that
β ≥ inf{dim A : µ(A) > 0}

and gives the second identity.

Corollary 6.10. If µ is a Radon measure on Rd then dim(µ, x) ≤ d a.e.

Proof. Otherwise, for some ε > 0, we would have dim(µ, x) > d + ε on a positive
µ-measure set. Then
dim µ = essinf dim(µ, x) > d + ε
x∼µ

Since µ is supported on Rd we conclude that

dim Rd ≥ dim µ > d + ε

a contradiction.

Corollary 6.11. If µ = ν0 + ν1 then

dim µ = max{dim ν0 , dim ν1 }


dim µ = min{dim ν0 , dim ν1 }
P∞ R
and similarly if µ = i=1 νi . If µ = νω dP (ω) is Radon, then

dim µ ≥ esssup dim νω


ω∼P
dim µ ≥ essinf dim νω
ω∼P

Proof. We can find pairwise disjoint sets A, A0 , A1 such that µ|A ∼ ν0 |A ∼ ν1 |A , and
µ|A1 ⊥ ν0 and µ|A0 ⊥ µ1 . By the previous corollaries, for µ-a.e. x ∈ A we have
dim(µ, x) = dim(ν1 , x) = dim(ν2 , x), while for µ-a.e. x ∈ A0 we have dim(µ, x) =
dim(ν0 , x) and for µ-a.e. x ∈ A1 we have dim(µ, x) == dim(ν1 , x). The claim follows
from the definitions.

54
The proof for countable sums is similar.
R
If µ = νω dP (ω), we use Proposition 6.9. If µ(A) > 0 then νω (A) > 0 for a set of ω
with positive P -measure. For each such ω, we have dim A ≥ dim νω and it follows that

µ(A) > 0 =⇒ dim A ≥ essinf dim νω


ω∼P

and dim µ ≥ essinf ω∼P dim νω follows follows from Proposition 6.9. The other inequality
is proved similarly by considering sets A with µ(Rd \ A) = 0.

The inequality in the corollary is not generally an equality: Every measure µ can
R
be written as µ = δx dµ(x), but essinf x∼µ dim δx = 0 can be strictly less than dim µ.

Exercises

1. ?

7 Hausdorff measures
7.1 Hausdorff measure

We return temporarily to the metric space setting. The definition of Hα∞ was closely
modeled after the definition of Lebesgue measure, but as we noted, it is not a measure
on the Borel sets. A slight modification of the definition yields a true measure which is
often viewed as the α-dimensional analog of Lebesgue measure. For δ > 0 let
X
Hαδ (A) = inf{ |E|α : E is a cover of A by sets of diameter ≤ δ}
E∈E

This is an outer measure for every δ > 0, but the Borel sets are not necessarily measur-
able with respect to Hαδ .
Decreasing δ means that the infimum in the definition of Hαδ is taken over a smaller
family of covers, so Hδα is non-decreasing as δ & 0. Thus

Hα (A) = lim Hαδ (A)


δ↘0

= sup Hαδ (A)


δ>0

is well defined and is also equal to supδ>0 Hαδ (A).


It is easy to show that Hα is an outer measure on Rd , and with some more work that
the Borel sets in Rd are Hα -measurable (for a proof see ??). Thus, by Caratheodory’s
theorem, Hα is a σ-additive measure on the Borel sets.

55
Definition 7.1. The measure Hα on the Borel σ-algebra is called the α-dimensional
Hausdorff measure.

Before discussing the properties of Hα , let us see their relation to dimension.

Lemma 7.2. If α < β then Hα (A) ≥ Hβ (A), and furthermore

Hβ (A) > 0 =⇒ Hα (A) = ∞


Hα (A) < ∞ =⇒ Hβ (A) = 0

In particular,

dim A = inf{α > 0 : Hα (A) = 0} (6)


= sup{α > 0 : Hα (A) = ∞}

Proof. A calculation like the one in Lemma 2.8 shows that for δ ≤ 1,

Hβδ (A) ≤ δ β−α Hαδ (A)

The first inequality and the two implications follow from this, since δ β−α → 0 as δ → 0.
The second part follows from the first and the trivial inequalities Hα (A) ≥ Hα∞ (A),
Hβ (A) ≥ Hβ∞ (A).

The proposition implies that Hα is α-dimensional in the sense that every set of
dimension < α has Hα -measure 0. We will discuss its dimension more below. We note
a slight sharpening of (6):

Lemma 7.3. A is an α-null-set if and only if Hα (A) = 0.

We leave the easy proof to the reader.

Proposition 7.4. H0 is the counting measure, Hd is equivalent to Lebesgue measure,


and Hα is non-atomic and non σ-finite for or 0 < α < d.

Proof. The first statement is immediate since since H0δ (A) = N (A, δ). It is clear from the
definition that Hα is translation invariant, and it is well known that up to normalization,
Lebesgue measure is the only σ-finite non-zero translation-invariant Borel measure on
Rd . It is easily shown that Hd (Br (0)) < ∞ for every r > 0, so Hd is σ-finite. Also, by
definition, Hdδ ≥ λ for every δ > 0, so Hd ≥ λ, and in particular Hd 6= 0. Hence Hd
is equal to a multiple of Lebesgue measure. Finally, Lemma 7.2 implies that Hα is not
equivalent to Hd for α < d, so it cannot be σ-finite, and one may verify directly that
Hα ({x}) = 0 for α > 0.

56
Exercises

1. Prove Proposition 7.2 in detail.

7.2 Properties of Hausdorff measures

We turn to the local properties of Hα . More precisely, since Hα is not Radon, we


consider its restriction to sets of finite measure. We will see that, in some respects, the
Hausdorff measures have are closer to Lebesgue measure than to arbitrary measures.

Definition 7.5. Given α > 0, a measure µ and x ∈ µ, the upper and lower α-
dimensional densities of µ at x are

µ(Br (x))
Dα+ (µ, x) = lim sup
r→0 (2r)α
µ(Br (x))
Dα− (µ, x) = lim inf
r→0 (2r)α

Note that (2r)α = |Br (x)|α .

Lemma 7.6. If Dα+ (µ, x) < ∞ then dim(µ, x) ≥ α and if Dα+ (µ, x) > 0 then dim(µ, x) ≤
α.

Proof. If Dα+ (µ, x) < t < ∞ then for small enough r we have µ(Br (x)) < t(2r)α . Taking
logarithms and dividing by log r we have

log µ(Br (x)) log 2α t


> +α
log r log r

for all small enough r, so dim(µ, x) ≥ α. The other inequality follows similarly.

The quantity Dα− is similarly related to the upper pointwise dimension. Of the two
quantities, Dα+ is more meaningful, as demonstrated in the next two theorems, which
essentially characterize measures for which Dα+ is positive and finite a.e..

Theorem 7.7. Let µ be a finite measure on Rd and A ⊆ Rd . If

C
Dα+ (µ, x) > s for all x ∈ A =⇒ Hα (A) ≤ · µ(A)
s

where C = C(d), and

1
Dα+ (µ, x) < t for all x ∈ A =⇒ Hα (A) ≥ · µ(A)
2α t

57
In particular, if

0 < inf Dα+ (ν, x) ≤ sup Dα+ (ν, x) < ∞ for all x ∈ A
x∈A x∈A

then µ ∼ Hα |A .

Proof. The proof is similar to that of Billingsley’s lemma, combined with an appropriate
covering lemma.
For the first statement fix an open neighborhood U of A, and for δ > 0 let

Eδ = {Br (x) ⊆ U : x ∈ A , 0 < r < δ , µ(Br (x)) > s|Bi |α }

By hypothesis Eδ is a Besicovitch cover of A. Apply the Besicovitch covering lemma to


obtain a sub-cover B1 , B2 , . . . A with multiplicity C = C(d). Hence
[ 1 X s X s
µ(U ) ≥ µ( Bi ) ≥ µ(Bi ) ≥ |Bi |α ≥ Hδα (A)
C C C

This holds for all δ > 0 so Hα (A) ≤ C


s µ(U ). Since U is any open neighborhood of A
and µ is Radon, we obtain the desired inequality.
For the second implication, for ε > 0 write

Aε = {x ∈ A : µ(Br (x)) < t · |Br (x)|α for all r < ε}


S
and note that A1/n increase to A, so it suffices to show that supn Hα ( A1/n ) ≥
2−α t−1 µ(A).
Fix n and δ < 1/2n and consider any cover E of A1/n by sets of diameter ≤ δ.
Replace each set E ∈ E that intersects A1/n with a ball centered in A1/n of radius |E|,
and hence of diameter 2|E| ≤ 2δ < 1/n. The resulting collection F of balls covers A1/n
and µ(F ) < t|F |α for F ∈ F , by definition of A1/n . Thus

X 1 X 1 X 1
|E|α ≥ α
|F | α
> α
µ(F ) ≥ α µ(A1/n )
2 2 t 2 t
E∈E F ∈F F ∈F

Taking the infimum over such covers E we have Hαδ (A1/n ) ≥ 2−α t−1 µ(A1/n ). Since this
holds for all δ < 1/2n we have Hα (A1/n ) ≥ 2−α t−1 µ(A1/n ). Letting n → ∞ gives the
conclusion.
For the last statement, note that the previous parts apply to any Borel subset of
A′ ⊆ A. Thus µ(A′ ) = 0 if and only if Hd (A′ ) = 0, that is, µ ∼ Hd |A .

We will use the theorem later to prove absolute continuity of certain measures with
respect to Lebesgue measure.

58
Theorem 7.8. Let A ⊆ Rd , α = dim A and suppose that 0 < Hα (A) < ∞. Let
µ = Hα |A . Then
2−α ≤ Dα+ (µ, x) ≤ C

for µ-a.e. x, and C = C(d).

Proof. Let
At = {x ∈ A : Dα+ (µ, x) > t}

Then by the previous theorem there is a constant C = C(d) such that

C α C
µ(At ) ≤ H (At ) = µ(At )
t t

Since µ < ∞, for t > C this is possible only if µ(At ) = 0. Thus

µ(x : Dα+ (µ, x) ≥ C) = lim µ(AC+1/n ) = 0


n→∞

The proof of the other inequality is analogous.

We remark that the constant C in Theorem 7.8 can be taken to be 1, but this
requires a more careful analysis, see ??. Any lower bound must be strictly less than 1
by Theorem 7.10 below. The optimal lower bound is not known.??

Corollary 7.9. If 0 < Hα (A) < ∞ then dim Hα |A = α.

Since Hd is just Lebesgue measure, when α = d the Lebesgue density theorem


tells us that a stronger form of Theorem 7.8 is true. Namely, for µ = Hd |A we have
Dd+ (µ, x) = Dd− (µ, x) = c · 1A (x) Hd -a.e. (the constant arises because of the way we
normalized the denominator in the definition of Dd± ). It is natural to ask whether the
same is true for Hausdorff measures, or perhaps even for more general measures. The
following remarkable and deep theorem provides a negative answer.

Theorem 7.10 (Preiss). If µ is a measure on Rd and limr→0 µ(Br (x))/rα exists µ-a.e.
then α is an integer and µ is Hausdorff measure on the graph of a Lipschitz function.

We will discuss a special case of this theorem later on.


We already saw that Hα is not σ-finite, and this makes it awkward to work with.
Nevertheless it is often considered the most “natural” fractal measure and much effort
has gone into analyzing it in various examples. The simplest of these are, as usual,
self-similar sets satisfying the open set condition. For these the appropriate Hausdorff
measure is positive and finite. There is a remarkable converse: if a self-similar set has
finite and positive Hausdorff measure in its dimension then it is the attractor of an IFS
satisfying the open set condition; see ??. There are also simple examples with infinite

59
Hausdorff measure; this is the case for the self-affine sets discussed in Section ??, see
??.
Another interesting result is that any Borel set of positive Hα measure contains a
Borel subset of positive finite Hα measure; see ??. Thus the measure in the conclusion
of Frostman’s lemma can always be taken to be the restriction of Hα to a finite measure
set. This lends some further support to the idea that Hα is the canonical α-dimensional
measure on Rd .
We end the discussion Hausdorff measures with an interesting fact that is purely
measure-theoretic. Recall that measure spaces (Ω, F, µ) and (Ω′ , F ′ , µ′ ) are isomorphic
if there is a bijection f : Ω → Ω such that f, f −1 are measurable, f induces a bijection
of F → F ′ , and f µ = µ′ .

Theorem 7.11. Let B denote the Borel σ-algebra of R and Bα its completion with
respect to Hα . If 0 ≤ α < β ≤ 1 then (R, B, Hα ) ∼
6 (R, B, Hβ ), but (R, Bα , Hα ) ∼
= =
(R, Bβ , Hβ ) are isomorphic for all 0 < α, β < 1.

8 Projections (Marstrand’s theorem)

Up until now we have viewed Rd primarily as a metric space with special combionatorial
properties (e.g. Besicovitch lemma). We now change perspective, and turn to questions
which involve, directly or indirectly, the group or vector structure of Rd .
In this section we examine the behavior of sets and measures under linear maps. For
simplicity we consider the case of linear maps R2 → R, although many of the results
extend to general linear maps Rd → Rk , and we shall sometimes state them this way.
The basic heuristic at play here is that when one projects a set or measure via a
linear map, the image should be “as large as possible”. We will see a number of such
statements.
We parametrize linear maps in various ways as is convenient, but in all the parame-
terizations measures on the space of linear maps will be equivalent, so statements that
hold for a.e. linear maps will be independent of the parametrization.

8.1 Dimension of projections

Denote the set of unit vectors in R2 by S 1 , and for u ∈ S 1 let πu : R2 → R denote the
lnear functional
πu (x) = x · u

Up to linear change of coordinates this is the orthogonal projection of x to the line Ru.

60
Lemma 8.1. Let f : X → Y be a Lipschitz map between compact metric spaces. Let
A ⊆ X and µ ∈ P(X). Then
1. dim f A ≤ dim{dim Y, dim A} .

2. dim f µ ≤ min{dim Y, dim µ}.

3. dim f µ ≤ min{dim Y, dim µ}


Proof. The bound dim f A ≤ dim A was proved in Lemma 2.10, and since f A ⊆ Y we
obviously have dim X ≤ dim Y , hence dim f X ≤ min{dim Y, dim X}. This proves (1).
If µ ∈ P(X) and ν = f µ, then the relation f Br (x) ⊆ BCr (f x) implies that
µ(Br (x)) ≤ ν(BCr (f x)). It follows that dim(ν, f x) ≤ dim(µ, x), so dim f µ ≤ dim µ.
On the other hand, ν is supported on Y , so dim ν ≤ dim Y . This proves (2).
The proof of (3) is similar to (2).

Thus, if we take the linear image of a set A or measure µ under a linear map, the
image will not be larger than the original object. The content of the following theorem
is that, typically, there is no other constraint.
Identify the set of unit vectors S 1 with angles [0, 2π), and the corresponding length
measure by λ.
Theorem 8.2 (Marstrand). If µ ∈ P(R2 ), then

dim πu µ = min{1, dim} for a.e. u ∈ S 1

and similarly for dim. In particular for any Borel set X ⊆ R2 ,

dim πu X = min{1, dim X} for a.e. u ∈ S 1


An analogous result holds for π : Rd → Rd and sets and measures in Rd , but we
will not prove it.
We emphasize that the theorem does not give any description of the directions u ∈ S 1
for which the conclusions hold, and neither does the proof give any hint how to identify
them. It may be that there are no “bad” u, or that this zero-measure set is actually
quite large (it may be dense, or have positive dimension). Identifying whether there are
any “bad” u and, if so, who they are, is often a very challenging problem.
The result for sets follows from the measure result using Frostman’s lemma, so it
suffices to prove the result for measures. For this we require the following definition.
Definition 8.3. For a compact metric space X and µ ∈ P(X), the t-energy of µ is
Z Z
1
It (µ) = dµ(x)µ(y)
d(x, y)t

61
In Rd this reduces to
Z Z
1
It (µ) = dµ(x)µ(y)
kx − ykt

where for concreteness we fix the Euclidean norm.


This intgral may be infinite. Note that

1. It (µ) < ∞ implies that µ is non-atomic.

2. The property that It (µ) is finite or infinite depends only on {(x, y) : d(x, y) ≤ 1}.

On this set, the integrand is increasing in t.

Therefore, if It (µ) < ∞ then Is (µ) < ∞ for all s < t.

3. In Rd , finiteness of It (µ) is independent of the norm.

Although dim µ is not quite characterized by the behavior of the function t 7→ It (µ), it
nearly is:

Proposition 8.4. Let µ ∈ P(X).

1. If It (µ) < ∞ then dim µ ≥ t.

2. If µ(Br (x)) ≤ c·rt for every x (with c independent of x) then Is (µ) < ∞ for s < t.

Proof. (1) Suppose dim µ < t. We wish to show that It (µ) = ∞. We may assume that
µ is non-atomic since otherwise this certainly holds.
Fix s > 0 such that dim µ < s < t.
Fix a µ-typical x. For any sequence 1 = r0 > q0 ≥ r1 > q1 ≥ . . . rn > qn → 0 we
have
Z Z
1 1
dµ(y) ≥ dµ(y)
d(x, y)t B1 (x) d(x, y)t
XZ

1
≥ dµ(x)
d(x, y)t
n=1 Brn (x)\Bqn (x))
X∞
1
= µ(Brn (x) \ Bqn (x))
(2rn )t
n=0

Since dim(µ, x) < s, there is a set A of positive µ-measure so that for every x ∈ A there
is a c = c(x) > 0 such that

µ(Bs (x)) > cr s for arbitrarily small s > 0

62
Fixing such an x ∈ A, we can choose a sequence of rn , qn satisfying

1
µ(Brn (x) \ Bqn (x)) ≥ µ(Brn (x)) > crns
2

Thus Z ∞ ∞
1 1 X 1 s X
dµ(y) ≥ cr = c rns−t = ∞
d(x, y)t 2t rnt n
n=0 n=0
R 1
Thus the integrand d(x,y)t dµ(y) in the definition of It (µ) is infinite on the positive-
meausre set A, so It (µ) = ∞.
(2) We perform essentially the same calculation. Let c, t be given. Let qn−1 = rn =
2−n and s < t. Then, given x,
Z Z
1 1
dµ(y) ≤ 1 + dµ(y)
d(x, y)s B1 (x) d(x, y)t
XZ

1
= 1+ dµ(x)
d(x, y)t
n=1 Brn (x)\Bqn (x))
X∞
1
≤ 1+ µ(Brn (x) \ Bqn (x))
qns
n=0
X∞
1
≤ 1+ µ(Brn (x))
qns
n=0

X
≤ 1+c· 2s(n+1) · 2−tn
n=1
X∞
≤ 1 + 2c · 2−(t−s)n
n=1

The last expression is a finite and bounded constant independent of x, hence Is =


RR
d(x, y)−s dµ(y) < ∞.

Corollary 8.5. For every Borel set A ⊆ Rd ,

dim A = sup{t ≥ 0 | ∃µ ∈ P(A) It (µ) < ∞}

Proof. Let E denote the set of t in the statement.


If t ∈ E then there exists µ ∈ P(A) withIt (µ) < ∞. Then, by the Proposition 8.4,
t ≤ dim µ. But µ is supported on A so dim µ ≤ dim A, hence t ≤ dim A. Thus also
sup E ≤ dim A.
On the other hand, by Frostman’s lemma, for every s < t we can find an s-regular
µ ∈ P(A) Thus, by Proposition 8.4, Is (µ) < ∞. This shows that s ∈ E, and since s < t
was arbitrary, sup E ≥ dim A.

63
Proof of the projeciton theorem. Let µ ∈ P(R2 ) and dim µ > t for some t < 1. Our aim
is to show that dim πu µ ≥ t for a.e. u ∈ S 1 .

We first claim that we can assume without loss of generality that It (µ) < ∞.

Indeed, dim µ > t means dim(µ, x) > t for µ-a.e. x, and this means the for µ-a.e. x
there exists c = c(x) such that µ(Br (x)) ≤ crt for all r > 0.

By (repeated application of) Egorov’s theorem, we can choose pairwise disjoint sets
S
An ⊆ R2 with µ( ∞n=1 An ) → 1, and such that the function c is bounded on each An .

The measures µ|An are t-regular by definition, hence It (µ|An ) < ∞ by Proposition
8.4.

On the other hand, if we knew for each n that dim πu (µ|An ) ≥ t for a.e. u then, then
P
for a.e. u the inequality would hold for all n, and, using the identity µ = ∞ n=1 µ|An ,
for a.e. u we would have
X
dim(πu µ) = dim πu ( µ|An )
X
= dim πu (µ|An )
= inf dim πu (µ|An )
n

≥t

Which is what we want.

Thus, we have reduced the theorem to the case It (µ) < ∞. which we now assume.

Write µu = πu µ. Note that


Z Z
1
It (µu ) = dµu (w)dµu (z)
|w − z|t
Z Z
1
= dµ(x)dµ(y)
|πu x − πu y|t
Z Z
1
= dµ(x)dµ(y)
|(x − y) · u|t

Integrating this with respect to u and the uniform measure on S 1 , we have


Z Z Z Z 
1
It (µu ) dλ(u) = dµ(x)dµ(y) du
|(x − y) · u|t

Using Fubini,
Z Z Z 
1
= du dµ(x)dµ(y)
|(x − y) · u|t

64
Now since t < 1, we have (using kuk = 1),
Z Z
1 1 2π
c′
du = (cos θ)−t dθ =
|u · v| t
kvkt 0 kvkt

for a constant c′ < ∞. Note that this identity is independent of u. Continuing the
previous integration,
Z Z Z  Z Z
1 ′ 1
du dµ(x)dµ(y) = c dµ(x)dµ(y)
|(x − y) · u| t |x − y|t
= c′ · It (µ)
<∞

By Fubini It (µu ) < ∞ for λ-a.e. u, so by the previous lemma, dim µu ≥ t, as


desired.

8.2 Absolute continuity of projections

Let A ⊆ R2 and π : R2 → R linear. Besides the dimension of πA, one may also be
interested in its topology (does it contain intervals?) or Lebesgue measure.
When dim A < 1 we have dim πA < 1, so Leb(A) = 0, and of course πA cannot
contain an interval.
It turns out that when t ≥ 1 there are two cases, depending on whether dim A = 1
or dim A > 1. In the latter regime there is an elegant answer to the measure question.

Theorem 8.6 (Marstrand). If A ⊆ R2 and dim A > 1 then Leb(πu A) > 0 for a.e.
u ∈ S 1 . Moreover if µ ∈ P(R2 ) and dim µ > 1 then πu µ  λ for a.e. u ∈ S 1 .

Proof. Let µ be an α-regular measure on A with α > 1. Write µu = πu µ. Recall that a


probability measure µ on R2 is absolutely continuous with respect to Lebesgue measure
if and only if
µt ((x − r, x + r))
lim inf < ∞ µ-a.e. x
r→0 r
Thus, absolute continuity of µt will follow from the (stronger) condition
Z
µt (πu (x) − r, πu (x) + r)
lim inf dµ(x) < ∞ µ-a.e. x
r→0 2r

Now,
Z
µt (πu (x) − r, πu (x) + r) = 1[πu (x)−r,πu (x)+r] (πu (y)) dµ(y)

65
and applying Fatou’s lemma, it is enough to prove that
Z Z
1
lim inf 1[πu (x)−r,πu (x)+r] (πu (y)) dµ(y) dµ(x) < ∞
r→0 2r

or: Z Z
1
lim inf 1{|πu (x)−πu (y)|≤r} dµ(y) dµ(x)
r→0 2r

This analysis gives a condition for absolute continuity of µu for fixed u ∈ S 1 . In order
to prove absolute continuity for a.e. u, it is enough to prove
Z  Z Z 
1
lim inf 1{|πu (x)−πu (y)|≤r} dµ(y) dµ(x) du < ∞
S1 r→0 2r

Applying Fatou again, followed by Fubini, we must show that


Z Z Z 
1
lim inf 1{|πu (x)−πu (y)|≤r} du dµ(y) dµ(x) < ∞
r→0 2r S1

But the inner integral is now easy to compute: for x, y fixed let v = x − y. Then
Z Z
1{|πu (x)−πu (y)|≤r} du = 1{|πu (v)|≤r} du
S1 S1

But
πu v = kvk cos ∠(u, v)

and this is < r if ∠(u, v) = O(r/ kvk), so


Z
r
1{|πu (x)−πu (y)|≤r} du ≤ c
S1 kx − yk

and hence
Z Z Z  Z Z
1 c
lim inf 1{|πu (x)−πu (y)|≤r} du dµ(y) dµ(x) ≤ lim inf dµ(y) dµ(x)
r→0 2r S1 r→0 kx − yk
= c · I1 (µ)
< ∞

by the assumption that µ is α-regular for α > 1. This completes the proof.

In the regime dim A = 1 there is more to be said. For a set A ⊆ R2 , we say that
A rectifiable if it is contained in a countable union of Lipscitz curves, and that it is
purely unrectifiable if H1 (A ∩ Γ) = 0 for every Lipschitz curve Γ. Every set A with
H1 (A) < ∞ may be decomposed as a union A = A′ ∪ A′′ , where A′ is contained in a
countable union of Lipschitz curves, and A′′ is purely unrectifiable.

66
Theorem 8.7 (Besicovitch). Let A ⊆ R2 be a set with 0 < H1 (A) < ∞.

1. If A not purely unrectifiable, then Leb(πu A) > 0 for all u ∈ S 1 except at most one
u.

2. If A is purely unrectifiable then Leb(πu A) = 0 for a.e. u ∈ S 1 .

(1) is not difficult but (2) is harder. For a proof, see ??.

9 Iterated function systems

The middle-α Cantor sets and some other example we have discussed have the common
feature that they are composed of scaled copies of themselves. In this section we will
consider such examples in greater generality.

9.1 Iterated function systems

Let (X, d) be a complete metric space. A contraction is a map f : X → X such that

d(f (x), f (y)) ≤ ρ · d(x, y)

for some 0 ≤ ρ < 1. In this case we say that f has contraction ρ. In general there is
no optimal value which can be called “the” contraction ratio, but if there is a minimal
such ρ, we call it the contraction ration of f .
Here we shall consider systems with more than one contractions:

Definition 9.1. An iterated function system (IFS) on (X, d) is a finite family


Φ = {φi }i∈Λ of strict contractions. We say that Φ has contraction ρ if each φi has
contraction ρ.
S
A compact non-empty set K ⊆ X satisfying K = i∈Λ φi K is called the attractor
of the IFS Φ = {φi }.

We study IFSs (and their attractors) with two goals in mind. First, it is natural
to ask about the dynamics of repeatedly applying maps from Φ to a point. When
multiple maps are present such a sequence of iterates need not converge, but we will
see that there is an “invariant” compact set, the attractor, on which all such sequences
accumulate. Second, we will study the fractal geometry of the attractor. Such sets are
among the simplest fractals but already exhibit nontrivial behavior.

67
Example: Contraction mapping theorem

For a map f : X → X, write


fk = f ◦ . . . ◦ f
| {z }
k

for the k-fold composition of f with itself. Recall the contraction mapping theorem:

Theorem 9.2 (Contraction mapping theorem). If (X, d) is complete metric space (X, d)
and f : X → X has contraction ρ < 1, then there is a unique fixed point x = f (x), and
for every y ∈ X we have d(x, f k (y)) ≤ ρk d(x, y) and in particular f k y → x.

If we think of the contratoin f as an IFS Φ = {f } with one map, then the fixed
point x is an attractor because
[
{x} = φ({x})
φ∈Φ

Furthermore, ever point y ∈ X converges to the attractor under iteration.

Example: Cα

It will be instructive re-examine the middle-α Cantor sets Cα from Section 2.1, where
one can find many of the features present in the general case. Write ρ = (1 − α)/2 and
consider the IFS Φ = {φ0 , φ1 } with contraction ρ given by

φ0 (x) = ρx
φ1 (x) = ρx + (1 − ρ)

Write I = [0, 1] and notice that φi I ⊆ I for i = 0, 1. Furthermore, the intervals I0 , I1 at


the stage 1 of the construction of Cα are just φ0 I and φ1 I, respectively, and it follows
that the intervals Ii,j at stage 2 is just φi φj I, and so on. For i1 . . . in ∈ {0, 1}n write

φi1 ...in = φi1 ◦ . . . ◦ φin

(note the order of application: the first function φi1 is the “outer” function). Then the
intervals Ii1 ...in at stage n of the construction are just the images φi1 ...in I. Writing Cα,n
for the union of the stage-n intervals, it follows that Cα,n+1 = φ0 Cα,n ∪ φ1 Cα,n , and
T
since Cα = ∞ n=1 Cα,n , we have

C α = φ1 C α ∪ φ2 C α

i.e. Cα is “invariant” under Φ.

68
Let us now examine the points x ∈ Cα . Each such point may be identified by the
sequence I n (x) of stage-n intervals to which it belongs. These intervals, which decrease
to x, are of the form
I n (x) = Ii1 ...in = φi1 φi2 . . . φin ([0, 1])

for some infinite sequence i1 i2 . . . ∈ {0, 1}N depending on x. If we fix any y ∈ [0, 1] then
φi1 ...in (y) ∈ φi1 ...in [0, 1] = I n (x), so φi1 ...in (y) → x as n → ∞.
The last calculation shows us two things. First, it shows that Cα is not just invariant
under application of φ0 , φ1 , but it actually “attracts” alll points y in [0, 1] under repeated
application. Second, we have found a “symbolic coding” of points x ∈ Cα by sequences
i1 i2 . . . ∈ {0, 1}N . In this example, we can be more explicit:

φi1 ...in (y) = ρ · φi2 ...in (y) + i1 (1 − ρ)


= ρ · (ρ · φi3 ...in (y) + i2 (1 − ρ)) + i1 (1 − ρ)
= ρ2 φi3 ...in (y) + (ρi2 + i1 )(1 − ρ)
..
.
X n
= ρ y + (1 − ρ)
n
ik ρk−1
k=1

P∞
Since ρn y → 0 it follows that x = (1 − ρ) k=1 ik ρ
k−1 , and we may thus identify Cα
with the set of such sums:
( ∞
)
X
Cα = (1 − ρ) ik ρk−1 : i1 i2 . . . ∈ {0, 1}N
k=1

(For example, for α = 0 we have ρ = 12 , and we have just described the fact that every
x ∈ [0, 1] has a binary representation; and if α = 31 then ρ = 13 this is the well-known
P
fact that x ∈ C1/3 if and only if x = an 3−n for an ∈ {0, 2}, that is, C1/3 is the set
of numbers in [0, 1] that can be represented in base 2 using only the digits 0 and 2).
Incidentally, the calculation above shows that the limit of φi1 ...in (y) → x also for all
y ∈ R, not only y ∈ [0, 1].

9.2 Existence of the attractor

In the general setting, let Φ = {φi }i∈Λ is an IFS with contraction ρ on a complete metric
space (X, d). In this section we will show that an attractor exists. Our strategy is as
follows. Let 2X denote the space of compact, non-empty subsets of X. We introduce
the map Φe : 2X → 2X given by
[
e
Φ(A) = φi A
i∈Λ

69
e is a contraction
Then an attractor is precisely a fixed point of Φ. We will show that Φ
in an appropriately chosen complete metric on 2X ; then the existence and uniqueness of
e (respectively, attractor of Φ) follows fomr the contraction mapping
the fixed point of Φ
theorem.
The proof requires some preparation. Let (X, d) be a metric space. For ε > 0 write

A(ε) = {x ∈ X : d(x, a) < ε for some a ∈ A}

If A, B ⊆ X, we say that A is ε-dense in B if B ⊆ A(ε) , or equivalently, if for every


b ∈ B there is an a ∈ A with d(a, b) < ε.
The Hausdorff distance dH on 2X is defined by

dH (A, B) = inf{ε > 0 : A ⊆ B (ε) and B ⊆ A(ε) }

Thus, dH (A, B) < ε if A is ε-dense in B and B is ε-dense in A. Heuristically this means


that A, B look the same “at resolution ε”. This distance should not be confused with
the distance of a point from a set, defined as usual by

d(x, A) = inf{d(x, a) : a ∈ A}

In general, d(x, A) 6= d({x}, A), for example if x ∈ A and |A| ≥ 2 then d(x, A) = 0 but
d({x}, A) > 0.
If (X, d) is complete, then a closed set A is compact if and only if it is totally
bounded, i.e. for every ε > 0 there is a cover of A by finitely many sets of diameter ε.
The proof is left as an exercise.

Proposition 9.3. Let (X, d) be a metric space and dH as above.

1. dH is a metric on 2X .
T∞
2. If An ∈ 2X and A1 ⊇ A2 ⊇ . . . then An → n=1 An

3. If (X, d) is complete then dH is complete.

4. If (An ) ⊆ 2X converges then A is the set of accumulation points of sequences (an )


with an ∈ An .

5. If (X, d) is compact, (2X , d) is compact.

Proof. (1) Clearly d(A, B) ≥ 0. If x ∈ A \ B then, since B is closed, d(x, B) = δ > 0,


and hence A 6⊆ B (δ) , so d(A, B) > 0; this establishes positivity. Symmetry it trivial
from the definition. Finally note that (A(ε) )(δ) ⊆ A(ε+δ) , so A ⊆ B (ε) and B ⊆ C (δ)
implies A ⊆ C (ε+δ) . This leads to the triangle inequality.

70
T
(2) Suppose An are decreasing non-empty compact sets and let A = An 6= ∅.
Obviously A ⊆ An so for every ε > 0 we must show that An ⊆ A(ε) for all large
enough n. Otherwise, for some ε > 0, infinitely many of the sets A′n = An \ A(ε) would
be non-empty. Re-numbering we can assume all are non-empty. This is a decreasing
T
sequence of compact sets so A′ = ∞ ′ ′
n=1 An 6= ∅. But then A ⊆ X \ A
(ε) and also
T T
A′ = ∞ ′
n=1 An ⊆

n=1 An = A, which is a contradiction.
(3) Suppose now that (X, d) is complete and An ∈ 2X is a Cauchy sequence. Let
[
An,∞ = Ak
k≥n

We claim that An,∞ are compact. Since An,∞ is closed and X is complete, we need
only show that it is totally bounded, i.e. that for every ε > 0 there is a cover of An,∞
by finitely many ε-balls. To see this note that, since {Ai } is Cauchy, there is a k such
(ε/4)
that Aj ⊆ Ak for every j ≥ k. We may assume k ≥ n. Now by compactness we
Sk
can cover j=n Aj by finitely many ε/2-balls. Taking the cover by balls with the same
(ε/2)
centers but radius ε, we have covered Ak as well, and therefore all the Aj , j > k.
Thus An,∞ is totally bounded, and so compact.
T∞
The sequence An,∞ is decreasing so An,∞ → A = n=1 An,∞ . Since An is Cauchy,
it is not hard to see from the definition of An,∞ that d(An , An,∞ ) → 0. Hence An → A.
(4) Suppose An → A. If A′ denotes the set of accumulation points of sequences
S
an ∈ An , then An,∞ = A′ ∪ k≥n Ak so A′ ⊆ A. The reverse inequality is also clear, so
A = A′ .
(5) Suppose that X is compact. Let ε > 0 and let Xε ⊆ X be a finite ε-dense set
of points. One may then verify without difficulty that 2Xε is ε-dense in 2X , so 2X is
totally bounded. Being complete, this shows that it is compact.

Theorem 9.4. Let Φ = {φi }i∈I be an iterated function system on a complete metric
space X. Then there exists a unique compact set K ⊆ X such that
[
K= φi K
i∈Λ

Furthermore, for sompact ∅ 6= E ⊆ X,

e n E → K exponentially fast in the metric dH .


1. Φ

T∞ e n
2. If φi E ⊆ E for every i ∈ Λ, then K = n=1 Φ E.

e be as at the beginning of the section.


Proof. Let Φ

71
e is a contraction. Indeed, if dH (A, B) < ε then A ⊆ B (ε)
Let us first show that Φ
and B ⊆ A(ε) . Let φi has contraction ρi . Then

φi (A) ⊆ φi (B (ε) ) ⊆ φi (B)(ρi ε)

and similarly φi (B) ⊆ φi (A)(ρi ε) . Hence, writing ρ = max ρi ,


[ [
e
Φ(A) = φi (A) ⊆ ( e
φi (B))(ρε) = Φ(B) (ρε)

i∈Λ i∈Λ

e
and similarly Φ(B) e
⊆ Φ(A) e
(ρε) . Thus by definition, d(Φ(A), e
Φ(B)) ≤ ρε. Since ρ < 1,
e
we have shown that Φ has contraction ρ.

Existence and uniqueness of a fixed point for now follow from the contraction map-
e : 2X → 2X is a contraction. This proves existence
ping theorem using the fact that Φ
and uniquness of the attractor.

e n (E) → K exponentially is also a consequence of the contraction


The fact that Φ
mapping theorem.

e n E ⊇ . . . is a decreasing
For the last part, note the by assumption E ⊇ ΦE ⊇ . . . ⊇ Φ
T∞ e n e n E = K.
sequence, hence by the above and Proposition 9.3, n=1 Φ E = lim Φ

9.3 Cylinder sets

Let Φ = {φi }i∈Λ be an iterted function system. We can describe the points x ∈ K by
associating to them a (possibly non-unique) “name” consisting of a sequence of symbols
from Λ. For i = i1 i2 . . . in ∈ Λn it is convenient to write

φi = φi 1 ◦ . . . ◦ φi n

Given i ∈ ΛN , since for each n we have φin K ⊆ K, it follows by induction

φi1 ...in K = φi1 ...in−1 (φin K) ⊆ φi1 ...in−1 K

72
and so the sequence φi1 ...in K is decreasing. In fact,
[
K= φi (K)
i∈Λ
[ [
= φi 1 ( φi1 (K))
i1 ∈Λ i1 ∈Λ
[
= φi1 ◦ φi2 (K)
i1 ,i2 ∈Λ
[
= φi (K)
i∈λ2

and in genetal for every n


[
K= φi (K)
i∈Λn

Definition 9.5. Fix n ∈ N. Then the sets φi (K) for i ∈ Λn are called the n-th
generation cylinders of K; they are compact and their union is K.

9.4 Symbolic coding


We now develop the symbolic coding of the atttractor of an IFS in general, similarly to
the example of Cα given above.
Since φi1 ...in has contraction ρn we also have diam φi1 ...in K ≤ ρn diam K, so, using
T
completeness of (X, d), the intersection ∞ n=1 φi1 ...in K is nonempty and consists of a
single point, which we denote Φ(i). It also follows that for any x ∈ K,

Φ(i) = lim φi1 ...im (x)


n→∞

and, in fact, this holds for any y ∈ X since d(φi1 ...in x, φi1 ...in y) ≤ ρn d(x, y).
The order in which we apply the maps φi1 , φi2 , . . . is important for the conclusion
that limφi1 ...in (y) exists. If we were to define yn = φin ◦ . . . ◦ φi1 (x) instead, then in
general yn would not converge. For example, in Cα with the maps φ0 , φ1 , note that
φim in−1 ...i1 (0) belongs to [0, ρ] or [1 − ρ, 1] depending on whether in = 0 or 1, so if we
take the sequence (in ) = (0, 1, 0, 1, 0, 1, . . .) then φin ...i1 (0) will alternately be in [0, ρ]
and [1 − ρ, 1], and will not converge.
Having defined the map Φ : ΛN → K we now study some of its properties. Recall
that for i, j ∈ ΛN ,

d(i, j) = 2−N +1 where N ∈ N is the largest integer with i1 . . . iN = j1 . . . jN

Lemma 9.6. Suppose that Φ has contraction ρ. If i, j ∈ ΛN and i1 . . . iN = j1 . . . jN ,


then d(Φ(i), Φ(j)) < ρN · diam K. In particular Φ : ΛN → K is (Hölder) continuous.

73
Proof. Fix x ∈ K. For n > N ,

d(φi1 ...in x, φj1 ,...,jn y) = d(φi1 ...iN (φiN +1 ,...in x), φi1 ...iN (φjN +1 ,...jn x))
< ρN · d(φiN +1 ,...in x, φjN +1 ,...jn x)
< ρN · diam K

since φiN +1 ...in x ∈ K and similarly for y. Taking n → ∞ we have

d(Φ(i), Φ(j)) ≤ ρd(x,y) · diam K

as claimed.

Recall that given i = i1 . . . ik ∈ Λk , the cylinder set [i] ⊆ ΛN is the set of infinite
sequences extending i, that is,

[i1 . . . ik ] = {j ∈ ΛN : j1 . . . jk = i1 . . . ik }

Lemma 9.7. The n-cylinders of K are the Φ-images of an n-cylinder in ΛN .

Proof. An n-cylinders of K are the sets φi (K) for i ∈ Λn . Now, for a fixed y ∈ X,

φi1 ...in (K) = φi1 ...in (Φ(ΛN ))


[
= {φi1 . . . φin ( lim φj! . . . φjk (y))}
k→∞
j∈Λn
[
= { lim φi1 . . . φin φj! . . . φjk (y)}
k→∞
j∈Λn

= Φ([i])

as claimed.

ej : ΛN → ΛN denote the map (i1 i2 . . .) 7→ (ji1 i2 . . .). It is clear that this map
Let φ
is continuous (in fact it has contraction 1/2).

ej (i)) = φj (Φ(i)) for any j ∈ Λ and i ∈ ΛN .


Lemma 9.8. Φ(φ

Proof. Fix x ∈ K. Since Φ(i) = limn→∞ φi1 ◦ . . . ◦ φin x, by continuity of φj ,

φj (Φ(i)) = φj ( lim φi1 ◦ . . . ◦ φin x)


i→∞
= lim φj ◦ φi1 ◦ . . . ◦ φin x
i→∞
= Φ(ji1 i2 i3 . . .)

as claimed.

74
The following observation may be of interest. Given IFSs Φ = {φi }i∈Λ and Ψ =
{ψi }i∈Λ on spaces (X, d) and (Y, d) and with attractors KX , KY , respectively, define a
morphism to be a continuous onto map f : KX → KY such that f φi = ψi f . Then what
e = {φ
we have shown is that there is a unique morphism from the IFS Φ ei }i∈Λ on ΛN to
any other IFS.

9.5 Stationary measures


Recall that the support of a Borel measure µ on X is
[
supp µ = X \ {U : U is open and µ(U ) = 0}

This is a closed set supporting the measure int he sense that µ(X \ supp µ) = 0, and is
the smallest closed set with this property (in the sense of inclusion).

Theorem 9.9. Let p = (pi )i∈Λ be a probability vector. Then there exists a unique Borel
probability measure µ on K satisfying
X
µ= pi · φi µ
i∈Λ

If p is positive then supp µ = K.

e denote the product measure on ΛN with marginal p. Note that


Proof. Let µ
X
e=
µ pi · φ
ei µ
e
i∈Λ

because on the right hand side, all summands give mass zero to sequences beginning
with i0 except for the term pi0 · φ
ei0 µ whose weight is pi0 , and all terms agree on the
later coordinates and are equal to the product measure.
Let µ = Φe
µ be the projection to K. Applying Φ to the identity above and using the
ei = φi Φ gives the desired identity for µ.
relation Φφ
For uniqueness, suppose that µ satisfies the desired relation on K. Then we can lift
µ to a measure µ e0 on ΛN such that Φe µ0 = µ (see the Appendix). Now µ e0 need not
P
satisfy the analogous relation, but we may define µ e1 = i∈Λ pi · φ ei µ
e0 , and note that
P
Φeµ1 = µ. Continue to define µ e2 = i∈Λ pi · φ
ei µ
e2 , etc., and each of these measures
satisfies Φe en → µ
µn = µ. Each of these measures is mapped by Φ to µ, but µ e in the
e is the product measure with marginal p. Since Φ is continuous the
weak sense, where µ
relation Φe
µn = µ passes to the limit, so µ = Φe
µ. This establishes uniqueness.
Finally, note that for a compactly supported measure ν and continuous function f
P
we have supp f ν = f supp ν. Thus the relation µ = pi · φi µ and positivity of p implies

75
that
[ [
supp µ = supp φi µ = φi supp µ
i∈Λ i∈Λ

and supp µ = K follows by uniqueness of the attractor.


P
Definition 9.10. The probability measure µ satisfying µ = i∈Λ pi · φi µ is called the
p-stationary measure for Φ.

Theorem 9.11. Let µ be a p-stationary measure for Φ. Let ω1 , ω2 , . . . be an i.i.d.


sequence of random variables with distribution p, Then for every x ∈ X, with probability
1, we have
1 X
N
δφωn φωn−1 ...φω1 x −−−→ µin the weak-* topology
N n→∞
n=1

The proof uses that fact that every accumulation point of the averages above con-
verge to a p-stationary measure, which, by uniqueness, must be µ. We do not prove
this in this course.

10 Self-similar sets and measures


In this section we shall bound the dimension of the attaactor of an IFS, and compute it
exactly in some cases. We will obtain the upper bound quite generally, for any system
of contractions with specified contraction ratios. For a more precise result, however, we
will have to specialize to Rd .
Our first result, however, holds very genreally.

Definition 10.1. Let Φ = {φi }i∈Λ be an IFS and let ri denote the contraction ratio of
φi . The similarity dimension of Φ = {φi }i∈Λ , denoted dims Φ, is the unique solution
of the equation
X
ris = 1

When K is the attractor of an IFS Φ, we shall often write dims K instead of dims Φ.
This is ambiguous because there can be multiple IFSs with the same attractor, but this
should not cause ambiguity.
In order to study the dimension of a set one needs to construct efficient covers of
it. Since the attractor K of an IFS can be written as unions of the sets φi! ...in K , these
sets are natural candidates. Recall that the cylinder φi K for i ∈ Λn is the image of
the cylinder [i1 , . . . , in ] ⊆ ΛN via the symbolic coding map Φ. But note that, while the
level-n cylinder sets in ΛN are disjoint, this is not generally true for cylinders of K.
S
Let Λ∗ = ∞ n
n=0 Λ denote the set of finite sequences over Λ (including the empty
sequence ∅, whose associated cylinder set is [∅] = ΛN ). A section of Λ∗ is a subset

76
S ⊆ Λ∗ such that every i ∈ ΛN has a unique prefix in S. It is clear that, if S is a
section, then the family of cylinders {[s] : s ∈ S} is a pairwise disjoint cover of ΛN , and
conversely any such cover corresponds to a section.

Theorem 10.2. Let K be the attractor for an IFS Φ with contraction ρ on a complete
metric space (X, d). Then dimM K ≤ dims K.

Proof. Let D = diam K. For r > 0 let Sr ⊆ Λ∗ denote the set of the finite sequences
i = i1 . . . ik such that

1
ri = ri 1 · . . . · ri k < r ≤ ri1 · . . . · rik−1
D

Clearly Sr is a section of Λ∗ , so {[a] : a ∈ Sr } is a cover of ΛN and hence {φa K : a ∈ Sr }


is a cover of K by cylinder sets. Furthermore, if a ∈ Sr then φa K has diameter

diam φa K ≤ ra diam K < r

In order to get an upper bound on N (K, r), we need to estimate |Sr |. We do so


P
by associating to each a ∈ Sr a weight w(a) such that a∈Sr w(a) = 1, giving the
trivial bound |Sr | ≤ (mina∈Sr w(a))−1 . This combinatorial idea is best carried out by
introducing a probability measure on ΛN and defining w(a) = µ([a]); then the condition
P
a∈Sr w(a) = 1 follows automatically from the fact that {[a] : a ∈ Sr } is a partition
of ΛN .
We want to choose the measure so that [a], a ∈ Sr are all of approximately equal
mass. The defining property of Sr implies that ra = ra1 · . . . · rak , k = |a|, is nearly
independent of a ∈ Sr . This looks like the mass of [a] under a product measure but it
P
is not normalized. To normalize it let s be such that i∈Λ ris = 1, and let µ e be the
product measure on ΛN with marginal (ris )i∈Λ . Then for a = a1 . . . ak ∈ Sr ,

e([a]) = rasi . . . rask = (ra1 . . . rak )s


µ

so by definition of Sr , writing ρ = mini∈Λ ri ,

ρs · (r/D)s ≤ µ
e([a]) < (r/D)s

It follows that
Ds −s
e([a]))−1 ≤
N (K, r) ≤ |Sr | ≤ (min µ ·r
i∈Sr ρs
Thus
log N (K, r)
dimM K = lim sup ≤s
r→0 log(1/r)

77
as claimed.

The theorem gives an upper bound dimM K ≤ dims K. In general the inequality is
strict, but there is one important case where it holds, namely when the IFS consists
of similarities. Recall that a similarity is a map that satisfies d(f (x), f (y)) = r ·
d(f (x), f (y)) for a constant r > 0. One can show that every similarity of Rd is a linear
map of the form f : x 7→ rU x + a, where r > 0, U is an orthogonal matrix, and a ∈ Rd .
If we assume that 0 < r < 1 ten f is a contraction and r is its contraction ratio.

Definition 10.3. A self-similar set on Rd is is the attractor of an IFS Φ = {φi }


where φi are contracting similarities.

Examples of self-similar Cantor sets include the middle-α Cantor set which we saw
above, and also the famous Sierpinski gasket and sponge and the Koch curve.
It is also necessary to impose some assumptions on the global properties of Φ. We
mention two such conditions.

Definition 10.4. Let Φ = {φi }i∈Λ be an IFS.

1. Φ satisfies the strong separation condition if φi (K) ∩ φj (K) = ∅ for distinct


i, j ∈ Λ.

2. Φ satisfies the open set condition if there is a non-empty open set U such that
φi U ⊆ U and φi U ∩ φj U = ∅ for distinct i, j ∈ Λ.

Strong separation implies the open set condition, since one can take U to be any
sufficiently small neighborhood of the attractor. The IFS given above for the middle-α
Cantor set satisfy strong separation when α > 0. The IFS Φ = {x 7→ 21 x, x 7→ 1
2 + 12 x}
satisfies the open set condition with U = (0, 1), but not strong separation, since the
1
attractor is [0, 1] and its images intersect at the point 2. This example shows that
the open set condition is a property of the IFS rather than the attractor, since [0, 1] is
also the attractor of Φ′ = {x 7→ 23 x, x 7→ 1
3 + 23 x}, which does not satisfy the open set
condition.

Theorem 10.5. If K is a self-similar measure generated by Φ = {φi }i∈Λ and if Φ


satisfies the open set condition, then dim K = dimM K = dims Φ.

Proof. Let ri be the contraction ratio of φi and s = dims Φ. For r > 0 define the section
Sr ⊆ Λ∗ and the measure µ
e on ΛN as in the proof of Theorem 10.2. These were chosen
e[a] ≤ rs and |φa K| ≤ rs for a ∈ Sr . We shall prove the following claim:
so that µ
Claim 10.6. For each r > 0 and x ∈ Rd the ball Br (x) intersects at most O(1) cylinder
sets φa K, a ∈ Sr .

78
Once this is proved the theorem follows from the mass distribution principle for the
µ, since then for any x ∈ Rd ,
measure µ = Φe

e(Φ−1 Br (x))
µ(Br (x)) = µ
X
≤ e[a]
µ
a∈Sr : φa K∩Br (x)̸=∅
= O(1) · r s

To prove the claim, let U 6= ∅ be the open set provided by the open set condition, and
note that φa U ∩ φb U = ∅ for a, b ∈ Sr (we leave the verification as an exercise). Fix
some non-empty ball D = Br0 (y0 ) ⊆ U and a point x0 ∈ K and write

δ = d(x0 , y0 )
D = diam K

We also write Da = φa D, ya = φa y0 and xa = φa x0 .


Fix a ball Br (x) and consider the disjoint collection of balls

D = {D : a ∈ Sr and Da ∩ Br (x) 6= ∅}

We must bound |D| from above. By definition of Sr , the radius ra of the ball Da =
φa D ∈ D satisfies
ρr0 r < ra ≤ r0 r

and in particular Da has volume O(1)rd . The center ya of Da is φa y0 , so

d(ya , xa ) = d(φa y0 , φa x0 ) ≤ rd(y0 , x0 ) = rδ

Finally, diam φa K ≤ rD. Since Br (x) and Da intersect, we conclude that

d(x, ya ) ≤ r + rD + rδ

so
Da = Bra (ya ) ⊆ Br(1+D+δ+r0 ) (x)

Both of these balls have volume O(1)rd , and the balls Da ∈ D are pairwise disjoint;
thus |D| = O(1), as desired.

To what extent is the theorem true without the open set condition? We can point
to two cases where the inequality dim K < dims K is strict. First, it may happen that
dims K > d, whereas we always have dimM K ≤ d, since K ⊆ Rd . Such an example is,

79
for instance, the system x 7→ 2x/3, x 7→ 1 + 2x/3. The second trivial case of a strong
inequality is when there are “redundant” maps in the IFS. For example, let φ : x 7→ x/2
and Φ = {φ, φ2 }. Then K = {0} is the common fixed point of φ and φ2 , so dimM K = 0,
whereas dims K > 1. More generally,

Definition 10.7. An IFS Φ = {φi }i∈Λ has exact overlaps if there are distinct se-
quences i, j ∈ Λ∗ such that φi = φj .

If i, j are as in the definition, then by considering the contraction ratios of φi , φj


it is clear that neither of the sequences i, j is a prefix of the other. Therefore one can
choose a section S ⊆ Λ∗ which includes both i and j. It is not hard to verify that
Ψ = {φu }u∈S is an IFS with the same attractor and the same similarity dimension as
Φ. But then K is also the attractor of Ψ′ = {φu }u∈S\{i} , which has smaller similarity
dimension. Therefore dimM K ≤ dims Ψ′ < dims Φ.

Conjecture 10.8. If an IFS on R does not have exact overlaps then its attractor K
satisfies dim K = min{1, dims Φ}.

This conjecture is still not resolved, but some things are known; we will return to
them later in the course. In dimensions d ≥ 2 it is false as stated, but an analogous
conjecture is open.

Exercises

1. Show that if {φi }i∈Λ is an IFS in a complete metric space, then there is a closed
ball B 6= ∅ such that φi B ⊆ B for all i ∈ Λ.

2. Let B be a ball as in (1). Show that


[
Kn = φi1 . . . φin (B)
i∈Λn

is a decreasing sequence and that



\
K= Kn
n=1

is the attractor of the IFS by showing that it is non-empty (use completeness)


S
and satisfies the identity K = i∈Λ φi K (this gives another proof of existence of
the attractor).

3. Show that if K is the attractor of an IFS {φi }i∈Λ and let S be is a section of the
tree Λ∗

80
(a) Show that
[
K= φi1 ...iℓ (K)
i1 ...iℓ ∈S

and that an analogous formula holds for self-similar measures.


(b) Show that K is the attractor of {φi }i∈S .
This shows that the attractor can be generated by many IFSs (although
often, they are closely related to each other).

4. Consider the IFS on R given by the maps

1
φ1 (x) = x
10
1 9
φ2 (x) = x +
10 10
1 9
φ3 (x) = x +
10 100

(a) What is the similarity dimension of this system?


(b) Show that φ1 φ2 = φ3 φ1 .
(c) Show that the attractor can be generated by a set of 8 maps with contraction
100. Use this to get a new bound on its dimension that is smaller than the
similarity dimension you found in (a).

11 Entropy
11.1 The entropy function
Let (X, B, µ) be a probability space. A partition of X is a countable collection A of
pairwise disjoint measurable sets whose union has full measure (this really should be
called a partition modulo µ, but we omit this by convention).
Given a partition A, how can we quantify how spread out a measure µ is among the
atoms (or, conversely, how concentrated it is on a small number of atoms?). We could
count the number of sets A ∈ A of positive mass, but this is very crude, since it ignores
how mass is distributed. For example, in a partition with two sets the sets might both
have mass 1/2, or one could have mass 0.9999 and the other mass 0.0001. The first of
these is spread evenly among the elements of the partition; the second, much less. The
purpose of entropy is quantify this distinction.

Definition 11.1. The entropy of µ with respect to a partition A is the non-negative


number
X
Hµ (A) = − µ(A) log µ(A)
A∈A

81
By convention the logarithm is taken in base 2 and 0 log 0 = 0. For infinite partitions
Hµ (A) may be infinite.

Observe that Hµ (A) depends only on the probability vector (µ(A))A∈A . For a
probability vector p = (pi ) it is convenient to introduce the notation
X
H(p) = H(p1 , p2 , . . .) = − pi log pi
i

Examples

1. For p = (t, 1 − t) the entropy

H(p) = −t log t − (1 − t) log(1 − t)

depends on the single variable t. It is an exercise in calculus to verify that h(·) is


strictly concave on [0, 1], increasing on [0, 1/2] and decreasing on [1/2, 1], with a
unique maximum value h(1/2) = 1 and minimal values h(0) = h(1) = 0. Thus,
the entropy is minimal when all the mass is on one atom of A, and maximal when
it is uniformly distributed.

2. Let µ be Lebesgue measure on [0, 1]. Then


X
H(µ, Dn ) = − µ(D) log µ(D)
D∈Dn
X
=− 2−n log 2−n
D∈Dn , D⊆[0,1]

= −2n 2−n · (−n) log 2


=n

3. Let ν be Lebesgue normalized measure on a closed interval I of length 2−n . Then


I intersects exactly two dyadic cells in Dn , say D′ and D′′ . Write p = ν(D′ ), so
ν(D′′ ) = 1 − p. Then
X
H(ν, Dn ) = − ν(D) log ν(D)
D∈Dn

= −ν(D′ ) log ν(D′ ) − ν(D′′ ) log ν(D′′ )


= h(p)

As we have seen, this value is between 0 and 1.

We begin to develop the formal properties of entropy.

82
Proposition 11.2 (Propertis of entropy). (E1) 0 ≤ H(µ, A) ≤ log |A|, and

(a) H(µ, A) = 0 if and only if µ(A) = 1 for some A ∈ A.


(b) H(µ, A) = log |A| if and only if µ is uniform on A, that is, µ(A) = 1/|A| for
A ∈ A.

(E2) H(·, A) is concave: for probability measures µ, ν on and 0 < α < 1,

H(αµ + (1 − α)ν, A) ≥ αH(µ, A) + (1 − α)H(ν, A)

with equality if and only if µ(A) = ν(A) for all A ∈ A.

Proof. We first prove (E2). Since f (t) = −t log t is strictly concave, by Jensen’s in-
equality,
X
H(αµ + (1 − α)ν, A) = f (αµ(A) + (1 − α)ν(A))
A∈A
X
≥ (αf (µ(A)) + (1 − α)f (ν(A)))
A∈A
= αH(µ, A) + (1 − α)H(ν, A)

with equality if and only if µ(A) = ν(A) for all A ∈ A.


The left inequality of (E1) is trivial. For the right one consider the function F (p) =
P
− A∈A pA log pA on the simplex ∆ of probability vectors p = (pA )A∈A . It suffices to
show that the unique maximum is attained at p∗ = (1/|A|, . . . , 1/|A|), since F (p∗ ) =
log |A|. The simplex ∆ is compact and convex and by (E2), H(·) is strictly concave,
so there is a unique maximizing point p∗ . Since F (·) is invariant under permutation
of its variables, the maximizing point p∗ must be similarly invariant, and hence all its
coordinates are equal. Since it is a probability vector they are are equal to 1/|A|.

11.2 Conditional entropy

For a set B of positive measure, let µB denote the conditional probability measure
µB (C) = µ(B ∩ C)/µ(B). Note that for a partition B we have the identity
X
µ= µ(B) · µB (7)
B∈B

The conditional entropy of µ and A given another partition B = {Bi } is defined by


X
H(µ, A|B) = µ(B)H(µB , A)
B∈B

83
This is just the average over B ∈ B of the entropy of A with respect to the conditional
measure on B.

Definition 11.3. Let A, B be partitions of the same space.

1. The join of A, B is the partition

A ∨ B = {A ∩ B : A ∈ A , B ∈ B}

2. A refines B (up to measure 0) if every A ∈ A is contained in some B ∈ B (up to


measure 0).

3. A, B are independent if µ(A ∩ B) = µ(A)µ(B) for A ∈ A, B ∈ B.

Proposition 11.4 (Propertis of conditional entropy). (E2’) H(·, A|B) is concave:

(E3) H(µ, A ∨ B) = H(µ, A) + H(µ, B|A)

(E4) H(µ, A ∨ B) ≥ H(µ, A) with equality if and only if A refines B up to µ-measure


0.

(E5) H(µ, A∨B) ≤ H(µ, A)+H(µ, B) with equality if and only if A, B are independent.
Equivalently, Hµ (B|A) ≤ H(B) with equality if and only if A, B are independent.

Proof. For (E3), by algebraic manipulation,

H(µ, A ∨ B) =
X
= − µ(A ∩ B) log µ(A ∩ B)
A∈A,B∈B
X X µ(A ∩ B)  µ(A ∩ B)

= µ(A) − log − log µ(A)
µ(A) µ(A)
A∈A B∈B
X X X X
= − µ(A) log µ(A) µA (B) − µ(A) µA (B) log µA (B)
A∈A B∈B A∈A B∈B

= H(µ, A) + H(µ, B|A)

The inequality in (E4) follows from (E3) since H(µ, B|A) ≥ 0; there is equality if and
only if H(µA , B) = 0 for all A ∈ A with µ(A) > 0. By (E1), this occurs precisely when,
on each A ∈ A with µ(A) 6= 0, the measure µA is supported on a single atom of B,
which means that A refines B up to measure 0.
αη(B) (1−α)θ(B)
For (E2’), let µ = αη+(1−α)θ. For B ∈ B let βB = µ(B) . Then (1−βB ) = µ(B)
and
µB = βB ηB + (1 − βB )θB

84
hence

H(µ, A|B) =
X
= µ(B)H(µB , B) by definition
B∈B
X
≥ µ(B) (βB H(ηB , A) + (1 − βB )H(θB , A)) by concavity (E2)
B∈B
X
= (αη(B) · H(ηB , A) + (1 − α)θ(B) · H(θB , A))
B∈B

= αH(η, A|B) + (1 − α)H(θ, A|B)

Finally, (E5) follows from (E1) an (E2). First,


X X
H(µ, B|A) = µ(B)H(µB , A) ≤ H( µ(B)µB , A) = H(µ, A)
B∈B B∈B

It is clear that if A, B are independent there is equality. To see this is the only way it
occurs, one again uses strict convexity of H(p), which shows that the independent case
is the unique maximizer.

There are a few generalizations of these properties which are useful:

Proposition 11.5 (More properties of conditional entropy). 1. H(A, B|C) = H(B|C)+


H(A|B ∨ C).

2. If C refines B then H(A|C) ≤ H(A|B).

Proof. For (1) expand both sides using (E3). For (2) use (1), noting that C = C ∨ B
since C refines B.

The definition of entropy may seem somewhat arbitrary. However, up to normaliza-


tion, it is essentially the only possible definition if we wish (E1)–(E6) to hold. A proof
of this can be found in Shannon’s original paper on information theory and entropy, [?].

11.3 Commensurable partitions and geometric operations


Definition 11.6. Given an (implicit) measure µ and partitions A, B of the underlying
space, we say that

1. A <k B if every atom of A can be covered by at most k atoms of B up to µ-measure


0.

2. A, B are k-commensurable if A <k B and B <k A. We then write A =k B.

85
Observe that A refines B precisely when A <1 B.
The following lemma will be used extensively later in calculations to replace parti-
tions with more convenient ones.

Lemma 11.7. If A <k B then

H(µ, A|B) = O(log k)

Furthermore, if A =k B and C =k D, then

H(µ, A) = H(µ, B) + O(log k)


H(µ, C|A) = H(µ, D|B) + O(log k)

Proof. If A <k B then for every B ∈ B the partition A has k atoms mode µB so
H(µB , A) ≤ log k. Then the first bound follows from the definition of conditional
entropy.
Assuming A =k B, by the chain rule for entropy and the first part of the lemma,

H(µ, A ∨ B) = H(µ, A) + H(µ, B|A)


= H(µ, A) + O(log k)

Reversing the roles of A, B we get

H(µ, A ∨ B) = H(µ, B) + O(log k)

Combining these two equations gives the second identity.


Assuming also C =k D, and noting that then A ∨ C =k B ∨ D, we get

H(µ, C|A) = H(µ, C ∨ A) − H(µ, A)


= H(µ, D ∨ B) − H(µ, B) + O(log k)
= H(µ, D|B) + O(log k)

as claimed.

We apply the notion of commensurability to explai the effect of geometric operations


on the entropy of a measure in Rd .
Let

Ta x = x + a
St (x) = tx

86
denote the operations of translation and scaling.
First note that for any measure µ on Rd , and partition A of Rd and any map
f : Rd → Rd , writing f −1 A = {f −1 A}A∈F for the pull-back of a partition, we have
X
H(f µ, A) = − µ ◦ f −1 (A) log µ ◦ f −1 (A)
A∈A
X
=− µ ◦ (f −1 A) log µ(f −1 A)
A∈A

= H(µ, f −1 A)

Lemma 11.8. Let µ ∈ P(Rd ).

1. For every isometry f (and in particular, if f is a translation or an orthogonal


map),

H(f µ, Dn ) = H(µ, Dn ) + O(1)

2. For t > 0,

H(St µ, Dn ) = H(µ, Dn ) + O(| log t|)


= H(µ, D[tn] ) + O(1)

Note that for simplicity of notation we work with partitions Dn instead of D2n but
of corse the former includes the latter as a special case.

Proof. For (1), let f be an isometry, and note that Dk and f Dk are Od (1)-commensurable,
giving the first statement.
For (2) we note that each D ∈ Dn intersects at most Od (max{t, t−1 }) atoms of
St−1 Dn and vice versa, so they are commensurable with this constant; hence

H(St µ, Dn ) = H(µ, Dn ) + O(| log t|)

Similarly, we may note that St−1 Dn and D[tn] are O(1)-commensurable, with analogous
result.

Lemma 11.9. Let µ be a measure on Rd supported on a ball of radius ≤ 2−m and


n ≥ m. Then
H(µ, 2−n |2−m ) = H(µ, 2−n ) + O(1)

Proof. This follows since, modulo µ, the partitions D2m and the trivial partition are
commensurable.

87
11.4 Entropy and dimension

Definition 11.10. Let If µ is a measure on Rd let

H(µ, 2−n ) = H(µ, D2n )

We call this the scale-n entropy of µ. We also write

H(µ, 2−n |2−m ) = H(µ, D2n |D2m )

Note that if µ is a measure on [0, 1)d , then

0 ≤ H(µ, 2−n ) ≤ log #{D ∈ D2n | D ∩ [0, 1)d 6= ∅} ≤ log 2dn = dn

so
1
0≤ H(µ, 2−n ) ≤ d
n
The same bound holds if µ is supported on any dyadic interval of length 1. More
generally, if µ is compactly supported then it gives mass toa finite number L of diadic
intervals in D0 , so

1 1 1
H(µ, 2−n ) = H(µ, 2−n |20 ) + H(µ, 20 )
n n n
X 1 1
= µ(D) H(µD , 2−n ) + log L
n n
D∈D2n

so asymptotically 1 −n )
n H(µ, 2 is in the range [0, d]. In this and many other ways,
1 −n )
n H(µ, 2 behaves asymptotically like dimension. In fact, we give it a name:

Definition 11.11. The entropy dimension of a measure µ in Rd is

1
dime µ = lim H(µ, 2−n )
n→∞ n

assuming the limit exists. We define the upper and lower entropy dimensions using
lim sup and lim inf, respectively; these are always defined and the entropy dimension is
defined when they are equal, in which case all three are the same.

Theorem 11.12. Let µ ∈ P([0, 1]d ) be a measure. Then

1
dim µ ≤ lim inf H(µ, 2−n )
n→∞ n

88
Furthermore, if for some α ≥ 0 we have

log µ(Br (x))


lim =α µ-a.e. x (8)
r→0 log(1/r)

Then
1
lim H(µ, 2−n ) = α
n→∞ n

Proof. Assume dim µ = α, so that in particular

dim(µ, x) ≥ α

µ-a.e.
As usual let D2n denote the dyadic partition and recall that D2n (x) is the unique
element ot D2n cotnaining x. Then by Proposition 6.6

1
lim inf − log µ(D2n (x)) ≥ α µ-a.s.
n→∞ n

Integrating and applying Fatou’s lemma,


Z
1
lim inf − log µ(D2n (x)) ≥ α a.s.
n→∞ n

But
Z
1 1 X
log µ(D2n (x)) = µ(D) log µ(D)
n n
D∈Dn
1
= H(µ, 2−n )
n

This proves the first part of the theorem.


For the second statement, we only need to prove ≤ since ≥ follows from the first
part.
The analog of Proposition 6.6 holds for the limit of µ(Br (x))/ log(1/r) and not just
the lim inf; the proof is similar to the proof of that proposition.
Let ε > 0. Since
log µ(D2n (x))
lim =α µ-a.e.
n→∞ log n
for all large enough n we can find a set Dnε ⊆ D2n such that µ(∪Dnε ) > 1 − ε and

log µ(D)
− <α+ε for D ∈ Dnε
n

89
Write En = ∪Dnε . Then
(
1
µ(En ) µ(D) D ∈ Dnε
µEn (D) =
0 otherwise

so

1 1 X
H(µEn , 2−n ) = − µEn (D) log µEn (D)
n n
D∈Dn
X log (µ(D)/µ(En ))
=− µEn (D)
ε
n
D∈Dn
X  
log µ(En )
< µEn (D) (α + ε) +
ε
n
D∈Dn

<α+ε

On the other hand


H(µE c , Dn ) ≤ dn

since this is true for any meaure on [0, 1)d .


Finally, writing En = {E, E c }, we have

H(µ, D2n ) ≤ H(µ, D2n ∨ E)


= H(µ, E) + H(µ, D2n |E)
≤ 1 + µ(En ) · H(µE , D2n ) + µ(Enc ) · H(µEnc , D2n )
≤ 1 + H(µEn , 2−n ) + ε · nd

Dividing by n, sending n → ∞ and using our previous bounds we get

1
lim sup H(µ, 2−n ) ≤ α + O(ε)
n→∞ n

Since ε was arbitrary this completes the proof.

11.5 Entropy of self-similar measures


P
Let µ = i∈Λ pi · fi µ be a self-similar measure in Rd . Write kf k for the contraction
ratio of a similrity map and let
ρ = min kfi k
i∈Λ

so 0 < ρ < 1.

90
For each infinite sequence ω ∈ ΛN there is a minimal k = k(ω) such that

kfω1 ...ωk k < 2−m

Note that this implies


kfω1 ...ωk k ≥ ρ2−m

If η ∈ ΛN and η1 . . . ηk = ω1 . . . ωk , then k(η) = k(ω). Setting

Λm = {ω1 . . . ωk(ω) | ω ∈ ΛN }

we find that this is a section of the tree Λ∗ .


Recall that fi1 ...ik = fi1 ◦ . . . ◦ fik . Similarly let

pi1 ...ik = pi1 · . . . · pik

We have the following general result which may be applied to Λm :

Lemma 11.13. If Σ ⊆ Λ∗ is a section of the tree Λ∗ , then


X
pi = 1
i∈Λm

and
X
µ= pi · f i µ
i∈Σ

The proof is by induction on the height of the section (the maximal length of a word
in Σ). We leave it as an exercise.

Lemma 11.14 (Fekete’s lemma). Let (an )∞


n=1 be a sequence satisying

am+n ≤ am + an

Then an /n converges and lim n1 an = inf n n1 an (interpreted in the obvious way if 1


n an is
not bounded below).

1
Proof. We prove this in case the sequence n an is bounded below (this is the case in
our application to entropy). When it is not bounded the proof is similar. Then we can
define the real number
1
α = inf an
n∈N n

Let ε > 0 and let n0 be such that an0 /n0 < α + ε. For any n ≥ n0 write n = kn0 + r

91
with 0 ≤ r < n0 . Then

an ≤ an−n0 + an0
≤ an−2n0 + 2an0
...
≤ ar + kan0
1
= ar + kn0 · an
n0 0

Writing c = max{a0 , . . . , an0 −1 }, noting that k ≤ n/n0 , and using an0 /n0 < α + ε we
conclude that
an < c + n(α + ε)

dividing by n we have
1 c
an ≤ α + ε +
n n
so lim sup n1 an ≤ α + ε and since ε > 0 is arbitrary, lim sup n1 an ≤ α. Of course
lim inf n1 an ≥ α since α is the infimum of the sequence, and we conclude that lim n1 an =
α.

We return to self-similar measures.

Theorem 11.15. Let µ be a self-similar measure on Rd . Then the entropy dimension


of µ exists.

P
Proof. Let µ = i∈Λ pi · fi µ and write

αn = H(µ, 2−n )

Given m, n note that

αm+n = H(µ, 2−(m+n) )


= H(µ, 2−(m+n) ) + H(µ, 2−(m+n) |2−m )
= αm + H(µ, 2−(m+n) |2−m )

92
Now,
X
H(µ, 2−(m+n) |2−m ) ≥ pi · H(fi µ, 2−(m+n) |2−m )
i∈Λm
X  
≥ pi · H(fi µ, 2−(m+n) ) + O(1)
i∈Λm
!
X
−(m+n)
= pi · H(fi µ, 2 ) + O(1)
i∈Λm

where in the first inequality we used concavity, and in the second we used Lemma ??.
Next, observe that for i ∈ Λm we have ρ2−m ≤ kfi k ≤ 2−m , so by Lemma ??,

H(fi µ, 2−(m+n) ) = H(µ, 2−n ) + O(1)


= αn + O(1)

Plugging this back into our previous estimate,


X
αm+n ≥ αm + pi · (αn + O(1))
i∈Λm

= αm + αn + O(1)

Let C > 0 denote the constant bounding the term O(1) above from both sides. Then
βn = αn − C satisfies

βm+n = αm+n − C
≥ αm + αn + O(1) + C
= (αm − C) + (αn − C) + (O(1) + C)
≥ βm + βn

Thus βn is a super-additive bounded non-negative sequence, so, applying Fekete’s lemma


to the subadditive sequence −βn we find tat lim m
1
βn exists. Since 1
n βn = n1 αn + O( n1 )
the same holds for αn .

We finish this section with an important estimate for the entropy dimension of a
self-similar measure. We first need a definition.
P
Definition 11.16. Let µ = i∈Λ pi · fi µ be a self-similar measure with fi = ri Ui + ai .
Then the Lyapunov exponent of µ is
X
λ(µ) = pi log ri
i∈Λ

93
Note that λ(µ) is negative. The Lyapunov exponent describes the average contrac-
e = pN denote the product measure on symbolic
tion of the system: indeed, letting µ
space (so µ = πe
µ),

1 1
log kfω1 ...ωn k = log rω1 rω2 . . . rωn
n n
1
= (log rω1 + log rω2 + . . . + log rωn )
n
→ λ(µ)µ-a.e. ω

by the strong law of large numbers. Thus means that

kfω1 ...ωn k = 2n(λ(µ)+o(1))

P
Proposition 11.17. Let µ = i∈Λ pi · fi µ be a self-similar measure. Then

H(p)
dim µ ≤ dime µ ≤
−λ(µ)

Proof. We only need to prove the right-hand inequality since the left one holds in
general. For n ∈ N let k(n) = [n/(−λ(µ))], so that fω1 ...ωk(n) = 2−n(1+o(1)) . Let Ek
denote the partition of ΛN into k-cylinders. We have

H(µ, 2−n ) = H(e


µ, π −1 D2n )
µ, π −1 D2n ∨ Ek(n) )
≤ H(e
µ, π −1 D2n |Ek(n) )
µ, Ek(n) ) + H(e
= H(e

e is a product measure, a simple calculation shows that


Now, since µ

µ, Ek(n) ) = k(n)H(p)
H(e

On the other hand,


X
µ, π −1 D2n |Ek(n) ) =
H(e µ[i] , π −1 Dn )
pi · H(e
i∈Λk(n)
X
= pi · H(πe
µ[i] , Dn )
i∈Λk(n)
Z
= µ[ω1 ...ωk(n) ] , 2−n )de
H(πe µ(ω)
Z
= H(fω1 ...ωk(n) µ, 2−n )de
µ(ω)

94
Since
fω1 ...ωk(n) = 2k(n)(λ(µ)+o(1)) = 2−n(1+o(1))

we have
H(fω1 ...ωk(n) µ, 2−n ) = o(n)

Hence
Z
1 1
µ, π −1 D2n |Ek(n) ) =
H(e H(fω1 ...ωk(n) µ, 2−n )de
µ(ω)
n n
Z
= o(1)de
µ

Also it is easy to see the integrand is bounded, so by bounded convergence the last
integral is o(1). Putting everything together we have

1 1
H(µ, 2−n ) = k(n)H(p) + o(1)
n n
H(p)
= + o(1)
−λ(µ)

as required.

Recall that we defined the dimilarity dimension of {fi }, fi = ri x + ai , to be the


P
solution s of i∈Λ ris = 1, and the self-similar measure µ to be the measure given by
the probability vector (pi ), pi = ris . Then the bound above is
P
H(p) − pi log pi
= P
−λ(p) − pi log ri
P
− pi log ris
= P
− pi log ri
P
− spi log ri
= P
− pi log ri
=s

So the theorem above says that dime µ ≤ s. This is the same upper bound we got for
the dimension of the attractor, and one can show (e.g. using lagrange multipliers) that
this probability vector maximizes −H(p)/λ(p) over all product measures. Thus, if for
this measure we show that dime µ = −H(p)/λ(p) then we will have proved that

s ≥ dim K ≥ dim µ ≥ dime µ = s

and so all are equalities.

95
12 Components and multiscale formula for entropy

12.1 Component measures

Given a probability measure µ and set A with µ(A) > 0, recall that the conditional
1
measure on A is µA = µ(A) µ|A .

Definition 12.1. The component measure of µ of level n is the measure

µx,n = µD2n (x)

Note that µx,n is supported on D2n (x). One can identify µx,n with the measure on a
sub-tree of the weighted dyadic tree reprenting µ. The node corresponds to the first n
binary digits of x.

Definition 12.2. For a probability measure µ and a finite set U ⊆ N of “levels”, the
component distribution is the probability distribution on components µx,n given by
choosing n ∈ U uniformly, and independently choosing x according to µ.

One should think of this as choosing a random node in the tree representation of
µ. Note that it is not the uniform distribution on nodes; the uniform dustribution is
skewed very strongly towards the leaves. The component distribution is uniform on (a
set of) levels, and in each level it chooses nodes according to µ.
Whenever µx,n (or similar symbols) appear inside the symbols E(. . .) or P(. . .), they
represent random variables chosen according to the component distribution. The set U
is indicated as necessary; if it is not indicated then the index n in µx,n is fixed. For
example, if A ⊆ P([0, 1]) is a set of measures (e.g. the set of purely atomic measures)
then

P(µx,n ∈ A) = µ(x : µx,n ∈ A)


Z
= 1A (µx,n ) dµ(x)

and

1 X
N
P0≤n≤N (µx,n ∈ A) = µ(x : µx,n ∈ A)
N +1
n=0
N Z
1 X
= 1A (µx,n ) dµ(x)
N +1
n=0

96
Similarly for a function f : P([0, 1]) → R,
Z
E(f (µx,n )) = f (µx,n ) dµ(x)

and Z
1 X
En∈U (f (µx,n )) = f (µx,n ) dµ(x)
|U |
n∈U

etc. Lastly, if two random variables µx,n , νy,n appear in the same expression they are
assumed that x, y are chosen independently.

12.2 Computing entropy from component entropies


Lemma 12.3. For any probability measure µ and any n ∈ N,

µ = E(µx,n )
P
Indeed this is just another way of writing µ = I∈D2 µ(I) · µI , which in turn follows
P
from the trivial decomposition µ = I∈D2n µ|I . Second,

Lemma 12.4. For any probability measure µ, any n ∈ N, and any partition A of R,

H(µ, A|D2n ) = E(H(µx,n , A)) (9)


P
Indeed, both are just another way of writing I∈D2n µ(I) · H(µI , A).

Proposition 12.5. Let µ ∈ P([0, 1)). For every m, n ∈ N,


 
1 1 m
H(µ, 2−n ) = E0≤i≤n H(µx,i , 2−(i+m)
) + O( )
n m n

Remark 12.6. The entropies 1 −(i+m) )


m H(µx,i , 2 appearing in the statement are the en-
tropy of the scale-2−i component at a scale that is a constant amount finer, i.e. scale
2−m · 2−i . In this sense, it views the component at finite resolution (relative to the scale
of the component).

Proof. Recall that for each j, by Equation (9),

E(H(µx,j , 2−(j+m) )) = H(µ, D2j+m |D2j )

Thus we must show

1 1 X 1 m
H(µ, 2−n ) = H(µ, D2i+m |D2i ) + O( )
n n m n
0≤i≤n

97
Let k = [n/m]. For every 0 ≤ u < m,

_
k
H(µ, D2u+mk ) = H(µ, D2u+im )
i=0
X
k
= H(µ, D2u ) + H(µ, D2u+(i+1)m |D2u+im ) (10)
i=1

Since µ ∈ P([0, 1)) and 0 ≤ u < m,

H(µ, D2−u ) = O(m)

Also, since |n − (u + mk)| < m,

H(µ, D2u+mk ) = H(µ, D2n ) + O(m)

Combining with the identity (10) above becomes

1 X1
m
1 m
H(µ, 2−n ) = H(µ, D2u+mk ) + O( )
n m n n
u=1

1 Xm
1 X m X
k−1
1 m
= H(µ, D2u ) + H(µ, D2u+(i+1)m |D2u+im ) + O( )
m n m n
u=1 u=1 i=0
X 1 m
= H(µ, D2(i+1)m |D2im ) + O( )
m n
0≤i≤n

as claimed.

13 Additive combinatorics

We shift focus temporarily to describe results from the field of additive combinatorics.

13.1 Sumsets and inverse theorems

The sum (or sumset, or Minkowski sum) of non-empty sets A, B ⊆ Rd is

A + B = {a + b : a ∈ A , b ∈ B}

Equivalently, for π(x, y) = x + y , we have

A + B = π(A × B)

98
Additive combinatorics, or at least an important chapter of it, is devoted to the study
of sumsets and the relation between the structure of A, B and A + B.
The so-called inverse problem asks, what structure we can deduce for sets A, B
such that A + B is “small” relative to the sizes of the original sets. The general flavor
of results of this kind is that, if the sumset is small, there must be an algebraic reason
for it. It will become evident later that this question comes up naturally in the study
of self-similar sets.

13.2 Trivial bounds

Assume that A, B are finite and non-empty. Then

max{|A|, |B|} ≤ |A + B| ≤ |A| · |B| (11)

The first inequality is an equality if and only if at least one of the sets is a singleton. The
right-hand inequality occurs precisely when each c ∈ A + B has a unique representation
as a + b for a ∈ A, b ∈ B (equivalently, π|A×B is injective).
The equality |A + B| = |A||B| can occur. For example for any b, n consider

A = {0, b, 2b, 3b, . . . , nb}


B = {0, 1, . . . , b − 1}

As another example, for “generic” pairs of sets one has |A + B| ∼ |A||B|. For
instance, when A, B ⊆ {1, . . . , n} are chosen randomly by including each 1 ≤ i ≤ n in
A with probability p and similarly for B, with all choices independent, there is high
probability that |A + B| ≥ c|A||B|. The question becomes, what can be said between
these two extremes.
This discussion motivates us to consider A + B to be “small” if |A + B|  |A||B|.

13.3 Small doubling and Freiman’s theorem

The classically studied case is when A = B ⊆ Zd and we assume that

|A + A| ≤ C|A| (12)

Here C is a constant, and where we think of A as large relative to C. Such sets are said
to have small doubling.
There are a number of simple examples in which small doubling occurs.

99
1. Consider A = {1, . . . , n}d ⊆ Zd . Then

|A + A| = |{2, . . . , 2n}d | ≤ 2d |A|

2. Example (1) can be pushed down from dimension d to any lower dimension as
follows. For i = 1, . . . , k, take intervals of integers Ii = {1, 2, . . . , ni }, and let
T : Zk → Zd be an affine map given by integer parameters, that is T x = Ax + b
for an integer matrix A and integer vector b. Suppose that T is injective on
I = I1 × . . . × Ik . Then A = T (I) ⊆ Zd satisfies

|A + A| = |T (I) + T (I)| = |T (I + I)| ≤ |I + I| ≤ 2k |I| = 2k |A|

(injectivity of T on I was used in the last equality). A set A as above is called a


(proper) generalized arithmetic progression (GAP) of rank k.

3. Finally, for any set with mall doubling one can pass to large subsets. Begin with
a set A satisfying |A + A| ≤ C|A| (e.g. a GAP) and choose any A′ ⊆ A has
cardinality |A′ | ≥ D−1 |A| for some D > 1. Then

|A′ + A′ | ≤ |A + A| ≤ C|A| ≤ CD|A′ |

One of the central results of additive combinatorics is Freiman’s theorem, which says
that, remarkably, these three procedures give all sets with small doubling.

Theorem 13.1 (Freiman). If A ⊆ Zd and |A + A| ≤ C|A|, then A ⊆ P for a GAP P


of rank C ′ and satisfying |P | ≤ C ′′ |A|, with C ′ = O(C(1 + log C)) and C ′′ = C O(1) .

For more information see [?, Theorem 5.32 and Theorem 5.33].
Combined with some standard arguments (e.g. the Plünnecke-Rusza inequality), the
symmetric version leads to an asymmetric versions: assuming A, B ⊆ Zd and C −1 ≤
|A|/|B| ≤ C, if |A + B| ≤ C|A| then A, B are contained in a GAP P of rank and ≤ C ′
and size |P | ≤ C ′ |A|, with similar bounds on the constants.

13.4 Power growth, the “fractal” regime


We shall be interested in a weaker growth condition, namely we consider finite sets
A ⊆ Z (or A ⊆ R) such that
|A + A| ≤ |A|1+δ (13)

This is the discrete analog of the condition

dimM (X + X) ≤ (1 + δ) dimM X (14)

100
for X ⊆ R. Indeed, given X ⊆ R and n ∈ N let Xn denote the set obtained by
replacing each x ∈ X with the closest point k/2n , k ∈ Z. Then |Xn | ∼ 2n(dimM X+o(1))
and |Xn + Xn | ∼ 2n(dimM (X+X)+o(1)) for large n, so (14) is equivalent to |Xn + Xn | ≲
|Xn |1+o(1) .
Here is a representative example of a set satisfying (13). Write In = {0, . . . , n − 1}
and let

Xn
1
An = I 2i
i=1
2i2
X
n
ai 2−i : 1 ≤ ai ≤ 2i }
2
= {
i=1

Pn −i2
Each term in the sum i=1 ai 2 determines uniquely a distinct block of binary digits
(the i-th term determines the digits at positions i2 − i to i2 ). Thus every element in An
has a unique representation as such a sum, so An is a GAP, being the injective image
P 1
of I2 × I4 × . . . × I2n by the map (x1 , . . . , xn ) 7→ x . The rank is n, so
i2 i 2

|An + An | ≤ 2n |An |

Since
Y
n ∑n
|An | = |In | = 2 i=1 i
= 2n(n+1)/2
i=1

we conclude
|An + An | = |An |1+o(1) as n → ∞

Do all examples of (14) look essentially like this one? One could try to answer this
using Freiman’s theorem, which applies with C = |A|δ . But all that one gets is that A
is a |A|O(δ) -fraction of a GAP or rank |A|O(δ) , and this gives rather coarse information
about A (note that, trivially, every set is a GAP of rank |A|).
Instead, it is possible to apply a multi-scale analysis, showing that at some scales
the set looks quite “dense” and at others quite “sparse”. See Theorem 13.6 below.

13.5 Convolution

The inverse theorem that we soon present is stated in the language of measures, instead
of sets. The measure-theoretic analog of the sumset operation is convolution.

Definition 13.2. For µ, ν ∈ P(R), the convolution µ ∗ ν ∈ P(R) is the image of µ × ν


under the map (x, y) 7→ x + y.

101
Thus, µ ∗ ν is characterized by the property that for f ∈ C0 (R),
Z Z Z
f dµ ∗ ν = f (x + y) dµ(x) dν(y) for f ∈ C0 (R)

Also, µ ∗ ν is the distribution of Z = X + Y where X, Y are independent random


variables with distributions µ, ν respectively.
For point masses µ = δx and ν = δy we have µ ∗ ν = δx+y , from which for atomic
P P
measures µ = x∈A µ(x)δx and ν = y∈B ν(y)δy we derive
 
X X X
µ∗ν = µ(x)ν(y)δx+y =  µ(x)ν(y) δz (15)
x∈A,y∈B z∈A+B x∈A,y∈B,x+y=z

Let µy denote the translate of µ by y,

µb (A) = µ(A − b)

Lemma 13.3 (Properties of convolution). Let µ, ν, τ ∈ P(Rd ). Then

1. (µ, ν) 7→ µ ∗ ν is multilinear.

2. µ ∗ ν = ν ∗ µ .

3. µ ∗ (ν ∗ τ ) = (µ ∗ ν) ∗ τ .
R
4. µ ∗ ν = µy dν(y), and in particular, µ ∗ δb = µb .

Proof. (1)-(3) may be verified easily from the definition. For (4),

µ ∗ ν(A) = µ × ν ({(x, y) | x + y ∈ A})


Z Z
= 1A (x + y) dµ(x) dν(y)
Z Z
= 1A−y (x)dµ(x)dν(y)
Z
= µ(A − y)dν(y)
Z 
= µy dν(y) (A)

The case ν = δy follows.

Proposition 13.4. Let ν, µ ∈ P([0, 1]) and let m, n ∈ N. Then


 
1 1 1 m
H(µ ∗ ν, 2−n ) ≥ E0≤i≤n H(µx,i ∗ νx,i , 2−(i+m) + O( + )
n m m n

102
Note that we only have an inequality, not an equality as we had in the corresponding
expression for the entropy of one measure.

Proof. Argguing as in Proposition 12.5,


 
1 1 m
H(µ ∗ ν, 2−n ) = E0≤i≤n H(µ ∗ ν, 2 −(i+m) −i
|2 ) + O( )
n m n

Using µ ∗ ν = π(µ × ν) we have for each k,

µ × ν = Ei=k (µx,i × νy,i )

Hence, writing π(x, y) = x + y,

µ ∗ ν = π(µ × ν)
= Ei=k (π(µx,i × νx,i ))
= Ei=k (µx,i ∗ νx,i )

By concavity of entropy,

1 1
H(µ ∗ ν, 2−(k+m) |2−k ) ≥ Ei=k ( H(µx,i ∗ νy,i , 2−(i+m) |2−i ))
m m

The measure µx,i ∗ νy,i = π(µx,i × νx,i ) has diameter O(2−i ), so we can remove condi-
tioning at scale 2−i with an O(1) error term, which after normalization is O(1/m):

1 1
≥ Ei=k ( H(µx,i ∗ νy,i , 2−(i+m) )) + O( )
m m

Inserting this into the first equation gives the claim.

13.6 Entropy growth under convolution


The analog of (11) for entropy is

Lemma 13.5. Let µ, ν ∈ P(R). Then

max{H(µ, 2−n ), H(ν, 2−n )} − O(1) ≤ H(µ ∗ ν, 2−n ) ≤ H(µ, 2−n ) + H(ν, 2−n ) + O(1)

When normalized by 1/n, the error terms become o(1) as n → ∞.


R
Proof. Using µ ∗ ν = µy dν(y), by concavity of entropy,
Z
−n
H(µ ∗ ν, 2 )≥ H(µy , 2−n )dν(y)

103
For any y ∈ R we have

H(µy , 2−n ) = H(Ty µ, 2−n ) = H(µ, 2−n ) + O(1)

Inserting this into the previous equation gives

H(µ ∗ ν, 2−n ) ≥ H(µ, 2−n ) − O(1)

reversing the roles of µ, ν gives the left inequality in the lemma.


For the right-hand inequality, note that=

H(µ ∗ ν, 2−n ) = H(π(µ × ν), 2−n )


= H(µ × ν, π −1 (D2n ))
≤ H(µ × ν, π −1 (D2n ) ∨ D22n )
≤ H(µ × ν, D22n ) + H(µ × ν, π −1 (D2n )|D22n )

Now, every atom of D22n intersects at most O(1) elements of π −1 (D2n ), so

H(µ × ν, π −1 (D2n )|D22n ) = O(1)

On the other hand, writing π1 , π2 for the coordinate projections, we have D22n =
π1−1 D2n ∨ π2−1 D2n and these partitions are independent for the product measure µ × ν,
hence

H(µ × ν, D22n ) = H(µ × ν, π1−1 D2n ∨ π2−1 D2n )


= H(µ × ν, π1−1 D2n ) + H(µ × ν, π2−1 D2n )
= H(π1 (µ × ν), D2n ) + H(π2 (µ × ν), D2n )
= H(µ, 2−n ) + H(ν, 2−n )

Inserting the last two bounds into the equation preceding them gives the second part
of the lemma.

For µ ∈ P([0, 1]), recall that the maximal value of 1 −n ) is ≈ 1, and that it
n H(µ, 2
is achieved (or nearly achieved) when µ is uniformly distributed (or nearly so) on the
atoms of D2n that meet [0, 1].
Similarly, 1 −n )
n H(µ, 2 if µ is “mostly concentrated on a small number of atoms”.
Observe that if µ is of one of the two types above then µ ∗ ν with have essentially
the same scale-n entropy as µ for every measure ν ∈ P([0, 1]). The following theorem
says that if µ ∗ ν is “not much bigger” than µ (in entropy terms), then a converse holds

104
“with high probaility on the component measures”: One can split the scales into two
kinds, the first where components of µ are with high probability close to uniform, and
those at which the components of ν are with high probability close to atomic, and that
these two types of scales cover almost all scales between 0 and n.

Theorem 13.6. For every ε > 0 and m > 0 there is a δ > 0 such that for all large
enough n the following holds. For any measures µ, ν ∈ P([0, 1]), if

1 1
H(µ ∗ ν, 2−n ) ≤ H(µ, 2−n ) + δ
n n

then there are disjoint sets I, J ⊆ {0, . . . , n} with |I ∪ J| ≥ (1 − ε)n, and

1
Pi∈I ( H(µx,i , 2−(i+m) ) > 1 − ε) > 1 − ε
m
1
Pj∈J ( H(µx,j , 2−(j+m) ) < ε) > 1 − ε
m

Corollary 13.7. Let τ > 0 be fixed and suppose ε < 41 τ . If, in the inverse theorem,
m, n are large relative to τ and if we know in addition that

1
H(ν, 2−n ) > τ
n

Then the set I in the conclusion satisfies with |I| > 41 τ n.

Proof. By the multiscale formula for entropy,

τ < H(ν, 2−n )


 
1 −(i+m) m
= E1≤i≤n H(νi,x , 2 ) + O( )
m n
   
|J| 1 n − |J| 1 m
= Ei∈J H(νi,x , 2−(i+m) ) + Ei∈{1...n}\J H(νi,x , 2−(i+m) + O( )
n m n m n
|J| n − |J| 1 m
<ε + + O( + )
n n m n

Using ε < 14 τ , and assuming as we may that the error term is < 41 τ , we rearrange and
get

1 1 1
|J| < 1 + τ −τ + τ
n 4 4
τ
=1−
2

105
Thus

1 1
|I| ≥ 1 − |J| − ε
n n
1
> τn
4

We will discuss this inverse theorem later in more detail.

13.7 Application to self-similar measures

Fix an IFS Φ = {fi }i∈Λ with fi = rx + ai (for simplicity we are assuming that all maps
contract by the same amount r).
S P
Let X = i∈Λ fi X be a self-similar set and µ = i∈Λ pi · fi µ a self-similar measure
for p = (pi )i∈Λ .
We return now to the conjecture that we stated earlier, that

dim X = min{1, dims Φ}

and
H(p)
dim µ = min{1, }
−λ(p)
unless there are exact overlaps, i.e. fi = fj for some distinct i, j ∈ Λ∗ . As we saw, if we
choose pi = ridims Φ (in our case, pi = 1/|∆| is uniform), then dims µ − dims X and so
the statement for X follows from the statement for µ.
We introduce a measure quantifying how far exact overlaps are from occurring:

Definition 13.8. For n ∈ N,

∆n = min{|fi (0) − fj (0)| | i, j ∈ Λn , i 6= j}

Lemma 13.9. ∆n → 0. Furthermore, exact overlaps occur if and only if ∆n = 0 for


all large enough n.

Proof. The first statement is obvious since fi (0), i ∈ Λn all lie in the attactor.
For the second statement note that fi contracts by rn for all i ∈ Λn , so fi (0)
determines fi . Thus, if ∆n = 0 then exact overlaps occur. Conversely, if fi = fj for
distinct i ∈ Λk and j ∈ Λℓ , then neither i, j extend the other, for then fi , fj would have
different contraction ratios. Then ij 6= ji and ij, ji ∈ Λk+ℓ show that ∆k+ℓ = 0.

106
Definition 13.10. We say that Φ satisfies exponential separation (ES) if there
exists ρ > 0 with
∆n > ρ n

Equivalently if ∆n > rk for some k ∈ N.

Theorem 13.11. If Φ has exponential separation, then dimM X = dims Φ and dime µ =
H(p)
min{1, −λ(p) }.

The result for sets follows from the result for measures, as explained above.

The theorem holds also for IFSs with non-uniform contraction but we focus for
simplicity on the simpler case above. We will show that this theorem follows from the
inverse theorem presented above. This involves showing how the assumption dime µ <
min{1, dims Φ} implies that there are convolutions of µ with measures of substantial
entropy for which no entropy growth occurs; and then showing that the fact that µ is
self-similar rules out the possibility that this can happen.

Let
c = − log2 r

Then that for i ∈ Λℓ we have


kfi k = rℓ = 2−cℓ

Define the approximations of µ at scale rm by


X
µ(n) = pi · δfi (0)
i∈Λn

Lemma 13.12. For any m, n ∈ N,

µ = µ(m) ∗ Srm µ (16)

Proof. Notice that


fi (x) = rm x + fi (0)

so
fi µ = Tfi (0) Srm µ = Srm µ ∗ δfi (0)

107
Hence
X
µ= pi · f i µ
i∈Λm
X
= pi · Srm µ ∗ δfi (0)
i∈Λm
X
= Sr m µ ∗ pi · δfi (0)
i∈Λm ,

= Srm µ ∗ µ(m)

as claimed.

It will be convenient to define entropy at “scales” that are not powers of 2. Thus
we define for all t > 0,
H(µ, t) = H(µ, 2[log t] )

Next, we study the effect of convolving two measures “of different scales”.

Lemma 13.13. Let θ ∈ P([0, 1]) and ν ∈ P(R). Let t < s and assume that ν is
supported on a set of diameter O(s). Then

H(θ ∗ ν, t) = H(θ, s) + Ei=[log s] (H(θx,i ∗ ν, t) + O(1)

In particular,

H(θ ∗ ν, s) = H(θ, s)
H(θ ∗ ν, t) ≥ H(θ, s) + H(ν, t) − O(1)

Proof. We prove the first identity:

H(θ ∗ ν, t) = H(θ × ν, σ −1 (D2−[log t] ))


= H(θ × ν, σ −1 (D2−[log s] )) + H(θ × ν, σ −1 (D2−[log t] )|σ −1 (D2−[log s] ))
= H(θ × ν, π1−1 (D2−[log s] )) + H(θ × ν, σ −1 (D2−[log t] )|σ −1 (D2−[log s] )) + O(1)
= H(θ, s) + H(θ × ν, σ −1 (D2−[log t] )|σ −1 (D2−[log s] )) + O(1) (17)

Now, the support of θ × σ is a rectangle of dimensions 1 × s, and on this rectangle the


partitions σ −1 (D2−[log s] ) and π1−1 (D2−[log s] ) are O(1)-commensurable. Therefore, using

108
Lemma ??,

H(θ × ν, σ −1 (D2−[log s] )) = H(θ × ν, π1−1 (D2−[log s] )) + O(1)


= H(π1 (θ × ν), (D2−[log s] )) + O(1)
= H(θ, s) + O(1) (18)

Similarly,

H(θ × ν, σ −1 (D2[t] ) | σ −1 (D2[s] )) = H(θ × ν, σ −1 (D2[t] ) | π1−1 (D2[s] )) + O(1)


X
= θ × ν(I) · H((θ × ν)I , σ −1 (D2[t] )) + O(1)
I∈π1−1 (D2[s] )
X
= θ(I) · H(θI × ν, σ −1 (D2[t] )) + O(1)
I∈D2[s]

= Ei=[log s] θ(I) · H(θI × ν, σ −1 (D2[t] )) + O(1)
= Ei=[log s] (H(θx,i ∗ ν, t) + O(1)

Inserting this into the equation (18) gives the desired equality.
For the second identity apply the first with s = t to get

H(θ ∗ ν, t) = H(θ, s) + Ei=[log s] (H(θx,i ∗ ν, t) + O(1)


= H(θ, s) + O(1)

where we used the fact that both θx,i and ν are supported on sets of diameter O(s), so
the same holds for θx,i ∗ ν and hence H(θx,i ∗ ν, s) = O(1).
For the thirs identity, note that for every x, i we have H(θx,i ∗ ν, t) ≥ H(ν, t) + O(1).
Inserting this into the first identity in the lemma gives the third.

1 (m) , r m )
Corollary 13.14. limm→∞ log(1/r m ) H(µ = dime µ

Proof. Using the first part of the lemma and the identity µ = µ(m) ∗ Srm µ and the fact
that Srm µ is supported on a set of diameter O(rm ),

1 1 1
H(µ(m) , rm ) = H(µ, rm ) + O( )
log(1/rm ) log(1/rm ) log(1/rm )
→ dime µ

as required.

109
Corollary 13.15. For every k ∈ N,

1  
lim Ei=[cm] H((µ(m) )x,i ∗ Srm µ, rkm ) = dime µ
m→∞ c(k − 1)m

In particular, given δ > 0, for large enough m we have


 
1
Pi=[cm] H((µ(m) )x,i ∗ Srm µ, rkm ) < (1 + δ) dime µ >1−δ
ckm

Proof. Apply the previous lemma with θ = µ(m) , ν = Srm µ and with s = rm , t = rkm .
We get

o(m) + ckm dime µ = H(µ, t)


 
= H(µ(m) , s) + Ei=[log s] H((µ(m) )x,i ∗ Srm µ, rkm ) + O(1)
 
= cm dime µ + Ei=[log s] H((µ(m) )x,i ∗ Srm µ, rkm ) + o(m)

Subtracting and dividing by c(k − 1)m gives the first statement.


For the second statement, recall that convolution cannot significantly decrease en-
topy, so for evecy component µx,i we have

H((µ(m) )x,i ∗ Srm µ, rkm ) ≥ H(Srm µ, rkm ) + O(1)


≥ H(µ, r(k−1)m ) + O(1)
= c(k − 1)m dime µ + o(m)

Thus in the first part of the lemma, we have an average over components whose value
is within o(1) if the mean. Therefore for large m the second statement follows.

Lemma 13.16. If dime µ < dims Φ and if ∆n satisfies exponential separation, then
there is a constant τ > 0 and k ∈ N such that for all m ∈ N,
 
1
Pi=[cm] H((µ(m) )x,i , rkm ) > τ >τ
c(k − 1)m

Proof. Since all maps in the IFS contract by r, we have

H(p)
dims µ =
log(1/r)

Let ε > 0 be such that

H(p)
dime µ < dims µ − ε = −ε
log(1/r)

110
Then

H(µ(m) , rm ) = cm dime µ + O(1)


H(p)
< cm( − ε) + O(1)
log(1/r)
= m(H(p) − cε)

Let k be such that ∆m > rkm for all m (this k depends on Φ but not on µ or m). Then
every partition into rkm -intervals separated points for µ(m) , and hence

H(µ(m) , rkm ) = mH(p)

Therefore
 
(m)
Ei=[cm] H(µx,i , rkm ) = H(µ(m) , rkm |rm )

= H(µ(m) , rkm ) − H(µ, rm )


> mH(p) − m(H(p) − cε) + O(1)
= cεm + O(1)

1 (m) km
It follows that there exists τ = τ (ε) such that cm H(µx,i , r ) > τ with probability
> τ , as required.

We return to Theorem 13.11. Suppose that Φ satisfies expoenential separation and

H(p)
dime µ < min{1, }
log(1/r)

Let k, τ be as in previous lemma. Then we know from the lemma and the previous
corollary that for any δ > 0, as soon as m is large enough,
 
Pi=[cm] H((µ(m) )x,i , rkm ) > τ > τ

and  
1
Pi=[cm] H((µ(m) )x,i ∗ Srm µ, rkm ) < (1 + δ) dime µ >1−δ
ckm
Taking δ < τ we can find a component ν ′ = (µ(m) )x,i belonging to both events above;

111
i.e.

1
H(ν ′ , rkm ) > τ
c(k − 1)m
1
H(ν ′ ∗ Srm µ, rkm ) < (1 + δ) dime µ
c(k − 1)m

Applying S1/rm to all measures above, and writing ν = S1/rm ν ′ and n = m(k − 1), we
have derived the following conclusion:

Corollary 13.17. Suppose that dime µ < dims Φ and that ∆n is exponentially separated.
Then there exists ℓ ∈ N and τ > 0 such that, for every δ > 0, for all sufficiently large
n, there exists ν = νn ∈ P([0, 1]) such that

1
H(ν, r n ) > τ
cn
1 1
H(µ ∗ ν, r n ) < H(µ, rn ) + δ
cn cn

By Theorem 13.6 and the corollary following it, this can only happen if for all m
and all sufficiently large n there exists I ⊆ {1, . . . , n} with |I| > 14 τ n and such that

1
Pi∈I ( H(µx,i , 2−(i+m) ) > 1 − ε) > 1 − ε
m

The proof of Theorem 13.6 is completed by showing that

Proposition 13.18. For large m and n, such a set I cannot exist.

13.8 The Kaimanovich-Vershik lemma

Lemma 13.19. Let Γ be a countable abelian group and let µ, ν ∈ P(Γ) be probability
measures with H(µ) < ∞, H(ν) < ∞. Let

δk = H(µ ∗ (ν ∗(k+1) )) − H(µ ∗ (ν ∗k )).

Then δk is non-increasing in k. In particular,

H(µ ∗ (ν ∗k )) ≤ H(µ) + k · (H(µ ∗ ν) − H(ν)).

This lemma above first appears in a study of random walks on groups by Kaı̆manovich
and Vershik [?]. It was more recently rediscovered and applied in additive combinatorics
by Madiman and his co-authors [?, ?] and, in a weaker form, by Tao [?], who later made
the connection to additive combinatorics. For completeness we give the short proof here.

112
Proof. Let X0 be a random variable distributed according to µ, let Zn be distributed
according to ν, and let all variables be independent. Set Xn = X0 + Z1 + . . . + Zn , so
the distribution of Xn is just µ ∗ ν ∗n . Furthermore, since G is abelian, given Z1 = g, the
distribution of Xn is the same as the distribution of Xn−1 + g and hence H(Xn |Z1 ) =
H(Xn−1 ). We now compute:

H(Z1 |Xn ) = H(Z1 , Xn ) − H(Xn )


= H(Z1 ) + H(Xn |Z1 ) − H(Xn )
= H(ν) + H(µ ∗ ν ∗(n−1) ) − H(µ ∗ ν ∗n ). (19)

Since Xn is a Markov process, given Xn , Z1 = X1 − X0 is independent of Xn+1 , so

H(Z1 | Xn ) = H(Z1 | Xn , Xn+1 ) ≤ H(Z1 | Xn+1 ).

Using (19) in both sides of the inequality above, we find that

H(µ ∗ ν ∗(n−1) ) − H(µ ∗ ν ∗n ) ≤ H(µ ∗ ν ∗n ) − H(µ ∗ ν ∗(n+1) ),

which is the what we claimed.

For the analogous statement for the scale-n entropy of measures on R we use a
discretization argument. For m ∈ N let

k
Mm = { : k ∈ Z}
2m

denote the group of 2m -adic rationals. Each D ∈ Dm contains exactly one x ∈ Mm .


Define the m-discretization map σm : R → Mm by σm (x) = v if Dm (x) = Dm (v), so
that σm (x) ∈ Dm (x).
We say that a measure µ ∈ P(Rd ) is m-discrete if it is supported on Mm . For
arbitrary µ its m-discretization is its push-forward σm µ through σm , given explicitly
by:
X
σm µ = µ(Dm (v)) · δv .
d
v∈Mm

Clearly Hm (µ) = Hm (σm µ).

Lemma 13.20. Given µ1 , . . . , µk ∈ P(R) with H(µi ) < ∞ and m ∈ N,

|Hm (µ1 ∗ µ2 ∗ . . . ∗ µk ) − Hm (σm µ1 ∗ . . . ∗ σm µk )| = O(k/m).

Pk
Proof. Let π : Rk → R denote the map (x1 , . . . , xk ) 7→ i=1 xi . Then µ1 ∗ . . . ∗ µk =

113
(m) (m)
π(µ1 × . . . × µk ) and µ1 ∗ . . . ∗ µk = π ◦ σm
k (µ × . . . × µ ) (here σ k : (x , . . . , x ) 7→
1 k m 1 k
(σm x1 , . . . , σm xk )). Now, it is easy to check that

|π(x1 , . . . , xk ) − π ◦ σm
k
(x1 , . . . , xk )| = O(k)

so the desired entropy bound follows from Lemma ?? (??).

Proposition 13.21. Let µ, ν ∈ P(R) with Hn (µ), Hn (ν) < ∞. Then

k
Hn (µ ∗ (ν ∗k )) ≤ Hn (µ) + k · (Hn (µ ∗ ν) − Hn (µ)) + O( ). (20)
n

e = σn (µ) and νe = σn (ν), Theorem 13.19 implies


Proof. Writing µ

H(e ν ∗k )) ≤ H(e
µ ∗ (e µ) + k · (H(e
µ ∗ νe) − H(e
ν )).

For n-discrete measures the entropy of the measure coincides with its entropy with
respect to Dn , so dividing this inequality by n gives (20) for µ
e, νe instead of µ, ν, and
without the error term. The desired inequality follows from Lemma 13.20.

We also will later need the following simple fact:


Corollary 13.22. For m ∈ N and µ, ν ∈ P([−r, r]d ) with Hn (µ), Hn (ν) < ∞,

1
Hm (µ ∗ ν) ≥ Hm (µ) − O( ).
m
R
Proof. This is immediate from the identity µ ∗ ν = µ ∗ δy dν(y), concavity of entropy,
and Lemma ?? (??) (note that µ ∗ δy is a translate of µ).

14 Appendix
14.1 Integration of measures
Let (X, B), (Y, C) be measurable spaces.
Let µ : Y → P(X, B) be a function mapping y ∈ Y to a measure µy ∈ P(X, B).
We say that µ is measurable if for every A ∈ B,

y 7→ µy (A)

is measurable as a function Y → R.
Given a meausre ν on (Y, C), we define a function µ : B → [0, ∞] by
Z
µ(A) = µy (A)dν(y)for A ∈ B

114
The integral is well-defined by integrability. This is a measure since
Z Z
µ(∅) = µy (∅)dν(y) = 0dν(y) = 0

and if A1 , A2 , . . . ∈ B are pairwise disjoint,


[ Z [
µ( An ) = µy ( An )dν(y)
Z X
= µy (An )dν(Y )
XZ
= µy (An )dν(y)
X
= µ(An )

using monotone convergence to exchange integration and summation.


Note that if µy is ν-a.s. a probability measure then so is µ.

Examples
P
1. If µ1 , µ2 . . . are measures on (X, B) then µn is a measure; it arises as above by
taking Y = N, ν =counting measure, and µ(n) = νn .

2. Every measure µ on (X, B) can be written as


Z
µ= δx dµ(x)

Indeed, the function x 7→ δx is measurable because δx (A) = 1A (x) so x 7→ δx (A)


is just the indicator function 1A , which is measurable for A ∈ B. Then we have
Z Z Z 
µ(A) = 1A (x)dµ(x) = δx (A)dµ(x) = µx dµ(x) (A)

3. Let X = [0, 1]2 and let µx denote Lebesgue measure λ1 on the interval {x} × [0, 1]
(i.e. the push-forward of Lebesgue measure on [0, 1] to R2 via t 7→ (x, t)).
R
Let Y = [0, 1] with Lebesgue measure λ. Then µ = µx dλ(x) is 2-dimensional
Lebesgue measure λ2 on X, since for A ⊆ X,
Z Z
2
λ (A) = 1A (x, y)dλ1 (x)dλ1 (y) by Fubini
Z
= µx (A)dλ1 (x)

115
14.2 The weak-* topology
We defined convergence of measures on symbolic spaces. Below we summarize the
general case.
Definition 14.1. Let X be a compact metric space and P(X) the spoace of Borel
probability measure on X. The weak-* topology on P(X) is the weakest topology with
R
respect to which µ 7→ f dµ is continuous for every f ∈ C(X).
Proposition 14.2. Let X be a compact metric space. Then P(X) is metrizable and
compact in the weak-* topology.
Proof. Using the Stone-Weierstrass theorem fix a {fi }∞
i=1 a countable dense subset
{fi }∞
i=1 of the unit ball in C(X). Define a metric on P(X) by


X Z Z
−i
d(µ, ν) = 2 | fi dµ − fi dν|
i=1

It is easy to check that this is a metric. We must show that the topology induced by
this metric is the weak-* topology.
R R
If µn → µ weak-* then fi dµn − fi dµ → 0 as n → ∞, hence d(µn , µ) → 0.
R R
Conversely, if d(µn , µ) → 0, then fi dµn → fi dµ for every i and therefore for every
linear combination of the fi s. Given f ∈ C(X) and ε > 0 there is a linear combination
g of the fi such that kf − gk∞ < ε. Then
Z Z Z Z Z Z Z Z
| f dµn − f dµ| < | f dµn − gdµn | + | gdµn − gdµ| + | gdµ − f dµ|
Z Z
< ε + | gdµn − gdµ| + ε

and the right hand side is < 3ε when n is large enough. Hence µn → µ weak-*.
Since the space is metrizable, to prove compactness it is enough to prove sequential
compactness, i.e. that every sequence µn ∈ P(X) has a convergent subsequence. Let
V = spanQ {fi }, which is a countable dense Q-linear subspace of C(X). The range of
each g ∈ V is a compact subset of R (since X is compact and g continuous) so for each
R
g ∈ V we can choose a convergent subsequence of gdµn . Using a diagonal argument
R
we may select a single subsequence µn(j) such that gµn(j) → Λ(g) as j → ∞ for every
g ∈ V . Now, Λ is a Q-linear functional because
Z
Λ(afi + bfj ) = k lim (afi + bfj ) dµn(k)
Z Z
= lim a fi dµn(k) + b fj dµn(k)
k→∞
= aΛ(fi ) + bΛ(fj )

116
Λ is also uniformly continuous because, if kfi − fj k∞ < ε then
Z
|Λ(fi − fj )| = lim (fi − fj ) dµn(k)
k→∞
Z
≤ lim |fi − fj |dµn(k)
k→∞
≤ ε

Thus Λ extends to a continuous linear functional on C(X). Since Λ is positive (i.e. non-
negative on non-negative functions), sos is its extension, so by the Riesz representation
R R R
theorem there exists µ ∈ P(X) with Λ(f ) = f dµ. By definition gdµ − gdµn(k) → 0
as k → ∞ for g ∈ V , hence this is true for the fi , so d(µn(k) , µ) → 0 Hence µn(k) → µ
weak-* .

14.3 Lifting measures


Let π : X → Y be a continuous map between compact metric spaces. If µ is a measure on
X then πµ is the measure on Y satisfying πµ(E) = µ(π −1 (E)) for measurable E ⊆ Y
(this definition works also when X, Y are measurable spaces and π is measurable).
Equivalently, Z Z
∀g ∈ C(Y ) g dπµ = g ◦ π dµ

(in the measurable case one requires this for measurable bounded functions, say). The
measure πµ is called the push-forward of µ and is sometimes denotes π∗ µ or π# µ.

Proposition 14.3. Let ν be a Borel probability measure on Y . Then there exists a


Borel probability measure µ on X such that πµ = ν, i.e. µ(π −1 E) = ν(E) for all Borel
sets E ⊆ Y .

Remark 14.4. µ need not be unique if π is not 1-1.


Remark 14.5. One can replace compactness by completeness, but then the theorem
becomes much more technical (requires descriptive set theory).

Proof No. 1 (almost elementary). Start by constructing a sequence νn of atomic mea-


R R
sures on Y with νn → ν weakly, i.e. g dνn → g dν for all g ∈ C(Y ). To get such
a sequence, given n choose a finite partition En of Y into measurable sets of diameter
S
< 1/n (for instance cover Y by balls Bi of radius < 1/n and set Ei = Bi \ j<i Bj ). For
P
each E ∈ En choose xE and set νn = E∈En ν(E) · δxE . One may verify that νn → ν.
Now, each νn can be lifted to a probability measure µn on X such that πµn = νn :
P
to see this, if νn = wi · δyi choose xi ∈ π −1 (yi ) (there may be many choices, choose
P
one), and set µn = w i · δ xi .

117
Since the space of Borel probability measures on X is compact in the weak-* topol-
ogy, by passing to a subsequence we can assume µn → µ. Clearly µ is a probability
R R
measures; we claim πµ = ν. It is enough to show that g d(πµ) = g dν for every
R R
g ∈ C(Y ). Using the identity g dνn = g ◦ π dµn (which is equivalent to νn = πµn )
we have
Z Z Z Z Z
g dν = lim g dνn = g ◦ π dµn = g ◦ π dµ = g d(πµ)

as claimed.

Proof No. 2 (function-analytic). . First a few general remarks. A linear functional µ∗


on C(X) is positive if it takes non-negative values on non-negative functions. This
property implies boundedness: to see this note that for any f ∈ C(X) we have kf k∞ −
f ≥ 0, hence by linearity and positivity µ∗ (kf k∞ ) − µ∗ (f ) ≥ 0, giving

µ∗ (f ) ≤ µ∗ (kf k∞ ) = kf k∞ · µ∗ (1)

Similarly, using f + kf k∞ ≥ 0 we get µ∗ (f ) ≥ − kf k∞ . Combining the two we have


|µ∗ (f )| ≤ C kf k∞ , where C = µ∗ (1).
Since a positive functional µ∗ is bounded it corresponds to integration against a
R
regular signed Borel measure µ, and since f dµ = µ∗ (f ) ≥ 0 for continuous f ≥ 0,
regularity implies that µ is a positive measure. Hence a linear functional µ∗ ∈ C(X)∗
corresponds to a probability measure if and only if it be positive and µ∗ (1) = 1 (this is
R
the normalization condition 1 dµ = 1).
We now begin the proof. Let ν ∗ : C(Y ) → R be bounded positive the linear
R
functional g 7→ g dν. The map π ∗ : C(Y ) → C(X), g 7→ g ◦ π, embeds C(Y )
isometrically as a subspace V = π(C(Y )) < C(X), and lifts ν ∗ to a bounded linear
functional µ∗0 : V → R (given by µ∗0 (g ◦ π) = ν ∗ (g)).
Consider the positive cone P = {f ∈ C(X) : f ≥ 0}, and let s ∈ C(X)∗ be the
functional
s(f ) = sup{0, −f (x) : x ∈ X}

It is easy to check that s is a seminorm, that s|P ≡ 0 and that −µ∗0 (f ) ≤ s(f ) on V .
Hence by Hahn-Banach we can extend −µ∗0 to a functional −µ∗ on C(X) satisfying
−µ∗ ≤ s, which for f ∈ P implies µ∗ (f ) ≥ −s(f ) = 0, so µ∗ is positive. By the previous
R
discussion there is a Borel probability measure µ such that f dµ = µ∗ (f ); for f = g ◦ π
this means that
Z Z Z
g dπµ = g ◦ π dµ = µ∗ (g ◦ π) = µ∗0 (g ◦ π) = ν ∗ (g) = g dν

118
so µ is the desired measure.

119

You might also like