Hochman - Lectures On Fractal Geometry
Hochman - Lectures On Fractal Geometry
Michael Hochman∗
November 6, 2023
Contents
1 Introduction 3
1.1 What is fractal geometry? . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 What is this course about? . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Prerequisites, conventions and notation . . . . . . . . . . . . . . . . . . 4
2 Dimension 5
2.1 A family of examples: Middle-α Cantor sets . . . . . . . . . . . . . . . . 6
2.2 Minkowski dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Hausdorff dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Trees and partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Product sets 34
5 Differentiation of measures in Rd 37
5.1 The Besicovitch covering theorem . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Density and differentiation theorems . . . . . . . . . . . . . . . . . . . . 44
∗
©2023. This is a draft! Send comments to [email protected]
1
6 Pointwise dimension of measures 49
6.1 Dimension of a measure at a point . . . . . . . . . . . . . . . . . . . . . 49
6.2 Upper and lower dimension of measures . . . . . . . . . . . . . . . . . . 52
7 Hausdorff measures 55
7.1 Hausdorff measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.2 Properties of Hausdorff measures . . . . . . . . . . . . . . . . . . . . . . 57
11 Entropy 81
11.1 The entropy function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
11.2 Conditional entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
11.3 Commensurable partitions and geometric operations . . . . . . . . . . . 85
11.4 Entropy and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
11.5 Entropy of self-similar measures . . . . . . . . . . . . . . . . . . . . . . 90
13 Additive combinatorics 98
13.1 Sumsets and inverse theorems . . . . . . . . . . . . . . . . . . . . . . . . 98
13.2 Trivial bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
13.3 Small doubling and Freiman’s theorem . . . . . . . . . . . . . . . . . . . 99
13.4 Power growth, the “fractal” regime . . . . . . . . . . . . . . . . . . . . . 100
13.5 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
13.6 Entropy growth under convolution . . . . . . . . . . . . . . . . . . . . . 103
13.7 Application to self-similar measures . . . . . . . . . . . . . . . . . . . . . 106
2
13.8 The Kaimanovich-Vershik lemma . . . . . . . . . . . . . . . . . . . . . . 112
14 Appendix 114
14.1 Integration of measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
14.2 The weak-* topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
14.3 Lifting measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
1 Introduction
1.1 What is fractal geometry?
Fractal geometry and its sibling, geometric measure theory, are branches of analysis
which study the structure of “irregular” sets and measures in metric spaces, primarily
Rd . The distinction between regular and irregular sets is not a precise one but informally,
regular sets might be understood as smooth sub-manifolds of Rk , or perhaps Lipschitz
graphs, or countable unions of the above; whereas irregular sets include just about
everything else, from the middle- 31 Cantor set (still highly structured) to arbitrary
Cantor sets (irregular, but topologically the same) to truly arbitrary subsets of Rd .
For concreteness, let us compare smooth sub-manifolds and Cantor subsets of Rd .
These two classes differ in many aspects besides the obvious topological one. Manifolds
possess many smooth symmetries; they carry a natural measure (the volume) which has
good analytic properties; and in most natural examples, we have a good understanding
of their intersections with hyperplanes or with each other, and of their images under
linear or smooth maps. On the other hand, Cantor sets typically have few or no smooth
symmetries; they may not carry a “natural” measure, and even if they do, its analytical
properties are likely to be bad; and even for very simple and concrete examples we do
not completely understand their intersections with hyperplanes, or their images under
linear maps.
The motivation to study the structure of irregular sets, besides the obvious theo-
retical one, is that many sets arising in analysis, number theory, dynamics and many
other mathematical fields are irregular to one degree or another, and the metric and
geometric properties of these objects often provides meaningful information about the
context in which they arose. At the simplest level, the theories of dimension provide a
means to compare the size of sets which coarser notions fail to distinguish. Thus the
set of well approximable numbers x ∈ R (those with bounded partial quotients) and the
set of Liouvillian numbers both have Lebesgue measure 0, but set of well-approximable
numbers has Hausdorff dimension 1, hence it is relatively large, whereas the Liouvillian
numbers form a set of Hausdorff dimension 0, and so are “rare”. Going deeper, however,
it turns out than many problems in dynamics and number theory can be formulated in
3
terms of bounds on the dimension of the intersection of certain very simple Cantor sets
with lines, or linear images of products of Cantor sets. Another connection to dynamics
arises from the fact that there is often an intimate relation between the dimension of an
invariant set or measure and its entropy (topological or measure-theoretic). Geometric
properties may allow us to single out physically significant invariant measures among the
many invariant measures of a system. Finer information encoded in an invariant mea-
sure may actually encode the dynamics which generated it, leading to rigidity results.
The list goes on.
Our goal in this course is primarily to develop the foundations of geometric measure
theory, and we cover in detail a variety of classical subjects. A second goal is to present
recent advances in the theory of self-similar sets and measures, and the connection
with additive combinatorics. We also hope to present applications and interactions
with dynamics and metric number theory, and we shall accomplish this mainly by our
choices of methods, examples, and open problems which we discuss.
We assume familiarity with the basic results on metric spaces, measure theory and
Lebesgue integration. We work in Rd or sometimes a complete metric space, and denote
by Br (x) the closed ball of radius r around x:
Br (x} = {y : d(x, y) ≤ r}
The open ball is denoted Br◦ (x); as our considerations are rarely topological is will
appear less often. We denote the indicator function of a set A by 1A .
All sets and functions we encounter will be Borel measurable, unless otherwise
stated. Also, all measures are Radon measures unless otherwise stated: recall that
µ is Radon if it is a Borel measure taking finite values on compact sets. Such measures
are regular, i.e.
4
metric space, then the support of µ is the smallest closed set of full measure, i.e.
\
supp µ = {C ⊆ Rd | C closed, µ(Rd \ C) = 0}
[
= Rd \ {U ⊆ Rd | U open, µ(U ) = 0}
In the second representation we may take the union over balls with rational radii and
centers in a countable dense set; the union then is a countable union of nullsets, and
we conclude that µ(Rd \ µ) = 0. Thus, supp µ is a set of full measure, and any open set
intersecting it by definition has positive measure. In particular if x ∈ µ then µ(Br (x)) >
0 for all r > 0.
We use standard Big-O and little-o notation. Thus O(f (t)) denotes a quantity
bounded by C · f (t) for some C > 0 and for all sufficiently large or small t (depending
on the context), and o(f (t)) denotes a quantity such that for all c > 0 is bounded by
c · f (t) for all sufficiently large or small t. For example, if g(t) = t + o(1) as t → 0 then
g(t)/t → 1.
2 Dimension
In much of mathematics, the dimension of a set describes, roughly speaking, the number
of degrees of freedom one needs to parametrize the set. This is the case in linear algebra
and also in the theory of smooth manifolds.
In the theory of metric spaces, however, one generally does not have a natural
notion of parametrizations. Nevertheless one would like to have a number describing
the “size” of a set in a metric space. It turns out that one can define reasonble notions of
dimension in this more general setting which capture the intuitive meaning of dimension
and coincide with the more classical ones in the cases mentioned above. Nearly all
these notions all measure how many balls one needs to cover the set at different scales,
and often, with the right combinatorial or probabilistic interpretation, they do in fact
describe the number of degrees of freedom one has.
In this course we focus on the two main notions of dimension, the Minkowski (box)
dimension and the Hausdorff dimension. We give the definitions in general for metric
spaces, but most of our applications and some of the results in these sections will already
be special to Rd .
5
2.1 A family of examples: Middle-α Cantor sets
Before discussing dimension, we present one of the simplest families of “fractal” sets,
which we will serve to demonstrate the definitions that follow.
Let 0 < α < 1. The middle-α Cantor set Cα ⊆ [0, 1] is defined by a recursive
procedure: For n = 0, 1, 2, . . . we construct a set Cα,n which is a union of 2n closed
intervals Ii1 ,...,in , indexed by sequences i = i1 . . . in ∈ {0, 1}n , each of length ((1−α)/2)n .
To begin let Cα,0 = [0, 1] and I∅ = [0, 1] (indexed by the unique empty sequence).
Assume that Cα,n has been defined and is the disjoint union of the 2n closed inter-
vals Ii1 ...in , i1 . . . in ∈ {0, 1}n . For each one of the intervals Ii! ,...,in , remove the open
subinterval with the same center as Ii1 ...in and length α times shorter, leaving two closed
sub–intervals, one on the left, which we denote Ii1 ...in 0 , and one on the right, which we
denote Ii1 ...in 1 . We thus have defined Ij! ,...,jn+1 for all j1 , . . . , jn+1 ∈ {0, 1}n+1 , and we
define
[
Cα,n+1 = Ii
i∈{0,1}n+1
∞
\
Cα = Cα,n
n=0
S
A cover of A is a collection of sets E such that A ⊆ E∈E E. A δ-cover is a E cover
such that |E| ≤ δ for all E ∈ E.
6
The simplest notion of dimension measures how many sets are needed to cover a set
as the scale tends to zero.
Definition 2.1. Let (X, d) be a metric space. For a set A and δ > 0, let N (A, δ) denote
the minimal size of a δ-cover of A, i.e.
[
k
N (A, δ) = min{k : ∃A1 , . . . , Ak ⊆ X such that A ⊆ Ai and |Ai | ≤ δ}
i=1
and set N (A, δ) if A does not admit a finite δ-cover. The Minkowski dimension of A is
log N (A, δ)
dimM (A) = lim
δ→0 log(1/δ)
provided the limit exists. We also define the upper and lower dimensions
log N (A, δ)
dimM (A) = lim sup
δ→0 log(1/δ)
log N (A, δ)
dimM (A) = lim inf
δ→0 log(1/δ)
First properties
1. The δ-covering number N (A, δ) of A is finite for all δ > 0 if (and in a complete
metric space, only if!) A is compact. Even in this case the Minkowski
dimension may be infinite.
2. Clearly
dimM ≤ dimM
equivalently,
N (A, δ) = δ −α+o(1) as δ → 0
7
and similarly for the upper and lower versions.
Example 2.2. .
1. A point has Minkowski dimension 0, since N ({x0 }, δ) = 1 for all δ. More generally
N ({x1 , . . . , xn }, δ) ≤ n, so finite sets have Minkowski dimension 0.
X
N (A,δ)
Leb(A) ≤ Leb(Ai )
n=1
X
N (A,δ)
≤ c · |Ai |d
n=1
= c · N (A, δ) · δ d
Writing α = dimM A, there are arbitrarily small δ > 0 such that N (A, δ) <
δ −α+o(1) . We thus have shown that Leb(A) ≤ c · δ d−α+o(1) for δ arbitrarily close
to 0, and since d − α > 0 this implies Leb(A) = 0.
To get an upper bound, notice that for δn = ((1 − α)/2)n the sets Cαn are covers
of Cα by 2n intervals of length δn , hence N (Cα , δn ) ≤ 2n .
On the other hand every set of diameter ≤ δ can intersect at most three maximal
intervals in Cαn+1 , hence
1 n
N (Cα , δ) ≥ · 2 ≥ 2n−2
3
8
so for δn+1 ≤ δ < δn
Remark 2.3. In the last example we analyzed dimM A by examining N (A, εk ) for a
certain sequence εk → 0 (specifically εk = ρk for ρ = ((1 − α)/2)n ). The fact that this
gives the right dimension is not a coincidence, and we can formulate it in genreal as
follows.
First note that from the definition, if δ < δ ′ then N (A, δ) ≥ N (A, δ ′ ). Now let
εk & 0 and suppose εk /εk+1 ≤ C < ∞. For every δ > 0 there is a k = k(δ) such that
εk+1 < δ ≤ εk . This implies
1. dimM A = dimM A
Proof. By inclusion dimM A ≤ dimM A, so for the first claim we can assume that
dimM A < ∞. Then N (A, ε) = N (A, ε) for every ε > 0, because in general if A ⊆
Sn Sn
i=1 Ai then A ⊆ i=1 Ai , and if {Ai } is a δ-cover then so is {Ai }. This implies the
claim.
S S
For the second claim, note that If A ⊆ Ai for Ai ⊆ X then A ⊆ (Ai ∩ A) and
|Ai ∩ A| ≤ |Ai |, so N (A, ε) is unchanged if we consider only covers by subsets of A. In
particular the Minkowski dimension does not change if we restrict to the metric space
(A, d|A×A ).
S S
Finally if A ⊆ Ai then f (A) ⊆ f (Ai ), and if c is the Lipschitz constant of f
9
then |f (E)| ≤ c|E|. Thus N (f A, cε) ≤ N (A, ε) and the claim follows, since
log N (f A, ε)
dimM f A = lim
ε→0 log(1/ε)
log N (A, ε/c)
≤ lim
ε→0 log(1/ε)
log N (A, ε/c)
= lim
ε→0 log(1/ε) + log c
log N (A, ε/c)
= lim
ε→0 log(c/ε)
= dimM A
The example of the middle-α Cantor sets demonstrates that Mankowski dimension is not
a topological notion, since the sets Cα all have different dimensions, but for 0 < α < 1
they are all topologically a Cantor set and therefore homeomorphic. On the other hand
the last part of the proposition shows that dimension is an invariant in the bi-Lipschitz
category. Thus,
Corollary 2.5. For 1 < α < β < 1, the sets Cα , Cβ , are not bi-Lipschitz equivalent, and
in particular are not C 1 -diffeomorphic, i.e. there is no bi-Lipschitz map f : Cα → Cβ .
Finally, let us discuss the role of the metric d. On often defines two metrics on the
same space to be equivalent if they define the same topology, i.e., the same notion of
convergence. This equivalence, however, may change dimension radially (we shall see
examples later).
Nevertheless, in Rd every two norms k·k and k·k′ not only define equivalent metrics,
but satisfy the stronger property that C −1 kvk′ ≤ kvk ≤ C kvk′ for some constant C. It
follows that the identity map from (Rd , k·k) to (Rd , k·k′ ) is bi-Lipschitz. We conclude
that
Lemma 2.6. If A ⊆ Rd then every choice of norm on Rd gives the same values of
dimM A (if it exist) and of dimM A, dimM A.
Exercises
10
e (A, δ) denote the size of the smallest cover of A by balls of radius δ
4. Let N
centered in A. Show that
e (A, δ)
log N
lim
δ→0 log(1/δ)
exists if and only if dimM A exists and in that case the limit and the dimension
are equal.
5. Let εk & 0 and suppose εk /εk+1 ≤ C for some C ∈ R. Show that
log N (A, εk )
lim
k→∞ log(1/εk )
exists if and only if dimM A exists and in that case the limit and the dimension
are equal.
6. (a) Give an example of εk & 0 for whic the conclusion of the previous
exercise fails.
(b) Does it always fail sup{εk /εk+1 : k ∈ N} = ∞?
7. Show that if f : X → Y is a non-Lipschitz map between metric spaces and
A ⊆ X then it may happen that dimM f (A) > dimM A.
8. Let f : X → Y be an α-Hölder map between metric spaces, i.e. there is a
constant C > 0 such that d(f (x), f (x′ )) ≤ C · d(x, x′ )α . For A ⊆ X, give a
bound for dimM f (A) in terms of α and dimM A.
Use the sets Cα to show that the bound is tight.
One can also find examples which are closed, for instance
1
A = {0} ∪ { : n ∈ N}
n
Indeed, in order to cover A with balls of radius ε, we will need precisely one ball for
each point 1/k such that |1/k − 1/(k + 1)| > 2ε. This is equivalent to 1/k(k + 1) > 2ε,
√ √
or: k < 1/ 2ε. On the other hand all other points of A lie in the interval [0, 2ε],
√ √
which can be covered by O(1/ 2ε) ε-balls. Thus N (A, ε) ≈ 1/ 2ε, so dimM A = 1/2.
11
These examples, being countable, also demonstrate that Minkowski dimension be-
haves badly under countable unions: letting An = {1, 1/2, . . . , 1/n} ∪ {0}, we see that
A1 ⊆ A2 ⊆ . . . but
∞
[
dimM An = 0 6→ 1/2 = dimM An
n=1
Since every set of diameter t is contained in a ball of diameter 2t, one may consider
general covers on the right hand side.
Now we pretend that there is a notion of α-dimensional volume. The “volume” of
a ball B would be or order |B|α , and we can define when a set is small with respect to
this “volume”:
Definition 2.7. Let (X, d) be a metric space and A ⊆ X. The α-dimensional Hausdorff
content Hα∞ is
X
Hα∞ (A) = inf{ |E|α : E is a cover of A}
E∈E
Note that Hα∞ (A) ≤ |A|α so Hα∞ (A) < ∞ when A is bounded. For unbounded sets
Hα∞ may be finite or infinite.
One can do more than define α-null sets: a modification of Hα∞ leads to an “α-
dimensional” measure on Borel sets in much the same way that the infimum in (1)
defines Lebesgue measure (Hα∞ itself is not a measure when 0 < α < d, since for
example on the line we have Hα∞ ([0, 1)) + Hα∞ ([1, 2)) 6= Hα∞ ([0, 2)) for α < 1). These
measures, called Hausdorff measures, will be discussed in section 7.1, at which point
the reason for the “∞” in the notation will be explained. At this point the notion of
α-null sets is sufficient for our needs.
12
Lemma 2.8. If Hα∞ (A) = 0 then Hβ∞ (A) = 0 for β > α.
P
Proof. Let 0 < ε < 1. Then there is a cover {Ai } of A with |Ai |α < ε. Since ε < 1,
we know |Ai | ≤ 1 for all i. Hence
X X X
|Ai |β = |Ai |α |Ai |β−α ≤ |Ai |α < ε
Consequently, for any A 6= ∅ there is a unique α0 such that Hα∞ (A) = 0 for α > α0
and Hα∞ (A) > 0 for 0 ≤ α < α0 (the value at α = α0 can be 0, positive or ∞).
1. A ⊆ B =⇒ dim A ≤ dim B.
3. dim A ≤ dimM A.
3. Let β > α > dimM A. For sufficiently small δ > 0, there is an N < δ −α and a cover
S PN PN β −α δ β =
A⊆ N i=1 Ai with diam Ai ≤ δ. Hence i=1 (diam Ai ) ≤
β
i=1 δ ≤ δ
δ β−α . Since δ can be taken arbitrarily close to 0, we have Hβ∞ (A) = 0. Since β >
dimM A was arbitrary (for any such β we can find suitable α), dim A ≤ dimM A.
13
S S
4. This is clear since if A ⊆ Ai then A ⊆ (Ai ∩ A) and |Ai ∩ A| ≤ |Ai |. Hence
the infimum in the definition of Hα∞ is unchanged if we consider only covers by
subsets of A.
S
5. If c is the Lipschitz constant of f then |f (E)| ≤ c|E|. Thus if A ⊆ Ai then
S P P
f (A) ⊆ f (Ai ) and |f (Ai )|α ≤ cα |Ai |α . Thus Hα∞ (f (A)) ≤ cα Hα∞ (A) and
the claim follows.
Lemma 2.11. Let E be a family of subsets of X and suppose that there is a constant
C such that every bounded set A ⊆ X can be covered by ≤ C elements of E, each of
diameter ≤ C|A|. Then for every set A ⊆ X and every α > 0,
Proof. The left inequality in (2) is immediate from the definition, since the infimum in
the definition of Hα∞ (A, E) is over fewer covers than in the definition of Hα∞ (A). On
the other hand if F is a cover of A then we can cover each F ∈ F by ≤ C sets E ∈ E
P
with |E| ≤ C|F |. Taking the collection F ′ ⊆ E of these sets we have F ∈F ′ |F |α ≤
P
C 1+α F ∈F |F |α , giving the other inequality. The other conclusions are immediate.
In particular, the family of open balls, and the family of closed balls, both satisfy
the hypothesis, and we shall freely use them in our arguments.
2. Any A ⊆ Rd has dim A ≤ d. It suffices to prove this for bounded A since we can
S
write A = D∈D1 A ∩ D, and by countable stability it is enough to deal with each
14
A ∩ D separately. For bounded A, let A ⊆ [−r, r]d for some r. Then
3. [0, 1]d has dimension at least 1, and more generally any set in Rd of positive
measure Lebesgue, has dimension at least d. This follows since Hd (A) = 0 if and
only if Leb(A) = 0.
4. Combining the last two examples, any set in Rd of positive Lebesgue measure has
dimension d.
5. A set A ⊆ Rd can have dimension d even when its Lebesgue measure is 0. In-
deed, we shall later show that Cα has the same Hausdorff and Minkowski di-
S
mensions. Let A = n∈N C1/n . Then dim C ≤ 1 because A ⊆ [0, 1], but
dim A ≥ supn dim C1/n = 1. Hence dim A = 1. On the other hand Leb(C1/n ) = 0
for all n, so Leb(A) = 0.
7. A real number x is Liouvillian if for every n there are arbitrarily large integers
p, q such that
p 1
|x − | < n
q |q|
These numbers are extremely well approximable by rationals and have various in-
teresting properties, for example, irrational Liouville numbers are transcendental.
Let L ⊆ [0, 1] denote the set of Liouville numbers. We claim that dim L = 0. Let
p 1
Ln = {x ∈ [0, 1] : |x − | < n for arbitrarily large q and p }It
q q
p 1
Ln,k = {x ∈ [0, 1] : |x − | < n for some q > k and some 0 ≤ p ≤ q}
q q
15
T
Evidently Ln = k∈N Ln,k and Ln ⊆ Ln,k for all k. Therefore it suffices that we
prove that Hα∞ (Ln,k ) → 0 as k → ∞.
For k fixed and q > k, the set
p 1
Ln,k,q = {x ∈ [0, 1] : |x − | < n for some 0 ≤ p ≤ q}
q q
Since n > 2/α there is an ε > 0 such that α > (1 + ε)2/n, hence
X
≤ 2α (q + 1)q −2−ε
q>k
X
= O(q −1−ε )
q>k
Exercises
1. Show that Hα∞ ([0, 1)) + Hα∞ ([1, 2)) 6= Hα∞ ([0, 2)) for 0 < α < 1.
2. Is it true that dim A = dim A for every A ⊆ R?
S
3. Show that if A1 ⊆ A2 ⊆ . . . then dim Ai → dim( Ai ) as i → ∞. Show that
the analogous statement for decreasing chains and intersections is false.
16
4. Let A ⊆ R2 be the graph of a differentiable function f : [a, b] → R for some
a < b. Show that dim A = 1.
5. Show that if in the definition of Hα∞ we allow only balls of radius 2−n , n ∈ N,
and define dimension in terms of this new quantity, then we obtain Hausdorff
dimension again.
A useful and powerful tool in fractal geometry is to model metric spaces using trees.
This idea, which takes many forms, not only provides a convenient heuristic but also,
when formalized, strong analytical tools. In this section we consider the simplest case,
but we shall return to similar ideas often.
Consider the interval [0, 1]. We can identify it with the space of infinite binary
sequences using the binary expansion: Let
This is not a bijection because rationals of the form k/2n have two binary expansions,
one ending in a constant string of 0s a the other in 1s. However this rarely poses a
problem, as we shall see.
The set {0, 1}N can be viewed as the space of maximal infinite paths in the full
binary tree. If we write {0, 1}∗ for the set of finite binary sequences (including the empty
sequence ∅), then the elements of {0, 1}∗ form the nodes of a tree, with edges between
each word w = w1 . . . wn {0, 1}∗ and its extensions, w1 . . . wn 0 and w1 . . . wn 1. An infinite
sequence w! w2 w . . .3 corresponds uniquely to the infinite path (w|n )∞
n=0 starting at the
root, where w|n = w1 . . . wn is the initial segment of length n of w.
Each verticex w ∈ {0, 1}∗ defines a cylinder set, denoted [w], consisting of all paths
from the root passing through w:
17
Then ( )
x has a binary expansion
π[w1 . . . wn ] = x ∈ [0, 1] :
starting with ww . . . wn
This is the closed interval of length 2−n whose left endpoint is k/2n , where k =
Pn −i+1 is the integer with binary expansion w w . . . w .
i=1 wi 2 1 2 n
forms a partition of {0, 1}N into 2n sets, but the intervals π[w] corresponding to [w]
in [0, 1] do not form a partition, because adjascent pairs intersect at their endpoints.
Nevertheless, for many purposes, one wants a comparable partiiton of Dn of [0, 1) or of
R. We therefore introduce for each n the partition D2n of R into half-open intervals,
k k+1
D 2n = [ n, n ) : k ∈ Z
2 2
Also note that in both cases π −1 (I) can be covered by at most two elements of Cn .
We return now to dimension. Note that if we wish to cover a sets by the elements
of a partition, we have only one way to do it, namely, to take the partition elements
that intersect the set non-trivially. This makes such covers easier to work than covers
by balls, where there may be many choices.
N (A, E) = #{E ∈ E : E ∩ A 6= ∅}
The following lemma is the reason that Minkowski dimension is sometimes called
box dimension.
18
Lemma 2.14. 1. For A ⊆ [0, 1],
provided one side (equivalently the other side) exists, and similalry for dimM and
dimM .
N (E, Cn )
dimM πE = lim
n→∞ n log 2
provided one side (equivalently the other side) exists,and similalry for dimM and
dimM .
Proof. Since every D ∈ D2n satisfies |D| = 2−n , and since every set B with |B| ≤ 2−n
can be covered by at most 2 intervals I ∈ D2n we find that
Upon dividing by 2 log n (and interpolating for scales between 2−n and 2−n−1 ), this
proves the first equality.
For the second inequality, note that for I ∈ D2n , the set π −1 I is covered by either
one or two generation-n cylinder sets. Thus, for A ⊆ [0, 1),
The analogous statement for Hausdorff dimension follows from Lemma 2.11. The
proof is left to the reader.
Finally, everything we have done here can be generalized to Rd and to expansions
in other bases.
19
Definition 2.15. Let b ≥ 2 be an integer. The partition of R into b-adic intervals is
k k+1
Db = {[ , ) : k ∈ Z}
b b
Dbd = {I1 × . . . × Id : Ii ∈ Db }
2.5 Examples
1
d(E) = lim sup |E ∩ {1, . . . , n}|
n→∞ n
1
d(E) = lim inf |E ∩ {1, . . . , n}|
n→∞ n
ΩE = {ω ∈ {0, 1}N : ∀n ∈ E ωn = 0}
and
Then
dimM XE = d(N \ E) , dimM XE = d(N \ E)
2. One can produce sets E ⊆ N with d(E) < d(E). This shows that the lower and
upper Minkowski dimension need not coincide. There are even sets with d(E) = 0
and d(E) = 1, so we can have dimM X = 0 and dimM X = 1.
N (ΩE , Cn ) = 2|{1,...,n}\E|
this is just the number of binary sequences of length n with 0’s in the positions in E).
20
Hence
log N (ΩE , Cn ) |{1, . . . , n} \ E| |{1, . . . , n} ∩ (N \ E)|
= =
n n n
taking lim sup or lim inf gives the claim.
and
Finally let
X = Xeven ∪ Xodd
Ω = Ωeven ∪ Ω∞odd
Note that Ω does not contain sequences ending in all 1’s, so in this example, π : Ω → X
is a bijection.
We claim that dimM = 1, dimM X = 1/2 and dim X = 0.
We shall do all computations in the symbolic model.
First consider N (Ωeven , CN2k ). Since the symbols at coordinates [N2k−1 , . . . , N2k )
are not constrained, we see that
so
log N (X, 2−N2k ) (2k)! − (2k − 1)! (2k − 1)!
N
≥ =1− →1
log 2 2k (2k)! 2k!
Thus dimM X ≥ 1, and of course there is equality (since X ⊆ [0, 1]).
Next, consider N (Ω, C2N2k ) . Clearly
Since points in Ωeven have all coordinates from N2k+1 to 2N2k equal to zero, we have
21
On the other hand, points in Ωodd have all coordinates from N2k−1 to N2k − 1 equal to
0, and no restrictions on coordiantes from N2k to 2N2k , we have
Thus
2N2k ≤ N (X, 2−2N2k ) ≤ 2N2k + 2N2k +N2k−1 ≤ 2 · 2N2k +N2k−1
so
log N (X, 2−2N2k ) log 2 + N2k + N2k−1 1
2N
≤ →
log 2 2k 2N2k 2
Hence dimM ≤ 1/2. One can show that this is an equality, by considering scales between
Nℓ and 2Nℓ and separately between 2Nℓ and Nℓ+1 , and noting that in both cases
the relative number of levels of the tree at which nodes have two children goes down
compared to the case analyzed above. We leave the details to the reader.
Finally, for δ > 0 and k ∈ N consider an optimal cover Ek ⊆ CN2k of Ωodd , and an
optimal cover Fk ⊆ CN2k+1 of Xeven . Since
we conclude that
X X X
|I|δ = |I|δ + |I|δ
I∈Ek ∪Fk I∈Ek I∈Fk
Exercises
1. Construct a set E ⊆ N with d(E) < d(E), completing the proof that there exist
sets with dimM A < dimM A.
2. Show that for any 0 ≤ α < β <≤ 1 there is a set A ⊆ [0, 1] with dimM A = α and
dimM A = β.
22
3 Using measures to compute dimension
The Mankowski dimension of a set is often straightforward to compute, and gives an
upper bound on the Hausdorff dimension. Lower bounds on the Hausdorff dimension
are trickier to come by. The main method to do so is to introduce an appropriate
measure on the set. In this section we discuss some relations between the dimension of
sets and the measures support on them.
Proposition 3.2. Let µ be an α-regular measure and µ(A) > 0. Then dim A ≥ α.
Proof. We shall show that Hα∞ (A) ≥ C ′ · µ(A) > 0, from which the result follows. Note
that every bounded E ⊆ X is contained in a ball of radius |E|, so µ(E) < C · |E|α .
S
Therefore, if A ⊆ ∞i=1 Ai then
X X
|Ai |α ≤ C −1 µ(Ai ) ≥ C −1 µ(A)
log 2
β=
log(2/(1 − α))
We already saw that dimM Cα ≤ β so, since dim Cα ≤ dimM Cα , we have an upper
bound of β on dim Cα .
Let µ = µα on Cα denote the measure which gives equal mass to each of the 2d
intervals in the set Cαn introduced in the construction of Cα . Let δn = ((1 − α)/2)n be
the length of these intervals. Then for every x ∈ Cα , one sees that Bδn (x) contains one
of these intervals and at most a part of one other interval, so
Using the fact that Bδn+1 (x) ⊆ Br (x) ⊆ Bδn (x) whenever δn+1 ≤ r < δn for x ∈ Cα we
have
2 β β
µ(Br (x)) ≤ µ(Bδn (x)) ≤ C · δnβ ≤ C · ( ) · δn+1 ≤ C ′ rβ
1−α
23
Hence by the mass distribution principle, dim Cα ≥ β. Since this is the same as the
upper bound, we conclude dim Cα = β.
Specializing to Rd , the analogous results are true if we define regularity in terms of
the mass of b-adic cubes rather than balls:
In Example 2.16 we saw that dimM E = d(E) = lim inf n1 |E ∩ {1, . . . , n}|. We claim
that this is also the Hausdorff dimension. Since dim XE ≤ dimM XE = d(E), we need
to show the lower bound.
We may assume N \ E in infinite, since if not then XE is finite and the claim is
trivial. Let ξn be independent random variables where ξn ≡ 0 if n ∈ E and Xn ∈ {0, 1}
with equal probabilities if n ∈ N \ E. The random real number ξ = 0.ξ1 ξ2 . . . belongs
to XE so, since XE is closed, the distribution measure µ of ξ is supported on XE
(that is, the measure µ(A) = P(ξ ∈ A)). Hence µ gives positive mass only to those
D ∈ Dk whose interiors intersect XE , and all such intervals are given equal mass,
namely µ(D) = 2−|{1,...,n}\E| . If α < d(E) then by definition nα < |E ∩ {1, . . . , n}| for
all large enough n, and hence there is a constant Cα such that
Exercises
24
3.2 Billingsley’s lemma
S∞
Lemma 3.6. Let E ⊆ n=0 Dbn
be a collection of b-adic cubes. Then there is a sub-
S S
collection F ⊆ E whose elements are pairwise disjoint and F = E.
Proof. Let F consist of the maximal elements of E, that is, all E ∈ E such that if E ′ ∈ E
then E 6⊆ E ′ . Since every two b-adic cubes are either disjoint or one is contained in
S
the other, F is a pairwise disjoint collection, and for the same reason, every x ∈ E is
S S
contained in a maximal cube from E, hence F = E.
Then α1 ≤ dim A ≤ α2 .
Proof. We first prove dim A ≥ α1 . Let ε > 0. For any x ∈ A there is an n0 = n0 (x)
depending on x such that for n > n0 ,
Thus we can find an n0 and a set Aε ⊆ A with µ(Aε ) > 0 such that the above holds for
every x ∈ Aε and every n > n0 . It follows that µ|Aε is (α1 − ε)-regular with respect to
b-adic partitions, and hence dim Aε ≥ α1 −ε. Since dim A ≥ dim Aε and ε was arbitrary,
dim A ≥ α1 .
Next we prove dim A ≤ α2 . Let ε > 0 and fix n0 . Then for every x ∈ A we can find
an n = n(x) > n0 and a cube Dx ∈ Dbn (x) such that µ(Dx ) ≥ (b−n )α2 +ε . Apply the
lemma to choose a maximal disjoint sub-collection {Dxi }i∈I ⊆ {Dx }x∈A , which is also
25
a cover of A. Using the fact that |Dxi | = C · b−n(xi ) , and writing C ′ = C α2 +2ε , we have
X
H∞
α2 +2ε
(A) ≤ |Dxi |α2 +2ε
i∈I
X
= (C · b−n(xi ) )α2 +2ε
i∈I
X
≤ C′ (b−n(xi ) )ε (b−n(xi ) )α2 +ε
i∈I
X
≤ C ′ b−εn0 µ(Dxi )
i∈I
≤ b−εn0 · C µ(Rd ) ′
Remark 3.8. The condition that the left inequality in (3) hold for every x ∈ A can be
relaxed: if it holds on a set A′ ⊆ A of positive measure, then the proposition implies
that dim A′ ≥ α1 , so the same is true of A.
In order to conclude dim A ≤ α2 , however, it is essential that (3) hold at every point.
Indeed every non-empty set supports point masses, for which the inequality holds with
α2 = 0, and this of course implies nothing about the set.
for the asymptotic frequency with which the digit u appears in the expansion, assuming
that the limit exists.
A number x is called simply normal if fu (x) = 1/10 for all u = 0, . . . , 9. Such
numbers may be viewed as having the statistically most random decimal expansion
(“simple” because we are only considering statistics of single digits rather than blocks
of digits. We will discuss the stronger version later.). It is a classical theorem of Borel
that for Lebesgue-a.e. x ∈ [0, 1] is simply normal; this is a consequence of the law
of large numbers, since when the digit functions xi : [0, 1] → {0, . . . , 9} are viewed as
random variables, they are independent and uniform on {0, . . . , 9}.
However, there are of course many numbers with other frequencies of digits, and it
is natural to ask how common this is, i.e. how large these sets are. Given a probability
26
vector p = (p0 , . . . , p9 ) let
X
9
H(p) = − pi log pi
i=0
e denote the product measure on {0, . . . , 9}N with marginal p, and let µ
Proof. Let µ
P
e by (u1 , u2 , . . .) 7→ ∞
denote the push-forward of µ −i
u=1 ui 10 . In other words, µ is the
distribution of a random number whose decimal digits are chosen i.i.d. with marginal
p.
For x = 0.x1 x2 . . . it is clear that µ(D10n (x)) = px1 px2 . . . pxn , so if x ∈ N (p) then
1X
n
log µ(D10n (x)) 1
= − · log pxi
−n log 10 log 10 n
i=1
9
1 X 1
= − #{1 ≤ i ≤ n : xi = u} · log pu
log 10 n
u=0
1 X
9
−−→ − fu (x) · log pu
n∞ log 10
u=0
1 X9
= (− pu log pu )
log 10
u=0
1
= H(p)
log 10
Proof. Let pε = (1/10 − ε, . . . , 1/10 − ε, 1/10 + 10ε). Then H(pε ) → log 10, and so
dim N (pε ) → 1. Since N (pε ) is contained in the set of non-simply-normal numbers, the
conclusion follows.
Exercises
1. Show that the set of numbers for which the digit frequencies does not exist is 1.
27
3.3 A metric on symbolic space
For a finite set Λ, the space ΛN can be given the metric
This metric is compatible with the product topology, which is compact. In this metric,
as sequence w(k) ∈ ΛN converges to w if and only if for every ℓ, w(k) |ℓ = w|ℓ for all large
enough k.
Lemma 3.11. ΛN is compact in the metric d.
Proof. Let (w(n) )∞ N
n=1 ⊆ Λ be a sequence. We must show that it has a convergent
subsequence.
Write w(0,n) = w(n) . Some elements u1 ∈ Λ appears in infinitely many of the
sequences w(0,n) as the first symbol; so we can choose a subsequence (w(1,n) )∞
n=1 of
(w(0,n) )∞
n=1 whose members all start with u1 .
Next, define u2 to be a symbol apperaing as the second symbol of in infinitely
many of the elements w(1,n) , and let (w(2,n) )∞
n=1 be a subsequence of (w
(1,n) )∞ whose
n=1
elemeents all have u2 in the second coordinate.
Continue in this way inductively: Given (w(k,n) )∞
k=1 we define a subsequence (w
(k+1,n) )∞
n=1
of (w(k,n) )∞
n=1 consisting of elements that all have some fixed uk+1 ∈ Λ in their k + 1-th
coordinate.
Finally, the sequence (w(n,n) )∞
n=1 is a subsequence of the original seuqence (w
(n) ),
[ω1 . . . ωn ] = {η ∈ ΛN : η1 . . . ηn = ω1 . . . ωn }
have diameter 2−n and they are both open and closed; they are closed because any
sequence of points in [w] begine with w, and so every limit point must also begin with
w. They are open because
[
[w] = ΛN \ [η]
η∈Λ∗ \{w}
so [w] is the complement of a finite union of closed sets, and is hence closed. Further-
more, it is not hard to see that every ball in the metric d is a cylinder set: if w ∈ ΛN
then
28
and since all distances in ΛN are of the form 2−n or 0, for every 2−(n+1) < r < 2−n we
have Br (w) = Br◦ (w) = B2−(n+1) (w).
Proof. Since A consists of open sets and contains all cylinder sets (i.e. all balls) it
generates the Borel σ-algebra. Since µ is finitely additive, the statement will follow if
we show that (ΛN , A, µ) satisfies the conditions of the Caratheodory extension theorem,
S
namely, that if A1 , A2 , . . . ∈ A are pairwise disjoint and A ∈ A, and if A = Ai , then
P
µ(A) = µ(Ai ).
Indeed, A is a finite union of (closed) cylinder sets, hence is itself closed, and there-
fore, compact; and the Ai are unions of (open) cylinder sets, so they are open; combining
these observations, by compcatness there exists a finite sub-cover {Ai }i∈I of A; but since
S
the Ai are disjoint and A = Ai , we conclude that Aj = ∅ for j ∈ N \ I; finally, by
disjointness and finite additivity of µ,
[
µ(A) = µ( Ai )
i∈I
X
= µ(Ai )
i∈I
X∞
= µ(Ai ) because µ(Ai ) = 0 for i ∈
/I
i=1
as desired.
The previous lemma is the reason that working in ΛN is more convenient than
S
working in [0, 1]d . In the latter space the union D2n is also a countable algebra, but
the extension theorem doesn’t automatically hold.
29
Lemma 3.14. For n ∈ N, let µn ∈ M(ΛN ) with µn (ΛN ) ≤ 1. Then there is a subse-
quence nk → ∞ and µ ∈ M(ΛN ) such that µnk → µ.
S
Proof. Since A = An is countable, a diagonal argument similar to the one in the
previous lemma lets us define a subsequence (µnk )∞
k=1 of (µn ) such that µnk (A) converges
for all A∈ A.
Define µ : A → [0, 1] by
For any two disjoint sets A′ , A′′ ∈ A we have A′ , A′′ ∈ Ank for all large enough k, hence
µnk (A′ ∪ A′′ ) = µnk (a′ ) + µnk (A′′ ) for all large enough k. Taking the limit as k → ∞
the same holds for µ, so µ is finitely additive, and by the previous lemma it extends to
a countably additive Borel measure.
so µ is supported on Y .
With the exception of Lemma 3.12, everything we did here can be done in a general
compact metric space (X, d). Then convergence of measures µn → µ is defined by the
R R
condition that f dµn → f dµ for all f ∈ C(X); this definition is equivalent to ours
for ΛN , and is called weak-* convergence. Using separability of C(X), one can prove
sequential compactness for this notion of convergence. Using seperability of X, one can
also establish the analog of Lemma 3.15.
Definition 3.16. Let (X, A), (Y, B) be measurable spaces and f : X → Y a measurable
map. The push-forward of a measure µ on (X, A) through f is the measure f∗ µ on (Y, B)
defined by
(f∗ µ)(B) = µ(f −1 (B))
Exercises
30
R R
2. For µn , µ ∈ M(ΛN ), show that µn → µ if and only if f dµn → f dµ for every
f ∈ C(ΛN ).
Theorem 3.17 (Frostman’s “lemma”). If X ⊆ Rd is closed and Hα∞ (X) > 0, then
there is an α-regular probability measure supported on X.
Corollary 3.18. If dim X = α then for every 0 ≤ β < α there is a β-regular probability
measure µ on X.
Proof of the Corollary. IF β < α and dim A = α then by definition, Hβ∞ (A) > 0, and
the claim follows from the theorem.
S
The corollary is not true for β = α. Indeed, if X = Xn and dim Xn = α − 1/n
then dim X = α, but any α-regular measure µ must satisfy µ(Xn ) = 0 for all n (since
if µ(Xn ) > 0 then dim Xn ≥ α by the mass distribution principle), and hence µ(X) ≤
P
µ(Xn ) = 0.
In order to prove the theorem we may assume without loss of generality that X ⊆
[0, 1]d . Indeed we can intersect X with each of the level-0 dyadic cubes, writing X =
S ∞
D∈D0 X ∩ D, and we saw the he proof of Proposition 2.10 that if Hα (X ∩ D) = 0 for
each D in the union then Hα∞ (X) = 0. Thus there is a D ∈ D0 for which Hα∞ (X∩D) > 0,
and by translating X we may assume that D = [0, 1]d .
For the proof, it is convenient to transfer the problem to the symbolic setting. Let
Λ = {0, 1}d . Then ΛN can be identified with {0, 1}N , where ω ∈ ΛN is identified with
the d-tuple of sequences obtained by projecting ω to each coordinate of the space Λ.
Define
by
(ω (1) , . . . ,(d) ) 7→ (π(ω (1) ), . . . , π(ω (d) ))
31
Then π d maps Λd onto [0, 1]d . One may verify that
• For D ∈ Dn , the set (π d )−1 (D) can be covered by 2d cylinder sets from Cn .
Theorem 3.20. Let Y ⊆ ΛN be a closed set with Hα∞ (Y ) > 0. Then there is an
α-regular probability measure supported on Y .
Proof. Let Y ⊆ ΛN be closed with Hα∞ (Y ) > 0. We will produce the desired measure
as a limit of suitable “finite” approximations.
For n ∈ N, we say that a measure µ on ΛN is n-admissible if for every k ≤ n and
C ∈ Ck , (
2−αk if C ∩ Y 6= ∅
µ(C) ≤ (4)
0 otherwise
Note that such a measure takes values in [0, 1], and are supported on Y .
Let
Mn = {µ ∈ M(ΛN ) : µ is n-admissible}
32
Let µn be a subsequential limt of (µn,k )∞ k N
k=1 . Since Λ = [∅] is a sylinder set, µn (Λ ) =
limk→∞ µn,k (ΛN ) is equal to the right hand side above. Also, since n-admissibility is
defined by weak inequalities on the masses of cylinder sets, µ is n-admissible, and it is
supported on Y because Y is closed (Lemma 3.15).
Next, let µ be a measure on (ΛN , Borel) which arises as a sub-sequential limit
µ = lim µnk . It is immediate that
(
2−αk if [a] ∩ Y 6= ∅
µ([a1 . . . ak ]) = lim µnk ([a1 . . . ak ]) ≤
k→∞ 0 otherwise
Proof. Fix n. First we claim that for every ω ∈ ΛN there is some 0 ≤ k ≤ n such
that equality holds in (4) for a = ω1 . . . ωk . For suppose not; then there is a point
ω = ω1 ω2 . . . such that µn ([ω1 . . . ωk ]) < 2−αk for all 0 ≤ k ≤ n. Define
n o
c = min 2−αk − µn ([ω1 . . . ωk ]) : 0 ≤ k ≤ n
so that c > 0, and let µ′n = µn + c · δω . Then µ′n is n-admissible, since (4) holds for
C = [ω1 . . . ωk ] by choice of c, and for any other cylinder set C ′ it holds because ω ∈
/ C′
an therefore µ′n (C ′ ) = µn (C). But now µ′n (Λn ) = µn (ΛN ) + c, contradicting maximality
of µn .
Thus for every ω = ω1 ω2 . . . ∈ Y we have at least one cylinder set Cω = [ω1 . . . ωk ]
with 0 ≤ k ≤ n and such that µn ([ω1 . . . ωk ]) = 2−αk .
Let E = {Eω }ω∈Y be the cover of Y thus obtained. Lemma 3.6 provides us with a
disjoint subcover F ⊆ E of Y .
Finally, for F ∈ F we have µ(F ) = 2−αn = |F |α , hence
X X
Hβ∞ (Y ) ≤ |F |β = µn (F ) = µn (Y ) = µn (ΛN )
F ∈F F ∈F
as claimed.
It may be of interest to note that the argument in the proof above is a variant of the
max flow/min cut theorem from graph theory. To see this, consider Λ≤n nad the tree
of height n + 1 in ΛN . The lemma shows that the maximal flow from the root [ω] = ΛN
33
to the set of leaves a ∈ Λn , is equal to the weight minimal cut, and that the weight of
any cutset is bounded below by Hβ∞ (Y ). See ??.
We have proved Frostman’s lemma for closed sets in Rd but the result is known
far more generally for Borel sets in complete metric spaces. See Mattila ?? for further
discussion.
4 Product sets
In this section we conside rproduct sets. For simplicity, we restrict the discussion to
Rd , although the results hold in general metric spaces. It is convenient to work with
the sup-norm k·k∞ , because under this norm if A ⊆ Rd and B ⊆ Rk are bounded, then
A × B ⊆ Rd+k and |A × B| = max{|A|, |B|}.
In general, we have
and if one of dimM X, dimM Y exist, the the inequalities above are equalities.
′ ′
Proof. A b-adic cell in Rd × Rd is the product of two b-adic cells from Rd , Rd , and it is
simple to verify that
N (X × Y, Db ) = N (X, Db ) · N (Y, Db )
taking logarithms and inserting this into the definition of dimM , the claim follows from
properties of lim sup and lim inf.
34
probability measure νε supported on Y . Then θε = µε × νε is a probability measure
supported on X × Y . We claim that it is (α + β − 2ε)-regular. Indeed, assuming without
loss of generality that we are using the ℓ∞ norm on all spaces involved, for (x, y) ∈ X ×Y
we have Br (x, y) = Br (x) × Br (y) so
θε (Br (x, y)) ≤ µε (Br (x)) · µε (Br (y)) ≤ C1 rα−ε · C2 rβ−ε = Crβ+β−2ε
Hence by the mass distribution principle, dim X × Y ≥ α + β − 2ε, and since ε was
arbitrary, dim X × Y ≥ α + β.
For the other inequality write γ = dimM Y and let 0 < ε < 1. Since H∞ α+ε (X) = 0
S∞ P
we can find a cover X ⊆ i=1 Ai with |Ai |α+ε < ε, and in particular |Ai | < ε1/(α+1)
for each i.
Next, for each i, there is a cover Ai,1 , . . . , Ai,N (Y,|Ai |) of Y by N (Y, |Ai |) sets of
diameter |Ai |.
Assuming ε is small enough, using |Ai | < ε1/(α+1) and the definition of γ, we have
that |N (Y, |Ai |)| < |Ai |−(γ+ε) for each i. Thus {Ai × Ai,j } is a cover of X × Y satisfying
X X i |)
∞ N (Y,|A ∞
X
|Ai × Ai,j |α+γ+2ε = |Ai |α+γ+2ε N (Y, |Ai |)|
i=1 j=1 i=1
X∞
≤ |Ai |α+γ+2ε |Ai |−(γ+ε)
i=1
∞
X
< |Ai |α+ε
i=1
<ε
α+γ+2ε
This shows that H∞ (X × Y ) = 0, so dim X × Y ≤ α + β, as desired.
dim X × Y = dimM X × Y
35
Proof. Suppose e.g. that dim Y = dimM Y . Then
dim X × Y ≤ dimM (X × Y )
= dimM X + dimM Y
= dim X + dim Y
= dim X × Y
The following example shows that one cannot do much better than this: although
we always have dim X × Y ≥ dim X + dim Y , the ineuqality may be strict. In fact, we
show that it may happen that dim X = dim Y = 0 but dim X × Y = 1.
Recall that for E ⊆ N the set XE is the set of x ∈ [0, 1] whose n-th binary digit
is 0 if n ∈ E, and otherwise may be 0 or 1. We saw in Example 3.5 that dim XE =
d(N \ E) = lim inf n1 |{1, . . . , n} \ E|. Now let E, F ⊆ N be the sets
∞
[
E = N∩ [(2n)!, (2n + 1)!)
n=1
[∞
F = N∩ [(2n + 1)!, (2n)!)
n=1
These sets are complementary, and it is clear that d(E) = d(F ) = 0, so dim XE =
dim XF = 0.
On the other hand observe that for any every x ∈ [0, 1] there are x1 ∈ XE and
x2 ∈ XF such that x1 + x2 = x, since for x1 we can take the number whose binary
expansion is the same as that of x at coordinates outside E but 0 elsewhere, and
similarly for x2 using F . Writing π(x, y) = x + y, we have shown that π(X × Y ) ⊇ [0, 1]
(in fact there is equality). But π is a 1-Lipschitz map R × R → R, so dim X × Y ≥
dim π(X × Y ) ≥ dim[0, 1] = 1.
Remark 4.4. There is a slight generalization of Proposition 4.2 using the notion of
packing dimension, which is defined by
36
pdim X = inf{sup dimM Xi : {Xi }∞
i=1 is a partition of X}
i
This notion is designed to fix the deficiency of box dimension with regard to countable
S
unions, since it is easy to verify that pdim An = supn pdim An . We will not discuss it
much but note that pdim is a natural notion of dimension in certain contexts, and can
also be defined intrinsically in a manner similar to the definition of Hausdorff dimension,
S
which is the one that is usually given. In particular, note that if Y = ∞ n=1 Yn then by
the previous theorem,
∞
[
dim X × Y = dim (X × Yn ) ≤ sup(dim X + dimM Yn ) = dim X + sup dimM Yn
n n
n=1
S
Now optimize over partitions Y = Yn and using the definition of pdim, we find that
Exercises
1. Prove that in Proposition 4.1, a strict intequality is possible for upper and lower
Minkowski dimensions.
2. Prove the conclusion of Proposition 4.1 for general metric spaces. For this purpose
define the metric in X × Y by d((x, y), (x′ , y ′ )) = max{d(x, y), d(x′ , y ′ )}.
3. For every 0 ≤ α, β < 1 with α + β < 1, show that there are sets X, Y ⊆ [0, 1] such
that dim X = α, dim Y = β and dim X × Y = 1.
5 Differentiation of measures in Rd
We have seen that measures can play an important auxiliary role in computing the
dimension of sets. In this section we etablish some general results on the local structure
of measures in Rd , which, roughly speaking, show that the local structure of a measure
µ on a set A ⊆ Rd is of its structure on Rd \ A. We also obtain local criteria for absolute
continuity of one measure with respect to another.
37
Parts of the discussion below are vaid in any metric space but the main results are
special for Rd . The choice of norm on Rd is not very significant, but may affect the
constants. For concreteness we fix the Euclidean norm.
By Zorn’s lemma, given r > 0, every set in Rd contains r-separated sets which are
maximal with respect to inclusion. By seperability, any r-separated set in Rd are at
most countable.
Lemma 5.2. Let r > 0 and let A ⊆ Rd be a r-separated. Then |B2r (z) ∩ A| ≤ C for
every z ∈ Rd , where C = C(d).
Proof. If this were false then we could find sequences rn > 0, points xn ∈ Rd and
rn -separated En ⊆ Rn such that
|B2rn (xn ) ∩ En | ≥ n
By re-scaling and translating xn to the origin we find that B2 (0) contains 1-separated
sets of arbitrarily large size. This contradicts the compactness of B2 (0).
P
Restricting the right inequality to A gives 1A ≥ 1
C E∈E 1E∩A , so for any measure µ,
Z
µ(A) = 1A dµ
Z X
1
≥ 1E∩A dµ
C
E∈E
1 X
= µ(A ∩ E)
C
E∈E
38
Lemma 5.4. Let E be a collection of balls in Rd with multiplicity C and such that each
B ∈ E has radius ≥ R. Then any ball Br (x) of radius r ≤ 2R intersects at most 4d C of
the balls in E.
1 X
k
≥ vol(Ei′ )
C
i=1
k
= · c · Rd
C
Therefore k ≤ 4d C, as claimed.
Proof. Clearly z 6= x, y and the hypothesis remains unchanged if we replace the smaller
of the radii by the larger, so we can assume s = r. Since the metric is induced by a
norm, by translating and re-scaling we may assume z = 0 and r = 1. Thus the problem
is equivalent to the following: given x, y with kxk = kyk = 1 and d(x, y) > 1, give a
positive lower bound ∠(x, y). This follows from the cosine law, since by the cosine law,
1 < kx − yk2
= kxk2 + kyk2 − 2 kxk kyk cos ∠(x, y)
≤ 2 − 2 cos ∠(x, y)
39
Proof. We may write E = {Br(x) (x)}x∈A , discarding redundant balls if necessary. Let
R0 = supx∈A r(x), so by assumption R0 < ∞, and let Rn = 2−n R0 . Also write
In the proof of Billingsley’s lemma (Proposition 3.7), we used the fact that any cover
40
of A by b-adic cubes contains a disjoint sub-cover of A (Lemma 3.6). Covers by balls do
not have this property, but the proposition above and the calculation before Lemma 5.4
often are a good substitute and can be used for example to prove Billingsley’s lemma
for balls.
[
k [ X
k [
µ(A) ≤ µ( E) ≤ µ( E)
i=1 E∈Ei i=1 E∈Ei
S
so there is some i with µ( E∈Ei E) ≥ k1 µ(A) ≥ 1
′ µ(A). Since Ei is countable, we can
S
C
find a finite sub-collection F ⊆ Ei such that µ( F ∈F F) > 1
2C ′ µ(A). This proves the
claim with the constant C = 2C ′ .
Proof. We clearly may assume that E has bounded diameter, that µ is supported on A
(i.e. µ(Rd \ A) = 0), and that µ(A) > 0. Assume also that µ(A) < ∞, we will remove
this assumption later. Finally we may assume µ(Rd \ A) = 0, since we can always
replace µ|A .
We will define by induction an increasing sequence F1 ⊆ F2 ⊆ . . . of disjoint, finite
sub-collections of E such that, at each step; we do so by applying the previous corollary
at each step to a large subset of the set that has not yet been covered. The will then
S
show that F = ∞ k=0 Fk has the desired properties.
Let C be the constant from the previous corollary. To begin, let F1 be the family
obtained by applying the previous corollary to E, so
[ 1
µ( F) > µ(A)
C
F ∈F1
S
Assuming Fk has been defined, write Fk = F ∈Fk F . This is a closed set (it is
a finite union of closed balls), By assumption, for every x ∈ A \ Fk there are balls
Br (x) ∈ E with arbitrarily small radius and when r is small enough, Br (x) ∩ Fk = ∅ (we
41
use here the fact that Fk is closed), so the collection
µ(A \ Fk ) ≥ µ(A \ F )
and consequently
∞
X [
µ(A) = µ( F)
k=1 F ∈Fk′
∞
X 1
≥ µ(A \ Fk )
C
k=1
∞
1 X
≥ µ(A \ F )
C
k=1
=∞
Proof like the one in class. We clearly may assume that E has bounded diameter, that
µ is supported on A (i.e. µ(Rd \ A) = 0), and that µ(A) > 0. Assume also that
µ(A) < ∞, we will remove this assumption later. Finally we may assume µ(Rd \ A) = 0,
since we can always replace µ|A .
42
We will define by induction an increasing sequence F1 ⊆ F2 ⊆ . . . of disjoint, finite
sub-collections of E such that, at each step; we do so by applying the previous corollary
at each step to a large subset of the set that has not yet been covered.. The will then
S
show that F = ∞ k=0 Fk has the desired properties.
Let C be the constant from the previous corollary. To begin, let F1 be the result of
applying the previous corollary to E, so
[ 1
µ( F) > µ(A)
C
F ∈F1
so that
[ 1
µ(A \ F ) < (1 − )µ(A)
C
F ∈Fk
S
Assuming Fk has been defined and writing Fk = F ∈Fk F , fix a parameter δ > 0
with the property that
1−δ 1
(1 − )µ(A \ F ) < (1 − )k+1 µ(A)
C C
Since µ is Radon and Fk is closed (it is a finite union of closed balls), there exists
an ε > 0 such that
(ε)
µ(A \ Fk ) > (1 − δ)µ(A \ F )
(ε)
By assumption, the collection of balls in E whose radius is < ε and center is in A \ Fk
(ε)
is a Besicovitch cover of A \ Fk . Apply the previous corollary to this collection and
the set A \ Fk . We obtain a finite, disjoint collection of balls Fk′ ⊆ E such that
(ε)
[ 1 (ε) 1
µ( F) > µ(A \ Fk ) > (1 − δ)µ(A \ F )
C C
F ∈Fk′
As the elements of Fk′ are of radius < ε and have centers in A \ Fk , they are disjoint
(ε)
43
Remark 5.10. To see that the Besicovitch theorem is not valid for families of open balls,
P
consider the measure on [0, 1] given by µ = 21 δ0 + ∞ n=1 2
−n−1 δ
1/n , and consider the
◦
S ∞ ◦
collection of open balls E = {B1/n (0)}n≥1 ∪ n=1 {B1/k (1/n)}k>n . Any sub-collection
F whose union has full µ-measure must contain B1/n (0) for some n, since it must cover
0, but it also must cover 1/n so it must contain B1/k (1/n) for some k, and hence F is
not disjoint.
The results of this section should be compared to the Vitali covering lemma:
Lemma 5.11 (Vitali covering lemma). Let A be a subset of a metric space, and
{Br(x) (x)}x∈A a collection of balls with centers in A such that supi∈I r(i) < ∞. Then
one can find a subset A′ ⊆ A such that {Br(j) (x(j))}x∈A′ are pairwise disjoint and
S S
x∈A Br(x) (x) ⊆ x∈A′ B5r(x) (x).
This lemma is enough to derive an analog of Theorem 5.9 when the measure of a
ball varies fairly regularly with the radius. Specifically,
Theorem 5.12 (Vitali covering theorem). Let µ be a measure such that µ(B3r (x)) ≤
cµ(Br (x)) for some constant c. Let {Br(x) (x)}x∈A be as in the Vitali lemma, with A a
Borel set. Then there is a set of centers A′ ⊆ A such that {Br(x) (x)}x∈A′ is disjoint,
S S
and µ( x∈A′ Br(x) (x)) > c−1 µ( x∈A Br(x) (x)).
For a general set A ⊆ Rd and x ∈ A, small balls Br (x) may intersect both A and its
complement. So, no matter how “close” you get to x, you will not be able to avoid
seeing some of the complement. For example if A is a half plane and x is a point on the
boundary of A then Br (x) ∩ A is exactly “half” of Br (x); “half” is exactly true if we
measure it with respect to Lebesgue measure. For another example, consider A = Q in
the line. Then for x ∈ A, both A and R \ A are dense in every ball Br (x).
Nevertheless, for Lebesgue measure λ there is a weaker form of separation between
A and Rd \ A that holds at a.e. point. Let µ = λ|A and write c for the volume of the
unit ball. Then the Lebesgue density theorem states that
for λ-a.e. x ∈ A, ir, equivalently, for µ-a.e. x. This implies that λ(Br (x) \ A)/crd → 0
44
as r → 0 for µ-a.e. x. Thus, if we look at small balls around a µ-typical point, we see
measures which have an asymptotically negligible contribution from Rd \ A.
In this section we establish similar results for general Radon measures in Rd . Note
that in the limits above, crd = λ(Br (x)), so we can re-state the Lebesgue density
theorem as
λ(Br (x) ∩ A)
lim =1 λ-a.e. x ∈ A
r→0 λ(Br (x))
This is the form that our results for general measures will take.
Let µ be a finite measure on Rd and f ∈ L1 (µ). Define
Z
+ 1
f (x) = lim sup f dµ
r→0 µ(Br (x)) Br (x)
Z
− 1
f (x) = lim inf f dµ
r→0 µ(Br (x)) Br (x)
Proof. First, for each r > 0, we claim that fr is measurable. It suffices to prove this for
f ≥ 0, since a general function can be decomposed into positive and negative parts.
We claim that, in fact, if f ≥ 0 then fr is upper semi-continuous (i.e. fr−1 ((−∞, t))
is open for all t), which implies measurability. To see this note that if xn → x and
s > r, then Br (xn ) ⊆ Bs (x) for large enough n, which implies fr (xn ) ≤ fs (x). Thus
45
R
Since Br (x) f dµ/µ(Br (x)) = fr (x)/gr (x), where g ≡ 1, we see that f ± are upper
and lower limits of measurable functions fr /gr as r → ∞ along the rationals. Hence f ±
are measurable.
It is easy to verify that f − (x) ≥ f (x) holds µ-a.e. if and only if µ(Aa,b ) = 0 for all
0 < a < b.
Suppose then that µ(Aa,b ) > 0 for some a < b and let U an open set containing
Aa,b . By definition of Aa,b , for every x ∈ Aa,b there are arbitrarily small radii r such
that Br (x) ⊆ U and fr (x) < aBr (x). Applying the Besicovitch covering theorem to the
collection of these balls, we obtain a disjoint sequence of balls {Bri (xi )}∞
i=1 such that
S∞ R
Aa,b ⊆ i=1 Bri (xi ) ⊆ U up to a µ-null-set, and Br (xi ) f dµ = fr (xi ) < aBr (xi ) for
i
each i. Now,
Z
b · µ(Aa,b ) < f dµ
Aa,b
XZ
∞
≤ f dµ
i=1 Bri (xi )
∞
X
< a · µ(Bri (xi ))
i=1
≤ a · µ(U )
Since µ is regular, we can find open neighborhoods U of Aa,b with µ(U ) arbitrarily close
to µ(Aa,b ). Hence, the inequality above shows that b · µ(Aa,b ) ≤ a · µ(Aa,b ), which is
impossible. Therefore µ(Aa,b ) = 0, and we have proved that f − ≥ f µ-a.e.
Similarly for a < b define
Then f + (x) = f (x) µ-a.e. unless µ(A′a,b ) > 0 for some a < b. Suppose such a, b exist
46
and let U and {Bri (xi )}∞ ′
i=1 be defined analogously for Aa,b . Then
Z ∞ Z
X
f dµ ≥ f dµ
U i=1 Bri
X∞
> b · µ(Bri (xi ))
i=1
≥ b · µ(A′a,b )
On the other hand, by regularity and the dominated convergence theorem, we can find
R R
U as above such that U f dµ is arbitrarily close to Aa,b f dµ < a · µ(A′a,b ), and we again
obtain a contradiction. Thus f + ≤ f µ-a.e.
We have shown that f − (x) ≥ f (x) ≥ f + (x) µ-a.e. On the other hand, f − ≤ f +
everywhere. Thus µ-a.e. we have f − ≤ f + ≤ f ≤ f − , so we have equality throughout.
The formulation of the theorem makes sense in any metric space but it does not
holds in such generality. The main cases in which it holds are Euclidean spaces and
ultrametric spaces, in which balls of a fixed radius form a partition of the space, for
which the Besicovitch theorem holds trivially.
µ(Br (x) ∩ A)
lim =1
r→0 µ(Br (x))
ν(Br (x))
lim
r→0 µ(Br (x))
47
exists and is positive and finite for ν-a.e. x, and in this case,
ν(Br (x)) dν
lim = (x)
r→0 µ(Br (x)) dµ
ν(Br (x))
lim
r→0 rd
Proof. Suppose that ν µ and set f = dν/dµ. Then by Theorem 5.14 we have
R
ν(Br (x)) B (x) f dµ
lim = lim r = f (x) µ-a.e.
r→0 µ(Br (x)) r→0 µ(Br (x))
The set where the limit exists and f is positive has ν-measure 1, proving the claim.
Now suppose that ν 6 µ. Then there is a set A with µ(A) = 0 and ν(A) > 0.
Since ν(B ∩ A) = (µ + ν)(B ∩ A) for every set B, by the density theorem we have, for
(µ + ν)-a.e. x ∈ A (equivalently ν-a.e. x ∈ A),
Also
ν(Br (x) ∩ A)
lim =1
r→0 ν(Br (x))
for ν-a.e. x ∈ A, so for such x,
ν(Br (x))
lim =1
r→0 (µ + ν)(Br (x))
This implies that µ(Br (x))/ν(Br (x)) → 0 for ν-a.e. x ∈ A, or equivalently, ν(Br (x))/µ(Br (x)) →
∞, so the conclusion fails.
The last statement follows from the first using the fact that λ(Br (x)) = c · rd .
48
integer base. Then for µ-a.e. x we have
Z
1
lim f dµ = f (x)
n→∞ µ(Dbn (x)) Dbn (x)
µ(Dbn (x) ∩ A)
lim =1
n→∞ µ(Dbn (x))
Similarly the other corollary and proposition above hold along b-adic cubes. The
proofs are identical to the one above, using Lemma 3.6 instead of the Besicovitch cover-
ing lemma. Alternatively, this is a consequence of the Martingale convergence theorem.
Thus dim(µ, x) = α means that the decay of µ-mass of balls around x scales no
slower than rα , i.e. for every ε > 0, we have µ(Br (x)) ≤ rα−ε for all sufficiently small
r, but µ(Br (x)) ≥ rα+ε for arbitrarily small r.
Remark 6.2. 1. One can also define the upper pointwise dimension using limsup, but
we shall not have use for it,
2. In many of the cases we consider the limit 5 exists, and there is no need for lim sup
or lim inf.
Example 6.3. 1. If µ = δu is the point mass at u, then µ(Br (u)) = 1 for all r, hence
dim(µ, u) = 0.
49
2. If µ is Lebesgue measure on Rd then for any x, µ(Br (x)) = crd , so dim(µ, x) = d.
3. Let µ = λ+δ0 where λ is the Lebesgue measure on the unit ball. Then if x 6= 0 is in
the unit ball, µ(Br (x)) = λ(Br (x)) for small enough r, so dim(µ, x) = dim(λ, x) =
d. On the other hand µ(Br (0)) = λ(Br (0)) + 1, so again dim(µ, 0) = 0.
This example shows that in general the pointwise dimension can depend on the
point.
Lemma 6.4. If ν, µ are Radon measures and ν µ then dim(ν, x) = dim(µ, x) for
ν-a.e. x.
In particular, if µ(A) > 0 and ν = µ|A , then dim(µ, x) = dim(ν, x) for µ-a.e..
x ∈ A.
Proof. Let f = dν/dµ. By Proposition 5.16, limr→0 ν(Br (x))/µ(Br (x)) = f (x) ∈ (0, ∞)
for ν-a.e. x. Taking logarithms and dividing by log r, we have
log ν(Br (x)) log µ(Br (x)) f (x) + o(1)
lim − = lim =0 ν-a.e. x
r→0 log r log r r→0 log r
Thus the limit inferior of the two terms are equal, giving, dim(ν, x) = dim µ(x), as
claimed.
We saw that Hausdorff dimension of sets may be defined using b-adic cells rather
than arbitrary sets. We now show that pointwise dimension can similarly be defined
using decay of mass along b-adic cells rather than balls.
Note that we may have x ∈ µ and µ(Dbn (x)) = 0 for some b, n, so dimb (µ, x) may
not be defined on all of µ. However, it is define µ-a.e., since there are countably many
b-adic cubes D with measure zero, so µ-a.e. every x belongs only to cells of positive
measure.
In general dim(µ, x) 6= dimb (µ, x). Nevertheless, at most points the notions agree:
Proof. We have Dbn (x) ⊆ Bc·b−n (x). Therefore µ(Dbn (x)) ≤ µ(Bc·b−n (x)), hence
50
We want to prove that equality holds a.e., hence suppose it does not.
Then we can find an α and ε > 0, and a set A with µ(A) > 0, such that dimb (µ, x) >
α + 3ε and dim(µ, x) < α + ε for x ∈ A.
Applying Egorov’s theorem to the limits in the definition of dimb , and replacing A
by a set of slightly smaller but still positive measure, we may assume that there is an
r0 > 0 such that µ(Dbn (x)) < b−n(α+2ε) for every x ∈ A and n satisfying b−n < r0 .
Let ν = µ|A and let x be ν-typical.
By Lemma 6.4, dim(ν, x) = dim(µ, x) < α + ε, so there are arbitrarily large k for
which
ν(Bb−k (x)) ≥ b−k(α+ε)
The number of cells on the last line is at most 2d , so we have found that if b−k < r0
then
ν(Bb−k (x)) < 2d b−k(α+2ε)
ombining the two bounds, for arbitrarily large k we have b−k(α+ε) ≤ 2d ·b−k(α+2ε) , which
is impossible.
As a consequence,
1. The analog of Lemma 6.4 holds for dimb (this could also be derived directly from
the differentiation theorem along b-adic cells).
2. The pointwise dimension of µ is a.s. independent of the norm used in the definition.
This follows since the equivalence with dimb is valid in any norm.
Exercises
This shows that the pointwise dimension of a set in Rd does not have to be ≤ d
at every point (but it does at a.e. point, as will be shown in the next section).
51
6.2 Upper and lower dimension of measures
Having defined dimension at a point, we now turn to global notions of dimension for
measures. These are defined as the largest and smallest pointwise dimension, after
ignoring a measure-zero sets of points.
Recall that if f is a measurable function on a measure space (X, B, µ) then the
essential supremum of f is
Definition 6.7. The upper and lower Hausdorff dimension of a Radon measure
µ are defined by
If dim µ = dim µ, then their common value is called the pointwise dimension of µ
and is denoted dim µ.
To see that these two quantities need not agree, take µ = λ+δ0 , where λ is Lebesgue
measure. Then dim µ = 0 (because dim(µ, 0) = 0 and µ({0}) > 0), and dim µ = d
because for any x ∈ Rd \ {0}, dim(µ, x) = d.
52
and for any µ ∈ P(Rd ),
Proof. For the first part, note that trivially we have dim µ ≤ dim µ, so
On the other hand, by Frostman’s lemma, for every ε > 0 there is a (dim A − ε)-regular
measure µ supported on A (we only proved this for closed A, but it is true for Borel
sets as well). Thus dim µ ≥ dim A − ε. Since ε was arbitrary, we have shown that
Combining these inequalities in the last threee equations, we have proved the first part
of the proposition.
For the second part write α = dim µ. We begin with the first identity. Let
A0 = {x ∈ A : dim(µ, x) ≤ α}
α ≥ inf{dim A : µ(Rd \ A) = 0}
On the other hand if A is a set such that µ(Rd \ A) = 0, then the essential supremum
of dim(µ, x) for x ∈ A is α, so for every ε > 0 there is a subset Aε ⊆ A of positive
measure such that dim(µ, x) ≥ α − ε for x ∈ Aε . By the lower bound in Billingsley’s
lemma, dim Aε ≥ α − ε, and since dim A ≥ dim Aε , we have dim A ≥ α − ε. Since ε was
arbitrary, dim A ≥ α. This shows that
α ≤ inf{dim A : µ(Rd \ A) = 0}
53
proving the first identity.
For the second identity write β = dim µ. If µ(A) > 0 then after removing a set of
measure 0 from A, we have dim(µ, x) ≥ dim µ for x ∈ A, so by Billingsley’s lemma,
dim A ≥ dim µ. This shows that
Given ε > 0 we can find a set Aε of positive measure such that dim(µ, x) ≤ β + ε for
x ∈ Aε , and then by Billingsley’s lemma dim Aε ≤ β + ε. Since ε was arbitrary this
shows that
β ≥ inf{dim A : µ(A) > 0}
Proof. Otherwise, for some ε > 0, we would have dim(µ, x) > d + ε on a positive
µ-measure set. Then
dim µ = essinf dim(µ, x) > d + ε
x∼µ
a contradiction.
Proof. We can find pairwise disjoint sets A, A0 , A1 such that µ|A ∼ ν0 |A ∼ ν1 |A , and
µ|A1 ⊥ ν0 and µ|A0 ⊥ µ1 . By the previous corollaries, for µ-a.e. x ∈ A we have
dim(µ, x) = dim(ν1 , x) = dim(ν2 , x), while for µ-a.e. x ∈ A0 we have dim(µ, x) =
dim(ν0 , x) and for µ-a.e. x ∈ A1 we have dim(µ, x) == dim(ν1 , x). The claim follows
from the definitions.
54
The proof for countable sums is similar.
R
If µ = νω dP (ω), we use Proposition 6.9. If µ(A) > 0 then νω (A) > 0 for a set of ω
with positive P -measure. For each such ω, we have dim A ≥ dim νω and it follows that
and dim µ ≥ essinf ω∼P dim νω follows follows from Proposition 6.9. The other inequality
is proved similarly by considering sets A with µ(Rd \ A) = 0.
The inequality in the corollary is not generally an equality: Every measure µ can
R
be written as µ = δx dµ(x), but essinf x∼µ dim δx = 0 can be strictly less than dim µ.
Exercises
1. ?
7 Hausdorff measures
7.1 Hausdorff measure
We return temporarily to the metric space setting. The definition of Hα∞ was closely
modeled after the definition of Lebesgue measure, but as we noted, it is not a measure
on the Borel sets. A slight modification of the definition yields a true measure which is
often viewed as the α-dimensional analog of Lebesgue measure. For δ > 0 let
X
Hαδ (A) = inf{ |E|α : E is a cover of A by sets of diameter ≤ δ}
E∈E
This is an outer measure for every δ > 0, but the Borel sets are not necessarily measur-
able with respect to Hαδ .
Decreasing δ means that the infimum in the definition of Hαδ is taken over a smaller
family of covers, so Hδα is non-decreasing as δ & 0. Thus
55
Definition 7.1. The measure Hα on the Borel σ-algebra is called the α-dimensional
Hausdorff measure.
In particular,
Proof. A calculation like the one in Lemma 2.8 shows that for δ ≤ 1,
The first inequality and the two implications follow from this, since δ β−α → 0 as δ → 0.
The second part follows from the first and the trivial inequalities Hα (A) ≥ Hα∞ (A),
Hβ (A) ≥ Hβ∞ (A).
The proposition implies that Hα is α-dimensional in the sense that every set of
dimension < α has Hα -measure 0. We will discuss its dimension more below. We note
a slight sharpening of (6):
Proof. The first statement is immediate since since H0δ (A) = N (A, δ). It is clear from the
definition that Hα is translation invariant, and it is well known that up to normalization,
Lebesgue measure is the only σ-finite non-zero translation-invariant Borel measure on
Rd . It is easily shown that Hd (Br (0)) < ∞ for every r > 0, so Hd is σ-finite. Also, by
definition, Hdδ ≥ λ for every δ > 0, so Hd ≥ λ, and in particular Hd 6= 0. Hence Hd
is equal to a multiple of Lebesgue measure. Finally, Lemma 7.2 implies that Hα is not
equivalent to Hd for α < d, so it cannot be σ-finite, and one may verify directly that
Hα ({x}) = 0 for α > 0.
56
Exercises
Definition 7.5. Given α > 0, a measure µ and x ∈ µ, the upper and lower α-
dimensional densities of µ at x are
µ(Br (x))
Dα+ (µ, x) = lim sup
r→0 (2r)α
µ(Br (x))
Dα− (µ, x) = lim inf
r→0 (2r)α
Lemma 7.6. If Dα+ (µ, x) < ∞ then dim(µ, x) ≥ α and if Dα+ (µ, x) > 0 then dim(µ, x) ≤
α.
Proof. If Dα+ (µ, x) < t < ∞ then for small enough r we have µ(Br (x)) < t(2r)α . Taking
logarithms and dividing by log r we have
for all small enough r, so dim(µ, x) ≥ α. The other inequality follows similarly.
The quantity Dα− is similarly related to the upper pointwise dimension. Of the two
quantities, Dα+ is more meaningful, as demonstrated in the next two theorems, which
essentially characterize measures for which Dα+ is positive and finite a.e..
C
Dα+ (µ, x) > s for all x ∈ A =⇒ Hα (A) ≤ · µ(A)
s
1
Dα+ (µ, x) < t for all x ∈ A =⇒ Hα (A) ≥ · µ(A)
2α t
57
In particular, if
0 < inf Dα+ (ν, x) ≤ sup Dα+ (ν, x) < ∞ for all x ∈ A
x∈A x∈A
then µ ∼ Hα |A .
Proof. The proof is similar to that of Billingsley’s lemma, combined with an appropriate
covering lemma.
For the first statement fix an open neighborhood U of A, and for δ > 0 let
X 1 X 1 X 1
|E|α ≥ α
|F | α
> α
µ(F ) ≥ α µ(A1/n )
2 2 t 2 t
E∈E F ∈F F ∈F
Taking the infimum over such covers E we have Hαδ (A1/n ) ≥ 2−α t−1 µ(A1/n ). Since this
holds for all δ < 1/2n we have Hα (A1/n ) ≥ 2−α t−1 µ(A1/n ). Letting n → ∞ gives the
conclusion.
For the last statement, note that the previous parts apply to any Borel subset of
A′ ⊆ A. Thus µ(A′ ) = 0 if and only if Hd (A′ ) = 0, that is, µ ∼ Hd |A .
We will use the theorem later to prove absolute continuity of certain measures with
respect to Lebesgue measure.
58
Theorem 7.8. Let A ⊆ Rd , α = dim A and suppose that 0 < Hα (A) < ∞. Let
µ = Hα |A . Then
2−α ≤ Dα+ (µ, x) ≤ C
Proof. Let
At = {x ∈ A : Dα+ (µ, x) > t}
C α C
µ(At ) ≤ H (At ) = µ(At )
t t
We remark that the constant C in Theorem 7.8 can be taken to be 1, but this
requires a more careful analysis, see ??. Any lower bound must be strictly less than 1
by Theorem 7.10 below. The optimal lower bound is not known.??
Theorem 7.10 (Preiss). If µ is a measure on Rd and limr→0 µ(Br (x))/rα exists µ-a.e.
then α is an integer and µ is Hausdorff measure on the graph of a Lipschitz function.
59
Hausdorff measure; this is the case for the self-affine sets discussed in Section ??, see
??.
Another interesting result is that any Borel set of positive Hα measure contains a
Borel subset of positive finite Hα measure; see ??. Thus the measure in the conclusion
of Frostman’s lemma can always be taken to be the restriction of Hα to a finite measure
set. This lends some further support to the idea that Hα is the canonical α-dimensional
measure on Rd .
We end the discussion Hausdorff measures with an interesting fact that is purely
measure-theoretic. Recall that measure spaces (Ω, F, µ) and (Ω′ , F ′ , µ′ ) are isomorphic
if there is a bijection f : Ω → Ω such that f, f −1 are measurable, f induces a bijection
of F → F ′ , and f µ = µ′ .
Theorem 7.11. Let B denote the Borel σ-algebra of R and Bα its completion with
respect to Hα . If 0 ≤ α < β ≤ 1 then (R, B, Hα ) ∼
6 (R, B, Hβ ), but (R, Bα , Hα ) ∼
= =
(R, Bβ , Hβ ) are isomorphic for all 0 < α, β < 1.
Up until now we have viewed Rd primarily as a metric space with special combionatorial
properties (e.g. Besicovitch lemma). We now change perspective, and turn to questions
which involve, directly or indirectly, the group or vector structure of Rd .
In this section we examine the behavior of sets and measures under linear maps. For
simplicity we consider the case of linear maps R2 → R, although many of the results
extend to general linear maps Rd → Rk , and we shall sometimes state them this way.
The basic heuristic at play here is that when one projects a set or measure via a
linear map, the image should be “as large as possible”. We will see a number of such
statements.
We parametrize linear maps in various ways as is convenient, but in all the parame-
terizations measures on the space of linear maps will be equivalent, so statements that
hold for a.e. linear maps will be independent of the parametrization.
Denote the set of unit vectors in R2 by S 1 , and for u ∈ S 1 let πu : R2 → R denote the
lnear functional
πu (x) = x · u
Up to linear change of coordinates this is the orthogonal projection of x to the line Ru.
60
Lemma 8.1. Let f : X → Y be a Lipschitz map between compact metric spaces. Let
A ⊆ X and µ ∈ P(X). Then
1. dim f A ≤ dim{dim Y, dim A} .
Thus, if we take the linear image of a set A or measure µ under a linear map, the
image will not be larger than the original object. The content of the following theorem
is that, typically, there is no other constraint.
Identify the set of unit vectors S 1 with angles [0, 2π), and the corresponding length
measure by λ.
Theorem 8.2 (Marstrand). If µ ∈ P(R2 ), then
′
An analogous result holds for π : Rd → Rd and sets and measures in Rd , but we
will not prove it.
We emphasize that the theorem does not give any description of the directions u ∈ S 1
for which the conclusions hold, and neither does the proof give any hint how to identify
them. It may be that there are no “bad” u, or that this zero-measure set is actually
quite large (it may be dense, or have positive dimension). Identifying whether there are
any “bad” u and, if so, who they are, is often a very challenging problem.
The result for sets follows from the measure result using Frostman’s lemma, so it
suffices to prove the result for measures. For this we require the following definition.
Definition 8.3. For a compact metric space X and µ ∈ P(X), the t-energy of µ is
Z Z
1
It (µ) = dµ(x)µ(y)
d(x, y)t
61
In Rd this reduces to
Z Z
1
It (µ) = dµ(x)µ(y)
kx − ykt
2. The property that It (µ) is finite or infinite depends only on {(x, y) : d(x, y) ≤ 1}.
Although dim µ is not quite characterized by the behavior of the function t 7→ It (µ), it
nearly is:
2. If µ(Br (x)) ≤ c·rt for every x (with c independent of x) then Is (µ) < ∞ for s < t.
Proof. (1) Suppose dim µ < t. We wish to show that It (µ) = ∞. We may assume that
µ is non-atomic since otherwise this certainly holds.
Fix s > 0 such that dim µ < s < t.
Fix a µ-typical x. For any sequence 1 = r0 > q0 ≥ r1 > q1 ≥ . . . rn > qn → 0 we
have
Z Z
1 1
dµ(y) ≥ dµ(y)
d(x, y)t B1 (x) d(x, y)t
XZ
∞
1
≥ dµ(x)
d(x, y)t
n=1 Brn (x)\Bqn (x))
X∞
1
= µ(Brn (x) \ Bqn (x))
(2rn )t
n=0
Since dim(µ, x) < s, there is a set A of positive µ-measure so that for every x ∈ A there
is a c = c(x) > 0 such that
62
Fixing such an x ∈ A, we can choose a sequence of rn , qn satisfying
1
µ(Brn (x) \ Bqn (x)) ≥ µ(Brn (x)) > crns
2
Thus Z ∞ ∞
1 1 X 1 s X
dµ(y) ≥ cr = c rns−t = ∞
d(x, y)t 2t rnt n
n=0 n=0
R 1
Thus the integrand d(x,y)t dµ(y) in the definition of It (µ) is infinite on the positive-
meausre set A, so It (µ) = ∞.
(2) We perform essentially the same calculation. Let c, t be given. Let qn−1 = rn =
2−n and s < t. Then, given x,
Z Z
1 1
dµ(y) ≤ 1 + dµ(y)
d(x, y)s B1 (x) d(x, y)t
XZ
∞
1
= 1+ dµ(x)
d(x, y)t
n=1 Brn (x)\Bqn (x))
X∞
1
≤ 1+ µ(Brn (x) \ Bqn (x))
qns
n=0
X∞
1
≤ 1+ µ(Brn (x))
qns
n=0
∞
X
≤ 1+c· 2s(n+1) · 2−tn
n=1
X∞
≤ 1 + 2c · 2−(t−s)n
n=1
63
Proof of the projeciton theorem. Let µ ∈ P(R2 ) and dim µ > t for some t < 1. Our aim
is to show that dim πu µ ≥ t for a.e. u ∈ S 1 .
We first claim that we can assume without loss of generality that It (µ) < ∞.
Indeed, dim µ > t means dim(µ, x) > t for µ-a.e. x, and this means the for µ-a.e. x
there exists c = c(x) such that µ(Br (x)) ≤ crt for all r > 0.
By (repeated application of) Egorov’s theorem, we can choose pairwise disjoint sets
S
An ⊆ R2 with µ( ∞n=1 An ) → 1, and such that the function c is bounded on each An .
The measures µ|An are t-regular by definition, hence It (µ|An ) < ∞ by Proposition
8.4.
On the other hand, if we knew for each n that dim πu (µ|An ) ≥ t for a.e. u then, then
P
for a.e. u the inequality would hold for all n, and, using the identity µ = ∞ n=1 µ|An ,
for a.e. u we would have
X
dim(πu µ) = dim πu ( µ|An )
X
= dim πu (µ|An )
= inf dim πu (µ|An )
n
≥t
Thus, we have reduced the theorem to the case It (µ) < ∞. which we now assume.
Using Fubini,
Z Z Z
1
= du dµ(x)dµ(y)
|(x − y) · u|t
64
Now since t < 1, we have (using kuk = 1),
Z Z
1 1 2π
c′
du = (cos θ)−t dθ =
|u · v| t
kvkt 0 kvkt
for a constant c′ < ∞. Note that this identity is independent of u. Continuing the
previous integration,
Z Z Z Z Z
1 ′ 1
du dµ(x)dµ(y) = c dµ(x)dµ(y)
|(x − y) · u| t |x − y|t
= c′ · It (µ)
<∞
Let A ⊆ R2 and π : R2 → R linear. Besides the dimension of πA, one may also be
interested in its topology (does it contain intervals?) or Lebesgue measure.
When dim A < 1 we have dim πA < 1, so Leb(A) = 0, and of course πA cannot
contain an interval.
It turns out that when t ≥ 1 there are two cases, depending on whether dim A = 1
or dim A > 1. In the latter regime there is an elegant answer to the measure question.
Theorem 8.6 (Marstrand). If A ⊆ R2 and dim A > 1 then Leb(πu A) > 0 for a.e.
u ∈ S 1 . Moreover if µ ∈ P(R2 ) and dim µ > 1 then πu µ λ for a.e. u ∈ S 1 .
Now,
Z
µt (πu (x) − r, πu (x) + r) = 1[πu (x)−r,πu (x)+r] (πu (y)) dµ(y)
65
and applying Fatou’s lemma, it is enough to prove that
Z Z
1
lim inf 1[πu (x)−r,πu (x)+r] (πu (y)) dµ(y) dµ(x) < ∞
r→0 2r
or: Z Z
1
lim inf 1{|πu (x)−πu (y)|≤r} dµ(y) dµ(x)
r→0 2r
This analysis gives a condition for absolute continuity of µu for fixed u ∈ S 1 . In order
to prove absolute continuity for a.e. u, it is enough to prove
Z Z Z
1
lim inf 1{|πu (x)−πu (y)|≤r} dµ(y) dµ(x) du < ∞
S1 r→0 2r
But the inner integral is now easy to compute: for x, y fixed let v = x − y. Then
Z Z
1{|πu (x)−πu (y)|≤r} du = 1{|πu (v)|≤r} du
S1 S1
But
πu v = kvk cos ∠(u, v)
and hence
Z Z Z Z Z
1 c
lim inf 1{|πu (x)−πu (y)|≤r} du dµ(y) dµ(x) ≤ lim inf dµ(y) dµ(x)
r→0 2r S1 r→0 kx − yk
= c · I1 (µ)
< ∞
by the assumption that µ is α-regular for α > 1. This completes the proof.
In the regime dim A = 1 there is more to be said. For a set A ⊆ R2 , we say that
A rectifiable if it is contained in a countable union of Lipscitz curves, and that it is
purely unrectifiable if H1 (A ∩ Γ) = 0 for every Lipschitz curve Γ. Every set A with
H1 (A) < ∞ may be decomposed as a union A = A′ ∪ A′′ , where A′ is contained in a
countable union of Lipschitz curves, and A′′ is purely unrectifiable.
66
Theorem 8.7 (Besicovitch). Let A ⊆ R2 be a set with 0 < H1 (A) < ∞.
1. If A not purely unrectifiable, then Leb(πu A) > 0 for all u ∈ S 1 except at most one
u.
(1) is not difficult but (2) is harder. For a proof, see ??.
The middle-α Cantor sets and some other example we have discussed have the common
feature that they are composed of scaled copies of themselves. In this section we will
consider such examples in greater generality.
for some 0 ≤ ρ < 1. In this case we say that f has contraction ρ. In general there is
no optimal value which can be called “the” contraction ratio, but if there is a minimal
such ρ, we call it the contraction ration of f .
Here we shall consider systems with more than one contractions:
We study IFSs (and their attractors) with two goals in mind. First, it is natural
to ask about the dynamics of repeatedly applying maps from Φ to a point. When
multiple maps are present such a sequence of iterates need not converge, but we will
see that there is an “invariant” compact set, the attractor, on which all such sequences
accumulate. Second, we will study the fractal geometry of the attractor. Such sets are
among the simplest fractals but already exhibit nontrivial behavior.
67
Example: Contraction mapping theorem
for the k-fold composition of f with itself. Recall the contraction mapping theorem:
Theorem 9.2 (Contraction mapping theorem). If (X, d) is complete metric space (X, d)
and f : X → X has contraction ρ < 1, then there is a unique fixed point x = f (x), and
for every y ∈ X we have d(x, f k (y)) ≤ ρk d(x, y) and in particular f k y → x.
If we think of the contratoin f as an IFS Φ = {f } with one map, then the fixed
point x is an attractor because
[
{x} = φ({x})
φ∈Φ
Example: Cα
It will be instructive re-examine the middle-α Cantor sets Cα from Section 2.1, where
one can find many of the features present in the general case. Write ρ = (1 − α)/2 and
consider the IFS Φ = {φ0 , φ1 } with contraction ρ given by
φ0 (x) = ρx
φ1 (x) = ρx + (1 − ρ)
(note the order of application: the first function φi1 is the “outer” function). Then the
intervals Ii1 ...in at stage n of the construction are just the images φi1 ...in I. Writing Cα,n
for the union of the stage-n intervals, it follows that Cα,n+1 = φ0 Cα,n ∪ φ1 Cα,n , and
T
since Cα = ∞ n=1 Cα,n , we have
C α = φ1 C α ∪ φ2 C α
68
Let us now examine the points x ∈ Cα . Each such point may be identified by the
sequence I n (x) of stage-n intervals to which it belongs. These intervals, which decrease
to x, are of the form
I n (x) = Ii1 ...in = φi1 φi2 . . . φin ([0, 1])
for some infinite sequence i1 i2 . . . ∈ {0, 1}N depending on x. If we fix any y ∈ [0, 1] then
φi1 ...in (y) ∈ φi1 ...in [0, 1] = I n (x), so φi1 ...in (y) → x as n → ∞.
The last calculation shows us two things. First, it shows that Cα is not just invariant
under application of φ0 , φ1 , but it actually “attracts” alll points y in [0, 1] under repeated
application. Second, we have found a “symbolic coding” of points x ∈ Cα by sequences
i1 i2 . . . ∈ {0, 1}N . In this example, we can be more explicit:
P∞
Since ρn y → 0 it follows that x = (1 − ρ) k=1 ik ρ
k−1 , and we may thus identify Cα
with the set of such sums:
( ∞
)
X
Cα = (1 − ρ) ik ρk−1 : i1 i2 . . . ∈ {0, 1}N
k=1
(For example, for α = 0 we have ρ = 12 , and we have just described the fact that every
x ∈ [0, 1] has a binary representation; and if α = 31 then ρ = 13 this is the well-known
P
fact that x ∈ C1/3 if and only if x = an 3−n for an ∈ {0, 2}, that is, C1/3 is the set
of numbers in [0, 1] that can be represented in base 2 using only the digits 0 and 2).
Incidentally, the calculation above shows that the limit of φi1 ...in (y) → x also for all
y ∈ R, not only y ∈ [0, 1].
In the general setting, let Φ = {φi }i∈Λ is an IFS with contraction ρ on a complete metric
space (X, d). In this section we will show that an attractor exists. Our strategy is as
follows. Let 2X denote the space of compact, non-empty subsets of X. We introduce
the map Φe : 2X → 2X given by
[
e
Φ(A) = φi A
i∈Λ
69
e is a contraction
Then an attractor is precisely a fixed point of Φ. We will show that Φ
in an appropriately chosen complete metric on 2X ; then the existence and uniqueness of
e (respectively, attractor of Φ) follows fomr the contraction mapping
the fixed point of Φ
theorem.
The proof requires some preparation. Let (X, d) be a metric space. For ε > 0 write
d(x, A) = inf{d(x, a) : a ∈ A}
In general, d(x, A) 6= d({x}, A), for example if x ∈ A and |A| ≥ 2 then d(x, A) = 0 but
d({x}, A) > 0.
If (X, d) is complete, then a closed set A is compact if and only if it is totally
bounded, i.e. for every ε > 0 there is a cover of A by finitely many sets of diameter ε.
The proof is left as an exercise.
1. dH is a metric on 2X .
T∞
2. If An ∈ 2X and A1 ⊇ A2 ⊇ . . . then An → n=1 An
70
T
(2) Suppose An are decreasing non-empty compact sets and let A = An 6= ∅.
Obviously A ⊆ An so for every ε > 0 we must show that An ⊆ A(ε) for all large
enough n. Otherwise, for some ε > 0, infinitely many of the sets A′n = An \ A(ε) would
be non-empty. Re-numbering we can assume all are non-empty. This is a decreasing
T
sequence of compact sets so A′ = ∞ ′ ′
n=1 An 6= ∅. But then A ⊆ X \ A
(ε) and also
T T
A′ = ∞ ′
n=1 An ⊆
∞
n=1 An = A, which is a contradiction.
(3) Suppose now that (X, d) is complete and An ∈ 2X is a Cauchy sequence. Let
[
An,∞ = Ak
k≥n
We claim that An,∞ are compact. Since An,∞ is closed and X is complete, we need
only show that it is totally bounded, i.e. that for every ε > 0 there is a cover of An,∞
by finitely many ε-balls. To see this note that, since {Ai } is Cauchy, there is a k such
(ε/4)
that Aj ⊆ Ak for every j ≥ k. We may assume k ≥ n. Now by compactness we
Sk
can cover j=n Aj by finitely many ε/2-balls. Taking the cover by balls with the same
(ε/2)
centers but radius ε, we have covered Ak as well, and therefore all the Aj , j > k.
Thus An,∞ is totally bounded, and so compact.
T∞
The sequence An,∞ is decreasing so An,∞ → A = n=1 An,∞ . Since An is Cauchy,
it is not hard to see from the definition of An,∞ that d(An , An,∞ ) → 0. Hence An → A.
(4) Suppose An → A. If A′ denotes the set of accumulation points of sequences
S
an ∈ An , then An,∞ = A′ ∪ k≥n Ak so A′ ⊆ A. The reverse inequality is also clear, so
A = A′ .
(5) Suppose that X is compact. Let ε > 0 and let Xε ⊆ X be a finite ε-dense set
of points. One may then verify without difficulty that 2Xε is ε-dense in 2X , so 2X is
totally bounded. Being complete, this shows that it is compact.
Theorem 9.4. Let Φ = {φi }i∈I be an iterated function system on a complete metric
space X. Then there exists a unique compact set K ⊆ X such that
[
K= φi K
i∈Λ
T∞ e n
2. If φi E ⊆ E for every i ∈ Λ, then K = n=1 Φ E.
71
e is a contraction. Indeed, if dH (A, B) < ε then A ⊆ B (ε)
Let us first show that Φ
and B ⊆ A(ε) . Let φi has contraction ρi . Then
i∈Λ i∈Λ
e
and similarly Φ(B) e
⊆ Φ(A) e
(ρε) . Thus by definition, d(Φ(A), e
Φ(B)) ≤ ρε. Since ρ < 1,
e
we have shown that Φ has contraction ρ.
Existence and uniqueness of a fixed point for now follow from the contraction map-
e : 2X → 2X is a contraction. This proves existence
ping theorem using the fact that Φ
and uniquness of the attractor.
e n E ⊇ . . . is a decreasing
For the last part, note the by assumption E ⊇ ΦE ⊇ . . . ⊇ Φ
T∞ e n e n E = K.
sequence, hence by the above and Proposition 9.3, n=1 Φ E = lim Φ
Let Φ = {φi }i∈Λ be an iterted function system. We can describe the points x ∈ K by
associating to them a (possibly non-unique) “name” consisting of a sequence of symbols
from Λ. For i = i1 i2 . . . in ∈ Λn it is convenient to write
φi = φi 1 ◦ . . . ◦ φi n
72
and so the sequence φi1 ...in K is decreasing. In fact,
[
K= φi (K)
i∈Λ
[ [
= φi 1 ( φi1 (K))
i1 ∈Λ i1 ∈Λ
[
= φi1 ◦ φi2 (K)
i1 ,i2 ∈Λ
[
= φi (K)
i∈λ2
Definition 9.5. Fix n ∈ N. Then the sets φi (K) for i ∈ Λn are called the n-th
generation cylinders of K; they are compact and their union is K.
and, in fact, this holds for any y ∈ X since d(φi1 ...in x, φi1 ...in y) ≤ ρn d(x, y).
The order in which we apply the maps φi1 , φi2 , . . . is important for the conclusion
that limφi1 ...in (y) exists. If we were to define yn = φin ◦ . . . ◦ φi1 (x) instead, then in
general yn would not converge. For example, in Cα with the maps φ0 , φ1 , note that
φim in−1 ...i1 (0) belongs to [0, ρ] or [1 − ρ, 1] depending on whether in = 0 or 1, so if we
take the sequence (in ) = (0, 1, 0, 1, 0, 1, . . .) then φin ...i1 (0) will alternately be in [0, ρ]
and [1 − ρ, 1], and will not converge.
Having defined the map Φ : ΛN → K we now study some of its properties. Recall
that for i, j ∈ ΛN ,
73
Proof. Fix x ∈ K. For n > N ,
d(φi1 ...in x, φj1 ,...,jn y) = d(φi1 ...iN (φiN +1 ,...in x), φi1 ...iN (φjN +1 ,...jn x))
< ρN · d(φiN +1 ,...in x, φjN +1 ,...jn x)
< ρN · diam K
as claimed.
Recall that given i = i1 . . . ik ∈ Λk , the cylinder set [i] ⊆ ΛN is the set of infinite
sequences extending i, that is,
[i1 . . . ik ] = {j ∈ ΛN : j1 . . . jk = i1 . . . ik }
Proof. An n-cylinders of K are the sets φi (K) for i ∈ Λn . Now, for a fixed y ∈ X,
= Φ([i])
as claimed.
ej : ΛN → ΛN denote the map (i1 i2 . . .) 7→ (ji1 i2 . . .). It is clear that this map
Let φ
is continuous (in fact it has contraction 1/2).
as claimed.
74
The following observation may be of interest. Given IFSs Φ = {φi }i∈Λ and Ψ =
{ψi }i∈Λ on spaces (X, d) and (Y, d) and with attractors KX , KY , respectively, define a
morphism to be a continuous onto map f : KX → KY such that f φi = ψi f . Then what
e = {φ
we have shown is that there is a unique morphism from the IFS Φ ei }i∈Λ on ΛN to
any other IFS.
This is a closed set supporting the measure int he sense that µ(X \ supp µ) = 0, and is
the smallest closed set with this property (in the sense of inclusion).
Theorem 9.9. Let p = (pi )i∈Λ be a probability vector. Then there exists a unique Borel
probability measure µ on K satisfying
X
µ= pi · φi µ
i∈Λ
because on the right hand side, all summands give mass zero to sequences beginning
with i0 except for the term pi0 · φ
ei0 µ whose weight is pi0 , and all terms agree on the
later coordinates and are equal to the product measure.
Let µ = Φe
µ be the projection to K. Applying Φ to the identity above and using the
ei = φi Φ gives the desired identity for µ.
relation Φφ
For uniqueness, suppose that µ satisfies the desired relation on K. Then we can lift
µ to a measure µ e0 on ΛN such that Φe µ0 = µ (see the Appendix). Now µ e0 need not
P
satisfy the analogous relation, but we may define µ e1 = i∈Λ pi · φ ei µ
e0 , and note that
P
Φeµ1 = µ. Continue to define µ e2 = i∈Λ pi · φ
ei µ
e2 , etc., and each of these measures
satisfies Φe en → µ
µn = µ. Each of these measures is mapped by Φ to µ, but µ e in the
e is the product measure with marginal p. Since Φ is continuous the
weak sense, where µ
relation Φe
µn = µ passes to the limit, so µ = Φe
µ. This establishes uniqueness.
Finally, note that for a compactly supported measure ν and continuous function f
P
we have supp f ν = f supp ν. Thus the relation µ = pi · φi µ and positivity of p implies
75
that
[ [
supp µ = supp φi µ = φi supp µ
i∈Λ i∈Λ
The proof uses that fact that every accumulation point of the averages above con-
verge to a p-stationary measure, which, by uniqueness, must be µ. We do not prove
this in this course.
Definition 10.1. Let Φ = {φi }i∈Λ be an IFS and let ri denote the contraction ratio of
φi . The similarity dimension of Φ = {φi }i∈Λ , denoted dims Φ, is the unique solution
of the equation
X
ris = 1
When K is the attractor of an IFS Φ, we shall often write dims K instead of dims Φ.
This is ambiguous because there can be multiple IFSs with the same attractor, but this
should not cause ambiguity.
In order to study the dimension of a set one needs to construct efficient covers of
it. Since the attractor K of an IFS can be written as unions of the sets φi! ...in K , these
sets are natural candidates. Recall that the cylinder φi K for i ∈ Λn is the image of
the cylinder [i1 , . . . , in ] ⊆ ΛN via the symbolic coding map Φ. But note that, while the
level-n cylinder sets in ΛN are disjoint, this is not generally true for cylinders of K.
S
Let Λ∗ = ∞ n
n=0 Λ denote the set of finite sequences over Λ (including the empty
sequence ∅, whose associated cylinder set is [∅] = ΛN ). A section of Λ∗ is a subset
76
S ⊆ Λ∗ such that every i ∈ ΛN has a unique prefix in S. It is clear that, if S is a
section, then the family of cylinders {[s] : s ∈ S} is a pairwise disjoint cover of ΛN , and
conversely any such cover corresponds to a section.
Theorem 10.2. Let K be the attractor for an IFS Φ with contraction ρ on a complete
metric space (X, d). Then dimM K ≤ dims K.
Proof. Let D = diam K. For r > 0 let Sr ⊆ Λ∗ denote the set of the finite sequences
i = i1 . . . ik such that
1
ri = ri 1 · . . . · ri k < r ≤ ri1 · . . . · rik−1
D
ρs · (r/D)s ≤ µ
e([a]) < (r/D)s
It follows that
Ds −s
e([a]))−1 ≤
N (K, r) ≤ |Sr | ≤ (min µ ·r
i∈Sr ρs
Thus
log N (K, r)
dimM K = lim sup ≤s
r→0 log(1/r)
77
as claimed.
The theorem gives an upper bound dimM K ≤ dims K. In general the inequality is
strict, but there is one important case where it holds, namely when the IFS consists
of similarities. Recall that a similarity is a map that satisfies d(f (x), f (y)) = r ·
d(f (x), f (y)) for a constant r > 0. One can show that every similarity of Rd is a linear
map of the form f : x 7→ rU x + a, where r > 0, U is an orthogonal matrix, and a ∈ Rd .
If we assume that 0 < r < 1 ten f is a contraction and r is its contraction ratio.
Examples of self-similar Cantor sets include the middle-α Cantor set which we saw
above, and also the famous Sierpinski gasket and sponge and the Koch curve.
It is also necessary to impose some assumptions on the global properties of Φ. We
mention two such conditions.
2. Φ satisfies the open set condition if there is a non-empty open set U such that
φi U ⊆ U and φi U ∩ φj U = ∅ for distinct i, j ∈ Λ.
Strong separation implies the open set condition, since one can take U to be any
sufficiently small neighborhood of the attractor. The IFS given above for the middle-α
Cantor set satisfy strong separation when α > 0. The IFS Φ = {x 7→ 21 x, x 7→ 1
2 + 12 x}
satisfies the open set condition with U = (0, 1), but not strong separation, since the
1
attractor is [0, 1] and its images intersect at the point 2. This example shows that
the open set condition is a property of the IFS rather than the attractor, since [0, 1] is
also the attractor of Φ′ = {x 7→ 23 x, x 7→ 1
3 + 23 x}, which does not satisfy the open set
condition.
Proof. Let ri be the contraction ratio of φi and s = dims Φ. For r > 0 define the section
Sr ⊆ Λ∗ and the measure µ
e on ΛN as in the proof of Theorem 10.2. These were chosen
e[a] ≤ rs and |φa K| ≤ rs for a ∈ Sr . We shall prove the following claim:
so that µ
Claim 10.6. For each r > 0 and x ∈ Rd the ball Br (x) intersects at most O(1) cylinder
sets φa K, a ∈ Sr .
78
Once this is proved the theorem follows from the mass distribution principle for the
µ, since then for any x ∈ Rd ,
measure µ = Φe
e(Φ−1 Br (x))
µ(Br (x)) = µ
X
≤ e[a]
µ
a∈Sr : φa K∩Br (x)̸=∅
= O(1) · r s
To prove the claim, let U 6= ∅ be the open set provided by the open set condition, and
note that φa U ∩ φb U = ∅ for a, b ∈ Sr (we leave the verification as an exercise). Fix
some non-empty ball D = Br0 (y0 ) ⊆ U and a point x0 ∈ K and write
δ = d(x0 , y0 )
D = diam K
D = {D : a ∈ Sr and Da ∩ Br (x) 6= ∅}
We must bound |D| from above. By definition of Sr , the radius ra of the ball Da =
φa D ∈ D satisfies
ρr0 r < ra ≤ r0 r
d(x, ya ) ≤ r + rD + rδ
so
Da = Bra (ya ) ⊆ Br(1+D+δ+r0 ) (x)
Both of these balls have volume O(1)rd , and the balls Da ∈ D are pairwise disjoint;
thus |D| = O(1), as desired.
To what extent is the theorem true without the open set condition? We can point
to two cases where the inequality dim K < dims K is strict. First, it may happen that
dims K > d, whereas we always have dimM K ≤ d, since K ⊆ Rd . Such an example is,
79
for instance, the system x 7→ 2x/3, x 7→ 1 + 2x/3. The second trivial case of a strong
inequality is when there are “redundant” maps in the IFS. For example, let φ : x 7→ x/2
and Φ = {φ, φ2 }. Then K = {0} is the common fixed point of φ and φ2 , so dimM K = 0,
whereas dims K > 1. More generally,
Definition 10.7. An IFS Φ = {φi }i∈Λ has exact overlaps if there are distinct se-
quences i, j ∈ Λ∗ such that φi = φj .
Conjecture 10.8. If an IFS on R does not have exact overlaps then its attractor K
satisfies dim K = min{1, dims Φ}.
This conjecture is still not resolved, but some things are known; we will return to
them later in the course. In dimensions d ≥ 2 it is false as stated, but an analogous
conjecture is open.
Exercises
1. Show that if {φi }i∈Λ is an IFS in a complete metric space, then there is a closed
ball B 6= ∅ such that φi B ⊆ B for all i ∈ Λ.
3. Show that if K is the attractor of an IFS {φi }i∈Λ and let S be is a section of the
tree Λ∗
80
(a) Show that
[
K= φi1 ...iℓ (K)
i1 ...iℓ ∈S
1
φ1 (x) = x
10
1 9
φ2 (x) = x +
10 10
1 9
φ3 (x) = x +
10 100
11 Entropy
11.1 The entropy function
Let (X, B, µ) be a probability space. A partition of X is a countable collection A of
pairwise disjoint measurable sets whose union has full measure (this really should be
called a partition modulo µ, but we omit this by convention).
Given a partition A, how can we quantify how spread out a measure µ is among the
atoms (or, conversely, how concentrated it is on a small number of atoms?). We could
count the number of sets A ∈ A of positive mass, but this is very crude, since it ignores
how mass is distributed. For example, in a partition with two sets the sets might both
have mass 1/2, or one could have mass 0.9999 and the other mass 0.0001. The first of
these is spread evenly among the elements of the partition; the second, much less. The
purpose of entropy is quantify this distinction.
81
By convention the logarithm is taken in base 2 and 0 log 0 = 0. For infinite partitions
Hµ (A) may be infinite.
Observe that Hµ (A) depends only on the probability vector (µ(A))A∈A . For a
probability vector p = (pi ) it is convenient to introduce the notation
X
H(p) = H(p1 , p2 , . . .) = − pi log pi
i
Examples
82
Proposition 11.2 (Propertis of entropy). (E1) 0 ≤ H(µ, A) ≤ log |A|, and
Proof. We first prove (E2). Since f (t) = −t log t is strictly concave, by Jensen’s in-
equality,
X
H(αµ + (1 − α)ν, A) = f (αµ(A) + (1 − α)ν(A))
A∈A
X
≥ (αf (µ(A)) + (1 − α)f (ν(A)))
A∈A
= αH(µ, A) + (1 − α)H(ν, A)
For a set B of positive measure, let µB denote the conditional probability measure
µB (C) = µ(B ∩ C)/µ(B). Note that for a partition B we have the identity
X
µ= µ(B) · µB (7)
B∈B
83
This is just the average over B ∈ B of the entropy of A with respect to the conditional
measure on B.
A ∨ B = {A ∩ B : A ∈ A , B ∈ B}
(E5) H(µ, A∨B) ≤ H(µ, A)+H(µ, B) with equality if and only if A, B are independent.
Equivalently, Hµ (B|A) ≤ H(B) with equality if and only if A, B are independent.
H(µ, A ∨ B) =
X
= − µ(A ∩ B) log µ(A ∩ B)
A∈A,B∈B
X X µ(A ∩ B) µ(A ∩ B)
= µ(A) − log − log µ(A)
µ(A) µ(A)
A∈A B∈B
X X X X
= − µ(A) log µ(A) µA (B) − µ(A) µA (B) log µA (B)
A∈A B∈B A∈A B∈B
The inequality in (E4) follows from (E3) since H(µ, B|A) ≥ 0; there is equality if and
only if H(µA , B) = 0 for all A ∈ A with µ(A) > 0. By (E1), this occurs precisely when,
on each A ∈ A with µ(A) 6= 0, the measure µA is supported on a single atom of B,
which means that A refines B up to measure 0.
αη(B) (1−α)θ(B)
For (E2’), let µ = αη+(1−α)θ. For B ∈ B let βB = µ(B) . Then (1−βB ) = µ(B)
and
µB = βB ηB + (1 − βB )θB
84
hence
H(µ, A|B) =
X
= µ(B)H(µB , B) by definition
B∈B
X
≥ µ(B) (βB H(ηB , A) + (1 − βB )H(θB , A)) by concavity (E2)
B∈B
X
= (αη(B) · H(ηB , A) + (1 − α)θ(B) · H(θB , A))
B∈B
It is clear that if A, B are independent there is equality. To see this is the only way it
occurs, one again uses strict convexity of H(p), which shows that the independent case
is the unique maximizer.
Proof. For (1) expand both sides using (E3). For (2) use (1), noting that C = C ∨ B
since C refines B.
85
Observe that A refines B precisely when A <1 B.
The following lemma will be used extensively later in calculations to replace parti-
tions with more convenient ones.
Proof. If A <k B then for every B ∈ B the partition A has k atoms mode µB so
H(µB , A) ≤ log k. Then the first bound follows from the definition of conditional
entropy.
Assuming A =k B, by the chain rule for entropy and the first part of the lemma,
as claimed.
Ta x = x + a
St (x) = tx
86
denote the operations of translation and scaling.
First note that for any measure µ on Rd , and partition A of Rd and any map
f : Rd → Rd , writing f −1 A = {f −1 A}A∈F for the pull-back of a partition, we have
X
H(f µ, A) = − µ ◦ f −1 (A) log µ ◦ f −1 (A)
A∈A
X
=− µ ◦ (f −1 A) log µ(f −1 A)
A∈A
= H(µ, f −1 A)
2. For t > 0,
Note that for simplicity of notation we work with partitions Dn instead of D2n but
of corse the former includes the latter as a special case.
Proof. For (1), let f be an isometry, and note that Dk and f Dk are Od (1)-commensurable,
giving the first statement.
For (2) we note that each D ∈ Dn intersects at most Od (max{t, t−1 }) atoms of
St−1 Dn and vice versa, so they are commensurable with this constant; hence
Similarly, we may note that St−1 Dn and D[tn] are O(1)-commensurable, with analogous
result.
Proof. This follows since, modulo µ, the partitions D2m and the trivial partition are
commensurable.
87
11.4 Entropy and dimension
so
1
0≤ H(µ, 2−n ) ≤ d
n
The same bound holds if µ is supported on any dyadic interval of length 1. More
generally, if µ is compactly supported then it gives mass toa finite number L of diadic
intervals in D0 , so
1 1 1
H(µ, 2−n ) = H(µ, 2−n |20 ) + H(µ, 20 )
n n n
X 1 1
= µ(D) H(µD , 2−n ) + log L
n n
D∈D2n
so asymptotically 1 −n )
n H(µ, 2 is in the range [0, d]. In this and many other ways,
1 −n )
n H(µ, 2 behaves asymptotically like dimension. In fact, we give it a name:
1
dime µ = lim H(µ, 2−n )
n→∞ n
assuming the limit exists. We define the upper and lower entropy dimensions using
lim sup and lim inf, respectively; these are always defined and the entropy dimension is
defined when they are equal, in which case all three are the same.
1
dim µ ≤ lim inf H(µ, 2−n )
n→∞ n
88
Furthermore, if for some α ≥ 0 we have
Then
1
lim H(µ, 2−n ) = α
n→∞ n
dim(µ, x) ≥ α
µ-a.e.
As usual let D2n denote the dyadic partition and recall that D2n (x) is the unique
element ot D2n cotnaining x. Then by Proposition 6.6
1
lim inf − log µ(D2n (x)) ≥ α µ-a.s.
n→∞ n
But
Z
1 1 X
log µ(D2n (x)) = µ(D) log µ(D)
n n
D∈Dn
1
= H(µ, 2−n )
n
log µ(D)
− <α+ε for D ∈ Dnε
n
89
Write En = ∪Dnε . Then
(
1
µ(En ) µ(D) D ∈ Dnε
µEn (D) =
0 otherwise
so
1 1 X
H(µEn , 2−n ) = − µEn (D) log µEn (D)
n n
D∈Dn
X log (µ(D)/µ(En ))
=− µEn (D)
ε
n
D∈Dn
X
log µ(En )
< µEn (D) (α + ε) +
ε
n
D∈Dn
<α+ε
1
lim sup H(µ, 2−n ) ≤ α + O(ε)
n→∞ n
so 0 < ρ < 1.
90
For each infinite sequence ω ∈ ΛN there is a minimal k = k(ω) such that
Λm = {ω1 . . . ωk(ω) | ω ∈ ΛN }
and
X
µ= pi · f i µ
i∈Σ
The proof is by induction on the height of the section (the maximal length of a word
in Σ). We leave it as an exercise.
am+n ≤ am + an
1
Proof. We prove this in case the sequence n an is bounded below (this is the case in
our application to entropy). When it is not bounded the proof is similar. Then we can
define the real number
1
α = inf an
n∈N n
Let ε > 0 and let n0 be such that an0 /n0 < α + ε. For any n ≥ n0 write n = kn0 + r
91
with 0 ≤ r < n0 . Then
an ≤ an−n0 + an0
≤ an−2n0 + 2an0
...
≤ ar + kan0
1
= ar + kn0 · an
n0 0
Writing c = max{a0 , . . . , an0 −1 }, noting that k ≤ n/n0 , and using an0 /n0 < α + ε we
conclude that
an < c + n(α + ε)
dividing by n we have
1 c
an ≤ α + ε +
n n
so lim sup n1 an ≤ α + ε and since ε > 0 is arbitrary, lim sup n1 an ≤ α. Of course
lim inf n1 an ≥ α since α is the infimum of the sequence, and we conclude that lim n1 an =
α.
P
Proof. Let µ = i∈Λ pi · fi µ and write
αn = H(µ, 2−n )
92
Now,
X
H(µ, 2−(m+n) |2−m ) ≥ pi · H(fi µ, 2−(m+n) |2−m )
i∈Λm
X
≥ pi · H(fi µ, 2−(m+n) ) + O(1)
i∈Λm
!
X
−(m+n)
= pi · H(fi µ, 2 ) + O(1)
i∈Λm
where in the first inequality we used concavity, and in the second we used Lemma ??.
Next, observe that for i ∈ Λm we have ρ2−m ≤ kfi k ≤ 2−m , so by Lemma ??,
= αm + αn + O(1)
Let C > 0 denote the constant bounding the term O(1) above from both sides. Then
βn = αn − C satisfies
βm+n = αm+n − C
≥ αm + αn + O(1) + C
= (αm − C) + (αn − C) + (O(1) + C)
≥ βm + βn
We finish this section with an important estimate for the entropy dimension of a
self-similar measure. We first need a definition.
P
Definition 11.16. Let µ = i∈Λ pi · fi µ be a self-similar measure with fi = ri Ui + ai .
Then the Lyapunov exponent of µ is
X
λ(µ) = pi log ri
i∈Λ
93
Note that λ(µ) is negative. The Lyapunov exponent describes the average contrac-
e = pN denote the product measure on symbolic
tion of the system: indeed, letting µ
space (so µ = πe
µ),
1 1
log kfω1 ...ωn k = log rω1 rω2 . . . rωn
n n
1
= (log rω1 + log rω2 + . . . + log rωn )
n
→ λ(µ)µ-a.e. ω
P
Proposition 11.17. Let µ = i∈Λ pi · fi µ be a self-similar measure. Then
H(p)
dim µ ≤ dime µ ≤
−λ(µ)
Proof. We only need to prove the right-hand inequality since the left one holds in
general. For n ∈ N let k(n) = [n/(−λ(µ))], so that fω1 ...ωk(n) = 2−n(1+o(1)) . Let Ek
denote the partition of ΛN into k-cylinders. We have
µ, Ek(n) ) = k(n)H(p)
H(e
94
Since
fω1 ...ωk(n) = 2k(n)(λ(µ)+o(1)) = 2−n(1+o(1))
we have
H(fω1 ...ωk(n) µ, 2−n ) = o(n)
Hence
Z
1 1
µ, π −1 D2n |Ek(n) ) =
H(e H(fω1 ...ωk(n) µ, 2−n )de
µ(ω)
n n
Z
= o(1)de
µ
Also it is easy to see the integrand is bounded, so by bounded convergence the last
integral is o(1). Putting everything together we have
1 1
H(µ, 2−n ) = k(n)H(p) + o(1)
n n
H(p)
= + o(1)
−λ(µ)
as required.
So the theorem above says that dime µ ≤ s. This is the same upper bound we got for
the dimension of the attractor, and one can show (e.g. using lagrange multipliers) that
this probability vector maximizes −H(p)/λ(p) over all product measures. Thus, if for
this measure we show that dime µ = −H(p)/λ(p) then we will have proved that
95
12 Components and multiscale formula for entropy
Given a probability measure µ and set A with µ(A) > 0, recall that the conditional
1
measure on A is µA = µ(A) µ|A .
Note that µx,n is supported on D2n (x). One can identify µx,n with the measure on a
sub-tree of the weighted dyadic tree reprenting µ. The node corresponds to the first n
binary digits of x.
Definition 12.2. For a probability measure µ and a finite set U ⊆ N of “levels”, the
component distribution is the probability distribution on components µx,n given by
choosing n ∈ U uniformly, and independently choosing x according to µ.
One should think of this as choosing a random node in the tree representation of
µ. Note that it is not the uniform distribution on nodes; the uniform dustribution is
skewed very strongly towards the leaves. The component distribution is uniform on (a
set of) levels, and in each level it chooses nodes according to µ.
Whenever µx,n (or similar symbols) appear inside the symbols E(. . .) or P(. . .), they
represent random variables chosen according to the component distribution. The set U
is indicated as necessary; if it is not indicated then the index n in µx,n is fixed. For
example, if A ⊆ P([0, 1]) is a set of measures (e.g. the set of purely atomic measures)
then
and
1 X
N
P0≤n≤N (µx,n ∈ A) = µ(x : µx,n ∈ A)
N +1
n=0
N Z
1 X
= 1A (µx,n ) dµ(x)
N +1
n=0
96
Similarly for a function f : P([0, 1]) → R,
Z
E(f (µx,n )) = f (µx,n ) dµ(x)
and Z
1 X
En∈U (f (µx,n )) = f (µx,n ) dµ(x)
|U |
n∈U
etc. Lastly, if two random variables µx,n , νy,n appear in the same expression they are
assumed that x, y are chosen independently.
µ = E(µx,n )
P
Indeed this is just another way of writing µ = I∈D2 µ(I) · µI , which in turn follows
P
from the trivial decomposition µ = I∈D2n µ|I . Second,
Lemma 12.4. For any probability measure µ, any n ∈ N, and any partition A of R,
1 1 X 1 m
H(µ, 2−n ) = H(µ, D2i+m |D2i ) + O( )
n n m n
0≤i≤n
97
Let k = [n/m]. For every 0 ≤ u < m,
_
k
H(µ, D2u+mk ) = H(µ, D2u+im )
i=0
X
k
= H(µ, D2u ) + H(µ, D2u+(i+1)m |D2u+im ) (10)
i=1
1 X1
m
1 m
H(µ, 2−n ) = H(µ, D2u+mk ) + O( )
n m n n
u=1
1 Xm
1 X m X
k−1
1 m
= H(µ, D2u ) + H(µ, D2u+(i+1)m |D2u+im ) + O( )
m n m n
u=1 u=1 i=0
X 1 m
= H(µ, D2(i+1)m |D2im ) + O( )
m n
0≤i≤n
as claimed.
13 Additive combinatorics
We shift focus temporarily to describe results from the field of additive combinatorics.
A + B = {a + b : a ∈ A , b ∈ B}
A + B = π(A × B)
98
Additive combinatorics, or at least an important chapter of it, is devoted to the study
of sumsets and the relation between the structure of A, B and A + B.
The so-called inverse problem asks, what structure we can deduce for sets A, B
such that A + B is “small” relative to the sizes of the original sets. The general flavor
of results of this kind is that, if the sumset is small, there must be an algebraic reason
for it. It will become evident later that this question comes up naturally in the study
of self-similar sets.
The first inequality is an equality if and only if at least one of the sets is a singleton. The
right-hand inequality occurs precisely when each c ∈ A + B has a unique representation
as a + b for a ∈ A, b ∈ B (equivalently, π|A×B is injective).
The equality |A + B| = |A||B| can occur. For example for any b, n consider
As another example, for “generic” pairs of sets one has |A + B| ∼ |A||B|. For
instance, when A, B ⊆ {1, . . . , n} are chosen randomly by including each 1 ≤ i ≤ n in
A with probability p and similarly for B, with all choices independent, there is high
probability that |A + B| ≥ c|A||B|. The question becomes, what can be said between
these two extremes.
This discussion motivates us to consider A + B to be “small” if |A + B| |A||B|.
|A + A| ≤ C|A| (12)
Here C is a constant, and where we think of A as large relative to C. Such sets are said
to have small doubling.
There are a number of simple examples in which small doubling occurs.
99
1. Consider A = {1, . . . , n}d ⊆ Zd . Then
2. Example (1) can be pushed down from dimension d to any lower dimension as
follows. For i = 1, . . . , k, take intervals of integers Ii = {1, 2, . . . , ni }, and let
T : Zk → Zd be an affine map given by integer parameters, that is T x = Ax + b
for an integer matrix A and integer vector b. Suppose that T is injective on
I = I1 × . . . × Ik . Then A = T (I) ⊆ Zd satisfies
3. Finally, for any set with mall doubling one can pass to large subsets. Begin with
a set A satisfying |A + A| ≤ C|A| (e.g. a GAP) and choose any A′ ⊆ A has
cardinality |A′ | ≥ D−1 |A| for some D > 1. Then
One of the central results of additive combinatorics is Freiman’s theorem, which says
that, remarkably, these three procedures give all sets with small doubling.
For more information see [?, Theorem 5.32 and Theorem 5.33].
Combined with some standard arguments (e.g. the Plünnecke-Rusza inequality), the
symmetric version leads to an asymmetric versions: assuming A, B ⊆ Zd and C −1 ≤
|A|/|B| ≤ C, if |A + B| ≤ C|A| then A, B are contained in a GAP P of rank and ≤ C ′
and size |P | ≤ C ′ |A|, with similar bounds on the constants.
100
for X ⊆ R. Indeed, given X ⊆ R and n ∈ N let Xn denote the set obtained by
replacing each x ∈ X with the closest point k/2n , k ∈ Z. Then |Xn | ∼ 2n(dimM X+o(1))
and |Xn + Xn | ∼ 2n(dimM (X+X)+o(1)) for large n, so (14) is equivalent to |Xn + Xn | ≲
|Xn |1+o(1) .
Here is a representative example of a set satisfying (13). Write In = {0, . . . , n − 1}
and let
Xn
1
An = I 2i
i=1
2i2
X
n
ai 2−i : 1 ≤ ai ≤ 2i }
2
= {
i=1
Pn −i2
Each term in the sum i=1 ai 2 determines uniquely a distinct block of binary digits
(the i-th term determines the digits at positions i2 − i to i2 ). Thus every element in An
has a unique representation as such a sum, so An is a GAP, being the injective image
P 1
of I2 × I4 × . . . × I2n by the map (x1 , . . . , xn ) 7→ x . The rank is n, so
i2 i 2
|An + An | ≤ 2n |An |
Since
Y
n ∑n
|An | = |In | = 2 i=1 i
= 2n(n+1)/2
i=1
we conclude
|An + An | = |An |1+o(1) as n → ∞
Do all examples of (14) look essentially like this one? One could try to answer this
using Freiman’s theorem, which applies with C = |A|δ . But all that one gets is that A
is a |A|O(δ) -fraction of a GAP or rank |A|O(δ) , and this gives rather coarse information
about A (note that, trivially, every set is a GAP of rank |A|).
Instead, it is possible to apply a multi-scale analysis, showing that at some scales
the set looks quite “dense” and at others quite “sparse”. See Theorem 13.6 below.
13.5 Convolution
The inverse theorem that we soon present is stated in the language of measures, instead
of sets. The measure-theoretic analog of the sumset operation is convolution.
101
Thus, µ ∗ ν is characterized by the property that for f ∈ C0 (R),
Z Z Z
f dµ ∗ ν = f (x + y) dµ(x) dν(y) for f ∈ C0 (R)
µb (A) = µ(A − b)
1. (µ, ν) 7→ µ ∗ ν is multilinear.
2. µ ∗ ν = ν ∗ µ .
3. µ ∗ (ν ∗ τ ) = (µ ∗ ν) ∗ τ .
R
4. µ ∗ ν = µy dν(y), and in particular, µ ∗ δb = µb .
Proof. (1)-(3) may be verified easily from the definition. For (4),
102
Note that we only have an inequality, not an equality as we had in the corresponding
expression for the entropy of one measure.
µ ∗ ν = π(µ × ν)
= Ei=k (π(µx,i × νx,i ))
= Ei=k (µx,i ∗ νx,i )
By concavity of entropy,
1 1
H(µ ∗ ν, 2−(k+m) |2−k ) ≥ Ei=k ( H(µx,i ∗ νy,i , 2−(i+m) |2−i ))
m m
The measure µx,i ∗ νy,i = π(µx,i × νx,i ) has diameter O(2−i ), so we can remove condi-
tioning at scale 2−i with an O(1) error term, which after normalization is O(1/m):
1 1
≥ Ei=k ( H(µx,i ∗ νy,i , 2−(i+m) )) + O( )
m m
max{H(µ, 2−n ), H(ν, 2−n )} − O(1) ≤ H(µ ∗ ν, 2−n ) ≤ H(µ, 2−n ) + H(ν, 2−n ) + O(1)
103
For any y ∈ R we have
On the other hand, writing π1 , π2 for the coordinate projections, we have D22n =
π1−1 D2n ∨ π2−1 D2n and these partitions are independent for the product measure µ × ν,
hence
Inserting the last two bounds into the equation preceding them gives the second part
of the lemma.
For µ ∈ P([0, 1]), recall that the maximal value of 1 −n ) is ≈ 1, and that it
n H(µ, 2
is achieved (or nearly achieved) when µ is uniformly distributed (or nearly so) on the
atoms of D2n that meet [0, 1].
Similarly, 1 −n )
n H(µ, 2 if µ is “mostly concentrated on a small number of atoms”.
Observe that if µ is of one of the two types above then µ ∗ ν with have essentially
the same scale-n entropy as µ for every measure ν ∈ P([0, 1]). The following theorem
says that if µ ∗ ν is “not much bigger” than µ (in entropy terms), then a converse holds
104
“with high probaility on the component measures”: One can split the scales into two
kinds, the first where components of µ are with high probability close to uniform, and
those at which the components of ν are with high probability close to atomic, and that
these two types of scales cover almost all scales between 0 and n.
Theorem 13.6. For every ε > 0 and m > 0 there is a δ > 0 such that for all large
enough n the following holds. For any measures µ, ν ∈ P([0, 1]), if
1 1
H(µ ∗ ν, 2−n ) ≤ H(µ, 2−n ) + δ
n n
1
Pi∈I ( H(µx,i , 2−(i+m) ) > 1 − ε) > 1 − ε
m
1
Pj∈J ( H(µx,j , 2−(j+m) ) < ε) > 1 − ε
m
Corollary 13.7. Let τ > 0 be fixed and suppose ε < 41 τ . If, in the inverse theorem,
m, n are large relative to τ and if we know in addition that
1
H(ν, 2−n ) > τ
n
Using ε < 14 τ , and assuming as we may that the error term is < 41 τ , we rearrange and
get
1 1 1
|J| < 1 + τ −τ + τ
n 4 4
τ
=1−
2
105
Thus
1 1
|I| ≥ 1 − |J| − ε
n n
1
> τn
4
Fix an IFS Φ = {fi }i∈Λ with fi = rx + ai (for simplicity we are assuming that all maps
contract by the same amount r).
S P
Let X = i∈Λ fi X be a self-similar set and µ = i∈Λ pi · fi µ a self-similar measure
for p = (pi )i∈Λ .
We return now to the conjecture that we stated earlier, that
and
H(p)
dim µ = min{1, }
−λ(p)
unless there are exact overlaps, i.e. fi = fj for some distinct i, j ∈ Λ∗ . As we saw, if we
choose pi = ridims Φ (in our case, pi = 1/|∆| is uniform), then dims µ − dims X and so
the statement for X follows from the statement for µ.
We introduce a measure quantifying how far exact overlaps are from occurring:
Proof. The first statement is obvious since fi (0), i ∈ Λn all lie in the attactor.
For the second statement note that fi contracts by rn for all i ∈ Λn , so fi (0)
determines fi . Thus, if ∆n = 0 then exact overlaps occur. Conversely, if fi = fj for
distinct i ∈ Λk and j ∈ Λℓ , then neither i, j extend the other, for then fi , fj would have
different contraction ratios. Then ij 6= ji and ij, ji ∈ Λk+ℓ show that ∆k+ℓ = 0.
106
Definition 13.10. We say that Φ satisfies exponential separation (ES) if there
exists ρ > 0 with
∆n > ρ n
Theorem 13.11. If Φ has exponential separation, then dimM X = dims Φ and dime µ =
H(p)
min{1, −λ(p) }.
The result for sets follows from the result for measures, as explained above.
The theorem holds also for IFSs with non-uniform contraction but we focus for
simplicity on the simpler case above. We will show that this theorem follows from the
inverse theorem presented above. This involves showing how the assumption dime µ <
min{1, dims Φ} implies that there are convolutions of µ with measures of substantial
entropy for which no entropy growth occurs; and then showing that the fact that µ is
self-similar rules out the possibility that this can happen.
Let
c = − log2 r
so
fi µ = Tfi (0) Srm µ = Srm µ ∗ δfi (0)
107
Hence
X
µ= pi · f i µ
i∈Λm
X
= pi · Srm µ ∗ δfi (0)
i∈Λm
X
= Sr m µ ∗ pi · δfi (0)
i∈Λm ,
= Srm µ ∗ µ(m)
as claimed.
It will be convenient to define entropy at “scales” that are not powers of 2. Thus
we define for all t > 0,
H(µ, t) = H(µ, 2[log t] )
Next, we study the effect of convolving two measures “of different scales”.
Lemma 13.13. Let θ ∈ P([0, 1]) and ν ∈ P(R). Let t < s and assume that ν is
supported on a set of diameter O(s). Then
In particular,
H(θ ∗ ν, s) = H(θ, s)
H(θ ∗ ν, t) ≥ H(θ, s) + H(ν, t) − O(1)
108
Lemma ??,
Similarly,
Inserting this into the equation (18) gives the desired equality.
For the second identity apply the first with s = t to get
where we used the fact that both θx,i and ν are supported on sets of diameter O(s), so
the same holds for θx,i ∗ ν and hence H(θx,i ∗ ν, s) = O(1).
For the thirs identity, note that for every x, i we have H(θx,i ∗ ν, t) ≥ H(ν, t) + O(1).
Inserting this into the first identity in the lemma gives the third.
1 (m) , r m )
Corollary 13.14. limm→∞ log(1/r m ) H(µ = dime µ
Proof. Using the first part of the lemma and the identity µ = µ(m) ∗ Srm µ and the fact
that Srm µ is supported on a set of diameter O(rm ),
1 1 1
H(µ(m) , rm ) = H(µ, rm ) + O( )
log(1/rm ) log(1/rm ) log(1/rm )
→ dime µ
as required.
109
Corollary 13.15. For every k ∈ N,
1
lim Ei=[cm] H((µ(m) )x,i ∗ Srm µ, rkm ) = dime µ
m→∞ c(k − 1)m
Proof. Apply the previous lemma with θ = µ(m) , ν = Srm µ and with s = rm , t = rkm .
We get
Thus in the first part of the lemma, we have an average over components whose value
is within o(1) if the mean. Therefore for large m the second statement follows.
Lemma 13.16. If dime µ < dims Φ and if ∆n satisfies exponential separation, then
there is a constant τ > 0 and k ∈ N such that for all m ∈ N,
1
Pi=[cm] H((µ(m) )x,i , rkm ) > τ >τ
c(k − 1)m
H(p)
dims µ =
log(1/r)
H(p)
dime µ < dims µ − ε = −ε
log(1/r)
110
Then
Let k be such that ∆m > rkm for all m (this k depends on Φ but not on µ or m). Then
every partition into rkm -intervals separated points for µ(m) , and hence
Therefore
(m)
Ei=[cm] H(µx,i , rkm ) = H(µ(m) , rkm |rm )
1 (m) km
It follows that there exists τ = τ (ε) such that cm H(µx,i , r ) > τ with probability
> τ , as required.
H(p)
dime µ < min{1, }
log(1/r)
Let k, τ be as in previous lemma. Then we know from the lemma and the previous
corollary that for any δ > 0, as soon as m is large enough,
Pi=[cm] H((µ(m) )x,i , rkm ) > τ > τ
and
1
Pi=[cm] H((µ(m) )x,i ∗ Srm µ, rkm ) < (1 + δ) dime µ >1−δ
ckm
Taking δ < τ we can find a component ν ′ = (µ(m) )x,i belonging to both events above;
111
i.e.
1
H(ν ′ , rkm ) > τ
c(k − 1)m
1
H(ν ′ ∗ Srm µ, rkm ) < (1 + δ) dime µ
c(k − 1)m
Applying S1/rm to all measures above, and writing ν = S1/rm ν ′ and n = m(k − 1), we
have derived the following conclusion:
Corollary 13.17. Suppose that dime µ < dims Φ and that ∆n is exponentially separated.
Then there exists ℓ ∈ N and τ > 0 such that, for every δ > 0, for all sufficiently large
n, there exists ν = νn ∈ P([0, 1]) such that
1
H(ν, r n ) > τ
cn
1 1
H(µ ∗ ν, r n ) < H(µ, rn ) + δ
cn cn
By Theorem 13.6 and the corollary following it, this can only happen if for all m
and all sufficiently large n there exists I ⊆ {1, . . . , n} with |I| > 14 τ n and such that
1
Pi∈I ( H(µx,i , 2−(i+m) ) > 1 − ε) > 1 − ε
m
Lemma 13.19. Let Γ be a countable abelian group and let µ, ν ∈ P(Γ) be probability
measures with H(µ) < ∞, H(ν) < ∞. Let
This lemma above first appears in a study of random walks on groups by Kaı̆manovich
and Vershik [?]. It was more recently rediscovered and applied in additive combinatorics
by Madiman and his co-authors [?, ?] and, in a weaker form, by Tao [?], who later made
the connection to additive combinatorics. For completeness we give the short proof here.
112
Proof. Let X0 be a random variable distributed according to µ, let Zn be distributed
according to ν, and let all variables be independent. Set Xn = X0 + Z1 + . . . + Zn , so
the distribution of Xn is just µ ∗ ν ∗n . Furthermore, since G is abelian, given Z1 = g, the
distribution of Xn is the same as the distribution of Xn−1 + g and hence H(Xn |Z1 ) =
H(Xn−1 ). We now compute:
For the analogous statement for the scale-n entropy of measures on R we use a
discretization argument. For m ∈ N let
k
Mm = { : k ∈ Z}
2m
Pk
Proof. Let π : Rk → R denote the map (x1 , . . . , xk ) 7→ i=1 xi . Then µ1 ∗ . . . ∗ µk =
113
(m) (m)
π(µ1 × . . . × µk ) and µ1 ∗ . . . ∗ µk = π ◦ σm
k (µ × . . . × µ ) (here σ k : (x , . . . , x ) 7→
1 k m 1 k
(σm x1 , . . . , σm xk )). Now, it is easy to check that
|π(x1 , . . . , xk ) − π ◦ σm
k
(x1 , . . . , xk )| = O(k)
k
Hn (µ ∗ (ν ∗k )) ≤ Hn (µ) + k · (Hn (µ ∗ ν) − Hn (µ)) + O( ). (20)
n
H(e ν ∗k )) ≤ H(e
µ ∗ (e µ) + k · (H(e
µ ∗ νe) − H(e
ν )).
For n-discrete measures the entropy of the measure coincides with its entropy with
respect to Dn , so dividing this inequality by n gives (20) for µ
e, νe instead of µ, ν, and
without the error term. The desired inequality follows from Lemma 13.20.
1
Hm (µ ∗ ν) ≥ Hm (µ) − O( ).
m
R
Proof. This is immediate from the identity µ ∗ ν = µ ∗ δy dν(y), concavity of entropy,
and Lemma ?? (??) (note that µ ∗ δy is a translate of µ).
14 Appendix
14.1 Integration of measures
Let (X, B), (Y, C) be measurable spaces.
Let µ : Y → P(X, B) be a function mapping y ∈ Y to a measure µy ∈ P(X, B).
We say that µ is measurable if for every A ∈ B,
y 7→ µy (A)
is measurable as a function Y → R.
Given a meausre ν on (Y, C), we define a function µ : B → [0, ∞] by
Z
µ(A) = µy (A)dν(y)for A ∈ B
114
The integral is well-defined by integrability. This is a measure since
Z Z
µ(∅) = µy (∅)dν(y) = 0dν(y) = 0
Examples
P
1. If µ1 , µ2 . . . are measures on (X, B) then µn is a measure; it arises as above by
taking Y = N, ν =counting measure, and µ(n) = νn .
3. Let X = [0, 1]2 and let µx denote Lebesgue measure λ1 on the interval {x} × [0, 1]
(i.e. the push-forward of Lebesgue measure on [0, 1] to R2 via t 7→ (x, t)).
R
Let Y = [0, 1] with Lebesgue measure λ. Then µ = µx dλ(x) is 2-dimensional
Lebesgue measure λ2 on X, since for A ⊆ X,
Z Z
2
λ (A) = 1A (x, y)dλ1 (x)dλ1 (y) by Fubini
Z
= µx (A)dλ1 (x)
=µ
115
14.2 The weak-* topology
We defined convergence of measures on symbolic spaces. Below we summarize the
general case.
Definition 14.1. Let X be a compact metric space and P(X) the spoace of Borel
probability measure on X. The weak-* topology on P(X) is the weakest topology with
R
respect to which µ 7→ f dµ is continuous for every f ∈ C(X).
Proposition 14.2. Let X be a compact metric space. Then P(X) is metrizable and
compact in the weak-* topology.
Proof. Using the Stone-Weierstrass theorem fix a {fi }∞
i=1 a countable dense subset
{fi }∞
i=1 of the unit ball in C(X). Define a metric on P(X) by
∞
X Z Z
−i
d(µ, ν) = 2 | fi dµ − fi dν|
i=1
It is easy to check that this is a metric. We must show that the topology induced by
this metric is the weak-* topology.
R R
If µn → µ weak-* then fi dµn − fi dµ → 0 as n → ∞, hence d(µn , µ) → 0.
R R
Conversely, if d(µn , µ) → 0, then fi dµn → fi dµ for every i and therefore for every
linear combination of the fi s. Given f ∈ C(X) and ε > 0 there is a linear combination
g of the fi such that kf − gk∞ < ε. Then
Z Z Z Z Z Z Z Z
| f dµn − f dµ| < | f dµn − gdµn | + | gdµn − gdµ| + | gdµ − f dµ|
Z Z
< ε + | gdµn − gdµ| + ε
and the right hand side is < 3ε when n is large enough. Hence µn → µ weak-*.
Since the space is metrizable, to prove compactness it is enough to prove sequential
compactness, i.e. that every sequence µn ∈ P(X) has a convergent subsequence. Let
V = spanQ {fi }, which is a countable dense Q-linear subspace of C(X). The range of
each g ∈ V is a compact subset of R (since X is compact and g continuous) so for each
R
g ∈ V we can choose a convergent subsequence of gdµn . Using a diagonal argument
R
we may select a single subsequence µn(j) such that gµn(j) → Λ(g) as j → ∞ for every
g ∈ V . Now, Λ is a Q-linear functional because
Z
Λ(afi + bfj ) = k lim (afi + bfj ) dµn(k)
Z Z
= lim a fi dµn(k) + b fj dµn(k)
k→∞
= aΛ(fi ) + bΛ(fj )
116
Λ is also uniformly continuous because, if kfi − fj k∞ < ε then
Z
|Λ(fi − fj )| = lim (fi − fj ) dµn(k)
k→∞
Z
≤ lim |fi − fj |dµn(k)
k→∞
≤ ε
Thus Λ extends to a continuous linear functional on C(X). Since Λ is positive (i.e. non-
negative on non-negative functions), sos is its extension, so by the Riesz representation
R R R
theorem there exists µ ∈ P(X) with Λ(f ) = f dµ. By definition gdµ − gdµn(k) → 0
as k → ∞ for g ∈ V , hence this is true for the fi , so d(µn(k) , µ) → 0 Hence µn(k) → µ
weak-* .
(in the measurable case one requires this for measurable bounded functions, say). The
measure πµ is called the push-forward of µ and is sometimes denotes π∗ µ or π# µ.
117
Since the space of Borel probability measures on X is compact in the weak-* topol-
ogy, by passing to a subsequence we can assume µn → µ. Clearly µ is a probability
R R
measures; we claim πµ = ν. It is enough to show that g d(πµ) = g dν for every
R R
g ∈ C(Y ). Using the identity g dνn = g ◦ π dµn (which is equivalent to νn = πµn )
we have
Z Z Z Z Z
g dν = lim g dνn = g ◦ π dµn = g ◦ π dµ = g d(πµ)
as claimed.
µ∗ (f ) ≤ µ∗ (kf k∞ ) = kf k∞ · µ∗ (1)
It is easy to check that s is a seminorm, that s|P ≡ 0 and that −µ∗0 (f ) ≤ s(f ) on V .
Hence by Hahn-Banach we can extend −µ∗0 to a functional −µ∗ on C(X) satisfying
−µ∗ ≤ s, which for f ∈ P implies µ∗ (f ) ≥ −s(f ) = 0, so µ∗ is positive. By the previous
R
discussion there is a Borel probability measure µ such that f dµ = µ∗ (f ); for f = g ◦ π
this means that
Z Z Z
g dπµ = g ◦ π dµ = µ∗ (g ◦ π) = µ∗0 (g ◦ π) = ν ∗ (g) = g dν
118
so µ is the desired measure.
119