AARMS Fractals
AARMS Fractals
HEESUNG YANG
Abstract. This course will present an introduction to one viewpoint (that of IFS) in
the study of “fractal” objects. Although there is not a universally accepted definition of a
“fractal”, for our purposes it is enough to think about objects which have “similar behaviour”
at finer and finer resolutions (smaller and smaller length scales). An IFS is a convenient
encoding of this “similar behaviour” between scales and lets us (to some extent) both control
this relationship and analyze the structure of the resulting object.
We will discuss both geometric fractals (viewed as subsets of Rd ) and fractals which are
functions or probability distributions. After discussing the construction and basic properties
of fractal sets, we will present various notions of “dimension” and discuss relations between
these notions and ways of computing them. However, the precise list of topics will depend
greatly on the interests and background of the students. As an example, some applications
of IFS fractals in digital imaging could be presented. The aim of the course is to develop
intuition about what it means to be self-similar and introduce techniques of analyzing fractal
objects.
The tools we will use include metric geometry and topology, probability and measure
theory, and some aspects of function spaces. We will certainly take the time to make
sure that all students have a chance to understand, filling in any gaps in the background
knowledge as we go.
Contents
1. Cantor sets 2
1.1. First method: removing the middle 2
1.2. Second method: concentrating on the complementary intervals 2
1.3. Third method: ternary strings 3
2. Cantor ternary function (“Devil’s staircase” function) 3
3. Contractions 4
3.1. Metric between functions 6
4. Iterated function systems 7
4.1. Hausdorff distance 7
4.2. Iterated function system 10
4.3. Examples of iterated function systems 11
4.4. Base-b decompositions of R and C 13
4.5. IFS, fixed points, and attractor 14
5. The chaos game 15
6. Code space and address maps on an attractor 16
7. Background material detour: measure theory 17
7.1. Infinite product measures 18
Date: 18 July 2019.
1
8. IFS with probabilities (IFSP) 18
8.1. Monge-Kantorovich metric 19
9. Invariant measure, Markov operator, and the IFSP chaos game 21
10. Moments of µ 24
11. Construction of measures 24
11.1. Method I 24
11.2. Method II 25
12. Hausdorff measures 25
13. Open set condition 28
14. Box dimensions 28
1. Cantor sets
We shall talk about three different ways to constructs the middle-third Cantor set.
(3) Repeat this step. Observe that Cn consists of 2n closed intervals each of whose length
is 3−n .
Observe that Cn+1 ⊆ Cn , and that Ci is compact for any i (i.e., Ci is closed and bounded
for any i).
\
Definition 1.1. The set C := Cn is the (middle-third) Cantor set.
n
Therefore, we see that C is indeed of “length“ zero (i.e., Lebesgue measure zero).
Then we have
2 2 2 8
S = (0 + A2 ) ∪ + A2 = (0 + A3 ) ∪ + A3 ∪ + A3 ∪ + A3 = ··· .
3 9 3 9
3. Contractions
Definition 3.1. Let (X, d) be a metric space. Then d is a metric on X, and d must satisfy
the following properties:
• (positive-definite) d(x, y) ≥ 0 for any x, y ∈ X; d(x, y) = 0 if and only if x = y
• (symmetry) d(x, y) = d(y, x) for any x, y ∈ X
• (triangle inequality) d(x, y) ≤ d(x, z) + d(z, y) for any x, y, z ∈ X.
Definition 3.2. Let (X, d) be a metric space, and let {xn } be a sequence of X. If for any
ε > 0 there exists sufficiently large N such that d(xn , xm ) < ε for all n, m ≥ N , then {xn }
is a Cauchy sequence. (X, d) is complete if any Cauchy sequence converges.
Example. The space ((0, 1], | · |) is not complete. Note that the sequence {n−1 } is Cauchy
since for any ε > 0 one can pick sufficiently large N so that N −1 < ε. Then we have
4
|n−1 − m−1 | < ε for any n, m ≥ N . However, note that n−1 → 0 as n → ∞, but 0 ∈ / (0, 1].
Since we displayed a Cauchy sequence that is not convergent, the given space is not complete.
Throughout this lecture note, unless otherwise specified, we shall assume that (X, d) is
always a complete metric space.
Definition 3.3. f : (X, d) → (Y, ρ) is a contraction with contraction factor c ∈ [0, 1) if
ρ(f (x), f (y)) ≤ cd(x, y)
for all x, y ∈ X.
Theorem 3.1 (Contraction mapping theorem). Let f : X → X be a contraction with factor
c < 1. Then f has a unique fixed point x ∈ X (i.e., f (x) = x). Moreover, for any
x0 ∈ X, the sequence {xn } defined by xn+1 := f (xn ) always converges to x, and we also have
d(x, xn ) ≤ cn d(x0 , x).
Proof. We first prove that f can have at most one fixed point. Suppose that there are at
least two distinct fixed points x and y. Then we have
0 < d(x, y) = d(f (x), f (y)) < cd(x, y) < d(x, y).
Indeed, the first inequality follows since x 6= y. The equality holds since x and y are fixed
points. The second inequality uses the fact that f is a contraction, and the last inequality
follows from the fact that c ∈ [0, 1). But it is impossible to have d(x, y) < d(x, y), so f can
have at most one fixed point.
To show the existence of a fixed point, we construct a Cauchy sequence and show that the
limit is the defined fixed point. Take x0 ∈ X to be arbitrary, and define {xn } by xi+1 := f (xi ).
Then by the triangle inequality, we have
d(xn , x0 ) ≤ d(x0 , x1 ) + d(x1 , x2 ) + · · · + d(xn−1 , xn )
= d(x0 , x1 ) + d(f (x0 ), f (x1 )) + d(f 2 (x0 ), f 2 (x1 )) + · · · + d(f n−1 (x0 ), f n−1 (x1 ))
≤ d(x0 , x1 ) + cd(x0 , x1 ) + c2 d(x0 , x1 ) + · · · + cn−1 d(x0 , x1 )
1
= d(x0 , x1 )(1 + c + · · · + cn−1 ) ≤ d(x0 , x1 ). (1)
1−c
Now we will show that {xn } is in fact Cauchy. For any 1 ≤ n < m, we have
d(xn , xm ) ≤ d(f (xn−1 ), f (xm−1 ) ≤ cd(xn−1 , xm−1 )
cn
≤ c2 d(xn−2 , xm−2 ) ≤ · · · ≤ cn d(x0 , xm−n ) ≤
d(x0 , x1 ),
1−c
with the last inequality following from (1). Hence {xn } is Cauchy, and so xn → x for some
x ∈ X by the completeness of X. Now, observe that lim f (xn ) = f (lim xn ) = f (x) since f is
continuous. Furthermore, lim xn = x, so it follows that f (x) = x as desired. Finally, observe
that
d(xn , x) = d(f (xn−1 ), f (x)) ≤ cd(xn−1 , x) ≤ · · · ≤ cn d(x0 , x),
so the last claim regarding the estimate follows.
The next example serves as an example that is not a contraction (but “almost” a contrac-
tion) that has no fixed point either.
5
Example. Let X = [1, ∞) and f (x) = x+x−1 . We claim that |f (x)−f (y)| < |x−y|, and that
f has no fixed point. Suppose 1 ≤ x < y so that x−1 > y −1 . Thus |f (x)−f (y)| < |x−y| since
x−1 > y −1 . But note that one cannot put a constant c < 1 so that |f (x) − f (y)| ≤ c|x − y|
holds uniformly in the domain, so f is not a contraction. Furthermore, f has no fixed point;
otherwise, we will have x such that f (x) = x + x−1 = x, or x−1 = 0. However, ∞ ∈ / X.
However, with an additional restriction on X, such example no longer exists.
Proposition 3.1. If (X, d) is a compact metric space and we have |f (x) − f (y)| < |x − y|
for any x 6= y, then f has a unique fixed point.
Proof. Assignment problem.
Theorem 3.2 (Collage theorem). Let f : X → X be a contraction with contraction factor c,
and let x be its fixed point. Then for any y ∈ X, we have
d(y, f (y))
d(x, y) ≤ .
1−c
Proof. By the triangle inequality, we have
d(y, x) ≤ d(y, f (y)) + d(f (y), x)
= d(y, f (y)) + d(f (y), f (x))
≤ d(y, f (y)) + cd(y, x).
Thus we have
(1 − c)d(y, x) ≤ d(y, f (y)),
and the claim follows upon dividing both sides by 1 − c.
3.1. Metric between functions
Definition 3.4. Suppose f, g : X → X, and we define
d∞ (f, g) := sup d(f (x), g(x)),
x∈X
provided this value is finite.
Proposition 3.2. Let f, g : X → X be contractions with contraction factors cf and cg and
with fixed points xf and xg respectively. Then
1
d(xf , xg ) ≤ d∞ (f, g).
1 − cf
Proof. By the triangle inequality,
d(xf , xg ) ≤ d(xf , f (xg )) + d(f (xg ), xg )
= d(f (xf ), f (xg ) + d(f (xg ), g(xg ))
≤ cf d(xf , xg ) + d∞ (f, g).
∴ (1 − cf )d(xf , xg ) ≤ d∞ (f, g),
and the claim follows upon dividing both sides by 1 − cf .
Corollary 3.1. Let fn : X → X be a sequence of contractions with contraction factors cn
and fixed point xn for each fn . Suppose that cn ≤ c < 1 and fn → f with d∞ (fn , f ) → 0 as
n → ∞. Then xn → x where x is the fixed point of f .
6
4. Iterated function systems
Let C be the Cantor set, and CL the intervals in the left side of C and CR the intervals
in the right side of C. Note that CL ≈ 13 C and CR ≈ 13 C + 23 (i.e., CL is just C contracted
by a factor of 1/3; the same with CR , but with the translation by 2/3). Thus if we define
w0 (x) := x/3 and w1 (x) := (x + 2)/3, then CL = w0 (C) and CR = w1 (C).
Note that starting with [0, 1], we get the following iterations as we use w0 and w1 to shrink
the previous iterated set, and that infinitely repeating this iteration gives us the Cantor set.
However, we still need to determine how An converges to C in what sense.
4.1. Hausdorff distance
Definition 4.1. Given a complete metric space X, we define
H(X) := {A ⊆ X : A is non-empty and compact}.
Furthermore, for any A, B ∈ H(X), we define
dH (A, B) := max sup inf d(a, b), sup inf d(a, b) .
a∈A b∈B b∈B a∈A
Note that sup inf d(a, b) denotes the “farthest” “closest” distance between a point in A
a∈A b∈B
and the set B.
Theorem 4.1. Let (X, d) be a complete metric space. Then (H(X), dH ) is also complete.
Proof. Suppose that {An } is a Cauchy sequence in (H(X), dH ). Let
\[
A := Ai .
n≥1 i≥n
Recall that we need to take the closure to ensure that A is a compact set.1
1We can view A as the “lim sup” of the sets An , since the union can be viewed as “sup” and the intersection
operation “inf”. Draw parallel to the usual definition of lim sup of the sequence {an } for an ∈ R written
below.
lim sup an = lim sup ai = inf sup ai .
n→∞ n→∞ i≥n n≥1 i≥n
8
First we need to prove that A ∈ H(X). Since X is complete, any subset of X is compact
if and only if it is closed and totally bounded. Write
[
Bn := Ai .
i≥n
By definition
\ Bn is closed; since any arbitrary intersection of closed sets is closed, it follows
that A = Bn is also closed. Also, note that
n≥1
[ [
Ai ⊆ Ai ,
i≥n+1 i≥n
so
[ [
Bn+1 = Ai ⊆ Ai = Bn .
i≥n+1 i≥n
Thus as long as B1 is compact, then so is Bn for all n. Pick some ε > 0. Since {An } is
Cauchy, there is some m so that for all n > m we have dH (An , Am ) < ε/2, or equivalently
An ⊆ (Am )ε/2 . Therefore Bm ⊆ (Am )ε/2 . Indeed, since Am is compact hence totally bounded,
there are finitely many balls of radius ε/2 so that we have
k
[
Am ⊆ Bε/2 (xi ),
i=1
Definition 4.5. Given the IFS {wi : 1 ≤ i ≤ n} on X, the induced map W : H(X) → H(X)
is given by
N
[
W (B) = wi (B).
i=1
N N
!
[ [
dH (W (A), W (B)) = dH wi (A), wi (B)
i=1 i=1
≤ max dH (wi (A), wi (B))
1≤i≤n
= max ci dH (A, B) = max ci dH (A, B).
1≤i≤N 1≤i≤N
1
x 2
0 x 0
w0 = 1 +
y 0 2
y 0
1 1
x 2
0 x 2
w1 = 1 +
y 0 2
y 0
1
x 2
0 x 0
w2 = 1 + 1
y 0 2
y 2
Suppose that the starting point of the iteration is an equilateral. Then infinite iterations of
{w0 , w1 , w2 } gives us the Sierpiński triangle.
11
Example. The following finite collection of self-maps {w0 , w1 , . . . , w6 } defined by
√
x √1/2 − 3/6 x 0
w0 = +
y 3/6 1/2 y 0
√
x 1/3 0 x 1/ 3
w1 = +
y 0 1/3 y 1/3
x 1/3 0 x 0
w2 = +
y 0 1/3 y 2/3
√
x 1/3 0 x −1/ 3
w3 = +
y 0 1/3 y 1/3
√
x 1/3 0 x 1/ 3
w4 = +
y 0 1/3 y −1/3
x 1/3 0 x 0
w5 = +
y 0 1/3 y −2/3
√
x 1/3 0 x 1/ 3
w6 = + .
y 0 1/3 y −1/3
Our assumption of “(almost) unique representation” implies that the “parts” of the IFS
decomposition are measure-disjoint. Thus,
!
[
λ(T ) = λ b−1 e + b−1 T
e∈D
X 1 N
= 2
λ(T ) = 2 λ(T ),
e∈D
|b| |b|
so we need N = |b|2 .
Recall that for a base-b representation to be well-defined, we need to pick the digit set
carefully. One necessary condition is that the the digit set must be chosen so that the digit
set is a complete set of coset representatives for Z/bZ.
4.5. IFS, fixed points, and attractor
Take {w1 , . . . , wN } an IFS with attractor A. Then each fixed point xi of wi is in A. To
see why, start with the set S0 = {x1 }. Then S1 = W (S0 ) = {w1 (x1 ), w2 (x1 ), . . . , wN (x1 )} =
{x1 , w2 (x1 ), . . . , wN (x1 )}. We can continue on, i.e.,
S2 = W (S1 ) = {x1 , w2 (x1 ), . . . , wN (x1 ), w1 (w2 (x1 )), w2 (w2 (x1 )). . . . , w2 (wN (x1 )), . . . }.
14
Therefore x1 ∈ Sn for all n. In other words, we have Sn → A in dH where
\∞ [
A := Si .
n=1 i≥n
It follow that x1 ∈ A as well. In fact, for any i1 , . . . , ik ∈ {1, 2, . . . , N }, we have wi1 ◦ wi2 ◦
· · · wik (xj ) ∈ A. In other words, any finite-level image of any fixed point xj of wj is in the
attractor. Note that repeated applications of W to the singleton set of a fixed point grow
the size of the set, but the initial fixed point persists in each iteration, as we just saw.
In the previous section, we presented the IFS for the Sierpiński triangle, with the first
points
0 1 0
D := d0 = , d1 , d2 = .
0 0 1
1
2
0
For A := , we have
0 12
1
0 0
S0 = {Ad0 , Ad1 , Ad2 } = , 2 , 1
0 0 2
S1 = {A(Ad0 ) + Ad0 , A(Ad1 ) + Ad0 , A(Ad2 ) + Ad0 ,
A(Ad0 ) + Ad1 , A(Ad1 ) + Ad1 , A(Ad2 ) + Ad1 ,
A(Ad0 ) + Ad2 , A(Ad1 ) + Ad2 , A(Ad2 ) + Ad2 }
= {A2 b2 + Ab1 : b1 , b2 ∈ D}
S2 = {A3 b3 + A2 b2 + Ab1 : b1 , b2 , b3 ∈ D}.
Thus in general, we have Sn = { ni=1 Ai bi : bi ∈ D}, so as n → ∞ we have
P
(∞ )
X
S= Ai bi : bi ∈ D .
i=1
Definition 6.1. The address map from Σ = {1, 2, . . . , N }N → A is given by the map
lim wσ1 ◦ wσ2 ◦ · · · ◦ wσn (x0 )
n→∞
for some starting point x0 .
Definition 6.2. Let {w1 , . . . , wN } be an IFS on (X, d), and let ω : Σ → X where Σ =
{1, 2, . . . , N }N . Σ is called the code space.
Now we explore a few properties of address maps.
Proposition 6.1. Let {w1 , . . . , wN } be an IFS on (X, d), and let ω : Σ → X be an address
map on the attractor A. Then the following are true.
• The range of ω is A.
• If we place a product topology with a discrete topology placed on each factor, then ω
is continuous.
• Under the aforementioned product topology, Σ is compact.
• Since Σ is compact and ω continuous, it follows that ω is uniformly continuous.
We will use the Sierpiński triangle as an example to examine the behaviour of address
maps.
Example. Let x0 = (0, 0), and let the IFS be as given in Section 4.3. Randomly choose
i1 ∈ {0, 1, 2}, and let x1 = wi1 (x0 ). Suppose that we pick i1 = 2. Then (0, 0) 7→ (0, 1/2).
Note that the address of x0 ∼ 0000000 . . . . Similarly, the address of (0, 1) is 222222 . . . ; and
the address of (1, 0) is 111111 . . . . Therefore, we see that the fixed point of w0 (resp. w1
resp. w2 ) has address 0000 . . . (resp. 1111 . . . resp. 2222 . . . ). So the address map is x1 ∼
20000 . . . . Suppose we pick i2 = 2 and i3 = 1. Then x3 = w3 (w2 (w1 ((0, 0)))) ∼ 1220000 . . . .
Furthermore, note that the string that the address map produces indicates which region xn
belongs, justifying the name “address” map.
7. Background material detour: measure theory
Definition 7.1. A, a collection of subsets of X, is a σ-algebra if the following are true.
(1) ∅, X ∈ A
(2) if A ∈ A then so is X \ A ∈ A S
(3) if An ∈ A for each n ∈ N, then so is An ∈ A.
Definition 7.2. Suppose that X is a set, and A a σ-algebra over X. Then a measure is a
function m : A → R ∪ {∞} satisfying the following conditions:
(1) m(∅) = 0
(2) (non-negativity) m(A) ≥ 0 for any
! A ∈∞A
X∞ X
(3) (countable additivity) m Ai = m(Ai ) if the Ai are disjoint.
i=1 i=1
17
The canonical measure is called the Lebesgue measure which “measures” the length of a set,
and is defined by λ([a, b]) := b − a.
[N N
X
If S = (ai , bi ) with bi < ai+1 , then λ(S) = bi − ai . The Lebesgue outer measure of
i=10 i=1
a set A is defined by
(∞ ∞
)
X [
λ∗ (A) = inf b i − ai : A ⊆ (ai , bi ) .
i=1 i=1
Proposition 7.1 (Properties of the Lebesgue measure). Let λ be the Lebesgue measure,
and suppose that A and the Ai are Lebesgue measurable sets. Then on top of the properties
satisfied by any measure, the following additional properties hold.
(1) (translation invariant) λ(A + t) = λ(A)
(2) (scaling) λ(tA) = |t|λ(A).
Remark. Note that we need to restrict our attention to measurable sets only for λ to have
all the desirable properties listed above.
Proposition 7.2. The set of Lebesgue measurable sets is a σ-algebra.
Measures can be used to comment on the probability of an event as well, as we will see in
the next definition.
Definition 7.3. We say µ is a probability measure of X if µ is a measure such that µ(X) = 1.
Definition 7.4. We say that something happens almost everywhere, almost surely, or with
probability 1 if S is the set of events in question such that µ(X \ S) = 0.
7.1. Infinite product measures
Suppose that we are only interested in product probability measures on Σ = {1, 2, . . . , N }N .
In this case, given probabilities p1 , p2 , . . . , pN (i.e., p1 + p2 + · · · + pN = 1 and pi ≥ 0), this
gives a probability measure on {1, 2, . . . , N } given by Prob(i) = pi .
The probability which is “induced” on Σ is that of an independent sequence of trials
of the repeated experiment: that is, draw σi ∈ {1, 2, . . . , N } according to the probability
distribution {pi }.
Definition 7.5. The sets which generate the σ-algebra on Σ are said to be the cylinder sets.
Remark. For a general cylinder set, we are allowed to specify only finitely many outcomes;
the rest have to remain free. Otherwise, we will always get probability 0. So, the probability
of any other (allowable) events is given by the probability of the cylinder sets.
Example. Let
S := {σ ∈ Σ : σ1 = 2, σ3 = 5, σ100 = 1, σ1000 = 3}
be a cylinder set. Then Prob(S) = p2 · p5 · p1 · p3 .
8. IFS with probabilities (IFSP)
Let (X, d) be a complete metric space, and let {w1 , . . . , wN } be an
P IFS such that each
wi has probability pi of being randomly chosen each stage (pi ≥ 0, pi = 1) rather than
assuming uniform probability (i.e., has probability 1/N of being chosen randomly). So if we
18
use the pi in the chaos game, we will get a sequence of occupation distributions on A which
depends on {pi }. Interesting questions arise then: do they converge (and in what sense?)?;
and if so, to what?
Definition 8.1. Let (X, A) with measure µ and (Y, A0 ) be a measure space, and let f : X →
Y be a (measurable) function. In this case, we see that
{B ⊆ Y : f −1 (B) ∈ A}
is a σ-algebra on Y. Then the push-forward of µ is f# (µ) : A0 → R+ ∪ {∞} defined by
f# (µ)(B) := µ(f −1 (B))
for all B ∈ A0 .
One example of a push-forward is the following operator.
Definition 8.2. Let (X, d) be a complete metric space, and let µ be a (Borel) probability
measure on X (i.e., µ(X) = 1). Then M defined by
N
X N
X
(M µ)(B) := pi (wi )# (µ)(B) = pi µ(wi−1 (B))
i=1 i=1
is a Markov operator.
Our IFS mapping on probability measure is precisely the Markov operator M : Prob →
Prob defined by
XN
(M µ)(B) = pi (µ ◦ wi−1 )(B),
i=1
acting on the space of probability measure on X. Now, we need a metric of some kind to get
a contraction.
Remark. If X is not compact, then there are some technical conditions in addition on µ and
ν.
Since the proof that dM K is a metric requires using techniques and results from functional
analysis, we shall take for granted that dM K is a metric. This metric gives weak convergence
of probability measures.
Theorem 8.1. If {w1 , . . . , wN } is an IFS on X with maximum contraction factor c < 1,
and pi ≥ 0 are associated probabilities, then
dM K (M µ, M ν) ≤ cdM K (µ, ν).
19
Proof. For f ∈ Lip1 (X),
Z Z " N N
#
X X
f (x) d(M µ − M ν)(x) = f (x) d pi µ ◦ wi−1 − pi ν ◦ wi−1 (x)
X X i=1 i=1
N
X Z
= pi f (x) d(µ ◦ wi−1 − ν ◦ wi−1 )(x)
i=1 wi (X)
N Z
∗
X
= pi f (wi (y)) d(µ − ν)(y)
i=1 y∈X
Z "X
N
#
= pi f ◦ wi (y) d(µ − ν)(y)
X i=1
Z "X
N
#
= pi f ◦ wi (y) d(µ − ν)(y).
X i=1
| {z }
fˆ
∗
(= follows from the change
P of variables: x = wi (y) ⇒ y = wi−1 (x).) We see that fˆ is
Lipschitz with factor pi ci . Indeed, we have
X
|fˆ(a) − fˆ(b)| = pi f (wi (a)) − pi f (wi (b))
i
N
X
≤ pi |f (wi (a)) − f (wi (b))| (∵ f ∈ Lip1 (X))
i=1
N
" N
#
X X
≤ pi d(wi (a), wi (b)) ≤ pi ci d(a, b).
i=1 i=1
Now define
fˆ
ĝ := N
.
P
p i ci
i=1
where µ is the invariant measure of the IFSP. i.e., there exists a set Ω ⊆ {1, 2, . . . , N }N with
P (Ω) = 1 so that for all σ ∈ Ω we have
∞ Z
1X
lim f (wσn (wσn−1 (· · · (wσ2 (wσ1 (x0 ) · · · ) = f (x) dµ(x).
n→∞ n X
n=1
Proof (sketch). Markov operator has a unique invariant measure µ (i.e, M µ = µ) and
X
pi µ ◦ wi−1 = µ.
So
Z Z
f (x) dM µ(x) = f (x) dµ(x)
X
X Z Z
pi f (x) d(µ ◦ wi−1 )(x) = f (x) dµ(x)
i wi (X) X
Z ! Z
X
= pi f (wi (y)) dµ(y) = f (x) dµ(x).
i X
Definition 9.2. Let P(X) be Borel probability measures on X, and M : P(X) → P(X) the
Markov operator. Then the adjoint operator M ∗ : C(X) → C(X) is defined to be
N
X
∗
M (f ) := pi f ◦ w i .
i=1
Suppose that
n
1X
lim f (xi )
n→∞ n
i=1
exists for f ∈ C(X). Then on C[x],
n
1X
f → lim f (xi )
n→∞ n
i=1
is linear, thanks to the Riesz representation theorem. Particularly, this is given by the regular
Borel measure ν, i.e.,
n Z
1X
lim f (xi ) = f (t) dν(t).
n→∞ n X
i=1
Definition 9.3. Let l∞ (N) = {{xn } : sup |xn | < ∞}. Then the Banach limit π : l∞ (N) → R
is a bounded linear functional that is shit-invariant, i.e., π(x1 , x2 , . . . ) = π(x2 , x3 , . . . ).
22
Remark. If xn → x, then π(xn ) = x. Clearly, we also have lim inf xn ≤ π(xn ) ≤ lim sup xn .
Furthermore, if π(xn ) = x for all Banach limits, then lim xn = x exists.
For some fixed Banach limit π, define
n
!
1X
π(f ) := π f (xi )
n i=1
where xi = wσi ◦ wσi−1 ◦ · · · ◦ wσ2 ◦ wσ1 (x0 ). Since π is a bounded linear functional on C(X),
there is a measure νπ such that
n
! Z
1X
π f (xi ) = f (t) dνπ (t).
n i=1 X
For σ = σ1 σ2 . . . σn , let
ϕn (σ, x) := (wσ1 ◦ wσ2 ◦ · · · ◦ wσn )(x).
If ω is the address map, then indeed ϕn → ω as n → ∞; furthermore, we have Pn → P . So
if ν is the starting probability measure, and µ the invariant measure of the given IFSP (ie.,
M µ = µ), then we have
Z Z Z
∗ n n
[(M ) )(f )](x) dν = f (x) d(M ν)(x) → f (x) dµ(x).
X X X
∗ n
Note we can re-write (M ) (f ) as follows.
N
X
∗ n
(M ) (f )(x) = pi1 · · · pin f (wσi1 ◦ · · · ◦ wσin )(x)
i1 ,...,in =1
Z Z Z
= f (ϕn (σ, x)) dPn (σ) =f (ϕn (σ, x)) dPn (σ) dν(x)
σ∈Σ X Σ
Z Z Z Z
= (f ◦ ω)(σ, x) dP (σ) dν(x) = (f ◦ ω)(σ) dP (σ) dν(x)
X Σ Σ X
Z
= (f ◦ ω)(σ) dP (σ).
Σ
For any σ ∈ Σ, we define si : Σ
P→ Σ by−1(σ1 , σ2 , . . . P
) 7→ (i, σ1 , σ2 , . . . ). Then the following
diagram commutes: with P = pi P ◦ si and µ = pi µ ◦ wi−1 (invariant measure).
Theorem 9.3. For all µ-almost all x0 ∈ X and P -almost all σ ∈ Σ, for any f ∈ L1 (µ) we
have n Z
1X
f (xi ) → f (x) dµ(x).
n i=1 X
#{xi : xi ∈ A}
Corollary 9.1. → µ(A).
n
23
10. Moments of µ
Let µ be the invariant measure (i.e., M µ = µ), i.e.,
Z Z Z N
Z X
∗
f (x) dµ(x) = f (x) d(M µ)(x) = (M f )(x) dµ(x) = pi f (wi (x)) dµ(x).
X X X X i=1
n
R wi : R → R of the form wi (x) = si x + bi , and f (x) = x . Then the
Definition 10.1. Take
nth moment of µ is f (x) dµ(x).
Compute the nth moments of µ:
Z Z X
n
x dµ(x) = pi f (si x + bi ) dµ(x)
R R i
N
X Z
= pi (si x + bi )n dµ(x)
i=1 R
N Z n
!
X X n
= pi sji xj bin−j dµ(x)
i=1 R j=0
j
N n Z
X X n
= pi sji bin−j xj dµ(x)
i=1 j=0
j R
" N
n−1 X
#Z " N #Z
X n X
= pi sji bn−j
i xj dµ(x) + pi sni xn dµ(x).
j=0
j i=1 R i=1 R
Hence, N
n−1
n
pi fij bn−j
P R
xj dµ(x)
P
Z j i R
j=0 i=1
xn dµ(x) = N
,
R
pi sni
P
1−
i=1
N
X R
provided pi |si | < 1. This recursive formula for the nth moment starts with x0 dµ(x) =
i=1
1.
11. Construction of measures
We present two methods of constructing measures.
11.1. Method I
Definition 11.1. For a space X and a class of subsets C, a pre-measure τ : C → R ∪ {+∞}
is a function such that
(1) ∅ ∈ C and τ (∅) = 0.
(2) 0 ≤ τ (A) ≤ +∞ for all A ∈ C
Example. Let C be the set of all intervals in R, and let τ (A) be the “length” of an interval
A. τ is clearly a pre-measure.
24
Based on this pre-measure τ , we construct or define µ by
(∞ ∞
)
X [
µ(A) = inf τ (Bi ) : Bi ∈ C, A ⊆ Bi
i=1 i=1
with the convention that inf(∅) = +∞. It is relatively straightforward to verify that µ gives
a σ-additive measure when restricted to µ-measurable sets.
11.2. Method II
On a metric space (X, d), start with some pre-measure τ . For any δ > 0, define
(∞ ∞
)
X [
µδ (A) = inf τ (Bi ) : Bi ∈ C, A ⊆ Bi , diam(Bi ) < δ ,
i=1 i=1
since µδ (A) is increasing as δ approaches 0. Then µ(A) is always Borel.If µ is finite, then
µ(A) = sup{µ(C) : C ⊆ A, C closed} = inf{µ(U ) : A ⊆ U, U open}.
One can also construct µ(A) by taking the supremum over compact sets, provided X satisfies
additional conditions, which prompts the following definition.
Definition 11.3. A space X is said to be separable if X contains a countable, dense subset
(i.e., there exists a sequence {xn } of elements of the space such that every non-empty open
subset of the space contains at least one element of {xn }).
Proposition 11.1. If X is complete and separable, and µ is finite, then
µ(A) = sup{µ(K) : K ⊆ A, K compact},
which is Borel regular.
diam(Bi ) ≤ diam(B
ci ) ≤ k diam(Bi )
for some uniform bound k, this changes the value we obtain for H s (but not the dimension).
Proposition 12.2 (Properties of the Hausdorff dimension). Let the Ai , A, and B are all
Hausdorff measurable.
(1) If A ⊆ B, then dimH (A) ≤ dim [H (B).
(2) (“countable stability”) If A = Ai , then dimH (A) = sup dimH (Ai ).
i
i≥1
26
!
[
(3) If t > sup dimH (Ai ), then H t (Ai ) = 0 for all i. Hence H t Ai = 0. Thus,
i
i≥1
t ≥ dimH (A).
(4) On the other hand, if t < sup dimH (Ai ), then there exists i so that t < dimH (Ai ). In
i
this case, H t (Ai ) = +∞, so H t ( Ai ) = +∞ as well.
S
(5) If the d-dimensional Lebesgue measure of A is positive, then dimH (A) ≥ d since
H d ' λd up to a constant.
(6) If A is countable, then dimH (A) = 0.
Example (Computing dimH (C)). We start with Xcomputing the upper bound. Find one se-
quence of coverings which give finite values for diam(Bi )s . Take δ > 0 and n large enough
i
so that 3−n < δ. Then the 2n intervals from stage n in the construction form a δ-covering of
C. Thus
2n n
X
s n −ns 2
diam(Bi ) = 2 · 3 = .
i=1
3s
nX o
If s = log 2
log 3
, then (2/3s n
) = 1, so Hδ
s
(C) = inf diam(B i ) s
: {Bi } is a δ − covering ≤ 1.
log 2
Thus H s (C) ≤ 1, so dimH (C) ≤ log 3
.
Computing the lower bound is trickier, and will need the mass distribution principle, which
is formally stated as Theorem 12.1 below. First, we need a measure µ on C. We use the
invariant measure µ of the IFSP with p0 = p1 = 12 . So each nth level “part” gets mass 2−n
and length 3−n . So we want to show that µ(U ) ≤ c diam(U )s for sufficiently small U . Take
diam(U ) < 1 and let k be such that 3−k−1 ≤ diam(U ) ≤ 3−k . Then U intersects at most one
interval of level k. So it follows that
log 2 log 2
µ(U ) ≤ 2−k = 3−k log 3 ≤ (3 diam(U )) log 3 .
log 2
log 2 1
Thus by Theorem 12.1, we have dimH C ≥ log 3
(and H log 3 (C) ≥ log 2 ). Hence dimH (C) =
3 log 3
log 2
log 3
.
Theorem 12.1 (Mass distribution principle). Let µ be a finite, positive Borel measure on
A and suppose that there exist c > 0 and δ > 0 such that for some s we have
µ(U ) ≤ c · diam(U )s
for U with diam(U ) ≤ δ. Then
(1) H s (A) ≥ µ(A)
c
, and
(2) s ≤ dimH (A).
Proof. If {Bi } is a δ-cover of A, then
∞
! ∞
[ X
0 < µ(A) = µ Bi ≤ µ(Bi ).
i=1 i=1
Definition 14.1. The D define above is the box dimension provided the limit exists, which
we write dimB (A).
In general, the limit doesn’t exist, so we have to define the upper and lower box dimensions.
Definition 14.2. For any A, the upper box dimension dimB (A) and lower box dimension
dimB (A) are each defined as follows.
log(Nδ (A))
dimB (A) = lim sup
n→∞ log(δ −1 )
log(Nδ (A))
dimB (A) = lim inf .
n→∞ log(δ −1 )
Example. For any line of length L (call it L, abusing the notation), we have
L
Nδ (L) ≤ + 1,
2δ
so
log(Nδ (A)) log(δ −1 )
∼ = 1.
log(δ −1 ) log(δ −1 )
Example (Cantor set). For δ = 3−n , we need Nδ (C) = 2n , so
log(Nδ (C) n log(2) log(2)
−1
= = .
log(δ ) n log(3) log(3)
If 3−n−1 < δ < 3−n , we have Nδ (C) ≤ 2−n+1 , and so
(n + 1) log(2) log(2)
−1
→ .
log(δ ) log(3)
Example. Take xn = n−p with p > 0 for n = 1, 2, . . . , and let A = {n−p : n ∈ N} ∪ {0}. Take
δ > 0. If xm − xm+1 < 2δ, we have
xm
Nε (A) ≤ + m + 2.
2δ
So if f (x) = x−p , then −f 0 (ξ) = p/ξ p+1 . Let −f 0 (ξ) = 2δ for some m < ξ < m + 1; using
this, we get
p 1 1 1 1
Nδ (A) ≤ p− p+1 (2δ)− p+1 + p p+1 (2δ)− p+1 ∼ cδ − p+1 .
So
log(Nδ (A)) (p + 1)−1 log(δ −1 ) 1
−1
∼ −1
= .
log(δ ) log(δ ) p+1
So the box dimension of A is (p + 1)−1 .
Proposition 14.1. Let A, B be a sets.
(1) If A is unbounded, then Nδ (A) = +∞, so dimB (A) = +∞.
29
(2) A ⊆ B implies Nδ (A) ⊆ Nδ (B), so dimB (A) ≤ dimB (B).
(3) dimB (A ∪ B) = max{dimB (A), dimB (B)}.
(4) However, the similar claim does not hold for dim: one can find A and B so that
dimB (A ∪ B) > max{dimB (A), dimB (B)}.
(5) If f : X → X is Lipschitz, then Nkδ (f (A)) ≤ Nδ (A). Hence, dimB f (A) ≤ dimB (A).
(6) dimH (A) ≤ dimB (A).
If A is covered by Nδ/2 (A) balls of radius δ/2, then
Hδs (A) ≤ δ s Nδ/2 (A).
If 1 ≤ H s (A) ≤ δ s Nδ/2 (A) for small δ, then
log(Nδ/2 (A)
s≤ ,
log(2δ −1 ) − log(2)
so dimH (A) ≤ dimB (A).
Theorem 14.1. If {w1 , . . . , wN } is an IFS of similarities on Rd which satisfies the open set
condition, then dimB (A) = dimH (A) = s.
Department of Mathematics and Statistics, Dalhousie University, 6316 Coburg Rd, Hal-
ifax, NS, Canada B3H 4R2
E-mail address: [email protected]
30