0% found this document useful (0 votes)
9 views30 pages

AARMS Fractals

This document outlines a course on fractals using Iterated Function Systems (IFS) to explore their properties and applications. It covers topics such as the construction of Cantor sets, the Cantor ternary function, and various mathematical tools including metric geometry and measure theory. The course aims to provide students with an understanding of self-similarity and techniques for analyzing fractal objects.

Uploaded by

Angelo Oppio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views30 pages

AARMS Fractals

This document outlines a course on fractals using Iterated Function Systems (IFS) to explore their properties and applications. It covers topics such as the construction of Cantor sets, the Cantor ternary function, and various mathematical tools including metric geometry and measure theory. The course aims to provide students with an understanding of self-similarity and techniques for analyzing fractal objects.

Uploaded by

Angelo Oppio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

AARMS 5920: FRACTALS: USING ITERATED FUNCTION SYSTEMS

(IFS) TO CONSTRUCT, EXPLORE, AND UNDERSTAND FRACTALS

HEESUNG YANG

Abstract. This course will present an introduction to one viewpoint (that of IFS) in
the study of “fractal” objects. Although there is not a universally accepted definition of a
“fractal”, for our purposes it is enough to think about objects which have “similar behaviour”
at finer and finer resolutions (smaller and smaller length scales). An IFS is a convenient
encoding of this “similar behaviour” between scales and lets us (to some extent) both control
this relationship and analyze the structure of the resulting object.
We will discuss both geometric fractals (viewed as subsets of Rd ) and fractals which are
functions or probability distributions. After discussing the construction and basic properties
of fractal sets, we will present various notions of “dimension” and discuss relations between
these notions and ways of computing them. However, the precise list of topics will depend
greatly on the interests and background of the students. As an example, some applications
of IFS fractals in digital imaging could be presented. The aim of the course is to develop
intuition about what it means to be self-similar and introduce techniques of analyzing fractal
objects.
The tools we will use include metric geometry and topology, probability and measure
theory, and some aspects of function spaces. We will certainly take the time to make
sure that all students have a chance to understand, filling in any gaps in the background
knowledge as we go.

Contents
1. Cantor sets 2
1.1. First method: removing the middle 2
1.2. Second method: concentrating on the complementary intervals 2
1.3. Third method: ternary strings 3
2. Cantor ternary function (“Devil’s staircase” function) 3
3. Contractions 4
3.1. Metric between functions 6
4. Iterated function systems 7
4.1. Hausdorff distance 7
4.2. Iterated function system 10
4.3. Examples of iterated function systems 11
4.4. Base-b decompositions of R and C 13
4.5. IFS, fixed points, and attractor 14
5. The chaos game 15
6. Code space and address maps on an attractor 16
7. Background material detour: measure theory 17
7.1. Infinite product measures 18
Date: 18 July 2019.
1
8. IFS with probabilities (IFSP) 18
8.1. Monge-Kantorovich metric 19
9. Invariant measure, Markov operator, and the IFSP chaos game 21
10. Moments of µ 24
11. Construction of measures 24
11.1. Method I 24
11.2. Method II 25
12. Hausdorff measures 25
13. Open set condition 28
14. Box dimensions 28

1. Cantor sets
We shall talk about three different ways to constructs the middle-third Cantor set.

1.1. First method: removing the middle


The first method is the standard and most well-known.
(1) Start with C0 := [0, 1], the unit interval, and remove the middle third. Let the new
set be C1 := I0 ∪ I1 where I0 is the left interval ([0, 1/3]) and I1 the right interval
([2/3, 1]).
(2) Remove the middle for each of the intervals; append 0 to the index for the left interval
after the subdivision, and 1 to the index for the right interval after the subdivision.
Thus, C2 = I00 ∪ I01 ∪ I10 ∪ I11 = [0, 1/9] ∪ [2/9, 1/3] ∪ [2/3, 7/9] ∪ [8/9, 1].

(3) Repeat this step. Observe that Cn consists of 2n closed intervals each of whose length
is 3−n .
Observe that Cn+1 ⊆ Cn , and that Ci is compact for any i (i.e., Ci is closed and bounded
for any i).
\
Definition 1.1. The set C := Cn is the (middle-third) Cantor set.
n

Proposition 1.1. Let C be the (middle-third) Cantor set.


(1) C is non-empty (since 0 ∈ Cn for any n) and compact.
(2) C has Lebesgue measure zero.
(3) C is totally disconnected.
(4) C has no isolated points.
(5) C is uncountable.
2
1.2. Second method: concentrating on the complementary intervals
This time, we will concentrate instead on the complementary intervals (in particular on
their lengths and method fo placements). In this case, observe that
1 1 1
a1 = , a2 = a3 = , a4 = a5 = a6 = a7 = , · · · .
3 9 27
For each step, we see that there are 2n gaps of length (1/3)n+1 for each n = 0, 1, 2, . . . . So
the sum of the lengths of the intervals being removed is
∞ ∞  n
X
n −(n+1) 1X 2 1 1
2 ·3 = = · = 1.
n=0
3 n=0 3 3 1 − 2/3

Therefore, we see that C is indeed of “length“ zero (i.e., Lebesgue measure zero).

1.3. Third method: ternary strings


For each n ≥ 1, let bn := 2 · 3−n . Then consider the set
(∞ )
X
S := εn bn : εn ∈ {0, 1} ,
n=1
P
i.e., the set of all possible sub-sums of the infinite sum bn . (Note that 1/3 = 0.13 , but
also note that 0.13 = 0.022222 . . .3 . Thus 1/3 ∈ C1 .) Observe that the middle third interval
being removed at the first step has the digit “1” in the first ternary place. Thus, the base-3
expansion of the numbers in I0 has 0 as its first digit; similarly, I1 has 2 as its first digit.
Similarly, any number I00 (resp. I01 resp. I10 resp. I11 ) has its base-3 expansion starting
with the 00 (resp. 02 resp. 20 resp. 22). Thus at the end, we will only have numbers
whose base-3 expansion does not contain the digit 1. We can also express S in terms of the
translates. Let
(∞ )
X
Ai := εn bn : εn ∈ {0, 1} .
n=i

Then we have
        
2 2 2 8
S = (0 + A2 ) ∪ + A2 = (0 + A3 ) ∪ + A3 ∪ + A3 ∪ + A3 = ··· .
3 9 3 9

2. Cantor ternary function (“Devil’s staircase” function)


The Cantor ternary function (also known as the “Devil’s staircase” function) is an example
of a non-constant function whose derivative is zero everywhere. We start with f0 (x) := x, and
construct the next iteration f1 by keeping the middle third constant with the value 1/2 (i.e.,
the mid-point between the minimum and the maximum of the non-constant portion), and
drawing the straight line for the remaining intervals. Do the same step for the non-constant
portion of f1 to construct f2 ; then we obtain the following diagram below.
3
We thus can make the following observations regarding fn :
(1) fn is constant on the “gaps” form the n-th level of the construction of the Cantor
set.
(2) fn has slope (3/2)n where it is not constant.
(3) {fn }n is a uniformly Cauchy sequence of functions. Let f be the function such that
fn → f uniformly as n → ∞. Then since fn is continuous, so is f .
But since (3/2)n → ∞ as n → ∞, we have that f 0 (x) = +∞ whenever x ∈ C whereas
f 0 (x) ≡ 0 for all x ∈/ C. f thus serves as an example of a function that is continuous but not
absolutely continuous whose definition is provided below.
Definition 2.1. A function g : I → R is absolutely continuous if for any ε > 0, there exists
δ > 0 such that for any finite sequence of pairwise disjoint sub-intervals (xk , yk ) of I such
that xk , yk ∈ I satisfying
X
(yk − xk ) < δ,
k
we have X
|f (yk ) − f (xk )| < ε.
k

3. Contractions
Definition 3.1. Let (X, d) be a metric space. Then d is a metric on X, and d must satisfy
the following properties:
• (positive-definite) d(x, y) ≥ 0 for any x, y ∈ X; d(x, y) = 0 if and only if x = y
• (symmetry) d(x, y) = d(y, x) for any x, y ∈ X
• (triangle inequality) d(x, y) ≤ d(x, z) + d(z, y) for any x, y, z ∈ X.
Definition 3.2. Let (X, d) be a metric space, and let {xn } be a sequence of X. If for any
ε > 0 there exists sufficiently large N such that d(xn , xm ) < ε for all n, m ≥ N , then {xn }
is a Cauchy sequence. (X, d) is complete if any Cauchy sequence converges.
Example. The space ((0, 1], | · |) is not complete. Note that the sequence {n−1 } is Cauchy
since for any ε > 0 one can pick sufficiently large N so that N −1 < ε. Then we have
4
|n−1 − m−1 | < ε for any n, m ≥ N . However, note that n−1 → 0 as n → ∞, but 0 ∈ / (0, 1].
Since we displayed a Cauchy sequence that is not convergent, the given space is not complete.
Throughout this lecture note, unless otherwise specified, we shall assume that (X, d) is
always a complete metric space.
Definition 3.3. f : (X, d) → (Y, ρ) is a contraction with contraction factor c ∈ [0, 1) if
ρ(f (x), f (y)) ≤ cd(x, y)
for all x, y ∈ X.
Theorem 3.1 (Contraction mapping theorem). Let f : X → X be a contraction with factor
c < 1. Then f has a unique fixed point x ∈ X (i.e., f (x) = x). Moreover, for any
x0 ∈ X, the sequence {xn } defined by xn+1 := f (xn ) always converges to x, and we also have
d(x, xn ) ≤ cn d(x0 , x).
Proof. We first prove that f can have at most one fixed point. Suppose that there are at
least two distinct fixed points x and y. Then we have
0 < d(x, y) = d(f (x), f (y)) < cd(x, y) < d(x, y).
Indeed, the first inequality follows since x 6= y. The equality holds since x and y are fixed
points. The second inequality uses the fact that f is a contraction, and the last inequality
follows from the fact that c ∈ [0, 1). But it is impossible to have d(x, y) < d(x, y), so f can
have at most one fixed point.
To show the existence of a fixed point, we construct a Cauchy sequence and show that the
limit is the defined fixed point. Take x0 ∈ X to be arbitrary, and define {xn } by xi+1 := f (xi ).
Then by the triangle inequality, we have
d(xn , x0 ) ≤ d(x0 , x1 ) + d(x1 , x2 ) + · · · + d(xn−1 , xn )
= d(x0 , x1 ) + d(f (x0 ), f (x1 )) + d(f 2 (x0 ), f 2 (x1 )) + · · · + d(f n−1 (x0 ), f n−1 (x1 ))
≤ d(x0 , x1 ) + cd(x0 , x1 ) + c2 d(x0 , x1 ) + · · · + cn−1 d(x0 , x1 )
1
= d(x0 , x1 )(1 + c + · · · + cn−1 ) ≤ d(x0 , x1 ). (1)
1−c
Now we will show that {xn } is in fact Cauchy. For any 1 ≤ n < m, we have
d(xn , xm ) ≤ d(f (xn−1 ), f (xm−1 ) ≤ cd(xn−1 , xm−1 )
cn
≤ c2 d(xn−2 , xm−2 ) ≤ · · · ≤ cn d(x0 , xm−n ) ≤
d(x0 , x1 ),
1−c
with the last inequality following from (1). Hence {xn } is Cauchy, and so xn → x for some
x ∈ X by the completeness of X. Now, observe that lim f (xn ) = f (lim xn ) = f (x) since f is
continuous. Furthermore, lim xn = x, so it follows that f (x) = x as desired. Finally, observe
that
d(xn , x) = d(f (xn−1 ), f (x)) ≤ cd(xn−1 , x) ≤ · · · ≤ cn d(x0 , x),
so the last claim regarding the estimate follows. 
The next example serves as an example that is not a contraction (but “almost” a contrac-
tion) that has no fixed point either.
5
Example. Let X = [1, ∞) and f (x) = x+x−1 . We claim that |f (x)−f (y)| < |x−y|, and that
f has no fixed point. Suppose 1 ≤ x < y so that x−1 > y −1 . Thus |f (x)−f (y)| < |x−y| since
x−1 > y −1 . But note that one cannot put a constant c < 1 so that |f (x) − f (y)| ≤ c|x − y|
holds uniformly in the domain, so f is not a contraction. Furthermore, f has no fixed point;
otherwise, we will have x such that f (x) = x + x−1 = x, or x−1 = 0. However, ∞ ∈ / X.
However, with an additional restriction on X, such example no longer exists.
Proposition 3.1. If (X, d) is a compact metric space and we have |f (x) − f (y)| < |x − y|
for any x 6= y, then f has a unique fixed point.
Proof. Assignment problem. 
Theorem 3.2 (Collage theorem). Let f : X → X be a contraction with contraction factor c,
and let x be its fixed point. Then for any y ∈ X, we have
d(y, f (y))
d(x, y) ≤ .
1−c
Proof. By the triangle inequality, we have
d(y, x) ≤ d(y, f (y)) + d(f (y), x)
= d(y, f (y)) + d(f (y), f (x))
≤ d(y, f (y)) + cd(y, x).
Thus we have
(1 − c)d(y, x) ≤ d(y, f (y)),
and the claim follows upon dividing both sides by 1 − c. 
3.1. Metric between functions
Definition 3.4. Suppose f, g : X → X, and we define
d∞ (f, g) := sup d(f (x), g(x)),
x∈X
provided this value is finite.
Proposition 3.2. Let f, g : X → X be contractions with contraction factors cf and cg and
with fixed points xf and xg respectively. Then
1
d(xf , xg ) ≤ d∞ (f, g).
1 − cf
Proof. By the triangle inequality,
d(xf , xg ) ≤ d(xf , f (xg )) + d(f (xg ), xg )
= d(f (xf ), f (xg ) + d(f (xg ), g(xg ))
≤ cf d(xf , xg ) + d∞ (f, g).
∴ (1 − cf )d(xf , xg ) ≤ d∞ (f, g),
and the claim follows upon dividing both sides by 1 − cf . 
Corollary 3.1. Let fn : X → X be a sequence of contractions with contraction factors cn
and fixed point xn for each fn . Suppose that cn ≤ c < 1 and fn → f with d∞ (fn , f ) → 0 as
n → ∞. Then xn → x where x is the fixed point of f .
6
4. Iterated function systems
Let C be the Cantor set, and CL the intervals in the left side of C and CR the intervals
in the right side of C. Note that CL ≈ 13 C and CR ≈ 13 C + 23 (i.e., CL is just C contracted
by a factor of 1/3; the same with CR , but with the translation by 2/3). Thus if we define
w0 (x) := x/3 and w1 (x) := (x + 2)/3, then CL = w0 (C) and CR = w1 (C).

Note that starting with [0, 1], we get the following iterations as we use w0 and w1 to shrink
the previous iterated set, and that infinitely repeating this iteration gives us the Cantor set.
However, we still need to determine how An converges to C in what sense.
4.1. Hausdorff distance
Definition 4.1. Given a complete metric space X, we define
H(X) := {A ⊆ X : A is non-empty and compact}.
Furthermore, for any A, B ∈ H(X), we define
 
dH (A, B) := max sup inf d(a, b), sup inf d(a, b) .
a∈A b∈B b∈B a∈A

Note that sup inf d(a, b) denotes the “farthest” “closest” distance between a point in A
a∈A b∈B
and the set B.

Definition 4.2. Given A ⊆ X, define


[
Aε := {y ∈ X : ∃x ∈ A such that d(x, y) < ε} = Bε (a).
a∈A
7
Remark. If B ⊆ Aε , then sup inf d(a, b) ≤ ε. So in this case, we get another characterization
b∈B a∈A
of dH (A, B):
dH (A, B) = inf{ε > 0 : A ⊆ Bε and B ⊆ Aε }.
The dH is called the Hausdorff distance (or Hausdorff metric; we shall see that dH is indeed
a metric).
For the sake of simplicity in notation, write d(A, B) := sup inf d(a, b). (Note that d(A, B)
a∈A b∈B
is not a metric since d is not symmetric.) If A ⊆ B, then d(A, B) = 0; thus this implies
dH (A, B) = d(B, A).
Proposition 4.1. dH is a metric.
Proof. That dH (A, B) ≥ 0 is evident from the definition. If dH (A, B) = 0, then d(A, B) =
d(B, A) = 0. Thus sup inf d(a, b) = 0. This implies that for any b ∈ B, we have inf d(a, b) =
b∈B a∈A a∈A
0. But recall that A and B are closed since both are compact. Hence there exists a sequence
{an } with an ∈ A so that d(b, an ) → 0. So b ∈ A = A. This is true for all b ∈ B, so
B ⊆ A. The symmetric argument shows that indeed A = B. Evidently, if A = B, then
dH (A, B) = 0. Symmetry is again straightforward from the way dH is defined.
Let A, B, C ∈ H(X), and write dH (A, C) = s and dH (C, B) = t. Then for any ε > 0 we
have A ⊆ Cε+s and C ⊆ Aε+s . Similarly, we also have C ⊆ Bε+t and B ⊆ Cε+t . Now we claim
that Cε+s ⊆ B(ε+t)+(ε+s) = Bs+t+2ε and Cε+t ⊆ As+t+2ε . The first claim follows immediately
from the fact that C ⊆ Bε+t . The second claim follows immediately since C ⊆ Aε+s . Hence
A ⊆ Bs+t+2ε and B ⊆ As+t+2ε . Hence dH (A, B) ≤ s + t + 2ε = dH (A, C) + dH (C, B), so the
triangle inequality is satisfied. 
Remark. The compactness of A and B is essential since otherwise, it is possible to find A
and B so that dH (A, B) = +∞: consider A = {n2 : n ∈ N}, B = {πn2 : n ∈ N} ⊆ R.
We need the following definition for the proof of the next theorem.
Definition 4.3. A set A is totally bounded if for any ε > 0, there are finitely many points
x1 , . . . , xN ∈ A so that
N
[
A⊆ Bε (xi ).
i=1

Theorem 4.1. Let (X, d) be a complete metric space. Then (H(X), dH ) is also complete.
Proof. Suppose that {An } is a Cauchy sequence in (H(X), dH ). Let
\[
A := Ai .
n≥1 i≥n

Recall that we need to take the closure to ensure that A is a compact set.1
1We can view A as the “lim sup” of the sets An , since the union can be viewed as “sup” and the intersection
operation “inf”. Draw parallel to the usual definition of lim sup of the sequence {an } for an ∈ R written
below.
lim sup an = lim sup ai = inf sup ai .
n→∞ n→∞ i≥n n≥1 i≥n

8
First we need to prove that A ∈ H(X). Since X is complete, any subset of X is compact
if and only if it is closed and totally bounded. Write
[
Bn := Ai .
i≥n

By definition
\ Bn is closed; since any arbitrary intersection of closed sets is closed, it follows
that A = Bn is also closed. Also, note that
n≥1
[ [
Ai ⊆ Ai ,
i≥n+1 i≥n
so
[ [
Bn+1 = Ai ⊆ Ai = Bn .
i≥n+1 i≥n

Thus as long as B1 is compact, then so is Bn for all n. Pick some ε > 0. Since {An } is
Cauchy, there is some m so that for all n > m we have dH (An , Am ) < ε/2, or equivalently
An ⊆ (Am )ε/2 . Therefore Bm ⊆ (Am )ε/2 . Indeed, since Am is compact hence totally bounded,
there are finitely many balls of radius ε/2 so that we have
k
[
Am ⊆ Bε/2 (xi ),
i=1

from which it follows


k
[
(Am )ε/2 ⊆ Bε (xi ).
i=1
Therefore Bm is totally bounded as well, so Bm is indeed compact. But then notice that
B1 = A1 ∪ A2 ∪ A3 ∪ · · · ∪ Am−1 ∪ Bm = A1 ∪ · · · ∪ Am−1 ∪ Bm ,
so B1 is the finite union of compact sets, making B1 compact as well. This in turn proves that
A ⊆ B1 is totally bounded and is thus compact. Finally, A 6= ∅ since A is the intersection
of a nested family of non-empty compact sets. Hence A ∈ H(X) as required.
Finally, we need to show that dH (An , A) → 0 as n → ∞. Suppose ε > 0. Since {An }
is Cauchy there is an m ∈ N such that for all n ≥ m we have dH (An , Am ) < 3ε . Hence
An ⊆ (Am )ε/3 ⊆ (Bm )ε/3 and Am ⊆ (An )ε/3 ; so it follows that Ai ⊆ (Am )ε/3 ⊆ (An )2ε/3 for
all i > m. Therefore [
Ai ⊆ (An )2ε/3 ,
i≥m
so
[
Bm = Ai ⊆ (Am )ε .
i≥m

Clearly A ⊆ Bm , so A ⊆ (Am )ε as well.


For the reverse inclusion, take x ∈ An . But then for all k ≥ m we have dH (Am , Ak ) < 3ε ,
so there is xk ∈ Ak such that d(xk , x) < 3ε . Since xk ∈ Bm which is a compact set, there
must be some y ∈ Bm which is a limit of some subsequence of {xk }.
9
Recall that B1 ⊇ B2 ⊇ B3 ⊇ · · · ⊇ A; and due to the way A is defined, it suffices to
show that y ∈ Bl for all sufficiently large l in order to prove that y ∈ A. Indeed, given some
sufficiently large l ≥ m, we have
[
{xk : k ≥ l} ⊆ Ai ,
i≥l
[
so the closure of {xk : k ≥ l} is indeed in the closure of Ai , which is Bl . But by
i≥l
definition, Bl is closed, so y ∈ Bl for all l ≥ m. Therefore, y ∈ A as required. Thus we
have d(xk , x) < ε/3 and d(xkl , y) < ε/3 where {xkl } is a subsequence of {xk }. Putting these
inequalities together gives d(x, y) < ε/3 + ε/3 < ε, from which we have An ⊆ Aε . Hence
dH (An , A) < ε. 
4.2. Iterated function system
Definition 4.4. An iterated function system (IFS) on (X, d) is a finite collection {w1 , . . . , wn }
of (contractive) self-maps wi : X → X.
Example. On R, if w0 (x) = x3 and w1 (x) = x+2
3
, then { x3 , x+2
3
} is the IFS whose “attractor”
is the Cantor set C. The iteration we want is B → w0 (B) ∪ w1 (B) = B3 + B3 + 23 .


Definition 4.5. Given the IFS {wi : 1 ≤ i ≤ n} on X, the induced map W : H(X) → H(X)
is given by
N
[
W (B) = wi (B).
i=1

It is not evident that W is necessarily contractive on H(X). First, we note that if f : X → X


is contractive with contraction factor c < 1, then dH (f (A), f (B)) ≤ cdH (A, B), i.e.,
sup inf d(f (a), f (b)) ≤ c sup inf d(a, b)
a∈A b∈B a∈A b∈B

(since d(f (a), f (b)) ≤ cd(a, b) for any a ∈ A and b ∈ B).


Next, consider dH (A1 ∪ A2 , B1 ∪ B2 ). If A1 , A2 , C ∈ H(X), then we have
d(A1 ∪ A2 , C) = sup inf d(a, c)
a∈A1 ∪A2 c∈C
 
= max sup inf d(a, c), sup inf d(a, c)
a∈A1 c∈C a∈A2 c∈C
= max{d(A1 , C), d(A2 , C)}.
Similarly, we have
d(C, B1 ∪ B2 ) = sup inf d(c, b)
c∈C b∈B1 ∪B2
 
= min sup inf d(c, b), sup inf d(c, b) .
c∈C b∈B1 c∈C b∈B2

And for any fixed c ∈ C,


 
inf d(c, b) = min inf d(c, b), inf d(c, b) .
b∈B1 ∪B2 b∈B1 b∈B2
10
Hence, it follows that

dH (A1 ∪ A2 , B1 ∪ B2 ) = max{d(A1 ∪ A2 , B1 ∪ B2 ), d(B1 ∪ B2 , A1 ∪ A2 )}


= max{d(A1 , B1 ∪ B2 ), d(A2 , B1 ∪ B2 ), d(B1 , A1 ∪ A2 ), d(B2 , A1 ∪ A2 )}
= max {min{d(A1 , B1 ), d(A1 , B2 )}, min{d(A2 , B1 ), d(A2 , B2 )},
min{d(B1 , A1 ), d(B1 , A2 )}, min{d(B2 , A1 ), d(B2 , A2 )}}
≤ max{d(A1 , B1 ), d(A2 , B2 ), d(B1 , A1 ), d(B2 , A2 )}
≤ max{dH (A1 , B1 ), dH (A2 , B2 )}.

Now let’s go back to W (B). With W (A) and W (B) we have

N N
!
[ [
dH (W (A), W (B)) = dH wi (A), wi (B)
i=1 i=1
≤ max dH (wi (A), wi (B))
1≤i≤n
 
= max ci dH (A, B) = max ci dH (A, B).
1≤i≤N 1≤i≤N

This allows us to conclude that W is indeed a contraction as desired.

4.3. Examples of iterated function systems


In this section we look at some examples of the IFS.

Example. Consider the following finite collection of self-maps defined as follows:

1
      
x 2
0 x 0
w0 = 1 +
y 0 2
y 0
1 1
      
x 2
0 x 2
w1 = 1 +
y 0 2
y 0
1
      
x 2
0 x 0
w2 = 1 + 1
y 0 2
y 2

Suppose that the starting point of the iteration is an equilateral. Then infinite iterations of
{w0 , w1 , w2 } gives us the Sierpiński triangle.
11
Example. The following finite collection of self-maps {w0 , w1 , . . . , w6 } defined by

   √    
x √1/2 − 3/6 x 0
w0 = +
y 3/6 1/2 y 0
      √ 
x 1/3 0 x 1/ 3
w1 = +
y 0 1/3 y 1/3
      
x 1/3 0 x 0
w2 = +
y 0 1/3 y 2/3
      √ 
x 1/3 0 x −1/ 3
w3 = +
y 0 1/3 y 1/3
      √ 
x 1/3 0 x 1/ 3
w4 = +
y 0 1/3 y −1/3
      
x 1/3 0 x 0
w5 = +
y 0 1/3 y −2/3
      √ 
x 1/3 0 x 1/ 3
w6 = + .
y 0 1/3 y −1/3

gives us the Koch snowflake when started with an equilateral triangle.


12
4.4. Base-b decompositions of R and C
Recall that the Cantor set C is
nX o
−n
C= 3 dn : dn ∈ {0, 2} .

We can express C iteratively also:


( ∞
) ( ∞
)
X X
C = 0 · 3−1 + 3−n dn : dn ∈ {0, 2} ∪ 2 · 3−1 + 3−n dn : dn ∈ {0, 2}
n=2 n=2
( ∞
) ( ∞
)
X 2 X
= 3−1 3−m dm : dm ∈ {0, 2} ∪ + 3−1 3−m dm : dm ∈ {0, 2}
m=1
3 m=1
 
1 2 1
= C∪ + C .
3 3 3
13
Define (for the simplicity of notation)
( ∞ )
1 X −m
A := 3 dm : dm ∈ {0, 1, 2} .
3 m=1
The same type of thing is true for [0, 1] in base 3:
(∞ )      
X
−n 0 1 2
3 dn : dn ∈ {0, 1, 2} = +A ∪ +A ∪ +A
n=1
3 3 3
   
1 1 1 2 1
= [0, 1] ∪ + [0, 1] ∪ + [0, 1] ,
3 3 3 3 3
and the three “parts” in this case touch at endpoints.
So suppose b ∈ C could be used as a base for an expansion with digit set D = {d1 , . . . , dN }.
Assume that we have unique representation, except possibly at some “just touching” bound-
ary. (∞ )
X
Let T := bn d−n : dn ∈ D , which we shall call the fundamental tile. As we did with
n=1
the Cantor set, we can write this T as union of translations of a scaled T .
(∞ )
X
T = b−n dn : dn ∈ D
n=1
( ∞
)!
[ X
−1 −1 −m
= b e+ b b dm : dm ∈ D
e∈D m=1
[
= (b−1 e + b−1 T ).
e∈D

Our assumption of “(almost) unique representation” implies that the “parts” of the IFS
decomposition are measure-disjoint. Thus,
!
[
λ(T ) = λ b−1 e + b−1 T
e∈D
X 1 N
= 2
λ(T ) = 2 λ(T ),
e∈D
|b| |b|

so we need N = |b|2 .
Recall that for a base-b representation to be well-defined, we need to pick the digit set
carefully. One necessary condition is that the the digit set must be chosen so that the digit
set is a complete set of coset representatives for Z/bZ.
4.5. IFS, fixed points, and attractor
Take {w1 , . . . , wN } an IFS with attractor A. Then each fixed point xi of wi is in A. To
see why, start with the set S0 = {x1 }. Then S1 = W (S0 ) = {w1 (x1 ), w2 (x1 ), . . . , wN (x1 )} =
{x1 , w2 (x1 ), . . . , wN (x1 )}. We can continue on, i.e.,
S2 = W (S1 ) = {x1 , w2 (x1 ), . . . , wN (x1 ), w1 (w2 (x1 )), w2 (w2 (x1 )). . . . , w2 (wN (x1 )), . . . }.
14
Therefore x1 ∈ Sn for all n. In other words, we have Sn → A in dH where
\∞ [
A := Si .
n=1 i≥n

It follow that x1 ∈ A as well. In fact, for any i1 , . . . , ik ∈ {1, 2, . . . , N }, we have wi1 ◦ wi2 ◦
· · · wik (xj ) ∈ A. In other words, any finite-level image of any fixed point xj of wj is in the
attractor. Note that repeated applications of W to the singleton set of a fixed point grow
the size of the set, but the initial fixed point persists in each iteration, as we just saw.
In the previous section, we presented the IFS for the Sierpiński triangle, with the first
points       
0 1 0
D := d0 = , d1 , d2 = .
0 0 1
 1 
2
0
For A := , we have
0 12
   1   
0 0
S0 = {Ad0 , Ad1 , Ad2 } = , 2 , 1
0 0 2
S1 = {A(Ad0 ) + Ad0 , A(Ad1 ) + Ad0 , A(Ad2 ) + Ad0 ,
A(Ad0 ) + Ad1 , A(Ad1 ) + Ad1 , A(Ad2 ) + Ad1 ,
A(Ad0 ) + Ad2 , A(Ad1 ) + Ad2 , A(Ad2 ) + Ad2 }
= {A2 b2 + Ab1 : b1 , b2 ∈ D}
S2 = {A3 b3 + A2 b2 + Ab1 : b1 , b2 , b3 ∈ D}.
Thus in general, we have Sn = { ni=1 Ai bi : bi ∈ D}, so as n → ∞ we have
P
(∞ )
X
S= Ai bi : bi ∈ D .
i=1

5. The chaos game


How do we draw a picture of A, the attractor of W ? If we start with one point, say x0 ,
then we have
S0 = {x0 }
S1 = W (S0 ) = {w1 (x0 ), . . . , wN (x0 )}
S2 = W (S1 ) = {wi ◦ wj (x0 ) : 1 ≤ i, j ≤ N }
S3 = W (S2 ) = {wi ◦ wj ◦ wk (x0 ) : 1 ≤ i, j, k ≤ N }
..
.
Sk = W (Sk−1 ) = {wi1 ◦ · · · ◦ wik (x0 ) : ij ∈ {1, 2, . . . , N }}.
The following algorithm, which we call the chaos game, draws the attractor of W – as
counter-intuitive as it may sound at the first glance.
Algorithm 5.1 (Chaos game). Start with x0 ∈ X.
(1) Choose in ∈ {1, 2, . . . , N } with probability N −1 .
15
(2) Let xn+1 = win (xn ), and plot xn+1 .
(3) Go back to the first step until the image is close enough.
What happens is that
A = lim {xm : m ≥ n},
n→∞
with respect to the Hausdorff metric dH , thereby obtaining the attractor A.
Example. Let A = [0, 1], w0 = x/2, and w1 = (x + 1)/2. Note that if x0 is the starting point,
then w0 just shifts the digits by 1 to the right side, and add 0 to the first digit; on the other
hand, w1 shifts the digits by 1 to the right side, and add 1 to the first digit. Start with
x0 = 0 = 0.00000 . . .2 . We see that (again, we randomly pick 0 or 1)
x0 = 0.00000 . . .2
w1 (x0 ) = x1 = 0.10000 . . .2
w1 (x1 ) = x2 = 0.11000 . . .2
w0 (x2 ) = x3 = 0.011000 . . .2
w1 (x3 ) = x4 = 0.1011000 . . .2
w0 (x4 ) = x5 = 0.01011000 . . .2
w1 (x5 ) = x6 = 0.10101100 . . .2 .
Suppose that we want to draw the attractor on a screen whose width has 1024 pixels. As
we see, any iterations applied in an earlier stage is pushed further and further to the right,
so the influence of the contraction maps applied earlier

6. Code space and address maps on an attractor


Let A = [0, 1], and let the IFS be
 
x x+1 x+2
w0 (x) = , w1 (x) = , w2 (x) = .
3 3 3
Then w0 (A) = [0, 1/3], w1 (A) = [1/3, 2/3], w2 (A) = [2/3, 1], so
A = w0 (A) ∪ w1 (A) ∪ w2 (A).
Iteratively, we see
A = w0 (A) ∪ w1 (A) ∪ w2 (A)
2
! 2
! 2
!
[ [ [
= w0 wi (A) ∪ w1 wi (A) ∪ w2 wi (A)
i=0 i=0 i=0
2 [
[ 2
= wi (wj (A)).
i=0 j=0

Clearly wj (A) ⊆ A for any j, so wi (wj (A)) ⊆ wi (A). More generally,


wσ1 (wσ2 (· · · (wσn+1 (A)) · · · )) ⊆ wσ1 (wσ2 (· · · (wσn (A)) · · · )
for any fixed sequence σ1 , σ2 , . . . , σn+1 ∈ {0, 1, 2}.
16
If σ1 , σ2 , . . . is a sequence in {0, 1, 2}N , then independent of our initial choice x, we always
have
lim wσ1 ◦ wσ2 ◦ · · · ◦ wσ (x) = {p}
n→∞
for some single point p dependent on the choice of x. This gives rise to the following definition.

Definition 6.1. The address map from Σ = {1, 2, . . . , N }N → A is given by the map
lim wσ1 ◦ wσ2 ◦ · · · ◦ wσn (x0 )
n→∞
for some starting point x0 .
Definition 6.2. Let {w1 , . . . , wN } be an IFS on (X, d), and let ω : Σ → X where Σ =
{1, 2, . . . , N }N . Σ is called the code space.
Now we explore a few properties of address maps.
Proposition 6.1. Let {w1 , . . . , wN } be an IFS on (X, d), and let ω : Σ → X be an address
map on the attractor A. Then the following are true.
• The range of ω is A.
• If we place a product topology with a discrete topology placed on each factor, then ω
is continuous.
• Under the aforementioned product topology, Σ is compact.
• Since Σ is compact and ω continuous, it follows that ω is uniformly continuous.
We will use the Sierpiński triangle as an example to examine the behaviour of address
maps.
Example. Let x0 = (0, 0), and let the IFS be as given in Section 4.3. Randomly choose
i1 ∈ {0, 1, 2}, and let x1 = wi1 (x0 ). Suppose that we pick i1 = 2. Then (0, 0) 7→ (0, 1/2).
Note that the address of x0 ∼ 0000000 . . . . Similarly, the address of (0, 1) is 222222 . . . ; and
the address of (1, 0) is 111111 . . . . Therefore, we see that the fixed point of w0 (resp. w1
resp. w2 ) has address 0000 . . . (resp. 1111 . . . resp. 2222 . . . ). So the address map is x1 ∼
20000 . . . . Suppose we pick i2 = 2 and i3 = 1. Then x3 = w3 (w2 (w1 ((0, 0)))) ∼ 1220000 . . . .
Furthermore, note that the string that the address map produces indicates which region xn
belongs, justifying the name “address” map.
7. Background material detour: measure theory
Definition 7.1. A, a collection of subsets of X, is a σ-algebra if the following are true.
(1) ∅, X ∈ A
(2) if A ∈ A then so is X \ A ∈ A S
(3) if An ∈ A for each n ∈ N, then so is An ∈ A.
Definition 7.2. Suppose that X is a set, and A a σ-algebra over X. Then a measure is a
function m : A → R ∪ {∞} satisfying the following conditions:
(1) m(∅) = 0
(2) (non-negativity) m(A) ≥ 0 for any
! A ∈∞A
X∞ X
(3) (countable additivity) m Ai = m(Ai ) if the Ai are disjoint.
i=1 i=1
17
The canonical measure is called the Lebesgue measure which “measures” the length of a set,
and is defined by λ([a, b]) := b − a.
[N N
X
If S = (ai , bi ) with bi < ai+1 , then λ(S) = bi − ai . The Lebesgue outer measure of
i=10 i=1
a set A is defined by
(∞ ∞
)
X [
λ∗ (A) = inf b i − ai : A ⊆ (ai , bi ) .
i=1 i=1

Proposition 7.1 (Properties of the Lebesgue measure). Let λ be the Lebesgue measure,
and suppose that A and the Ai are Lebesgue measurable sets. Then on top of the properties
satisfied by any measure, the following additional properties hold.
(1) (translation invariant) λ(A + t) = λ(A)
(2) (scaling) λ(tA) = |t|λ(A).
Remark. Note that we need to restrict our attention to measurable sets only for λ to have
all the desirable properties listed above.
Proposition 7.2. The set of Lebesgue measurable sets is a σ-algebra.
Measures can be used to comment on the probability of an event as well, as we will see in
the next definition.
Definition 7.3. We say µ is a probability measure of X if µ is a measure such that µ(X) = 1.
Definition 7.4. We say that something happens almost everywhere, almost surely, or with
probability 1 if S is the set of events in question such that µ(X \ S) = 0.
7.1. Infinite product measures
Suppose that we are only interested in product probability measures on Σ = {1, 2, . . . , N }N .
In this case, given probabilities p1 , p2 , . . . , pN (i.e., p1 + p2 + · · · + pN = 1 and pi ≥ 0), this
gives a probability measure on {1, 2, . . . , N } given by Prob(i) = pi .
The probability which is “induced” on Σ is that of an independent sequence of trials
of the repeated experiment: that is, draw σi ∈ {1, 2, . . . , N } according to the probability
distribution {pi }.
Definition 7.5. The sets which generate the σ-algebra on Σ are said to be the cylinder sets.
Remark. For a general cylinder set, we are allowed to specify only finitely many outcomes;
the rest have to remain free. Otherwise, we will always get probability 0. So, the probability
of any other (allowable) events is given by the probability of the cylinder sets.
Example. Let
S := {σ ∈ Σ : σ1 = 2, σ3 = 5, σ100 = 1, σ1000 = 3}
be a cylinder set. Then Prob(S) = p2 · p5 · p1 · p3 .
8. IFS with probabilities (IFSP)
Let (X, d) be a complete metric space, and let {w1 , . . . , wN } be an
P IFS such that each
wi has probability pi of being randomly chosen each stage (pi ≥ 0, pi = 1) rather than
assuming uniform probability (i.e., has probability 1/N of being chosen randomly). So if we
18
use the pi in the chaos game, we will get a sequence of occupation distributions on A which
depends on {pi }. Interesting questions arise then: do they converge (and in what sense?)?;
and if so, to what?
Definition 8.1. Let (X, A) with measure µ and (Y, A0 ) be a measure space, and let f : X →
Y be a (measurable) function. In this case, we see that
{B ⊆ Y : f −1 (B) ∈ A}
is a σ-algebra on Y. Then the push-forward of µ is f# (µ) : A0 → R+ ∪ {∞} defined by
f# (µ)(B) := µ(f −1 (B))
for all B ∈ A0 .
One example of a push-forward is the following operator.
Definition 8.2. Let (X, d) be a complete metric space, and let µ be a (Borel) probability
measure on X (i.e., µ(X) = 1). Then M defined by
N
X N
X
(M µ)(B) := pi (wi )# (µ)(B) = pi µ(wi−1 (B))
i=1 i=1

is a Markov operator.
Our IFS mapping on probability measure is precisely the Markov operator M : Prob →
Prob defined by
XN
(M µ)(B) = pi (µ ◦ wi−1 )(B),
i=1

acting on the space of probability measure on X. Now, we need a metric of some kind to get
a contraction.

8.1. Monge-Kantorovich metric


Definition 8.3. Let Lip1 (X) := {f : X → R : |f (x) − f (y)| ≤ d(x, y) for all x, y}, and X be
compact. Then the Monge-Kantorovich metric is defined by
Z 
dM K (µ, ν) := sup f (x) d(µ − ν)(x) (= Eµ (f ) − Eν (f )) : f ∈ Lip1 (X) .
X

Remark. If X is not compact, then there are some technical conditions in addition on µ and
ν.
Since the proof that dM K is a metric requires using techniques and results from functional
analysis, we shall take for granted that dM K is a metric. This metric gives weak convergence
of probability measures.
Theorem 8.1. If {w1 , . . . , wN } is an IFS on X with maximum contraction factor c < 1,
and pi ≥ 0 are associated probabilities, then
dM K (M µ, M ν) ≤ cdM K (µ, ν).
19
Proof. For f ∈ Lip1 (X),
Z Z " N N
#
X X
f (x) d(M µ − M ν)(x) = f (x) d pi µ ◦ wi−1 − pi ν ◦ wi−1 (x)
X X i=1 i=1
N
X Z
= pi f (x) d(µ ◦ wi−1 − ν ◦ wi−1 )(x)
i=1 wi (X)
N Z

X
= pi f (wi (y)) d(µ − ν)(y)
i=1 y∈X
Z "X
N
#
= pi f ◦ wi (y) d(µ − ν)(y)
X i=1
Z "X
N
#
= pi f ◦ wi (y) d(µ − ν)(y).
X i=1
| {z }


(= follows from the change
P of variables: x = wi (y) ⇒ y = wi−1 (x).) We see that fˆ is
Lipschitz with factor pi ci . Indeed, we have

X
|fˆ(a) − fˆ(b)| = pi f (wi (a)) − pi f (wi (b))
i
N
X
≤ pi |f (wi (a)) − f (wi (b))| (∵ f ∈ Lip1 (X))
i=1
N
" N
#
X X
≤ pi d(wi (a), wi (b)) ≤ pi ci d(a, b).
i=1 i=1

Now define

ĝ := N
.
P
p i ci
i=1

Then continuing on with the calculations we have


Z Z
f (x) d(M µ − M ν)(x) ≤ fˆ(y) d(µ − ν)(y)
X X
N
!Z
X
= p i ci ĝ(y) d(µ − ν)(y)
i=1 X
N
!
X
≤ p i ci dM K (µ, ν).
i=1
20
Hence, we have !
N
X
dM K (M µ, M ν) ≤ p i ci dM K (µ, ν). 
i=1

Corollary 8.1. There is a unique invariant probability measure.


Remark. One big advantage of the MK metric is that it relates the distance between µ and
ν on (X, d) to the underlying distance between two points on X. In particular, if x, y ∈ X,
then dM K (δx , δy ) = d(x, y) where δx and δy denote the point mass of x and y, respectively.
9. Invariant measure, Markov operator, and the IFSP chaos game
Take an IFS {w0 , w1 } on X with probabilities p0 and p1 . Let x0 := w0 (x0 ) and µ0 = δx0 .
Also let (
1 (x0 ∈ A)
µ0 (A) =
0 (x0 ∈ / A).
Recall that
µ1 (A) = (M µ0 )(A) = p0 δw0 (x0 ) + p1 δw1 (x0 ) = p0 µ0 ◦ w0−1 (A) + p1 µ1 ◦ w1−1 (A).
µ0 (w0−1 (A)) is 1 when x0 ∈ w0−1 (A), or w0 (x0 ) ∈ A (or 0 otherwise). Similarly, µ(w1−1 (A)) is
1 if w1 (x0 ) ∈ A or 0 otherwise. Similarly, if M is applied twice, then
M 2 µ0 = M µ1 = p0 p0 δw0 (w0 (x0 )) + p0 p1 δw0 (w1 (x0 )) + p1 p0 δw1 (w0 (x0 )) + p1 p1 δw1 (w1 (x0 )) .
So in general, for any n,
1
X
M n µ0 = pi1 pi2 · · · pin δwi1 (wi2 (···(win (x0 )··· ) .
i1 ,i2 ,...,in =0

Now we will see when we integrate a continuous f against M n µ0 .


Z X1
f (x) dM n µ0 (x) = pi1 pi2 · · · pin f (wi1 (wi2 (· · · (win (x0 ) · · · ).
i1 ,i2 ,...,in =0
n
Thus M µ0 → µ in dM K (weakly), i.e.,
Z Z
n
f (x) dM µ0 (x) → f (x) dµ(x).

Definition 9.1. The support of µ is {x : ∀ε > 0, µ(Bε (x)) > 0}.


Theorem 9.1. The support of µ is the attractor A.
Remark. One way to show that B is the support of µ is to prove that B is invariant with
respect to the IFS.
The chaos game generates one sequence of points which (somewhat Rrandomly) wander
around “on” A. Does f (xn ) in some way give me a way toy estimate f (x) dµ(x)? The
ergodic theorem says
n Z
1X
f (xi ) → f (x) dµ(x).
n i=1
21
Theorem 9.2. Let X be a compact metric space, and {wi , pi } be a contractive IFSP. Choose
x0 ∈ X and generate the sequence {xn } by xn+1 = wσn (xn ) where σ ∈ {1, 2, . . . , N }N is
chosen according to the infinite product measure P given by pi for each factor. Then for any
continuous f : X → R, and P -almost all σ ∈ {1, 2, . . . , N }N we have
n Z
1X
lim f (xi ) = f (x) dµ(x),
n→∞ n X
i=1

where µ is the invariant measure of the IFSP. i.e., there exists a set Ω ⊆ {1, 2, . . . , N }N with
P (Ω) = 1 so that for all σ ∈ Ω we have
∞ Z
1X
lim f (wσn (wσn−1 (· · · (wσ2 (wσ1 (x0 ) · · · ) = f (x) dµ(x).
n→∞ n X
n=1

Proof (sketch). Markov operator has a unique invariant measure µ (i.e, M µ = µ) and
X
pi µ ◦ wi−1 = µ.
So
Z Z
f (x) dM µ(x) = f (x) dµ(x)
X
X Z Z
pi f (x) d(µ ◦ wi−1 )(x) = f (x) dµ(x)
i wi (X) X
Z ! Z
X
= pi f (wi (y)) dµ(y) = f (x) dµ(x). 
i X

Definition 9.2. Let P(X) be Borel probability measures on X, and M : P(X) → P(X) the
Markov operator. Then the adjoint operator M ∗ : C(X) → C(X) is defined to be
N
X

M (f ) := pi f ◦ w i .
i=1

Suppose that
n
1X
lim f (xi )
n→∞ n
i=1
exists for f ∈ C(X). Then on C[x],
n
1X
f → lim f (xi )
n→∞ n
i=1
is linear, thanks to the Riesz representation theorem. Particularly, this is given by the regular
Borel measure ν, i.e.,
n Z
1X
lim f (xi ) = f (t) dν(t).
n→∞ n X
i=1

Definition 9.3. Let l∞ (N) = {{xn } : sup |xn | < ∞}. Then the Banach limit π : l∞ (N) → R
is a bounded linear functional that is shit-invariant, i.e., π(x1 , x2 , . . . ) = π(x2 , x3 , . . . ).
22
Remark. If xn → x, then π(xn ) = x. Clearly, we also have lim inf xn ≤ π(xn ) ≤ lim sup xn .
Furthermore, if π(xn ) = x for all Banach limits, then lim xn = x exists.
For some fixed Banach limit π, define
n
!
1X
π(f ) := π f (xi )
n i=1
where xi = wσi ◦ wσi−1 ◦ · · · ◦ wσ2 ◦ wσ1 (x0 ). Since π is a bounded linear functional on C(X),
there is a measure νπ such that
n
! Z
1X
π f (xi ) = f (t) dνπ (t).
n i=1 X

Suppose Σ = {1, 2, . . . , N }N , and P the product measure on Σ. Let Pn be the conditional or


projection given by {pi } on each other, but to the first n coordinates only, i.e.,
Z N
X
ξ(σ) dPn = pi1 pi2 · · · pin ξ(σ).
σ∈Σ i1 ,...,in =1

For σ = σ1 σ2 . . . σn , let
ϕn (σ, x) := (wσ1 ◦ wσ2 ◦ · · · ◦ wσn )(x).
If ω is the address map, then indeed ϕn → ω as n → ∞; furthermore, we have Pn → P . So
if ν is the starting probability measure, and µ the invariant measure of the given IFSP (ie.,
M µ = µ), then we have
Z Z Z
∗ n n
[(M ) )(f )](x) dν = f (x) d(M ν)(x) → f (x) dµ(x).
X X X
∗ n
Note we can re-write (M ) (f ) as follows.
N
X
∗ n
(M ) (f )(x) = pi1 · · · pin f (wσi1 ◦ · · · ◦ wσin )(x)
i1 ,...,in =1
Z Z Z
= f (ϕn (σ, x)) dPn (σ) =f (ϕn (σ, x)) dPn (σ) dν(x)
σ∈Σ X Σ
Z Z Z Z
= (f ◦ ω)(σ, x) dP (σ) dν(x) = (f ◦ ω)(σ) dP (σ) dν(x)
X Σ Σ X
Z
= (f ◦ ω)(σ) dP (σ).
Σ
For any σ ∈ Σ, we define si : Σ
P→ Σ by−1(σ1 , σ2 , . . . P
) 7→ (i, σ1 , σ2 , . . . ). Then the following
diagram commutes: with P = pi P ◦ si and µ = pi µ ◦ wi−1 (invariant measure).
Theorem 9.3. For all µ-almost all x0 ∈ X and P -almost all σ ∈ Σ, for any f ∈ L1 (µ) we
have n Z
1X
f (xi ) → f (x) dµ(x).
n i=1 X

#{xi : xi ∈ A}
Corollary 9.1. → µ(A).
n
23
10. Moments of µ
Let µ be the invariant measure (i.e., M µ = µ), i.e.,
Z Z Z N
Z X

f (x) dµ(x) = f (x) d(M µ)(x) = (M f )(x) dµ(x) = pi f (wi (x)) dµ(x).
X X X X i=1

n
R wi : R → R of the form wi (x) = si x + bi , and f (x) = x . Then the
Definition 10.1. Take
nth moment of µ is f (x) dµ(x).
Compute the nth moments of µ:
Z Z X
n
x dµ(x) = pi f (si x + bi ) dµ(x)
R R i
N
X Z
= pi (si x + bi )n dµ(x)
i=1 R
N Z n  
!
X X n
= pi sji xj bin−j dµ(x)
i=1 R j=0
j
N n   Z
X X n
= pi sji bin−j xj dµ(x)
i=1 j=0
j R
" N
n−1   X
#Z " N #Z
X n X
= pi sji bn−j
i xj dµ(x) + pi sni xn dµ(x).
j=0
j i=1 R i=1 R

Hence, N 
n−1
n
pi fij bn−j
 P R
xj dµ(x)
P
Z j i R
j=0 i=1
xn dµ(x) = N
,
R
pi sni
P
1−
i=1
N
X R
provided pi |si | < 1. This recursive formula for the nth moment starts with x0 dµ(x) =
i=1
1.
11. Construction of measures
We present two methods of constructing measures.
11.1. Method I
Definition 11.1. For a space X and a class of subsets C, a pre-measure τ : C → R ∪ {+∞}
is a function such that
(1) ∅ ∈ C and τ (∅) = 0.
(2) 0 ≤ τ (A) ≤ +∞ for all A ∈ C
Example. Let C be the set of all intervals in R, and let τ (A) be the “length” of an interval
A. τ is clearly a pre-measure.
24
Based on this pre-measure τ , we construct or define µ by
(∞ ∞
)
X [
µ(A) = inf τ (Bi ) : Bi ∈ C, A ⊆ Bi
i=1 i=1

with the convention that inf(∅) = +∞. It is relatively straightforward to verify that µ gives
a σ-additive measure when restricted to µ-measurable sets.
11.2. Method II
On a metric space (X, d), start with some pre-measure τ . For any δ > 0, define
(∞ ∞
)
X [
µδ (A) = inf τ (Bi ) : Bi ∈ C, A ⊆ Bi , diam(Bi ) < δ ,
i=1 i=1

with diam(Bi ) defined below.


Definition 11.2. The diameter of a set S is defined as
diam(S) = sup{d(x, y) : x, y ∈ S}.
As δ → 0, we have
µ(A) = sup µδ (A) = lim µδ (A),
δ>0 δ→0

since µδ (A) is increasing as δ approaches 0. Then µ(A) is always Borel.If µ is finite, then
µ(A) = sup{µ(C) : C ⊆ A, C closed} = inf{µ(U ) : A ⊆ U, U open}.
One can also construct µ(A) by taking the supremum over compact sets, provided X satisfies
additional conditions, which prompts the following definition.
Definition 11.3. A space X is said to be separable if X contains a countable, dense subset
(i.e., there exists a sequence {xn } of elements of the space such that every non-empty open
subset of the space contains at least one element of {xn }).
Proposition 11.1. If X is complete and separable, and µ is finite, then
µ(A) = sup{µ(K) : K ⊆ A, K compact},
which is Borel regular.

12. Hausdorff measures


Take Rd (or a general metric space), and pick s ∈ [0, ∞). Let C be the collection of all
open sets. We will use Method II with a pre-measure τ (B) = diam(B)s . Define
(∞ )
X
Hδs (A) := inf diam(Bi )s : {Bi } a δ-cover of A .
i=1

diam(Bi )s becomes the Lebesgue measure.


P
Remark. If s = 1, then
Definition 12.1. The Hausdorff measure of a set H s (A) is
H s (A) := lim Hδs (A).
δ→0
25
Proposition 12.1 (Properties of H s (A) in Rd ). Let H s (A) be the Hausdorff measure. Then
the following are true.
(1) (translation invariant) H s (A + t) = H s (A).
(2) (scaling) Since diam(t · A)s = |t|s diam(A), it follows that H s (tA) = |t|s H s (A).
(3) If s is an integer, then H s is a constant times s-dimensional Lebesgue measure.
Take s < t, and diam(B) < δ. Then note that diam(B)t = diam(B)t−s diam(B)s ≤
δ t−s diam(B)s , from which we have Hδt (A) ≤ δ t−s Hδs (A). So how is this observation helpful?
Suppose that H t (A) = α > 0. In other words, since lim Hδt (A) = H t (A), for any small
δ→0
δ > 0, we have
α
≤ Hδt (A) ≤ δ t−s Hδs (A),
2
or equivalently
α
δ s−t
≤ Hδs (A).
2
Note that as δ → 0 we have δ s−t → +∞ since s − t < 0. In this case, H s (A) = +∞. Hence
if H t (A) > 0 and s < t, then H s (A) = +∞.
On the other hand, if H s (A) < +∞ and s < t, then we have
Hδt (A) ≤ δ t−s Hδs (A) < sup δ t−s Hεs (A) < ∞.
ε→0
t−s
So as δ → 0, we see δ → 0. Similarly, Hδt (A) → 0 as δ → 0, so H t (A) = 0.
Definition 12.2. Let A be a Hausdorff measurable set. Then the Hausdorff dimension of
A denoted by dimH (A) is
dimH (A) := sup{t : H t (A) = +∞} = inf {t : H t (A) = 0}.
t t

Example. Let C be the middle-third Cantor set. Then


 s  s  n
s 1 s 1 s 2 s 2
H (C) = H (C) + H (C) = s H (C) = H s (C).
3 3 3 3s
Therefore, if 2/3s > 1, then (2/3s )n → +∞. On the other hand, if (2/3s ) < 1, then
(2/3s )n → 0. The heuristics seem to suggest that dimH (C) = log 2/ log 3. While this is not
a proof, in turns out that this is indeed the case.
Notice that if we have a δ-cover {Bi } and replace each Bi with B
ci so that

diam(Bi ) ≤ diam(B
ci ) ≤ k diam(Bi )

for some uniform bound k, this changes the value we obtain for H s (but not the dimension).

Proposition 12.2 (Properties of the Hausdorff dimension). Let the Ai , A, and B are all
Hausdorff measurable.
(1) If A ⊆ B, then dimH (A) ≤ dim [H (B).
(2) (“countable stability”) If A = Ai , then dimH (A) = sup dimH (Ai ).
i
i≥1
26
!
[
(3) If t > sup dimH (Ai ), then H t (Ai ) = 0 for all i. Hence H t Ai = 0. Thus,
i
i≥1
t ≥ dimH (A).
(4) On the other hand, if t < sup dimH (Ai ), then there exists i so that t < dimH (Ai ). In
i
this case, H t (Ai ) = +∞, so H t ( Ai ) = +∞ as well.
S
(5) If the d-dimensional Lebesgue measure of A is positive, then dimH (A) ≥ d since
H d ' λd up to a constant.
(6) If A is countable, then dimH (A) = 0.
Example (Computing dimH (C)). We start with Xcomputing the upper bound. Find one se-
quence of coverings which give finite values for diam(Bi )s . Take δ > 0 and n large enough
i
so that 3−n < δ. Then the 2n intervals from stage n in the construction form a δ-covering of
C. Thus
2n  n
X
s n −ns 2
diam(Bi ) = 2 · 3 = .
i=1
3s
nX o
If s = log 2
log 3
, then (2/3s n
) = 1, so Hδ
s
(C) = inf diam(B i ) s
: {Bi } is a δ − covering ≤ 1.
log 2
Thus H s (C) ≤ 1, so dimH (C) ≤ log 3
.
Computing the lower bound is trickier, and will need the mass distribution principle, which
is formally stated as Theorem 12.1 below. First, we need a measure µ on C. We use the
invariant measure µ of the IFSP with p0 = p1 = 12 . So each nth level “part” gets mass 2−n
and length 3−n . So we want to show that µ(U ) ≤ c diam(U )s for sufficiently small U . Take
diam(U ) < 1 and let k be such that 3−k−1 ≤ diam(U ) ≤ 3−k . Then U intersects at most one
interval of level k. So it follows that
log 2 log 2
µ(U ) ≤ 2−k = 3−k log 3 ≤ (3 diam(U )) log 3 .
log 2
log 2 1
Thus by Theorem 12.1, we have dimH C ≥ log 3
(and H log 3 (C) ≥ log 2 ). Hence dimH (C) =
3 log 3
log 2
log 3
.
Theorem 12.1 (Mass distribution principle). Let µ be a finite, positive Borel measure on
A and suppose that there exist c > 0 and δ > 0 such that for some s we have
µ(U ) ≤ c · diam(U )s
for U with diam(U ) ≤ δ. Then
(1) H s (A) ≥ µ(A)
c
, and
(2) s ≤ dimH (A).
Proof. If {Bi } is a δ-cover of A, then

! ∞
[ X
0 < µ(A) = µ Bi ≤ µ(Bi ).
i=1 i=1

Thus for all sufficiently small δ, we have Hδs (A)


≥ µ(A)/c, so H s (A) ≥ µ(A)/c > 0 as well.
The second claim readily follows from the first claim. 
27
Suppose that w0 ([0, 1]) ∩ w1 ([0, 1]) = ∅, and w0 ([0, 1]) (resp. w1 ([0, 1])) has the length 1/3
(resp. 1/4). Suppose that s is the solution to (1/3)s + (1/4)s = 1 where p0 = 1/3s and
p1 = 1/4s . So we see that
1 1 1
diam(w0 (w1 (w0 ([0, 3])))) = · · ,
3 4 3
and  s
1 1 1
p0 p 1 p0 = · · .
3 4 3
Finally, we want some measure µ so that µ(U ) ∼ c · diam(U )s for some constant c. So the
similar argument as we did with the Cantor set gives us dimH A = s; besides, s satisfies
XN
tsi = 1 where ti is some scaling factor for each wi when the IFS consists of similarities on
i=1
R (i.e., affine functions) with disjoint parts.
Definition 12.3. A function f is called Lipschitz if d(f (x), f (y)) ≤ Kd(x, y) for any x and
y for some K. f is bi-Lipschitz if both f and its inverse f −1 are both Lipschitz.
If f : X → Y is Lipschitz with factor K, then diam(f (B))s ≤ K s diam(B)s . Thus
H s (f (B)) ≤ K s H s (B), or dimH (f (B)) ≤ dimH (B). If f is bi-Lipschitz then dimH (f (B)) =
dimH (B).P Let {wi } be an IFS with contraction factors ci < 1 with an attractor A. Let s
s
satisfy ci = 1, and c := max ci < 1. If δ > 0 is given, take n sufficiently large so that
cn < δ so that all diam(wσ1 ◦ wσ2 ◦ · · · ◦ wσn (A)) < δ, so the collection of all these sets is a
δ-cover.
If σ is a string of length n, then
X X X
diam(Aσ )s = diam(A)s (cσ )s = diam(A)s (cσ )s = diam(A)s .
σ σ σ
s
Thus Hδ (A) ≤ diam(A) for all sufficiently small δ, so H (A) ≤ diam(A)s , from which
s s

dimH (A) ≤ s follows.

13. Open set condition


Definition 13.1. The IFS {wi } satisfies the open set condition if there is a bounded, non-
empty open set U with
S
(1) wi (U ) ⊆ U
(2) wi (U ) ∩ wj (U ) = ∅ for all i 6= j.
d
Theorem 13.1. Suppose that the open set condition holds P s for {w1 , . . . , wN } on R where
the wi are similarities with scaling ci . If s ≥ 0 satisfies ci = 1, then
(1) dimH (A) = s
(2) 0 < H s (A) < ∞.

14. Box dimensions


Take A ⊆ Rd and δ > 0. Define
( n
)
[
Nδ (A) := min n ∈ N : A ⊆ Bδ (xi ), xi ∈ A .
i=1
28
Suppose that D satisfies Nδ (A) ∼ c · δ −D , so we have
log(Nδ (A))
D = lim .
δ→0 log(δ −1 )

Definition 14.1. The D define above is the box dimension provided the limit exists, which
we write dimB (A).
In general, the limit doesn’t exist, so we have to define the upper and lower box dimensions.

Definition 14.2. For any A, the upper box dimension dimB (A) and lower box dimension
dimB (A) are each defined as follows.
log(Nδ (A))
dimB (A) = lim sup
n→∞ log(δ −1 )
log(Nδ (A))
dimB (A) = lim inf .
n→∞ log(δ −1 )
Example. For any line of length L (call it L, abusing the notation), we have
 
L
Nδ (L) ≤ + 1,

so
log(Nδ (A)) log(δ −1 )
∼ = 1.
log(δ −1 ) log(δ −1 )
Example (Cantor set). For δ = 3−n , we need Nδ (C) = 2n , so
log(Nδ (C) n log(2) log(2)
−1
= = .
log(δ ) n log(3) log(3)
If 3−n−1 < δ < 3−n , we have Nδ (C) ≤ 2−n+1 , and so
(n + 1) log(2) log(2)
−1
→ .
log(δ ) log(3)
Example. Take xn = n−p with p > 0 for n = 1, 2, . . . , and let A = {n−p : n ∈ N} ∪ {0}. Take
δ > 0. If xm − xm+1 < 2δ, we have
xm
Nε (A) ≤ + m + 2.

So if f (x) = x−p , then −f 0 (ξ) = p/ξ p+1 . Let −f 0 (ξ) = 2δ for some m < ξ < m + 1; using
this, we get
p 1 1 1 1
Nδ (A) ≤ p− p+1 (2δ)− p+1 + p p+1 (2δ)− p+1 ∼ cδ − p+1 .
So
log(Nδ (A)) (p + 1)−1 log(δ −1 ) 1
−1
∼ −1
= .
log(δ ) log(δ ) p+1
So the box dimension of A is (p + 1)−1 .
Proposition 14.1. Let A, B be a sets.
(1) If A is unbounded, then Nδ (A) = +∞, so dimB (A) = +∞.
29
(2) A ⊆ B implies Nδ (A) ⊆ Nδ (B), so dimB (A) ≤ dimB (B).
(3) dimB (A ∪ B) = max{dimB (A), dimB (B)}.
(4) However, the similar claim does not hold for dim: one can find A and B so that
dimB (A ∪ B) > max{dimB (A), dimB (B)}.
(5) If f : X → X is Lipschitz, then Nkδ (f (A)) ≤ Nδ (A). Hence, dimB f (A) ≤ dimB (A).
(6) dimH (A) ≤ dimB (A).
If A is covered by Nδ/2 (A) balls of radius δ/2, then
Hδs (A) ≤ δ s Nδ/2 (A).
If 1 ≤ H s (A) ≤ δ s Nδ/2 (A) for small δ, then
log(Nδ/2 (A)
s≤ ,
log(2δ −1 ) − log(2)
so dimH (A) ≤ dimB (A).
Theorem 14.1. If {w1 , . . . , wN } is an IFS of similarities on Rd which satisfies the open set
condition, then dimB (A) = dimH (A) = s.

Department of Mathematics and Statistics, Dalhousie University, 6316 Coburg Rd, Hal-
ifax, NS, Canada B3H 4R2
E-mail address: [email protected]

30

You might also like