Elias M. Stein, Rami Shakarchi - Real Analysis - Measure Theory, Integration, and Hilbert Spaces. Vol.3.-Princeton University Press (2005)
Elias M. Stein, Rami Shakarchi - Real Analysis - Measure Theory, Integration, and Hilbert Spaces. Vol.3.-Princeton University Press (2005)
Elias M. Stein, Rami Shakarchi - Real Analysis - Measure Theory, Integration, and Hilbert Spaces. Vol.3.-Princeton University Press (2005)
II Complex Analysis
III
REAL ANALYSIS
Measure Theory, Integration, and
Hilbert Spaces
Elias M. Stein
&
Rami Shakarchi
The publisher would like to acknowledge the authors of this volume for
providing the camera-ready copy from which this book was printed
press.princeton.edu
3 5 7 9 10 8 6 4
To my grandchildren
Carolyn, Alison, Jason
E.M.S.
To my parents
Mohamed & Mireille
and my brother
Karim
R.S.
Foreword
We wish also to record a note of special thanks for the following in-
dividuals: Charles Fefferman, who taught the first week (successfully
launching the whole project!); Paul Hagelstein, who in addition to read-
ing part of the manuscript taught several weeks of one of the courses, and
has since taken over the teaching of the second round of the series; and
Daniel Levine, who gave valuable help in proof-reading. Last but not
least, our thanks go to Gerree Pecht, for her consummate skill in type-
setting and for the time and energy she spent in the preparation of all
aspects of the lectures, such as transparencies, notes, and the manuscript.
We are also happy to acknowledge our indebtedness for the support
we received from the 250th Anniversary Fund of Princeton University,
and the National Science Foundation’s VIGRE program.
Elias M. Stein
Rami Shakarchi
November 2004
Contents
Foreword vii
Introduction xv
1 Fourier series: completion xvi
2 Limits of continuous functions xvi
3 Length of curves xvii
4 Differentiation and integration xviii
5 The problem of measure xviii
xi
xii CONTENTS
5 Exercises 380
6 Problems 385
Bibliography 391
Index 397
Introduction
The earlier view that the relevant functions in analysis were given by
formulas or other “analytic” expressions, that these functions were by
their nature continuous (or nearly so), that by necessity such functions
had derivatives for most points, and moreover these were integrable by
the accepted methods of integration − all of these ideas began to give
way under the weight of various examples and problems that arose in
the subject, which could not be ignored and required new concepts to
be understood. Parallel with these developments came new insights that
were at once both more geometric and more abstract: a clearer under-
standing of the nature of curves, their rectifiability and their extent; also
the beginnings of the theory of sets, starting with subsets of the line, the
plane, etc., and the “measure” that could be assigned to each.
That is not to say that there was not considerable resistance to the
change of point-of-view that these advances required. Paradoxically,
some of the leading mathematicians of the time, those who should have
been best able to appreciate the new departures, were among the ones
who were most skeptical. That the new ideas ultimately won out can
be understood in terms of the many questions that could now be ad-
dressed. We shall describe here, somewhat imprecisely, several of the
most significant such problems.
xv
xvi INTRODUCTION
(i) What are the putative “functions” f that arise when we complete
R? In other words: given an arbitrary sequence {an } ∈ `2 (Z) what
is the nature of the (presumed) function f corresponding to these
coefficients?
(ii) How do we integrate such functions f (and in particular verify (1))?
3 Length of curves
The study of curves in the plane and the calculation of their lengths
are among the first issues dealt with when one learns calculus. Suppose
we consider a continuous curve Γ in the plane, given parametrically by
Γ = {(x(t), y(t))}, a ≤ t ≤ b, with x and y continuous functions of t. We
define the length of Γ in the usual way: as the supremum of the lengths
of all polygonal lines joining successively finitely many points of Γ, taken
in order of increasing t. We say that Γ is rectifiable if its length L is
finite. When x(t) and y(t) are continuously differentiable we have the
well-known formula,
Z b
¡ 0 ¢1/2
(2) L= (x (t))2 + (y 0 (t))2 dt.
a
3 The limit f can be highly discontinuous. See, for instance, Exercise 10 in Chapter 1.
xviii INTRODUCTION
There are further issues that arise. Rectifiable curves, because they
are endowed with length, are genuinely one-dimensional in nature. Are
there (non-rectifiable) curves that are two-dimensional? We shall see
that, indeed, there are continuous curves in the plane that fill a square,
or more generally have any dimension between 1 and 2, if the notion of
fractional dimension is appropriately defined.
Z x
d
(4) f (y) dy = f (x).
dx 0
For the first assertion, the existence of continuous functions F that are
nowhere differentiable, or for which F 0 (x) exists for every x, but F 0 is
not integrable, leads to the problem of finding a general class of the F for
which (3) is valid. As for (4), the question is to formulate properly and
establish this assertion for the general class of integrable functions f that
arise in the solution of the first two problems considered above. These
questions can be answered with the help of certain “covering” arguments,
and the notion of absolute continuity.
4 There is no such measure on the class of all subsets, since there exist non-measurable
sets. See the construction of such a set at the end of Section 3, Chapter 1.
1 Measure Theory
The sets whose measure we can define by virtue of the
preceding ideas we will call measurable sets; we do
this without intending to imply that it is not possible
to assign a measure to other sets.
E. Borel, 1898
1 Preliminaries
We begin by discussing some elementary concepts which are basic to the
theory developed below.
The main idea in calculating the “volume” or “measure” of a subset
of Rd consists of approximating this set by unions of other sets whose
geometry is simple and whose volumes are known. It is convenient to
speak of “volume” when referring to sets in Rd ; but in reality it means
“area” in the case d = 2 and “length” in the case d = 1. In the approach
given here we shall use rectangles and cubes as the main building blocks
of the theory: in R we use intervals, while in Rd we take products of
intervals. In all dimensions rectangles are easy to manipulate and have
a standard notion of volume that is given by taking the product of the
length of all sides.
2 Chapter 1. MEASURE THEORY
x = (x1 , x2 , . . . , xd ), xi ∈ R, for i = 1, . . . , d.
E c = {x ∈ Rd : x ∈
/ E}.
E − F = {x ∈ Rd : x ∈ E and x ∈
/ F }.
is open. A similar statement holds for the class of closed sets, if one
interchanges the roles of unions and intersections.
A set E is bounded if it is contained in some ball of finite radius.
A bounded set is compact if it is also closed. Compact sets enjoy the
Heine-Borel covering property:
S
• Assume E is compact, E ⊂ α Oα , and each Oα is open. Then
there are finitely many of the open sets, Oα1 , Oα2 , . . . , OαN , such
SN
that E ⊂ j=1 Oαj .
R2
R3
Figure 1. Rectangles in Rd , d = 1, 2, 3
is defined to be
N
X
|R| = |Rk |.
k=1
1. Preliminaries 5
M
[ [
R= R̃j and Rk = R̃j , for k = 1, . . . , N
j=1 j∈Jk
RN R̃M
R1
R2 R̃1 R̃2
PM
For the rectangle R, for example, we see that |R| = j=1 |R̃j |, since
the grid actually partitions the sides of R and each R̃j consists of taking
products of the intervals in these partitions. Thus when adding the
volumes of the R̃j we are summing the corresponding products of lengths
of the intervals that arise. Since this also holds for the other rectangles
R1 , . . . , RN , we conclude that
M
X N X
X N
X
|R| = |R̃j | = |R̃j | = |Rk |.
j=1 k=1 j∈Jk k=1
SN
Lemma 1.2 If R, R1 , . . . , RN are rectangles, and R ⊂ k=1 Rk , then
N
X
|R| ≤ |Rk |.
k=1
The main idea consists of taking the grid formed by extending all sides
of the rectangles R, R1 , . . . , RN , and noting that the sets corresponding
to the Jk (in the above proof) need not be disjoint any more.
We now proceed to give a description of the structure of open sets in
terms of cubes. We begin with the case of R.
Proof. For each x ∈ O, let Ix denote the largest open interval contain-
ing x and contained in O. More precisely, since O is open, x is contained
in some small (non-trivial) interval, and therefore if
we must have ax < x < bx (with possibly infinite values for ax and bx ).
If we now let Ix = (ax , bx ), then by construction we have x ∈ Ix as well
as Ix ⊂ O. Hence
[
O= Ix .
x∈O
Now suppose that two intervals Ix and Iy intersect. Then their union
(which is also an open interval) is contained in O and contains x. Since
Ix is maximal, we must have (Ix ∪ Iy ) ⊂ Ix , and similarly (Ix ∪ Iy ) ⊂ Iy .
This can happen only if Ix = Iy ; therefore, any two distinct intervals in
the collection I = {Ix }x∈O must be disjoint. The proof will be complete
once we have shown that there are only countably many distinct intervals
in the collection I. This, however, is easy to see, since every open interval
Ix contains a rational number. Since different intervals are disjoint, they
must contain distinct rationals, and therefore I is countable, as desired.
S∞
Naturally, if O is open and O = j=1 Ij , where the Ij ’s are disjoint
P∞
open intervals, the measure of O ought to be j=1 |Ij |. Since this rep-
resentation is unique, we could take this as a definition of measure; we
would then note that whenever O1 and O2 are open and disjoint, the mea-
sure of their union is the sum of their measures. Although this provides
1. Preliminaries 7
O O
Step 1 Step 2
C0 ⊃ C1 ⊃ C2 ⊃ · · · ⊃ Ck ⊃ Ck+1 ⊃ · · · .
1. Preliminaries 9
C0
0 1
C1
0 1/3 2/3 1
C2
C3
The set C is not empty, since all end-points of the intervals in Ck (all k)
belong to C.
Despite its simple construction, the Cantor set enjoys many interest-
ing topological and analytical properties. For instance, C is closed and
bounded, hence compact. Also, C is totally disconnected: given any
x, y ∈ C there exists z ∈/ C that lies between x and y. Finally, C is per-
fect: it has no isolated points (Exercise 1).
Next, we turn our attention to the question of determining the “size”
of C. This is a delicate problem, one that may be approached from
different angles depending on the notion of size we adopt. For instance,
in terms of cardinality the Cantor set is rather large: it is not countable.
Since it can be mapped to the interval [0, 1], the Cantor set has the
cardinality of the continuum (Exercise 2).
However, from the point of view of “length” the size of C is small.
Roughly speaking, the Cantor set has length zero, and this follows from
the following intuitive argument: the set C is covered by sets Ck whose
lengths go to zero. Indeed, Ck is a disjoint union of 2k intervals of length
10 Chapter 1. MEASURE THEORY
3−k , making the total length of Ck equal to (2/3)k . But C ⊂ Ck for all
k, and (2/3)k → 0 as k tends to infinity. We shall define a notion of
measure and make this argument precise in the next section.
1 Some authors use the term outer measure instead of exterior measure.
2. The exterior measure 11
yields the same exterior measure is quite direct. (See Exercise 15.) The
equivalence with the latter is more subtle. (See Exercise 26 in Chapter 3.)
We begin our investigation of this new notion by providing examples
of sets whose exterior measures can be calculated, and we check that
the latter matches our intuitive idea of volume (length in one dimension,
area in two dimensions, etc.)
Example 1. The exterior measure of a point is zero. This is clear once
we observe that a point is a cube with volume zero, and which covers
itself. Of course the exterior measure of the empty set is also zero.
∞
X
(2) |Q| ≤ |Qj |.
j=1
N
X ∞
X
|Q| ≤ (1 + ²) |Qj | ≤ (1 + ²) |Qj |.
j=1 j=1
Since ² is arbitrary, we find that the inequality (2) holds; thus |Q| ≤
m∗ (Q), as desired.
Moreover, there
P are O(k d−1 ) cubes2 in Q0 , and these cubes have volume
k −d , so that Q∈Q0 |Q| = O(1/k). Hence
X
|Q| ≤ |R| + O(1/k),
Q∈(Q∪Q0 )
Example 6. The Cantor set C has exterior measure 0. From the con-
struction of C, we know that C ⊂ Ck , where each Ck is a disjoint union
of 2k closed intervals, each of length 3−k . Consequently, m∗ (C) ≤ (2/3)k
for all k, hence m∗ (C) = 0.
2 We remind the reader of the notation f (x) = O(g(x)), which means that |f (x)| ≤
C|g(x)| for some constant C and all x in a given range. In this particular example, there
are fewer than Ckd−1 cubes in question, as k → ∞.
2. The exterior measure 13
S∞
• For every ² > 0, there exists a covering E ⊂ j=1 Qj with
∞
X
m∗ (Qj ) ≤ m∗ (E) + ².
j=1
First, we may assume that each m∗ (Ej ) < ∞, for otherwise the in-
equality clearly holds. For any ² > 0, the
S∞definition of the exterior mea-
sure yields for each j a covering Ej ⊂ k=1 Qk,j by closed cubes with
∞
X ²
|Qk,j | ≤ m∗ (Ej ) + .
2j
k=1
S∞
Then, E ⊂ j,k=1 Qk,j is a covering of E by closed cubes, and therefore
X X ∞
∞ X
m∗ (E) ≤ |Qk,j | = |Qk,j |
j,k j=1 k=1
∞ ³
X ²´
≤ m∗ (Ej ) +
2j
j=1
∞
X
= m∗ (Ej ) + ².
j=1
Since this holds true for every ² > 0, the second observation is proved.
∞
X ²
|Qj | ≤ m∗ (E) + .
2
j=1
∞
X ∞
X
m∗ (O) ≤ m∗ (Q0j ) = |Q0j |
j=1 j=1
∞ ³
X ² ´
≤ |Qj | +
2j+1
j=1
X∞
²
≤ |Qj | +
2
j=1
≤ m∗ (E) + ².
∞
[ ∞
[
E1 ⊂ Qj as well as E2 ⊂ Qj .
j∈J1 j∈J2
2. The exterior measure 15
Therefore,
X X
m∗ (E1 ) + m∗ (E2 ) ≤ |Qj | + |Qj |
j∈J1 j∈J2
X∞
≤ |Qj |
j=1
≤ m∗ (E) + ².
Observation
S∞ 5 If a set E is the countable union of almost disjoint cubes
E = j=1 Qj , then
∞
X
m∗ (E) = |Qj |.
j=1
Let Q̃j denote a cube strictly contained in Qj such that |Qj | ≤ |Q̃j | +
²/2j , where ² is arbitrary but fixed. Then, for every N , the cubes
Q̃1 , Q̃2 , . . . , Q̃N are disjoint, hence at a finite distance from one another,
and repeated applications of Observation 4 imply
à N
! N N
[ X X ¡ ¢
m∗ Q̃j = |Q̃j | ≥ |Qj | − ²/2j .
j=1 j=1 j=1
SN
Since j=1 Q̃j ⊂ E, we conclude that for every integer N ,
N
X
m∗ (E) ≥ |Qj | − ².
j=1
P∞
In the limit as N tends to infinity we deduce j=1 |Qj | ≤ m∗ (E) + ²
P∞
for every ² > 0, hence j=1 |Qj | ≤ m∗ (E). Therefore, combined with
Observation 2, our result proves that we have equality.
This last property shows that if a set can be decomposed into almost
disjoint cubes, its exterior measure equals the sum of the volumes of the
cubes. In particular, by Theorem 1.4 we see that the exterior measure of
an open set equals the sum of the volumes of the cubes in a decomposi-
tion, and this coincides with our initial guess. Moreover, this also yields
a proof that the sum is independent of the decomposition.
16 Chapter 1. MEASURE THEORY
One can see from this that the volumes of simple sets that are cal-
culated by elementary calculus agree with their exterior measure. This
assertion can be proved most easily once we have developed the requisite
tools in integration theory. (See Chapter 2.) In particular, we can then
verify that the exterior measure of a ball (either open or closed) equals
its volume.
Despite observations 4 and 5, one cannot conclude in general that if
E1 ∪ E2 is a disjoint union of subsets of Rd , then
In fact (3) holds when the sets in question are not highly irregular or
“pathological” but are measurable in the sense described below.
m∗ (O − E) ≤ ².
m(E) = m∗ (E).
Clearly, the Lebesgue measure inherits all the features contained in Ob-
servations 1 - 5 of the exterior measure.
Immediately from the definition, we find:
First, we observe that it suffices to prove that compact sets are mea-
surable. Indeed,
S∞any closed set F can be written as the union of compact
sets, say F = k=1 F ∩ Bk , where Bk denotes the closed ball of radius k
centered at the origin; then Property 3 applies.
So, suppose F is compact (so that in particular m∗ (F ) < ∞), and let
² > 0. By Observation 3 we can select an open set O with F ⊂ O and
m∗ (O) ≤ m∗ (F ) + ². Since F is closed, the difference O − F is open,
and by Theorem 1.4 we may write this difference as a countable union
of almost disjoint cubes
∞
[
O−F = Qj .
j=1
SN
For a fixed N , the finite union K = j=1 Qj is compact; therefore
d(K, F ) > 0 (we isolate this little fact in a lemma below). Since (K ∪
F ) ⊂ O, Observations 1, 4, and 5 of the exterior measure imply
m∗ (O) ≥ m∗ (F ) + m∗ (K)
N
X
= m∗ (F ) + m∗ (Qj ).
j=1
18 Chapter 1. MEASURE THEORY
PN
Hence j=1 m∗ (Qj ) ≤ m∗ (O) − m∗ (F ) ≤ ², and this also holds in the
limit as N tends to infinity. Invoking the sub-additivity property of the
exterior measure finally yields
∞
X
m∗ (O − F ) ≤ m∗ (Qj ) ≤ ²,
j=1
as desired.
We digress briefly to complete the above argument by proving the
following.
Lemma 3.1 If F is closed, K is compact, and these sets are disjoint,
then d(F, K) > 0.
Proof. Since F is closed, S for each point x ∈ K, there exists δx > 0 so
that d(x, F ) > 3δx . Since x∈K B2δx (x) covers K, and K is compact, we
SN
may find a subcover, which we denote by j=1 B2δj (xj ). If we let δ =
min(δ1 , . . . , δN ), then we must have d(K, F ) ≥ δ > 0. Indeed, if x ∈ K
and y ∈ F , then for some j we have |xj − x| ≤ 2δj , and by construction
|y − xj | ≥ 3δj . Therefore
(E c − S) ⊂ (On − E),
Theorem
S∞ 3.2 If E1 , E2 , . . ., are disjoint measurable sets, and E =
j=1 Ej , then
∞
X
m(E) = m(Ej ).
j=1
N
X N
X
m(E) ≥ m(Fj ) ≥ m(Ej ) − ².
j=1 j=1
The
S∞ union above is disjoint and every Ej,k is bounded. Moreover Ej =
k=1 Ej,k , and this union is also disjoint. Putting these facts together,
20 Chapter 1. MEASURE THEORY
as claimed.
With this, the countable additivity of the Lebesgue measure on mea-
surable sets has been established. This result provides the necessary
connection between the following:
• our primitive notion of volume given by the exterior measure,
• the more refined idea of measurable sets, and
• the countably infinite operations allowed on these sets.
We make two definitions to state succinctly some further consequences.
If E1 , E2 , . . . is a countable collection of subsets of S Rd that increases
∞
to E in the sense that Ek ⊂ Ek+1 for all k, and E = k=1 Ek , then we
write Ek % E.
Similarly, if ET1 , E2 , . . . decreases to E in the sense that Ek ⊃ Ek+1 for
∞
all k, and E = k=1 Ek , we write Ek & E.
∞ N
à N
!
X X [
m(E) = m(Gk ) = lim m(Gk ) = lim m Gk ,
N →∞ N →∞
k=1 k=1 k=1
SN
and since k=1 Gk = EN we get the desired limit.
For the second part, we may clearly assume that m(E1 ) < ∞. Let
Gk = Ek − Ek+1 for each k, so that
∞
[
E1 = E ∪ Gk
k=1
3. Measurable sets and the Lebesgue measure 21
N
X −1
m(E1 ) = m(E) + lim (m(Ek ) − m(Ek+1 ))
N →∞
k=1
= m(E) + m(E1 ) − lim m(EN ).
N →∞
Hence, since m(E1 ) < ∞, we see that m(E) = limN →∞ m(EN ), and the
proof is complete.
The reader should note that the second conclusion may fail without
the assumption that m(Ek ) < ∞ for some k. This is shown by the simple
example when En = (n, ∞) ⊂ R, for all n.
What follows provides an important geometric and analytic insight
into the nature of measurable sets, in terms of their relation to open and
closed sets. Its thrust is that, in effect, an arbitrary measurable set can
be well approximated by the open sets that contain it, and alternatively,
by the closed sets it contains.
m(E4F ) ≤ ².
The notation E4F stands for the symmetric difference between the
sets E and F , defined by E4F = (E − F ) ∪ (F − E), which consists of
those points that belong to only one of the two sets E or F .
Proof. Part (i) is just the definition of measurability. For the second
part, we know that E c is measurable, so there exists an open set O with
E c ⊂ O and m(O − E c ) ≤ ². If we let F = Oc , then F is closed, F ⊂ E,
and E − F = O − E c . Hence m(E − F ) ≤ ² as desired.
For (iii), we first pick a closed set F so that F ⊂ E and m(E − F ) ≤
²/2. For each n, we let Bn denote the ball centered at the origin of radius
22 Chapter 1. MEASURE THEORY
∞
[ ∞
X
E⊂ Qj and |Qj | ≤ m(E) + ²/2.
j=1 j=1
Since m(E) < ∞, the series converges and there exists N > 0 such that
P∞ SN
j=N +1 |Qj | < ²/2. If F = j=1 Qj , then
≤ ².
3 The terminology G comes from German “Gebiete” and “Durschnitt”; F comes from
δ σ
French “fermé” and “somme.”
24 Chapter 1. MEASURE THEORY
To see why these sets are disjoint, suppose that the intersection
Nk ∩ Nk0 is non-empty. Then there exist rationals rk 6= rk0 and α and
β with xα + rk = xβ + rk0 ; hence
xα − xβ = rk0 − rk .
This is the desired contradiction, since neither m(N ) = 0 nor m(N ) > 0
is possible.
Axiom of choice
That the construction of the set N is possible is based on the following
general proposition.
5 It can be proved that in an appropriate formulation of the axioms of set theory, the
axiom of choice is independent of the other axioms; thus we are free to accept its validity.
4. Measurable functions 27
4 Measurable functions
With the notion of measurable sets in hand, we now turn our attention
to the objects that lie at the heart of integration theory: measurable
functions.
The starting point is the notion of a characteristic function of a set
E, which is defined by
½
1 if x ∈ E,
χE (x) =
0 if x ∈
/ E.
The next step is to pass to the functions that are the building blocks of
integration theory. For the Riemann integral it is in effect the class of
step functions, with each given as a finite sum
N
X
(5) f= ak χRk ,
k=1
N
X
(6) f= ak χEk
k=1
−∞ ≤ f (x) ≤ ∞.
sup fn (x), inf fn (x), lim sup, fn (x) and lim inf fn (x)
n n n→∞ n→∞
are measurable.
Proving
S that supn fn is measurable requires noting that {supn fn > a} =
n {fn > a}. This also yields the result for inf n fn (x), since this quantity
equals − supn (−fn (x)).
The result for the limsup and liminf also follows from the two obser-
vations
lim sup fn (x) = inf {sup fn } and lim inf fn (x) = sup{ inf fn }.
n→∞ k n≥k n→∞ k n≥k
Property 4 If {fn }∞
n=1 is a collection of measurable functions, and
then f is measurable.
Since f (x) = lim supn→∞ fn (x) = lim inf n→∞ fn (x), this property is a
consequence of property 3.
For (i) we simply note that if k is odd, then {f k > a} = {f > a1/k }, and
if k is even and a ≥ 0, then {f k > a} = {f > a1/k } ∪ {f < −a1/k }.
For (ii), we first see that f + g is measurable because
[
{f + g > a} = {f > a − r} ∩ {g > r},
r∈Q
30 Chapter 1. MEASURE THEORY
We shall say that two functions f and g defined on a set E are equal
almost everywhere, and write
then f is measurable.
Note that if f and g are defined almost everywhere on a measurable
subset E ⊂ Rd , then the functions f + g and f g can only be defined on
the intersection of the domains of f and g. Since the union of two sets of
measure zero has again measure zero, f + g is defined almost everywhere
on E. We summarize this discussion as follows.
In this light, Property 5 (ii) also holds when f and g are finite-valued
almost everywhere.
|ϕk (x)| ≤ |ϕk+1 (x)| and lim ϕk (x) = f (x), for all x.
k→∞
we see that ϕk (x) converges to f (x) for all x. Finally, the sequence {|ϕk |}
(1)
is increasing because the definition of f + , f − and the properties of ϕk
(2)
and ϕk imply that
(1) (2)
|ϕk (x)| = ϕk (x) + ϕk (x).
M
X
f (x) = χRj (x),
j=1
Ek = {x : f (x) 6= ψk (x)},
S∞ T∞
then m(Ek ) ≤ 2−k . If we let FK = j=K+1 Ej and F = K=1 FK , then
m(F ) = 0 since m(FK ) ≤ 2−K , and ψk (x) → f (x) for all x in the com-
plement of F , which is the desired result.
4. Measurable functions 33
Ekn = {x ∈ E : |fj (x) − f (x)| < 1/n, for all j > k}.
Next, if δ > 0, we choose n ≥ N such that 1/n < δ, and note that x ∈
ò implies x ∈ Eknn . We see therefore that |fj (x) − f (x)| < δ whenever
j > kn . Hence fk converges uniformly to f on ò .
Finally, using Theorem 3.4 choose a closed subset A² ⊂ ò with m(ò −
A² ) < ²/2. As a result, we have m(E − A² ) < ² and the theorem is
proved.
The next theorem attests to the validity of the second of Littlewood’s
principle.
F² ⊂ E, and m(E − F² ) ≤ ²
P
for N so large that n≥N 1/2n < ²/3. Now for every n ≥ N the function
fn is continuous on F 0 ; thus f (being the uniform limit of {fn }) is also
continuous on F 0 . To finish the proof, we merely need to approximate
the set F 0 by a closed set F² ⊂ F 0 such that m(F 0 − F² ) < ²/3.
difficulty does not occur when, for example, A and B are closed sets, or
when one of them is open. (See Exercise 19.)
With the above considerations in mind we can state the main result.
Theorem 5.1 Suppose A and B are measurable sets in Rd and their
sum A + B is also measurable. Then the inequality (8) holds.
Let us first check (8) when A and B are rectangles with side lengths
{aj }dj=1 and {bj }dj=1 , respectively. Then (8) becomes
à d
!1/d à d
!1/d à d
!1/d
Y Y Y
(9) (aj + bj ) ≥ aj + bj ,
j=1 j=1 j=1
d
à d
!1/d
1X Y
xj ≥ xj , for all xj ≥ 0:
d
j=1 j=1
which gives the desired inequality (8) when A and B are both finite
unions of rectangles with disjoint interiors.
Next, this quickly implies the result when A and B are open sets of
finite measure. Indeed, by Theorem 1.4, for any ² > 0 we can find unions
of almost disjoint rectangles A² and B² , such that A² ⊂ A, B² ⊂ B, with
m(A) ≤ m(A² ) + ² and m(B) ≤ m(B² ) + ². Since A + B ⊃ A² + B² , the
inequality (8) for A² and B² and a passage to a limit gives the desired
result. From this, we can pass to the case where A and B are arbitrary
compact sets, by noting first that A + B is then compact, and that if
we define A² = {x : d(x, A) < ²}, then A² are open, and A² & A as ² →
0. With similar definitions for B ² and (A + B)² , we observe also that
A + B ⊂ A² + B ² ⊂ (A + B)2² . Hence, letting ² → 0, we see that (8) for
A² and B ² implies the desired result for A and B. The general case,
in which we assume that A, B, and A + B are measurable, then follows
by approximating A and B from inside by compact sets, as in (iii) of
Theorem 3.4.
6 Exercises
1. Prove that the Cantor set C constructed in the text is totally disconnected and
perfect. In other words, given two distinct points x, y ∈ C, there is a point z ∈
/C
that lies between x and y, and yet C has no isolated points.
[Hint: If x, y ∈ C and |x − y| > 1/3k , then x and y belong to two different intervals
in Ck . Also, given any x ∈ C there is an end-point yk of some interval in Ck that
satisfies x 6= yk and |x − yk | ≤ 1/3k .]
P∞
Note that this decomposition is not unique since, for example, 1/3 = k=2 2/3k .
Prove that x ∈ C if and only if x has a representation as above where every
ak is either 0 or 2.
(b) The Cantor-Lebesgue function is defined on C by
∞
X bk P∞
F (x) = if x = k=1 ak 3−k , where bk = ak /2.
2k
k=1
3. Cantor sets of constant dissection. Consider the unit interval [0, 1], and
let ξ be a fixed real number with 0 < ξ < 1 (the case ξ = 1/3 corresponds to the
Cantor set C in the text).
In stage 1 of the construction, remove the centrally situated open interval in
[0, 1] of length ξ. In stage 2, remove two central intervals each of relative length ξ,
one in each of the remaining intervals after stage 1, and so on.
Let Cξ denote the set which remains after applying the above procedure indefi-
nitely.6
(a) Prove that the complement of Cξ in [0, 1] is the union of open intervals of
total length equal to 1.
(b) Show directly that m∗ (Cξ ) = 0.
[Hint: After the kth stage, show that the remaining set has total length = (1 − ξ)k .]
4. Cantor-like sets. Construct a closed set Cˆ so that at the kth stage of the
construction one removes 2k−1 centrally situated open intervals each of length `k ,
with
`1 + 2`2 + · · · + 2k−1 `k < 1.
Show:
(a) If E is compact, then m(E) = limn→∞ m(On ).
(b) However, the conclusion in (a) may be false for E closed and unbounded; or
E open and bounded.
m(δE) = δ1 · · · δd m(E).
|L(x) − L(x0 )| ≤ M |x − x0 |
for some M , we can see that L maps√any cube of side length ` into a
cube of side length cd M `, with cd = 2 S
d. Now if P m(E) = 0, there is a
collection of cubes {Qj } such that E ⊂ j Qj , and j m(Qj ) < ². Thus
m∗ (L(E)) ≤ c0 ², and hence m(L(E)) = 0. Finally, use Corollary 3.5.
40 Chapter 1. MEASURE THEORY
One can show that m(L(E)) = | det L| m(E); see Problem 4 in the next chapter.
9. Give an example of an open set O with the following property: the boundary
of the closure of O has positive Lebesgue measure.
[Hint: Consider the set obtained by taking the union of open intervals which are
deleted at the odd steps in the construction of a Cantor-like set.]
F1
F2
(The proof of this fact, which is given in the Appendix of Book I, is outlined in
Problem 4.) Since f is discontinuous on a set of positive measure, we find that f
is not Riemann integrable.
11. Let A be the subset of [0, 1] which consists of all numbers which do not have
the digit 4 appearing in their decimal expansion. Find m(A).
12. Theorem 1.3 states that every open set in R is the disjoint union of open
intervals. The analogue in Rd , d ≥ 2, is generally false. Prove the following:
(b) An open connected set Ω is the disjoint union of open rectangles if and only
if Ω is itself an open rectangle.
14. The purpose of this exercise is to show that covering by a finite number of
intervals will not suffice in the definition of the outer measure m∗ .
The outer Jordan content J∗ (E) of a set E in R is defined by
N
X
J∗ (E) = inf |Ij |,
j=1
SN
where the inf is taken over every finite covering E ⊂ j=1 Ij , by intervals Ij .
(a) Prove that J∗ (E) = J∗ (E) for every set E (here E denotes the closure of
E).
(b) Exhibit a countable subset E ⊂ [0, 1] such that J∗ (E) = 1 while m∗ (E) = 0.
15. At the start of the theory, one might define the outer measure by taking
coverings by rectangles instead of cubes. More precisely, we define
∞
X
mR
∗ (E) = inf |Rj |,
j=1
42 Chapter 1. MEASURE THEORY
S
where the inf is now taken over all countable coverings E ⊂ ∞ j=1 Rj by (closed)
rectangles.
Show that this approach gives rise to the same theory of measure developed in
the text, by proving that m∗ (E) = mR d
∗ (E) for every subset E of R .
Let
17. Let {fn } be a sequence of measurable functions on [0, 1] with |fn (x)| < ∞ for
a.e x. Show that there exists a sequence cn of positive real numbers such that
fn (x)
→0 a.e. x
cn
[Hint: Pick cn such that m({x : |fn (x)/cn | > 1/n}) < 2−n , and apply the Borel-
Cantelli lemma.]
18. Prove the following assertion: Every measurable function is the limit a.e. of a
sequence of continuous functions.
(c) Show, however, that A + B might not be closed even though A and B are
closed.
20. Show that there exist closed sets A and B with m(A) = m(B) = 0, but m(A +
B) > 0:
6. Exercises 43
(a) In R, let A = C (the Cantor set), B = C/2. Note that A + B ⊃ [0, 1].
(b) In R2 , observe that if A = I × {0} and B = {0} × I (where I = [0, 1]), then
A + B = I × I.
21. Prove that there is a continuous function that maps a Lebesgue measurable
set to a non-measurable set.
[Hint: Consider a non-measurable subset of [0, 1], and its inverse image in C by the
function F in Exercise 2.]
22. Let χ[0,1] be the characteristic function of [0, 1]. Show that there is no every-
where continuous function f on R such that
in R is non-empty?
[Hint: Find an enumeration where the only rationals outside of a fixed bounded
interval take the form rn , with n = m2 for some integer m.]
27. Suppose E1 and E2 are a pair of compact sets in Rd with E1 ⊂ E2 , and let
a = m(E1 ) and b = m(E2 ). Prove that for any c with a < c < b, there is a compact
set E with E1 ⊂ E ⊂ E2 and m(E) = c.
[Hint: As an example, if d = 1 and E is a measurable subset of [0, 1], consider
m(E ∩ [0, t]) as a function of t.]
44 Chapter 1. MEASURE THEORY
28. Let E be a subset of R with m∗ (E) > 0. Prove that for each 0 < α < 1, there
exists an open interval I so that
m∗ (E ∩ I) ≥ α m∗ (I).
Loosely speaking, this estimate shows that E contains almost a whole interval.
[Hint: Choose an open set O that contains E, and such that m∗ (E) ≥ α m∗ (O).
Write O as the countable union of disjoint open intervals, and show that one of
these intervals must satisfy the desired property.]
29. Suppose E is a measurable subset of R with m(E) > 0. Prove that the
difference set of E, which is defined by
30. If E and F are measurable, and m(E) > 0, m(F ) > 0, prove that
E + F = {x + y : x ∈ E, x ∈ F }
contains an interval.
32. Let N denote the non-measurable subset of I = [0, 1] constructed at the end
of Section 3.
33. Let N denote the non-measurable set constructed in the text. Recall from the
exercise above that measurable subsets of N have measure zero.
Show that the set N c = I − N satisfies m∗ (N c ) = 1, and conclude that if E1 =
N and E2 = N c , then
34. Let C1 and C2 be any two Cantor sets (constructed in Exercise 3). Show that
there exists a function F : [0, 1] → [0, 1] with the following properties:
(i) F is continuous and bijective,
(a) Construct a measurable set E ⊂ [0, 1] such that for any non-empty open
sub-interval I in [0, 1], both sets E ∩ I and E c ∩ I have positive measure.
(b) Show that f = χE has the property that whenever g(x) = f (x) a.e x, then
g must be discontinuous at every point in [0, 1].
[Hint: For the first part, consider a Cantor-like set of positive measure, and add in
each of the intervals that are omitted in the first step of its construction, another
Cantor-like set. Continue this procedure indefinitely.]
x1 + · · · + xd
(10) ≥ (x1 · · · xd )1/d for all xj ≥ 0, j = 1, . . . , d
d
by using backward induction as follows:
(b) If (10) holds for some integer d ≥ 2, then it must hold for d − 1, that is,
one has (y1 + · · · + yd−1 )/(d − 1) ≥ (y1 · · · yd−1 )1/(d−1) for all yj ≥ 0, with
j = 1, . . . , d − 1.
[Hint: For (a), if k ≥ 2, write (x1 + · · · + x2k )/2k as (A + B)/2, where A = (x1 +
· · · + x2k−1 )/2k−1 , and apply the inequality when d = 2. For (b), apply the in-
equality to x1 = y1 , . . . , xd−1 = yd−1 and xd = (y1 + · · · + yd−1 )/(d − 1).]
7 Problems
1. Given an irrational x, one can show (using the pigeon-hole principle, for exam-
ple) that there exists infinitely many fractions p/q, with relatively prime integers
p and q such that
˛ ˛
˛ ˛
˛x − p ˛ ≤ 1 .
˛ q˛ q2
However, prove that the set of those x ∈ R such that there exist infinitely many
fractions p/q, with relatively prime integers p and q such that
˛ ˛
˛ ˛
˛x − p ˛ ≤ 1 (or ≤ 1/q 2+² ),
˛ q˛ q3
S
2. Any open set Ω can be written as the union of closed cubes, so that Ω = Qj
with the following properties
(i) The Qj ’s have disjoint interiors.
(ii) d(Qj , Ωc ) ≈ side length of Qj . This means that there are positive constants
c and C so that c ≤ d(Qj , Ωc )/`(Qj ) ≤ C, where `(Qj ) denotes the side
length of Qj .
7. Problems 47
3. Find an example of a measurable subset C of [0, 1] such that m(C) = 0, yet the
difference set of C contains a non-trivial interval centered at the origin. Compare
with the result in Exercise 29.
[Hint: Pick the Cantor set C = C. For a fixed a ∈ [−1, 1], consider the line y =
x + a in the plane, and copy the construction of the Cantor set, but in the cube
Q = [0, 1] × [0, 1]. First, remove all but four closed cubes of side length 1/3, one at
each corner of Q; then, repeat this procedure in each of the remaining cubes (see
Figure 6). The resulting set is sometimes called a Cantor dust. Use the property
of nested compact sets to show that the line intersects this Cantor dust.]
(a) For every ² > 0, the set of points c in J such that osc(f, c) ≥ ² is compact.
E = E1 ∪ E2 , E1 ∩ E2 = ∅.
6.∗ The fact that the axiom of choice and the well-ordering principle are equivalent
is a consequence of the following considerations.
One begins by defining a partial ordering on a set E to be a binary relation ≤
on the set E that satisfies:
(i) x ≤ x for all x ∈ E.
(ii) If x ≤ y and y ≤ x, then x = y.
(iii) If x ≤ y and y ≤ z, then x ≤ z.
If in addition x ≤ y or y ≤ x whenever x, y ∈ E, then ≤ is a linear ordering of E.
The axiom of choice and the well-ordering principle are then logically equivalent
to the Hausdorff maximal principle:
Every non-empty partially ordered set has a (non-empty) maximal
linearly ordered subset.
In other words, if E is partially ordered by ≤, then E contains a non-empty subset
F which is linearly ordered by ≤ and such that if F is contained in a set G also
linearly ordered by ≤, then F = G.
An application of the Hausdorff maximal principle to the collection of all well-
orderings of subsets of E implies the well-ordering principle for E. However, the
proof that the axiom of choice implies the Hausdorff maximal principle is more
complicated.
8.∗ Suppose A and B are open sets of finite positive measure. Then we have
equality in the Brunn-Minkowski inequality (8) if and only if A and B are convex
and similar, that is, there are a δ > 0 and an h ∈ Rd such that
A = δB + h.
2 Integration Theory
...amongst the many definitions that have been succes-
sively proposed for the integral of real-valued functions
of a real variable, I have retained only those which, in
my opinion, are indispensable to understand the trans-
formations undergone by the problem of integration,
and to capture the relationship between the notion of
area, so simple in appearance, and certain more com-
plicated analytical definitions of the integral.
One might ask if there is sufficient interest to oc-
cupy oneself with such complications, and if it is not
better to restrict oneself to the study of functions that
necessitate only simple definitions.... As we shall see
in this course, we would then have to renounce the
possibility of resolving many problems posed long ago,
and which have simple statements. It is to solve these
problems, and not for love of complications, that I
have introduced in this book a definition of the inte-
gral more general than that of Riemann.
H. Lebesgue, 1903
1. Simple functions
2. Bounded functions supported on a set of finite measure
3. Non-negative functions
50 Chapter 2. INTEGRATION THEORY
We emphasize from the onset that all functions are assumed to be mea-
surable. At the beginning we also consider only finite-valued functions
which take on real values. Later we shall also consider extended-valued
functions, and also complex-valued functions.
where the Ek are measurable sets of finite measure and the ak are con-
stants. A complication that arises from this definition is that a simple
function can be written in a multitude of ways as such finite linear com-
binations; for example, 0 = χE − χE for any measurable set E of finite
measure. Fortunately, there is an unambiguous choice for the represen-
tation of a simple function, which is natural and useful in applications.
The canonical form of ϕ is the unique decomposition as in (1), where
the numbers ak are distinct and non-zero, and the sets Ek are disjoint.
Finding the canonical form of ϕ is straightforward: since ϕ can take
only finitely many distinct and non-zero values, say c1 , . . . , cM , we may
set Fk = {x : ϕ(x) = ck }, and note that the sets Fk are disjoint. There-
PM
fore ϕ = k=1 ck χFk is the desired canonical form of ϕ.
PM
If ϕ is a simple function with canonical form ϕ(x) = k=1 ck χFk (x),
then we define the Lebesgue integral of ϕ by
Z M
X
ϕ(x) dx = ck m(Fk ).
Rd k=1
Proof. The only conclusion that is a little tricky is the first, which
asserts that the integral of a simple function can be calculated by us-
ing any of its decompositions as a linear combination of characteristic
functions. PN
Suppose that ϕ = k=1 ak χEk , where we assume that the sets Ek are
disjoint, but we do not suppose that the numbers ak are distinct and non-
zero. For each distinct non-zero value a among the {ak } we define Ea0 =
S
Ek , where the union is taken over those indices k such
P that ak = a.
Note then that the sets Ea0 are disjoint, and m(Ea0 ) = m(Ek ), where
52 Chapter 2. INTEGRATION THEORY
P
the sum is taken over the same set of k’s. Then clearly ϕ = aχEa0 ,
where the sum is over the distinct non-zero values of {ak }. Thus
Z X N
X
ϕ= am(Ea0 ) = ak m(Ek ).
k=1
PN
Next, suppose ϕ = k=1 ak χEk , where we no longer assume that the Ek
SN
are disjoint. Then we can “refine” the decomposition k=1 Ek by finding
SN Sn
sets E1∗ , E2∗ , . . . , En∗ with the property that k=1 Ek = j=1 Ej∗S ; the
sets Ej∗ (j = 1, . . . , n) are mutually disjoint; and for each k, Ek = Ej∗ ,
where the union is taken over those Ej∗ that are contained in Ek . (A proof
of thisPelementary fact can be found in Exercise 1.) For each j, let now
a∗j = ak , with the summation
Pn taken over all k such that Ek contains
Ej∗ . Then clearly ϕ = j=1 a∗j χEj∗ . However, this is a decomposition
already dealt with above because the Ej∗ are disjoint. Thus
Z X X X X
ϕ= a∗j m(Ej∗ ) = ak m(Ej∗ ) = ak m(Ek ),
Ek ⊃Ej∗
The key lemma that follows allows us to define the integral for the class
of bounded functions supported on sets of finite measure.
By the uniform convergence, one has, for all x ∈ A² and all large n and
m, the estimate |ϕn (x) − ϕm (x)| < ², so we deduce that
Since ² is arbitrary and m(E) < ∞, this proves that {In } is a Cauchy
sequence and hence converges, as desired.
For the second part, we note that if f = 0, we may repeat the argument
above to find that |In | ≤ m(E)² + M ², which yields limn→∞ In = 0, as
was to be shown.
Using Lemma 1.2 we can now turn to the integration of bounded func-
tions that are supported on sets of finite measure. For such a function f
we define its Lebesgue integral by
Z Z
f (x) dx = lim ϕn (x) dx,
n→∞
Consequently,
Z Z
fn → f as n → ∞.
for all large n. Since ² is arbitrary, the proof of the theorem is complete.
by monotonicity Sof the integral. Thus m(Ek ) = 0 for all k, and since
∞
{x : f (x) > 0} = k=1 Ek , we see that f = 0 almost everywhere.
where the integral on the left-hand side is the standard Riemann integral,
and that on the right-hand side is the Lebesgue integral.
Proof. By definition, a Riemann integrable function is bounded, say
|f (x)| ≤ M , so we need to prove that f is measurable, and then establish
the equality of integrals.
Again, by definition of Riemann integrability,1 we may construct two
sequences of step functions {ϕk } and {ψk } that satisfy the following
properties: |ϕk (x)| ≤ M and |ψk (x)| ≤ M for all x ∈ [a, b] and k ≥ 1,
ϕ1 (x) ≤ ϕ2 (x) ≤ · · · ≤ f ≤ · · · ≤ ψ2 (x) ≤ ψ1 (x),
and
Z R Z R Z R
(2) lim ϕk (x) dx = lim ψk (x)dx = f (x) dx.
k→∞ [a,b] k→∞ [a,b] [a,b]
and
Z L Z L
lim ψk (x) dx = ψ̃(x) dx.
k→∞ [a,b] [a,b]
where the supremum is taken over all measurable functions g such that
0 ≤ g ≤ f , and where g is bounded and supported on a set of finite
measure.
With the above definition of the integral, there are only two possible
cases;
R the supremum is either finite, or infinite. In the first case, when
f (x) dx < ∞, we shall say that f is Lebesgue integrable or simply
integrable.
Clearly, if E is any measurable subset of Rd , and f ≥ 0, then f χE is
also positive, and we define
Z Z
f (x) dx = f (x)χE (x) dx.
E
1
Fa (x) = , all x ∈ Rd .
1 + |x|a
η1 ≤ f and η2 ≤ g.
and shows that we must change our formulation of the question to obtain
a positive convergence result.
Let
½
n if 0 < x < 1/n,
fn (x) =
0 otherwise.
R
Then fn (x) → 0 for all x, yet fn (x) dx = 1 for all n. In this particular
example, the limit of the integrals is greater than the integral of the limit
function. This turns out to be the case in general, as we shall see now.
Lemma 1.7 (Fatou) Suppose {fn } is a sequence of measurable func-
tions with fn ≥ 0. If limn→∞ fn (x) = f (x) for a.e. x, then
Z Z
f ≤ lim inf fn .
n→∞
R R
Proof. Since fn (x) ≤ f (x) a.e x, we necessarily have fn ≤ f for
all n; hence
Z Z
lim sup fn ≤ f.
n→∞
62 Chapter 2. INTEGRATION THEORY
This inequality combined with Fatou’s lemma proves the desired limit.
In particular, we can now obtain a basic convergence theorem for the
class of non-negative measurable functions. Its statement requires the
following notation.
In analogy with the symbols % and & used to describe increasing and
decreasing sequences of sets, we shall write
fn % f
whenever {fn }∞
n=1 is a sequence of measurable functions that satisfies
fn (x) ≤ fn+1 (x) a.e x, all n ≥ 1 and lim fn (x) = f (x) a.e x.
n→∞
fn (x) ≥ fn+1 (x) a.e x, all n ≥ 1 and lim fn (x) = f (x) a.e x.
n→∞
P∞ R P∞
If k=1 ak (x) dx is finite, then the series k=1 ak (x) converges for
a.e. x.
Pn P∞
Proof. Let fn (x) = k=1 ak (x) and f (x) = k=1 ak (x). The func-
tions fn are measurable, fn (x) ≤ fn+1 (x), and fn (x) → f (x) as n tends
to infinity. Since
Z n Z
X
fn = ak (x) dx,
k=1
1. The Lebesgue integral: basic properties and convergence theorems 63
PR P∞
If ak < ∞, then the above implies that ak (x) is integrable,
k=1P
∞
and by our earlier observation, we conclude that k=1 ak (x) is finite
almost everywhere.
We give two nice illustrations of this last corollary.
The first consists of another proof of the Borel-Cantelli lemma (see
Exercise 16, Chapter 1), which
P says that if E1 , E2 , . . . is a collection
of measurable subsets with m(Ek ) < ∞, then the set of points that
belong to infinitely many sets Ek has measure zero. To prove this fact,
we let
∞
X 1
g(x) = ak (x) where ak (x) = χAk (x),
(2k ²)d+1
k=0
R R
then we must have f (x) ≤ g(x), and hence f ≤ g. Since the set Ak
is obtained from A = {1 < |x| < 2} by a dilation of factor 2k ², we have
64 Chapter 2. INTEGRATION THEORY
Since all integrals involved are finite, we find the desired result
Z Z Z Z
f1 − f2 = g1 − g2 .
(i) There exists a set of finite measure B (a ball, for example) such
that
Z
|f | < ².
Bc
R
and since 1 − χBN = χBN c , this implies c f < ², as we set out to prove.
BN
For the second part, assuming again that f ≥ 0, we let fN (x) = f (x)χEN
where
EN = {x : f (x) ≤ N }.
that integrability need not guarantee the more naive pointwise vanishing
as |x| becomes large. See Exercise 6.
We are now ready to prove a cornerstone of the theory of Lebesgue
integration, the dominated convergence theorem. It can be viewed as a
culmination of our efforts, and is a general statement about the interplay
between limits and integrals.
and consequently
Z Z
fn → f as n → ∞.
≤ ² + 2² = 3²
Complex-valued functions
If f is a complex-valued function on Rd , we may write it as
where u and v are real-valued functions called the real and imaginary
parts of f , respectively. The function f is measurable if and only if both u
and v are measurable. We then say that f is Lebesgue integrable if the
function |f (x)| = (u(x)2 + v(x)2 )1/2 (which is non-negative) is Lebesgue
integrable in the sense defined previously.
It is clear that
The collection of all integrable functions with the above norm gives a
(somewhat imprecise) definition of the space L1 (Rd ). We also note that
kf k = 0 if and only if f = 0 almost everywhere (see Proposition 1.6),
and this simple property of the norm reflects the practice we have al-
ready adopted not to distinguish two functions that agree almost every-
where. With this in mind, we take the precise definition of L1 (Rd ) to be
the space of equivalence classes of integrable functions, where we define
two functions to be equivalent if they agree almost everywhere. Often,
however, it is convenient to retain the (imprecise) terminology that an
element f ∈ L1 (Rd ) is an integrable function, even though it is only an
equivalence class of such functions. Note that by the above, the norm
kf k of an element f ∈ L1 (Rd ) is well-defined by the choice of any inte-
grable function in its equivalence class. Moreover, L1 (Rd ) inherits the
property that it is a vector space. This and other straightforward facts
are summarized in the following proposition.
Proposition 2.1 Suppose f and g are two functions in L1 (Rd ).
(i) kaf kL1 (Rd ) = |a| kf kL1 (Rd ) for all a ∈ C.
(ii) kf + gkL1 (Rd ) ≤ kf kL1 (Rd ) + kgkL1 (Rd ) .
(iii) kf kL1 (Rd ) = 0 if and only if f = 0 a.e.
(iv) d(f, g) = kf − gkL1 (Rd ) defines a metric on L1 (Rd ).
In (iv), we mean that d satisfies the following conditions. First, d(f, g) ≥
0 for all integrable functions f and g, and d(f, g) = 0 if and only if f = g
a.e. Also, d(f, g) = d(g, f ), and finally, d satisfies the triangle inequality
d(f, g) ≤ d(f, h) + d(h, g), for all f, g, h ∈ L1 (Rd ).
A space V with a metric d is said to be complete if for every Cauchy
sequence {xk } in V (that is, d(xk , x` ) → 0 as k, ` → ∞) there exists
x ∈ V such that limk→∞ xk = x in the sense that
d(xk , x) → 0, as k → ∞.
Our main goal of completing the space of Riemann integrable functions
will be attained once we have established the next important theorem.
2 In this chapter the only norm we consider is the L1 -norm, so we often write kf k for
kf kL1 . Later, we shall have occasion to consider other norms, and then we shall modify
our notation accordingly.
70 Chapter 2. INTEGRATION THEORY
and
∞
X
g(x) = |fn1 (x)| + |fnk+1 (x) − fnk (x)|,
k=1
whenever n > N . Thus {fn } has the limit f in L1 , and the proof of the
theorem is complete.
Since every sequence that converges in the norm is a Cauchy sequence
in that norm, the argument in the proof of the theorem yields the fol-
lowing.
kf − ϕk kL1 → 0 as k → ∞.
Thus there are simple functions that are arbitrarily close to f in the L1
norm.
For (ii), we first note that by (i) it suffices to approximate simple
functions by step functions. Then, we recall that a simple function is
a finite linear combination of characteristic functions of sets of finite
measure, so it suffices to show that if E is such a set, then there is a
step function ψ so that kχE − ψkL1 is small. However, we now recall
that this argument was already carried out in the proof of Theorem 4.3,
Chapter 1. Indeed, there it is shownSthat there is an almost disjoint
M
family of rectangles {Rj } with m(E4 j=1 Rj ) ≤ 2². Thus χE and ψ =
P
j χRj differ at most on a set of measure 2², and as a result we find
that kχE − ψkL1 < 2².
By (ii), it suffices to establish (iii) when f is the characteristic function
of a rectangle. In the one-dimensional case, where f is the characteristic
function of an interval [a, b], we may choose a continuous piecewise linear
function g defined by
½
1 if a ≤ x ≤ b,
g(x) =
0 if x ≤ a − ² or x ≥ b + ²,
Invariance Properties
If f is a function defined on Rd , the translation of f by a vector h ∈ Rd
is the function fh , defined by fh (x) = f (x − h). Here we want to examine
some basic aspects of translations of integrable functions.
First, there is the translation-invariance of the integral. One way to
state this is as follows: if f is an integrable function, then so is fh and
Z Z
(4) f (x − h) dx = f (x) dx.
Rd Rd
We digress to record for later use two useful consequences of the above
invariance properties:
(i) Suppose that f and g are a pair of measurable functions on Rd so
that for some fixed x ∈ Rd the function y 7→ f (x − y)g(y) is integrable.
As a consequence, the function y 7→ f (y)g(x − y) is then also integrable
and we have
Z Z
(6) f (x − y)g(y) dy = f (y)g(x − y) dy.
Rd Rd
This follows from (4) and (5) on making the change of variables which
replaces y by x − y, and noting that this change is a combination of a
translation and a reflection.
74 Chapter 2. INTEGRATION THEORY
and
Z Z
dx dx
(8) = ²−a+d whenever a < d.
|x|≤²|x|a |x|≤1 |x| a
R dx
R dx
It can also be seen that the integrals |x|≥1 |x| a and |x|≤1 |x|a
(respec-
tively, when a > d and a < d) are finite by the argument that appears
after Corollary 1.10.
3 Fubini’s theorem
In elementary calculus integrals of continuous functions of several vari-
ables are often calculated by iterating one-dimensional integrals. We
shall now examine this important analytic device from the general point
of view of Lebesgue integration in Rd , and we shall see that a number of
interesting issues arise.
In general, we may write Rd as a product
Rd = Rd1 × Rd2 where d = d1 + d2 , and d1 , d2 ≥ 1.
A point in Rd then takes the form (x, y), where x ∈ Rd1 and y ∈ Rd2 .
With such a decomposition of Rd in mind, the general notion of a slice,
formed by fixing one variable, becomes natural. If f is a function in
Rd1 × Rd2 , the slice of f corresponding to y ∈ Rd2 is the function f y of
the x ∈ Rd1 variable, given by
f y (x) = f (x, y).
Similarly, the slice of f for a fixed x ∈ Rd1 is fx (y) = f (x, y).
In the case of a set E ⊂ Rd1 × Rd2 we define its slices by
E y = {x ∈ Rd1 : (x, y) ∈ E} and Ex = {y ∈ Rd2 : (x, y) ∈ E}.
See Figure 1 for an illustration.
Rd2
Ey
y
Ex
x Rd1
Theorem 3.1 Suppose f (x, y) is integrable on Rd1 × Rd2 . Then for al-
most every y ∈ Rd2 :
Moreover:
Z µZ ¶ Z
(iii) f (x, y) dx dy = f.
R d2 R d1 Rd
d2 y
By assumption, for each k there exists a set S∞Ak ⊂ R , so that fk is
d1
integrable on R whenever y ∈ / Ak . If A = k=1 Ak , then m(A) = 0 in
/ A, then fky is integrable on Rd1 for all k, and, by the
Rd2 , and if y ∈
monotone convergence theorem, we find that
Z Z
gk (y) = fky (x) dx increases to a limit g(y) = f y (x) dx
R d1 R d1
and combining this fact with (9) and (10), we conclude that
Z Z
g(y) dy = f (x, y) dx dy.
Rd2 Rd
Since f is integrable, the right-hand integral is finite, and this proves that
g is integrable. Consequently g(y) < ∞ a.e. y, hence f y is integrable for
a.e. y, and
Z µZ ¶ Z
f (x, y) dx dy = f (x, y) dx dy.
Rd2 R d1 Rd
(c) Suppose now E is a finite union of closed cubes whose interiors are
SK
disjoint, E = k=1 Qk . Then, if Q̃k denotes the interior of Qk , we may
write χE as a linear combination of the χQ̃k and χAk where Ak is a
subset of the boundary of Qk for k = 1, . . . , K. By our previous analysis,
we know that χQk and χAk belong to F for all k, and since Step 1
guarantees that F is closed under finite linear combinations, we conclude
that χE ∈ F, as desired.
(d) Next, we prove that if E is open and of finite measure, then χE ∈
F. This follows from taking a limit in the previous case. Indeed, by
Theorem 1.4 in Chapter 1, we may write E as a countable union of
almost disjoint closed cubes
∞
[
E= Qj .
j=1
Pk
Consequently, if we let fk = j=1 χQj , then we note that the functions
fk increase to f = χE , which is integrable since m(E) is finite. Therefore,
we may conclude by Step 2 that f ∈ F.
(e) Finally, if E is a Gδ of finite measure, then χE ∈ F. Indeed, by
definition, there exist open sets Õ1 , Õ2 , . . ., such that
∞
\
E= Õk .
k=1
Since E has finite measure, there exists an open set Õ0 of finite measure
with E ⊂ Õ0 . If we let
k
\
Ok = O0 ∩ Õj ,
j=1
Therefore
Z
χG (x, y) dx = 0 for a.e. y.
Rd1
Consequently, the slice Gy has measure 0 for a.e. y. The simple obser-
vation
R that E y ⊂ Gy then shows that E y has measure 0 for a.e. y, and
χ (x, y) dx = 0 for a.e. y. Therefore,
Rd1 E
Z µZ ¶ Z
χE (x, y) dx dy = 0 = χE ,
Rd2 Rd1 Rd
χE = χG − χG−E ,
3 Theorem 3.2 was formulated by Tonelli. We will, however, use the short-hand of
referring to it, as well as Theorem 3.1 and Corollary 3.3, as Fubini’s theorem.
82 Chapter 2. INTEGRATION THEORY
Combining (11), (12), and (13) completes the proof of Theorem 3.2.
E y = {x ∈ Rd1 : (x, y) ∈ E}
E = [0, 1] × N ⊂ R × R,
we see that
½
y [0, 1] if y ∈ N ,
E =
∅ if y ∈
/ N.
the reals, with the property that {x : x ≺ y} is a countable set for each
y ∈ R. (The construction of this ordering is discussed in Problem 5.)
Given this ordering we let
Note that for each y ∈ [0, 1], E y = {x : x ≺ y}; thus E y is countable and
m(E y ) = 0. Similarly m(Ex ) = 1, because Ex is the complement of a
denumerable set in [0, 1]. If E were measurable, it would contradict the
formula in Corollary 3.3.
In relating a set E to its slices Ex and E y , matters are straightforward
for the basic sets which arise when we consider Rd as the product Rd1 ×
Rd2 . These are the product sets E = E1 × E2 , where Ej ⊂ Rdj .
Proof. By Corollary 3.3, we know that for a.e. y ∈ Rd2 , the slice
function
(χE1 ×E2 )y (x) = χE1 (x)χE2 (y)
with the understanding that if one of the sets Ej has exterior measure
zero, then m∗ (E1 × E2 ) = 0.
84 Chapter 2. INTEGRATION THEORY
and
∞
X ∞
X
|Qk | ≤ m∗ (E1 ) + ² and |Q0` | ≤ m∗ (E2 ) + ².
k=1 `=1
S∞
Since E1 × E2 ⊂ k,`=1 Qk × Q0` , the sub-additivity of the exterior mea-
sure yields
∞
X
m∗ (E1 × E2 ) ≤ |Qk × Q0` |
k,`=1
Ã∞ !à ∞
!
X X
= |Qk | |Q0` |
k=1 `=1
≤ (m∗ (E1 ) + ²)(m∗ (E2 ) + ²).
If neither E1 nor E2 has exterior measure 0, then from the above we find
with the understanding that if one of the sets Ej has measure zero, then
m(E) = 0.
the previous proposition shows that {f˜(x, y) < a} is measurable for each
a ∈ R. Thus f˜(x, y) is a measurable function on Rd1 × Rd2 , as desired.
Finally, we return to an interpretation of the integral
R that arose first in
the calculus. We have in mind the notion that f describes the “area”
under the graph of f . Here we relate this to the Lebesgue integral and
show how it extends to our more general context.
A = {(x, y) ∈ Rd × R : 0 ≤ y ≤ f (x)}.
Then:
(i) f is measurable on Rd if and only if A is measurable in Rd+1 .
(ii) If the conditions in (i) hold, then
Z
f (x) dx = m(A).
Rd
F (x, y) = y − f (x)
86 Chapter 2. INTEGRATION THEORY
as was to be shown.
We conclude this section with a useful result.
= m(O) m(Bk ),
4 The L2 theory will be dealt with in Chapter 5, and distributions will be studied in
Book IV.
88 Chapter 2. INTEGRATION THEORY
The proof of the theorem requires only that we adapt the earlier argu-
ments carried out for Schwartz functions in Chapter 5 of Book I to the
present context. We begin with the “multiplication formula.”
Note that both integrals converge in view of the proposition above. Con-
sider the function F (ξ, y) = g(ξ)f (y)e−2πiξ·y defined for (ξ, y) ∈ Rd ×
Rd = R2d . It is measurable as a function on R2d in view of Corollary 3.7.
We now apply Fubini’s theorem to observe first that
Z Z Z Z
|F (ξ, y)| dξ dy = |g(ξ)| dξ |f (y)| dy < ∞.
Rd Rd Rd Rd
R R R ¡R ¢
Next, if we evaluate Rd Rd F (ξ, y) dξ dy by writing it as Rd Rd F (ξ, y) dξ dy
we get the left-hand side of the desired equality. Evaluating the double
integral in the reverse order gives as the right-hand side, proving the
lemma.
2
Next we consider the modulated Gaussian, g(ξ) = e−πδ|ξ| e2πix·ξ , where
for the moment δ and x are fixed, with δ > 0 and x ∈ Rd . An elementary
calculation gives5
Z
2 2
ĝ(y) = e−πδ|ξ| e2πi(x−y)·ξ dξ = δ −d/2 e−π|x−y| /δ ,
Rd
Now, for given ² > 0 we can find (by Proposition 2.5) η > 0 so small such
that kfy − f k < ² when |y| < η. Thus
Z Z
k∆δ k ≤ ² + kfy − f kKδ (y) dy ≤ ² + 2kf k Kδ (y) dy.
|y|>η |y|>η
The first inequality follows by using (i) again; the second holds because
kfy − f k ≤ kfy k + kf k = 2kf k. Therefore, with the use of (ii), the com-
bination above is ≤ 2² if δ is sufficiently small. To summarize: the right-
hand side of (16) converges to f in the L1 -norm as δ → 0, and thus
by Corollary 2.3 there is a subsequence that converges to f (x) almost
everywhere, and the theorem is proved.
Note that an immediate consequence of the theorem and the proposi-
tion is that if fˆ were in L1 , then f could be modified on a set of measure
zero to become continuous everywhere. This is of course impossible for
the general f ∈ L1 (Rd ).
5 Exercises
∗ ∗ ∗
1. Given a collection of sets
SF 1 , F2 , . . . , Fn , construct another collection F1 , F2 , . . . , FN ,
SN
n n ∗ ∗
with N = 2 − 1, so that k=1 Fk = j=1 Fj ; the collection {Fj } is disjoint; also
90 Chapter 2. INTEGRATION THEORY
S
Fk = Fj∗ ⊂Fk Fj∗ , for every k.
[Hint: Consider the 2n sets F10 ∩ F20 ∩ · · · ∩ Fn0 where each Fk0 is either Fk or Fkc .]
5. Suppose F is a closed set in R, whose complement has finite measure, and let
δ(x) denote the distance from x to F , that is,
Consider
Z
δ(y)
I(x) = dy.
R |x − y|2
(a) Prove that δ is continuous, by showing that it satisfies the Lipschitz condi-
tion
(c) Show that I(x) < ∞ for a.e. x ∈ F . This may be surprising in view of the
fact that the Lispshitz condition cancels only one power of |x − y| in the
integrand of I.
5. Exercises 91
R
[Hint: For the last part, investigate F
I(x) dx.]
Rx
8. If f is integrable on R, show that F (x) = −∞
f (t) dt is uniformly continuous.
10. Suppose f ≥ 0, and let E2k = {x : f (x) > 2k } and Fk = {x : 2k < f (x) ≤
2k+1 }. If f is finite almost everywhere, then
∞
[
Fk = {f (x) > 0},
k=−∞
R
11. Prove that if f is integrable on Rd , real-valued, and
R E f (x) dx ≥ 0 for ev-
ery measurable E, then f (x) ≥ 0 a.e. x. As a result, if E f (x) dx = 0 for every
measurable E, then f (x) = 0 a.e.
92 Chapter 2. INTEGRATION THEORY
12. Show that there are f ∈ L1 (Rd ) and a sequence {fn } with fn ∈ L1 (Rd ) such
that
kf − fn kL1 → 0,
13. Give an example of two measurable sets A and B such that A + B is not
measurable.
[Hint: In R2 take A = {0} × [0, 1] and B = N × {0}.]
π d/2
vd = .
Γ(d/2 + 1)
∞
X
F (x) = 2−n f (x − rn ).
n=1
5. Exercises 93
Prove that F is integrable, hence the series defining F converges for almost every
x ∈ R. However, observe that this series is unbounded on every interval, and in
fact, any function F̃ that agrees with F a.e is unbounded in any interval.
y
R
(a) Verify thatReach
`R slice f and´ fx is integrable. Also for all x, fx (y) dy = 0,
and hence f (x, y) dy dx = 0.
R y R
(b) However, f (x) dx = a0 if 0 ≤ y R< 1, and f y (x) dx = an − an−1 if n ≤
y < n + 1 with n ≥ 1. Hence y 7→ f y (x) dx is integrable on (0, ∞) and
Z „Z «
f (x, y) dx dy = s.
R
(c) Note that R×R
|f (x, y)| dx dy = ∞.
18. Let f be a measurable finite-valued function on [0, 1], and suppose that |f (x) −
f (y)| is integrable on [0, 1] × [0, 1]. Show that f (x) is integrable on [0, 1].
19. Suppose f is integrable on Rd . For each α > 0, let Eα = {x : |f (x)| > α}.
Prove that
Z Z ∞
|f (x)| dx = m(Eα ) dα.
Rd 0
20. The problem (highlighted in the discussion preceding Fubini’s theorem) that
certain slices of measurable sets can be non-measurable may be avoided by re-
stricting attention to Borel measurable functions and Borel sets. In fact, prove the
following:
Suppose E is a Borel set in R2 . Then for every y, the slice E y is a Borel set in
R.
94 Chapter 2. INTEGRATION THEORY
[Hint: Consider the collection C of subsets E of R2 with the property that each
slice E y is a Borel set in R. Verify that C is a σ-algebra that contains the open
sets.]
Show that f ∗ g is well defined for a.e. x (that is, f (x − y)g(y) is integrable
on Rd for a.e. x).
(d) Show that f ∗ g is integrable whenever f and g are integrable, and that
\
(f ∗ g)(ξ) = fˆ(ξ)ĝ(ξ).
23. As an application of the Fourier transform, show that there does not exist a
function I ∈ L1 (Rd ) such that
1
25. Show that for each ² > 0 the function F (ξ) = (1+|ξ|2 )²
is the Fourier transform
of an L1 function.
2 R∞
[Hint: With Kδ (x) = e−π|x| /δ δ −d/2 consider f (x) = 0
Kδ (x)e−πδ δ ²−1 dδ. Use
Fubini’s theorem to prove f ∈ L1 (Rd ), and
Z ∞ 2
fˆ(ξ) = e−πδ|ξ| e−πδ δ ²−1 dδ,
0
6 Problems
R 2π
1. If f is integrable on [0, 2π], then 0 f (x)e−inx dx → 0 as |n| → ∞.
Show as a consequence that if E is a measurable subset of [0, 2π], then
Z
m(E)
cos2 (nx + un ) dx → , as n → ∞
E 2
∞
X ∞
X
An (x) = (an cos nx + bn sin nx)
n=0 n=0
converges for x in a set of positive measure (or in particular for all x), then an → 0
and bn → 0 as n → ∞.
[Hint: Note that An (x) → 0 uniformly on a set E of positive measure.]
As a special case, note that the Lebesgue measure is invariant under rotations.
(For this special case see also Exercise 26 in the next chapter.)
The above identity can be proved using Fubini’s theorem as follows.
(a) Consider first the case d = 2, and L a “strictly” upper triangular transfor-
mation x0 = x + ay, y 0 = y. Then
Hence
Z „Z «
m(L(E)) = χE (x − ay, y) dy
R×R
Z „Z «
= χE (x, y) dx dy
R×R
= m(E),
5. There is an ordering ≺ of R with the property that for each y ∈ R the set
{x ∈ R : x ≺ y} is at most countable.
The existence of this ordering depends on the continuum hypothesis, which
asserts: whenever S is an infinite subset of R, then either S is countable, or S has
the cardinality of R (that is, can be mapped bijectively to R).6
of the other axioms of set theory, and so we are also free to accept its validity.
6. Problems 97
To deal with F 0 (x), we recall the definition of the derivative as the limit
of the quotient
F (x + h) − F (x)
when h tends to 0.
h
We note that this quotient takes the form (say in the case h > 0)
Z x+h Z
1 1
f (y) dy = f (y) dy,
h x |I| I
where we use the notation I = (x, x + h) and |I| for the length of this
interval. At this point, we pause to observe that the above expression
is the “average” value of f over I, and that in the limit as |I| → 0,
we might expect that these averages tend to f (x). Reformulating the
question slightly, we may ask whether
Z
1
lim f (y) dy = f (x)
|I| → 0
x ∈ I
|I| I
as desired.
The averaging problem has an affirmative answer, but to establish that
fact, which is qualitative in nature, we need to make some quantitative
estimates bearing on the overall behavior of the averages of f . This will
be done in terms of the maximal averages of |f |, to which we now turn.
where the supremum is taken over all balls containing the point x. In
other words, we replace the limit in the statement of the averaging prob-
lem by a supremum, and f by its absolute value.
1. Differentiation of the integral 101
for all α. Taking the limit as α tends to infinity, the third property yields
m({x : f ∗ (x) = ∞}) = 0.
The proof of inequality (1) relies on an elementary version of a Vitali
covering argument.1
1 We note that the lemma that follows is the first of a series of covering arguments that
occur below in the theory of differentiation; see also Lemma 3.9 and its corollary, as well
as Lemma 3.5, where the covering assertion is more implicit.
1. Differentiation of the integral 103
B̃
B0
In the last step we have used the fact that in Rd a dilation of a set by
δ > 0 results in the multiplication by δ d of the Lebesgue measure of this
set.
The proof of (iii) in Theorem 1.1 is now in reach. If we let Eα = {x :
f ∗ (x) > α}, then for each x ∈ Eα there exists a ball Bx that contains x,
and such that
Z
1
|f (y)| dy > α.
m(Bx ) Bx
S
Fix a compact subset K of Eα . Since K is covered by x∈Eα Bx , we
SN
may select a finite subcover of K, say K ⊂ `=1 B` . The covering lemma
guarantees the existence of a sub-collection Bi1 , . . . , Bik of disjoint balls
with
à N
! k
[ X
(3) m B` ≤ 3d m(Bij ).
`=1 j=1
104 Chapter 3. DIFFERENTIATION AND INTEGRATION
Since the balls Bi1 , . . . , Bik are disjoint and satisfy (2) as well as (3), we
find that
ÃN ! k k Z
[ X 3d X
d
m(K) ≤ m B` ≤ 3 m(Bij ) ≤ |f (y)| dy
α
`=1 j=1 j=1 Bij
Z
3d
= |f (y)| dy
α Skj=1 Bi
j
Z
3d
≤ |f (y)| dy.
α Rd
Since this inequality is true for all compact subsets K of Eα , the proof
of the weak type inequality for the maximal operator is complete.
has measure
S∞ zero, because this assertion then guarantees that the set
E = n=1 E1/n has measure zero, and the limit in (4) holds at all points
of E c .
We fix α, and recall Theorem 2.4 in Chapter 2, which states that for
each ² > 0 we may select a continuous function g of compact support with
kf − gkL1 (Rd ) < ². As we remarked earlier, the continuity of g implies
that
Z
1
lim g(y) dy = g(x), for all x.
m(B) → 0
x ∈ B
m(B) B
1
R
Since we may write the difference m(B) B
f (y) dy − f (x) as
Z Z
1 1
(f (y) − g(y)) dy + g(y) dy − g(x) + g(x) − f (x)
m(B) B m(B) B
1. Differentiation of the integral 105
we find that
¯ Z ¯
¯ 1 ¯
lim sup ¯¯ f (y) dy − f (x)¯¯ ≤ (f − g)∗ (x) + |g(x) − f (x)|,
m(B) → 0 m(B) B
x ∈ B
A 1
m(Eα ) ≤ ² + ².
α α
Since ² is arbitrary, we must have m(Eα ) = 0, and the proof of the the-
orem is complete.
Note that as an immediate consequence of the theorem applied to |f |,
we see that f ∗ (x) ≥ |f (x)| for a.e. x, with f ∗ the maximal function.
We have worked so far under the assumption that f is integrable. This
“global” assumption is slightly out of place in the context of a “local”
notion like differentiability. Indeed, the limit in Lebesgue’s theorem is
taken over balls that shrink to the point x, so the behavior of f far from
x is irrelevant. Thus, we expect the result to remain valid if we simply
assume integrability of f on every ball.
To make this precise, we say that a measurable function f on Rd
is locally integrable, if for every ball B the function f (x)χB (x) is
integrable. We shall denote by L1loc (Rd ) the space of all locally integrable
functions. Loosely speaking, the behavior at infinity does not affect the
local integrability of a function. For example, the functions e|x| and
|x|−1/2 are both locally integrable, but not integrable on Rd .
Clearly, the conclusion of the last theorem holds under the weaker
assumption that f is locally integrable.
106 Chapter 3. DIFFERENTIATION AND INTEGRATION
m(B ∩ E)
lim = 1.
m(B) → 0
x ∈ B
m(B)
Loosely speaking, this condition says that small balls around x are almost
entirely covered by E. More precisely, for every α < 1 close to 1, and
every ball of sufficiently small radius containing x, we have
m(B ∩ E) ≥ αm(B).
At this stage, two simple observations about this definition are in order.
First, x belongs to the Lebesgue set of f whenever f is continuous at x.
Second, if x is in the Lebesgue set of f , then
Z
1
lim f (y) dy = f (x).
m(B) → 0
x ∈ B
m(B) B
we must have
Z
1
lim sup |f (y) − f (x)| dy ≤ 2²,
m(B) → 0 m(B) B
x ∈ B
rectangles the existence of the limit almost everywhere and the weak
type inequality fail (see Problem 8).
A collection of sets {Uα } is said to shrink regularly to x (or has
bounded eccentricity at x) if there is a constant c > 0 such that for
each Uα there is a ball B with
chapter.
2. Good kernels and approximations to the identity 109
Z
(i) Kδ (x) dx = 1.
Rd
Z
(ii) |Kδ (x)| dx ≤ A.
Rd
We observe that these requirements are stronger and imply the conditions
in the definition of good kernels. Indeed, we first prove (ii). For that, we
use the second illustration of Corollary 1.10 in Chapter 2, which gives
Z
dx C
(5) d+1
≤ for some C > 0 and all ² > 0.
|x|≥² |x| ²
Then, using the estimates (ii0 ) and (iii0 ) when |x| < δ and |x| ≥ δ, re-
spectively, yields
Z Z Z
|Kδ (x)| dx = |Kδ (x)| dx + |Kδ (x)| dx
Rd |x|<δ |x|≥δ
Z Z
dx 1
≤A + Aδ dx
|x|<δ δd |x|≥δ |x|d+1
0 00
≤ A + A < ∞.
Finally, the last condition of a good kernel is also verified, since another
application of (5) gives
Z Z
dx
|Kδ (x)| dx ≤ Aδ
|x|≥η |x|≥η |x|d+1
0
Aδ
≤ ,
η
1/2δ
−δ 0 δ
Figure 2. An approximation to the identity
so-called unit mass at the origin or Dirac delta “function.” The latter
is heuristically defined by
½ Z
∞ if x = 0
D(x) = and D(x) dx = 1.
0 if x 6= 0
2. Good kernels and approximations to the identity 111
Then, if we set Kδ (x) = δ −d ϕ(δ −1 x), the family {Kδ }δ>0 is an approx-
imation to the identity. The simple verification is left to the reader.
Important special cases are in the next two examples.
where δ = 1/N .
We note that Examples 2 through 5 have already appeared in Book I.
We now turn to a general result about approximations to the identity
that highlights the role of the Lebesgue set.
Theorem 2.1 If {Kδ }δ>0 is an approximation to the identity and f is
integrable on Rd , then
(f ∗ Kδ )(x) → f (x) as δ → 0
for every x in the Lebesgue set of f . In particular, the limit holds for
a.e. x.
Since the integral of each kernel Kδ is equal to 1, we may write
Z
(f ∗ Kδ )(x) − f (x) = [f (x − y) − f (x)] Kδ (y) dy.
Consequently,
Z
|(f ∗ Kδ )(x) − f (x)| ≤ |f (x − y) − f (x)| |Kδ (y)| dy,
and it now suffices to prove that the right-hand side tends to 0 as δ goes
to 0. The argument we give depends on a simple result that we isolate
in the next lemma.
Lemma 2.2 Suppose that f is integrable on Rd , and that x is a point of
the Lebesgue set of f . Let
Z
1
A(r) = d |f (x − y) − f (x)| dy, whenever r > 0.
r |y|≤r
A(r) → 0 as r → 0.
Moreover, A(r) is bounded, that is, A(r) ≤ M for some M > 0 and all
r > 0.
2. Good kernels and approximations to the identity 113
By using the property (ii0 ) of the approximation to the identity, the first
term is estimated by
Z Z
c
|f (x − y) − f (x)| |Kδ (y)| dy ≤ d |f (x − y) − f (x)| dy
|y|≤δ δ |y|≤δ
≤ c A(δ).
Each term in the sum is estimated similarly, but this time by using
property (iii0 ) of approximations to the identity:
Z
|f (x − y) − f (x)| |Kδ (y)| dy
2k δ<|y|≤2k+1 δ
Z
cδ
≤ |f (x − y) − f (x)| dy
(2k δ)d+1 |y|≤2k+1 δ
Z
c0
≤ |f (x − y) − f (x)| dy
2k (2k+1 δ)d |y|≤2k+1 δ
is integrable, and
3 Differentiability of functions
We now take up the second question raised at the beginning of this
chapter, that of finding a broad condition on functions F that guarantees
the identity
Z b
(6) F (b) − F (a) = F 0 (x) dx.
a
There are two phenomena that make a general formulation of this identity
problematic. First, because of the existence of non-differentiable func-
tions,4 the right-hand side of (6) might not be meaningful if we merely
assumed F was continuous. Second, even if F 0 (x) existed for every x,
the function F 0 would not necessarily be (Lebesgue) integrable. (See
Exercise 12.)
N
X
(7) |z(tj ) − z(tj−1 )| ≤ M.
j=1
By definition, the length L(γ) of the curve is the supremum over all
partitions of the sum on the left-hand side, that is,
N
X
L(γ) = sup |z(tj ) − z(tj−1 )|.
a=t0 <t1 <···<tN =b
j=1
The answer to the first question leads directly to the class of functions
of bounded variation, a class that plays a key role in the theory of dif-
ferentiation.
Suppose F (t) is a complex-valued function defined on [a, b], and a =
t0 < t1 < · · · < tN = b is a partition of this interval. The variation of F
116 Chapter 3. DIFFERENTIATION AND INTEGRATION
and if a and b are real, then |a + ib| ≤ |a| + |b| ≤ 2|a + ib|.
Intuitively, a function of bounded variation cannot oscillate too often
with amplitudes that are too large. Some examples should help clarify
this assertion.
We first fix some terminology. A real-valued function F defined on
[a, b] is increasing if F (t1 ) ≤ F (t2 ) whenever a ≤ t1 ≤ t2 ≤ b. If the
inequality is strict, we say that F is strictly increasing.
Example 1. If F is real-valued, monotonic, and bounded, then F is of
bounded variation. Indeed, if for example F is increasing and bounded
by M , we see that
N
X N
X
|F (tj ) − F (tj−1 )| = F (tj ) − F (tj−1 )
j=1 j=1
Example 3. Let
½
xa sin(x−b ) for 0 < x ≤ 1,
F (x) =
0 if x = 0.
a = 2, b = 1
a = 1, b = 1
a = 1/2, b = 1
where the sup is over all partitions of [a, x]. The preceding definition
makes sense if F is complex-valued. The succeeding ones require that
F is real-valued. In the spirit of the first definition, we say that the
positive variation of F on [a, x] is
X
PF (a, x) = sup F (tj ) − F (tj−1 ),
(+)
where the sum is over all j such that F (tj ) ≥ F (tj−1 ), and the supremum
is over all partitions of [a, x]. Finally, the negative variation of F on
[a, x] is defined by
X
NF (a, x) = sup −[F (tj ) − F (tj−1 )],
(−)
where the sum is over all j such that F (tj ) ≤ F (tj−1 ), and the supremum
is over all partitions of [a, x].
and
(To see this, it suffices to use the definition to obtain similar estimates
for PF and NF with possibly different partitions, and then to consider a
common refinement of these two partitions.) Since we also note that
X X
F (x) − F (a) = F (tj ) − F (tj−1 ) − −[F (tj ) − F (tj−1 )],
(+) (−)
we find that |F (x) − F (a) − [PF − NF ]| < 2², which proves the first iden-
tity.
For the second identity, we also note that for any partition of a = t0 <
· · · < tN = x of [a, x] we have
N
X X X
|F (tj ) − F (tj−1 )| = F (tj ) − F (tj−1 ) + −[F (tj ) − F (tj−1 )],
j=1 (+) (−)
Once again, one can argue using common refinements of partitions in the
definitions of PF and NF to deduce the inequality PF + NF ≤ TF , and
the lemma is proved.
F (x + h) − F (x)
lim
h→0 h
exists for almost every x ∈ [a, b]. By the previous result, it suffices to
consider the case when F is increasing. In fact, we shall first also assume
that F is continuous. This makes the argument simpler. As for the
general case, we leave that till later. (See Section 3.3.) It will then
be instructive to examine the nature of the possible discontinuities of a
function of bounded variation, and reduce matters to the case of “jump
functions.”
We begin with a nice technical lemma of F. Riesz, which has the effect
of a covering argument.
G(bk ) − G(ak ) = 0.
G(ak ) + G(bk )
G(c) = ,
2
and in fact we may choose c farthest to the right in the interval (ak , bk ).
Since c ∈ E, there exists d > c such that G(d) > G(c). Since bk ∈ / E, we
must have G(x) ≤ G(bk ) for all x ≥ bk ; therefore d < bk . Since G(d) >
G(c), there exists (by continuity) c0 > d with c0 < bk and G(c0 ) = G(c),
122 Chapter 3. DIFFERENTIATION AND INTEGRATION
which contradicts the fact that c was chosen farthest to the right in
(ak , bk ). This shows that we must have G(ak ) = G(bk ), and the lemma
is proved.
Note. This result sometimes carries the name “rising sun lemma” for
the following reason. If one thinks of the sun rising from the east (at
the right) with the rays of light parallel to the x-axis, then the points
(x, G(x)) on the graph of G, with x ∈ E, are precisely the points which
are in the shade; these points appear in bold in Figure 5.
F (x + h) − F (x)
4h (F )(x) = .
h
3. Differentiability of functions 123
Thus all four Dini numbers are finite and equal almost everywhere, hence
F 0 (x) exists for almost every point x.
We recall that we assume that F is increasing, bounded, and continu-
ous on [a, b]. For a fixed γ > 0, let
3/4
1/2
1/4
Figure 6. Construction of F2
6 The reader may check that indeed this function agrees with the one given in Exercise 2
of Chapter 1.
3. Differentiability of functions 127
0 1
N
X
m(Bi ) ≥ m(E) − δ.
i=1
N
[
m(E − Bi ) < 2δ.
i=1
We now return to the situation on the real line. To complete the proof
of the theorem it suffices to show that under its hypotheses we have
F (b) = F (a), since if that is proved, we can replace the interval [a, b] by
any sub-interval. Now let E be the set of those x ∈ (a, b) where F 0 (x)
exists and is zero. By our assumption m(E) = b − a. Next, momentarily
fix ² > 0. Since for each x ∈ E we have
¯ ¯
¯ F (x + h) − F (x) ¯
lim ¯ ¯ = 0,
h→0 ¯ h ¯
130 Chapter 3. DIFFERENTIATION AND INTEGRATION
then for each η > 0 we have an open interval I = (ax , bx ) ⊂ [a, b] con-
taining x, with
since the intervals Ii are disjoint and lie in [a, b]. Next consider the
SN
complement of j=1 Ij in [a, b]. It consists of finitely many closed in-
SM
tervals k=1 [αk , βk ] with total length ≤ δ because of (9). Thus by the
absolute continuity of F (if δ is chosen appropriately in terms of ²),
PM
k=1 |F (βk ) − F (αk )| ≤ ². Altogether, then,
N
X M
X
|F (b) − F (a)| ≤ |F (bi ) − F (ai )| + |F (βk ) − F (αk )| ≤ ²(b − a) + ².
i=1 k=1
F (x+ −
n ) = F (xn ) + αn
and
F (xn ) = F (x−
n ) + θn αn , for some θn , with 0 ≤ θn ≤ 1.
132 Chapter 3. DIFFERENTIATION AND INTEGRATION
If we let
0 if x < xn ,
jn (x) = θn if x = xn ,
1 if x > xn ,
(i) J(x) is discontinuous precisely at the points {xn } and has a jump
at xn equal to that of F .
N
X ∞
X
J(x) = αn jn (x) + αn jn (x).
n=1 n=N +1
Now,
N
X X ² ²
J0 (b) − J0 (a) ≥ J0 (bj ) − J0 (aj ) > ² (bj − aj ) ≥ m(K) ≥ δ.
3 6
j=1
134 Chapter 3. DIFFERENTIATION AND INTEGRATION
Thus by (11), ²δ/6 < η, and since we are free to choose η, it follows that
δ = 0 and the theorem is proved.
In fact, because of Theorem 3.11, for any partition a = t0 < t1 < · · · <
tN = b of [a, b], we have
¯ ¯
N
X N ¯ Z tj
X ¯
¯ ¯
|F (tj ) − F (tj−1 )| = ¯ F 0 (t) dt¯
¯ tj−1 ¯
j=1 j=1
N Z
X tj
≤ |F 0 (t)| dt
j=1 tj−1
Z b
= |F 0 (t)| dt.
a
So this proves
Z b
(13) TF (a, b) ≤ |F 0 (t)| dt.
a
To prove the reverse inequality, fix ² > 0, and using Theorem 2.4 in
Chapter 2 find a step function g on [a, b], such that F 0 = g + h with
Rb Rx Rx
a
|h(t)| dt < ². Set G(x) = a g(t) dt, and H(x) = a h(t) dt. Then F =
G + H, and as is easily seen
TF (a, b) ≥ TG (a, b) − ².
Now partition the interval [a, b], as a = t0 < · · · < tN = b, so that the step
function g is constant on each of the intervals (tj−1 , tj ), j = 1, 2, . . . , N .
Then
N
X
TG (a, b) ≥ |G(tj ) − G(tj−1 )|
j=1
¯
N ¯ Z tj
¯
X ¯
¯ ¯
= ¯ g(t) dt¯
¯ tj −1 ¯
j=1
X tjZ
= |g(t)| dt
tj−1
Z b
= |g(t)| dt.
a
136 Chapter 3. DIFFERENTIATION AND INTEGRATION
Rb Rb
Since a
|g(t)| dt ≥ a
|F 0 (t)| dt − ², we obtain as a consequence that
Z b
TF (a, b) ≥ |F 0 (t)| dt − 2²,
a
Γδ
We then say that the set K has Minkowski content7 if the limit
m(K δ )
lim
δ→0 2δ
exists. When this limit exists, we denote it by M(K).
m(K δ ) m(K δ )
M∗ (K) = lim sup and M∗ (K) = lim inf
δ→0 2δ δ→0 2δ
Proof. Since the distance function and the Lebesgue measure are
invariant under translations and rotations (see Section 3 in Chapter 1
and Problem 4 in Chapter 2) we may transform the situation by an
appropriate composition of these motions. Therefore we may assume
that the end-points of the curve have been placed on the x-axis, and
thus we may suppose that z(a) = (A, 0), z(b) = (B, 0) with A < B, and
∆ = B − A (in the case A = B the conclusion is automatically verified).
By the continuity of the function x(t), there is for each x in [A, B] a
value t in [a, b], such that x = x(t). Since Q = (x(t), y(t)) ∈ Γ, the set
Chapter 7 below.
4. Rectifiable curves and the isoperimetric inequality 139
A x = x(t) B
We now pass to the proof of the proposition. Let us assume first that
the curve is simple. Let P be any partition a = t0 < t1 < · · · < tN = b
of the interval [a, b], and let LP denote the length of the corresponding
polygonal line, that is,
N
X
LP = |z(tj ) − z(tj−1 )|.
j=1
N
X
|z(bj ) − z(aj )| ≥ LP − ².
j=1
N
X X
δ
m(Γ ) ≥ m((Γj )δ ) ≥ 2δ |z(bj ) − z(aj )|.
j=1
M∗ (Γ) ≤ L.
(where z(s) has been extended outside [0, L], so that z(s) = z(0), when
s < 0, and z(s) = z(L) when s > L). Because z(s) is continuous the
supremum of h in the definition of Fn (s) can be replaced by a supremum
of countably many measurable functions, and hence each Fn is measur-
able. However, Fn (s) → 0, as n → ∞ for a.e s ∈ [a, b]. Thus by Egorov’s
theorem the convergence is uniform outside a set E² with m(E² ) < ²,
4. Rectifiable curves and the isoperimetric inequality 141
aj s0 bj
Γj
aj − s0 − ²ρ aj − s0 0 = z(s0 ) bj − s0 bj − s0 + ²ρ
since |h| ≤ ρ < r² by construction, and |z(s0 + h) − h| < ²|h| by (14). See
Figure 11. Thus (Γj )δ is contained in the rectangle
We now sum (15) over the good intervals (of which there are at most
L/ρ + 1), and (16) over the bad intervals. There are at most ²/ρ + 1
of the latter kind, since their union is included in E² and this set has
measure < ². Altogether, then,
¡ ¢
m(Γδ ) ≤ 2δL + 2δρ + O(²δ + δ 2 /ρ + ²ρ) + O (²/ρ + 1)(δ 2 + ρ2 ) ,
where in the last line we have used the fact that ² < 1 and ρ < 1. In
order to obtain a favorable estimate from this as δ → 0, we need to
choose ρ (the length of the sub-intervals) very roughly of the same size
as δ. An effective choice is ρ = δ/²1/2 . If we fix this choice and restrict
our attention to δ for which 0 < δ < ²1/2 r² , then automatically ρ < r² ,
as required by (14). Inserting ρ = δ/²1/2 in the above inequality gives
µ ¶
m(Γδ ) δ δ
≤L+O + ² + ²1/2 + ,
2δ ²1/2 ²
4. Rectifiable curves and the isoperimetric inequality 143
and thus
m(Γδ )
lim sup ≤ L + O(² + ²1/2 ).
δ→0 2δ
and that this union is disjoint. Moreover, if D(δ) is the open ball (disc)
of radius δ centered at the origin, D(δ) = {x ∈ R2 , |x| < δ}, then clearly
½
Ω+ (δ) ⊃ Ω + D(δ),
(18)
Ω ⊃ Ω− (δ) + D(δ).
144 Chapter 3. DIFFERENTIATION AND INTEGRATION
Ω+ (δ)
Ω− (δ)
Similarly, m(Ω) ≥ m(Ω− (δ)) + 2π 1/2 δ m(Ω− (δ))1/2 using the second in-
clusion in (18), which implies
Now by (17)
4π m(Ω) ≤ M∗ (Γ)2 .
5 Exercises
R
1. Suppose ϕ is an integrable function on Rd with Rd
ϕ(x) dx = 1. Set Kδ (x) =
δ −d ϕ(x/δ), δ > 0.
(c) Show that Theorem 2.3 (convergence in the L1 -norm) holds for good kernels
as well.
Thus Kδ satisfies conditions (i) and (ii) of approximations to the identity, but the
average value of Kδ is 0 instead of 1. Show that if f is integrable on Rd , then
3. Suppose 0 is a point of (Lebesgue) density of the set E ⊂ R. Show that for each
of the individual conditions below there is an infinite sequence of points xn ∈ E,
with xn 6= 0, and xn → 0 as n → ∞.
Generalize.
c
f ∗ (x) ≥ , for some c > 0 and all |x| ≥ 1.
|x|d
Conclude that f ∗ is not integrable on Rd . Then, show that the weak type estimate
c
f ∗ (x) ≥ for some c > 0 and all |x| ≤ 1/2,
|x|(log 1/|x|)
6. In one dimension there is a version of the basic inequality (1) for the maximal
function in the form of an identity. We define the “one-sided” maximal function
Z x+h
∗ 1
f+ (x) = sup |f (y)| dy.
h>0 h x
If Eα+ = {x ∈ R : f+
∗
(x) > α}, then
Z
1
m(Eα+ ) = |f (y)| dy.
α +
Eα
Rx
[Hint: Apply Lemma 3.5 to F (x) = 0 |f (y)| dy − αx. Then Eα+ is the union of
R bk
disjoint intervals (ak , bk ) with a |f (y)| dy = α(ak − bk ).]
k
5. Exercises 147
9. Let F be a closed subset in R, and δ(x) the distance from x to F , that is,
Prove that f is of bounded variation in [0, 1] if and only if a > b. Then, by tak-
ing a = b, construct (for each 0 < α < 1) a function that satisfies the Lipschitz
condition of exponent α
12. Consider the function F (x) = x2 sin(1/x2 ), x 6= 0, with F (0) = 0. Show that
F 0 (x) exists for every x, but F 0 is not integrable on [−1, 1].
13. Show directly from the definition that the Cantor-Lebesgue function is not
absolutely continuous.
F (x + h) − F (x)
D+ (F )(x) = lim sup
h → 0 h
h > 0
is measurable.
P∞
(b) Suppose J(x) = n=1 αn jn (x) is a jump function as in Section 3.3. Show
that
J(x + h) − J(x)
lim sup
h→0 h
is measurable.
17. Prove that if {K² }²>0 is a family of approximations to the identity, then
18. Verify the agreement between the two definitions given for the Cantor-Lebesgue
function in Exercise 2, Chapter 1 and in Section 3.1 of this chapter.
20. This exercise deals with functions F that are absolutely continuous on [a, b]
and are increasing. Let A = F (a) and B = F (b).
(a) There exists such an F that is in addition strictly increasing, but such that
F 0 (x) = 0 on a set of positive measure.
(b) The F in (a) can be chosen so that there is a measurable subset E ⊂ [A, B],
m(E) = 0, so that F −1 (E) is not measurable.
(c) Prove, however, that for any increasing absolutely continuous F , and E a
measurable subset of [A, B], the set F −1 (E) ∩ {F 0 (x) > 0} is measurable.
Rx
[Hint: (a) Let F (x) = a χK (x) dx, where K is the complement of a Cantor-like
set C of positive measure. For (b),
R note that F (C) is a set of measure zero. Finally,
for (c) prove first that m(O) = F −1 (O) F 0 (x) dx for any open set O.]
21. Let F be absolutely continuous and increasing on [a, b] with F (a) = A and
F (b) = B. Suppose f is any measurable function on [A, B].
(a) Show that f (F (x))F 0 (x) is measurable on [a, b]. Note: f (F (x)) need not be
measurable by Exercise 20 (b).
(b) Prove the change of variable formula: If f is integrable on [A, B], then so is
f (F (x))F 0 (x), and
Z B Z b
f (y) dy = f (F (x))F 0 (x) dx.
A a
R
[Hint: Start with the identity m(O) = F −1 (O)
F 0 (x) dx used in (c) of Exercise 20
above.]
22. Suppose that F and G are absolutely continuous on [a, b]. Show that their
product F G is also absolutely continuous. This has the following consequences.
(b) Let F be absolutely continuous in [−π, π] with F (π) = F (−π). Show that
if
Z π
1
an = F (x)e−inx dx,
2π −π
P
such that F (x) ∼ an einx , then
X
F 0 (x) ∼ inan einx .
150 Chapter 3. DIFFERENTIATION AND INTEGRATION
(a) Suppose (D+ F )(x) ≥ 0 for every x ∈ [a, b]. Then F is increasing on [a, b].
(b) If F 0 (x) exists for every x ∈ (a, b) and |F 0 (x)| ≤ M , then |F (x) − F (y)| ≤
M |x − y| and F is absolutely continuous.
[Hint: For (a) it suffices to show that F (b) − F (a) ≥ 0. Assume otherwise. Hence
with G² (x) = F (x) − F (a) + ²(x − a), for sufficiently small ² > 0 we have G² (a) =
0, but G² (b) < 0. Now let x0 ∈ [a, b) be the greatest value of x0 such that G² (x0 ) ≥
0. However, (D+ G² )(x0 ) > 0.]
F = F A + FC + FJ ,
25. The following shows the necessity of allowing for general exceptional sets of
measure zero in the differentiation Theorems 1.4, 3.4, and 3.11. Let E be any set
of measure zero in Rd . Show that:
26. An alternative way of defining the exterior measure m∗ (E) of an arbitrary set
E, as given in Section 2 of Chapter 1, is to replace the coverings ofPE by cubes
∞
with coverings by balls. That is, suppose we define S∞ mB∗ (E) as inf j=1 m(Bj ),
where the infimum is taken over all coverings E ⊂ j=1 Bj by open balls. Then
m∗ (E) = mB ∗ (E). (Observe that this result leads to an alternate proof that the
Lebesgue measure is invariant under rotations.)
Clearly m∗ (E) ≤ mB ∗ (E). Prove the reverse inequality by showingSthe follow-
ing.
P For any ² > 0, there is a collection of balls {Bj } such that E ⊂ j Bj while
j m(Bj ) ≤ m∗ (E) + ². Note also that for any preassigned δ, we can choose the
balls to have diameter < δ.
[Hint: Assume first that E is measurable, and pick O open so that O ⊃ E and
0
m(O
PN − E) < ² . Next, using Corollary
S 3.10, find balls B1 , . . . , BN such
SNthat
j=1 m(B j ) ≤ m(E) + 2²0 and m(E − N 0
j=1 Bj ) ≤ 3² . Finally, cover E − j=1 Bj
by a union of cubes, the sum of whose measures is ≤ 4²0 , and replace these cubes
by balls that contain them. For the general E, begin by applying the above when
E is a cube.]
27. A rectifiable curve has a tangent line at almost all points of the curve. Make
this statement precise.
(a) State and prove the analogues of the conditions dealing with the rectifiability
of curves and their length that are given in Theorems 3.1, 4.1, and 4.3.
m(K δ )
as δ → 0,
md−1 (B(δ))
where md−1 (B(δ)) is the measure (in Rd−1 ) of the ball defined by B(δ) =
{x ∈ Rd−1 , |x| < δ}. State and prove analogues of Propositions 4.5 and 4.7
for curves in Rd .
R
(a) R
|F (x + h) − F (x)| dx ≤ A|h|, for some constant A and all h ∈ R.
R
(b) | R F (x)ϕ0 (x) dx| ≤ A, where ϕ ranges over all C 1 functions of bounded
support with supx∈R |ϕ(x)| ≤ 1.
31. Let F be the Cantor-Lebesgue function described in Section 3.1. Consider the
curve that is the graph of F , that is, the curve given by x(t) = t and y(t) = F (t)
with 0 ≤ t ≤ 1. Prove that the length L(x) of the segment 0 ≤ t ≤ x of the curve
is given by L(x) = x + F (x). Hence the total length of the curve is 2.
|f (x) − f (y)| ≤ M |x − y|
for some M and all x, y ∈ R, if and only if f satisfies the following two properties:
6 Problems
∞
! ∞
[ X
m∗ E/ Bj =0 and |Bj | ≤ (1 + η)m∗ (E).
j=1 j=1
N
[ K
[ L
[
Ij = Ik0 ∪ I`00 .
j=1 k=1 `=1
Note that, in contrast with Lemma 1.2, the full union is covered and not merely a
part.
6. Problems 153
[Hint: Choose I10 to be an interval whose left end-point is as far left as possible.
Discard all intervals contained in I10 . If the remaining intervals are disjoint from
I10 , select again an interval as far to the left as possible, and call it I20 . Otherwise
choose an interval that intersects I10 , but reaches out to the right as far as possible,
and call this interval I100 . Repeat this procedure.]
for every x1 , x2 ∈ (a, b) and 0 ≤ θ ≤ 1. One can also observe as a consequence that
we have the following inequality of the slopes:
ϕ(x + h) − ϕ(x) ϕ(y) − ϕ(x) ϕ(y) − ϕ(y − h)
≤ ≤ ,
h y−x h
whenever x < y, h > 0, and x + h < y.
The following can then be proved.
(a) ϕ is continuous on (a, b).
(b) ϕ satisfies a Lipschitz condition of order 1 in any proper closed sub-interval
[a0 , b0 ] of (a, b). Hence ϕ is absolutely continuous in each sub-interval.
(c) ϕ0 exists at all but an at most denumerable number of points, and ϕ0 = D+ ϕ
is an increasing function with
Z y
ϕ(y) − ϕ(x) = ϕ0 (t) dt.
x
Rx
(d) Conversely, if ψ is any increasing function on (a, b), then ϕ(x) = c
ψ(t) dt
is a convex function in (a, b) (for c ∈ (a, b)).
5. Suppose that F is continuous on [a, b], F 0 (x) exists for every x ∈ (a, b), and
F 0 (x) is integrable. Then F is absolutely continuous and
Z b
F (b) − F (a) = F 0 (x) dx.
a
154 Chapter 3. DIFFERENTIATION AND INTEGRATION
x x+h y−h y
[Hint: Assume F 0 (x) ≥ 0 for a.e. x. We want to conclude that F (b) ≥ F (a). Let
E be the set of measure 0 of those x such that F 0 (x) < 0. Then according to
Exercise 25, there is a function Φ which is increasing, absolutely continuous, and for
which D+ Φ(x) = ∞, x ∈ E. Consider F + δΦ, for each δ and apply the result (a)
in Exercise 23.]
for all ϕ ∈ C 1 that have bounded support, and for which supx∈Rd |ϕ(x)| ≤ 1.
The class of functions that satisfy either (a0 ) or (b0 ) is the extension to Rd of
the class of functions of bounded variation.
(a) Prove that f1 satisfies |f1 (x) − f1 (y)| ≤ Aα |x − y|α for each 0 < α < 1.
8.∗ Let R denote the set of all rectangles in R2 that contain the origin, and with
sides parallel to the coordinate axis. Consider the maximal operator associated to
this family, namely
Z
∗ 1
fR (x) = sup |f (x − y)| dy.
R∈R m(R) R
∗
(a) Then, f 7→ fR does not satisfy the weak type inequality
∗ A
m({x : fR (x) > α}) ≤ kf kL1
α
(b) Using this, one can show that there exists f ∈ L1 (R) so that for R ∈ R
Z
1
lim sup f (x − y) dy = ∞ for almost every x.
diam(R)→0 m(R) R
[Hint: For part (a), let B be the unit ball, and consider the function ϕ(x) =
χB (x)/m(B). For δ > 0, let ϕδ (x) = δ −2 ϕ(x/δ). Then
1
(ϕδ )∗R (x) → as δ → 0,
|x1 | |x2 |
for every (x1 , x2 ), with x1 x2 6= 0. If the weak type inequality held, then we would
have
A
m({|x| ≤ 1 : |x1 x2 |−1 > α}) ≤ .
α
This is a contradiction since the left-hand side is of the order of (log α)/α as α
tends to infinity.]
4 Hilbert Spaces: An
Introduction
There are two reasons that account for the importance of Hilbert
spaces. First, they arise as the natural infinite-dimensional generaliza-
tions of Euclidean spaces, and as such, they enjoy the familiar properties
of orthogonality, complemented by the important feature of complete-
ness. Second, the theory of Hilbert spaces serves both as a conceptual
framework and as a language that formulates some basic arguments in
analysis in a more abstract setting.
For us the immediate link with integration theory occurs because of
the example of the Lebesgue space L2 (Rd ). The related example of
L2 ([−π, π]) is what connects Hilbert spaces with Fourier series. The
latter Hilbert space can also be used in an elegant way to analyze the
boundary behavior of bounded holomorphic functions in the unit disc.
A basic aspect of the theory of Hilbert spaces, as in the familiar finite-
dimensional case, is the study of their linear transformations. Given the
introductory nature of this chapter, we limit ourselves to rather brief
discussions of several classes of such operators: unitary mappings, pro-
jections, linear functionals, and compact operators.
The reader should compare those definitions with these for the space
L1 (Rd ) of integrable functions and its norm that were described in Sec-
tion 2, Chapter 2. A crucial difference is that L2 has an inner product,
which L1 does not. Some relative inclusion relations between those spaces
are taken up in Exercise 5.
The space L2 (Rd ) is naturally equipped with the following inner prod-
uct:
Z
(f, g) = f (x)g(x) dx, whenever f, g ∈ L2 (Rd ),
Rd
therefore
Z Z Z
|f + g|2 ≤ 4 |f |2 + 4 |g|2 < ∞,
|(f˜, g̃)| ≤ 1.
kf + gk2 = (f + g, f + g)
= kf k2 + (f, g) + (g, f ) + kgk2
≤ kf k2 + 2 |(f, g)| + kgk2
≤ kf k2 + 2 kf k kgk + kgk2
= (kf k + kgk)2 ,
and
∞
X
g(x) = |fn1 (x)| + |(fnk+1 (x) − fnk (x))|,
k=1
K
X
SK (f )(x) = fn1 (x) + (fnk+1 (x) − fnk (x))
k=1
and
K
X
SK (g)(x) = |fn1 (x)| + |fnk+1 (x) − fnk (x)|,
k=1
160 Chapter 4. HILBERT SPACES: AN INTRODUCTION
Theorem 1.3 The space L2 (Rd ) is separable, in the sense that there
exists a countable collection {fk } of elements in L2 (Rd ) such that their
linear combinations are dense in L2 (Rd ).
Proof. Consider the family of functions of the form rχR (x), where r
is a complex number with rational real and imaginary parts, and R is
a rectangle in Rd with rational coordinates. We claim that finite linear
combinations of these type of functions are dense in L2 (Rd ).
Suppose f ∈ L2 (Rd ) and let ² > 0. Consider for each n ≥ 1 the func-
tion gn defined by
½
f (x) if |x| ≤ n and |f (x)| ≤ n,
gn (x) =
0 otherwise.
2. Hilbert spaces 161
2 Hilbert spaces
A set H is a Hilbert space if it satisfies the following:
1 Bydefinition f ∈ L2 (Rd ) implies that |f |2 is integrable, hence f (x) is finite for a.e x.
2 Atthis stage we consider both cases, where the scalar field can be either C or R.
However, in many applications, such as in the context of Fourier analysis, one deals
primarily with Hilbert spaces over C.
162 Chapter 4. HILBERT SPACES: AN INTRODUCTION
for all f, g ∈ H.
(vi) H is separable.
CN = {(a1 , . . . , aN ) : ak ∈ C}
2. Hilbert spaces 163
One can formulate in the same way the real Hilbert space RN .
∞
à ∞
!1/2
X X
(a, b) = a k bk and kak = |ak |2 .
k=−∞ k=−∞
The inner product and norm are then defined in the same way with the
sums extending from n = 1 to ∞.
A characteristic feature of a Hilbert space is the notion of orthogo-
nality. This aspect, with its rich geometric and analytic consequences,
distinguishes Hilbert spaces from other normed vector spaces. We now
describe some of these properties.
164 Chapter 4. HILBERT SPACES: AN INTRODUCTION
2.1 Orthogonality
Two elements f and g in a Hilbert space H with inner product (·, ·) are
orthogonal or perpendicular if
(f, ej ) = aj .
Proof. We prove that each property implies the next, with the last
one implying the first.
We begin by assuming (i). Given f ∈ H with (f, ej ) = 0 for all j, we
wish to prove that f = 0. By assumption, there exists a sequence {gn }
of elements in H that are finite linear combinations of elements in {ek },
and such that kf − gn k tends to 0 as n goes to infinity. Since (f, ej ) = 0
for all j, we must have (f, gn ) = 0 for all n; therefore an application of
the Cauchy-Schwarz inequality gives
N
X
SN (f ) = ak ek , where ak = (f, ek ),
k=1
166 Chapter 4. HILBERT SPACES: AN INTRODUCTION
(f − g, ej ) = 0 for all j.
P∞
Hence f = g by assumption (ii), and we have proved that f = k=1 ak ek .
Now assume that (iii) holds. Observe from (2) that we immediately
get in the limit as N goes to infinity
∞
X
2
kf k = |ak |2 .
k=1
The first step in the proof of this fact is to recall that (by definition)
a Hilbert space H is separable. Hence, we may choose a countable col-
lection of elements F = {hk } in H so that finite linear combinations of
elements in F are dense in H.
We start by recalling a definition already used in the case of finite-
dimensional vector spaces. Finitely many elements g1 , . . . , gN are said to
be linearly independent if whenever
take ek+1 = e0k+1 /ke0k+1 k to complete the inductive step. With this we
have found an orthonormal basis for H
Note that we have implicitly assumed that the number of linearly in-
dependent elements f1 , f2 , . . . is infinite. In the case where there are only
N linearly independent vectors f1 , . . . , fN , then e1 , . . . , eN constructed
in the same way also provide an orthonormal basis for H. These two
cases are differentiated in the following definition. If H is a Hilbert space
with an orthonormal basis consisting of finitely many elements, then we
say that H is finite-dimensional. Otherwise H is said to be infinite-
dimensional.
To see this, it suffices to “polarize,” that is, to note that for any vector
space (say over C) with inner product (·, ·) and norm k · k, we have
· µ ¶¸
1 2 2 F 2 F 2
(F, G) = kF + Gk − kF − Gk + i k + Gk − k − Gk
4 i i
∞
X
U (f ) = g, where g= ak e0k .
k=1
represented by {fnk }∞ k
n=1 , fn ∈ H0 . If we define F ∈ H as represented by
n n n
the sequence {fn } with fn = fN (n) , where N (n) is so that |fN (n) − fj | ≤
1/n for j ≥ N (n), then we note that F k → F in H.
One can also observe that the completion H of H0 is unique up to
isomorphism. (See Exercise 14.)
to indicate that the sum on the right is the Fourier series of the func-
tion on the left. The theory developed thus far provides the natural
generalization of some earlier results obtained in Book I.
Theorem 3.1 Suppose f is integrable on [−π, π].
(i) If an = 0 for all n, then f (x) = 0 for a.e. x.
P∞ |n| inx
(ii) n=−∞ an r e tends to f (x) for a.e. x, as r → 1, r < 1.
The second conclusion is the almost everywhereR“Abel summability” to
1 π
f of its Fourier series. Note that since |an | ≤ 2π −π
|f (x)| dx, the series
P
an r|n| einx converges absolutely and uniformly for each r, 0 ≤ r < 1.
Proof. The first conclusion is an immediate consequence of the second.
To prove the latter we recall the identity
∞
X 1 − r2
r|n| einy = Pr (y) =
1 − 2r cos y + r2
n=−∞
for the Poisson kernel; see Book I, Chapter 2. Starting with our given
f ∈ L1 ([−π, π]) we extend it as a function on R by making it periodic of
period 2π.3 We then claim that for every x
X∞ Z π
|n| inx 1
(3) an r e = f (x − y)Pr (y) dy.
2π −π
n=−∞
3 Note that we may without loss of generality assume that f (π) = f (−π) so as to make
To apply the Rprevious results, we let H = L2 ([−π, π]) with inner prod-
1 π
uct (f, g) = 2π −π
f (x)g(x) dx, and take the orthonormal set {ek }∞ k=1
to be the exponentials {einx }∞ n=−∞ , with k = 1 when n = 0, k = 2n for
n > 0, and k = 2|n| − 1 for n < 0.
By the previous result, assertion (ii) of Theorem 2.3 holds and thus
all the other conclusions hold. We therefore havePParseval’s relation,
and from (iv) we conclude that kf − SN (f )k2 = |n|>N |an |2 → 0 as
N → ∞. Similarly, if {an } ∈ `2 (Z) is given, then kSN (f ) − SM (f )k2 →
0, as N, M → ∞. Hence the completeness of L2 guarantees that there is
an f ∈ L2 such that kf − SN (f )k → 0, and one verifies directly that f
3. Fourier series and Fatou’s theorem 173
has {an } as its Fourier coefficients. Thus we deduce that the mapping
f 7→ {an } is onto and hence unitary. This is a key conclusion that holds
in the setting on L2 and was not valid in an earlier context of Riemann
integrable functions. In fact the space R of such functions on [−π, π] is
not complete in the norm, containing as it does the continuous functions,
but R is itself restricted to bounded functions.
lim F (reiθ )
r → 1
r < 1
exists.
and the integral vanishes when n < 0. (See also Chapter 3, Section 7 in
Book II).
174 Chapter 4. HILBERT SPACES: AN INTRODUCTION
the origin and planes passing through the origin are the one-dimensional
and two-dimensional subspaces, respectively.
The subspace S is closed if whenever {fn } ⊂ S converges to some
f ∈ H, then f also belongs to S. In the case of finite-dimensional Hilbert
spaces, every subspace is closed. This is, however, not true in the gen-
eral case of infinite-dimensional Hilbert spaces. For instance, as we
have already indicated, the subspace of Riemann integrable functions
in L2 ([−π, π]) is not closed, nor is the subspace obtained by fixing a ba-
sis and taking all vectors that are finite linear combinations of these basis
elements. It is useful to note that every closed subspace S of H is itself a
Hilbert space, with the inner product on S that which is inherited from
H. (For the separability of S, see Exercise 11.)
Next, we show that a closed subspace enjoys an important character-
istic property of Euclidean geometry.
kf − g0 k = inf kf − gk.
g∈S
(f − g0 , g) = 0 for all g ∈ S.
g0
kf − gn k → d as n → ∞.
We claim that {gn } is a Cauchy sequence whose limit will be the desired
element g0 . In fact, it would suffice to show that a subsequence of {gn }
converges, and this is immediate in the finite-dimensional case because
a closed ball is compact. However, in general this compactness fails, as
we shall see in Section 6, and so a more intricate argument is needed at
this point.
To prove our claim, we use the parallelogram law, which states that
in a Hilbert space H
£ ¤
(4) kA + Bk2 + kA − Bk2 = 2 kAk2 + kBk2 for all A, B ∈ H.
1
k2f − (gn + gm )k = 2kf − (gn + gm )k ≥ 2d.
2
Therefore
£ ¤
kgm − gn k2 = 2 kf − gn k2 + kf − gm k2 − k2f − (gn + gm )k2
£ ¤
≤ 2 kf − gn k2 + kf − gm k2 − 4d2 .
kf − (g0 − ²g)k2 ≥ kf − g0 k2 .
4. Closed subspaces and orthogonal pro jections 177
H = S ⊕ S ⊥.
f = g0 + (f − g0 ).
f = g + h = g̃ + h̃ where g, g̃ ∈ S and h, h̃ ∈ S ⊥ .
PS (f ) = g, where f = g + h and g ∈ S, h ∈ S ⊥ .
Example 2. Once again, consider L2 ([−π, π]) and let S denote the
subspace that consists of all F ∈ L2 ([−π, π]) with
∞
X
F (θ) ∼ an einθ .
n=0
∞
X
P (f )(z) = an z n .
n=0
Z
1 f (ζ)
C(f )(z) = dζ,
2πi γ ζ −z
where γ denotes the unit circle and z belongs to the unit disc. Then we
have the identity
5 Linear transformations
The focus of analysis in Hilbert spaces is largely the study of their lin-
ear transformations. We have already encountered two classes of such
transformations, the unitary mappings and the orthogonal projections.
There are two other important classes we shall deal with in this chapter
in some detail: the “linear functionals” and the “compact operators,”
and in particular those that are symmetric.
Suppose H1 and H2 are two Hilbert spaces. A mapping T : H1 → H2
is a linear transformation (also called linear operator or operator)
if
kT k = inf M,
where the infimum is taken over all M so that (6) holds. A trivial example
is given by the identity operator I, with I(f ) = f . It is of course a
unitary operator and a projection, with kIk = 1.
5. Linear transformations 181
|(T f 0 , g 0 )| ≤ M.
complex numbers,
` : H → C.
`(f ) = (f, g)
S = {f ∈ H : `(f ) = 0}.
5.2 Adjoints
The first application of the Riesz representation theorem is to determine
the existence of the “adjoint” of a linear transformation.
Proposition 5.4 Let T : H → H be a bounded linear transformation.
There exists a unique bounded linear transformation T ∗ on H so that:
(i) (T f, g) = (f, T ∗ g),
(ii) kT k = kT ∗ k,
(iii) (T ∗ )∗ = T .
The linear operator T ∗ : H → H satisfying the above conditions is called
the adjoint of T .
To prove the existence of an operator satisfying (i) above, we observe
that for each fixed g ∈ H, the linear functional ` = `g , defined by
`(f ) = (T f, g),
To prove (iii), note that (T f, g) = (f, T ∗ g) for all f and g if and only
if (T ∗ f, g) = (f, T g) for all f and g, as one can see by taking complex
conjugates and reversing the roles of f and g.
We record here a few additional remarks.
(a) In the special case when T = T ∗ (we say that T is symmetric), then
This should be compared to Lemma 5.1, which holds for any linear oper-
ator. To establish (7), let M = sup{|(T f, f )| : kf k = 1}. By Lemma 5.1
it is clear that M ≤ kT k. Conversely, if f and g belong on H, then one
has the following “polarization” identity which is easy to verify
1
(T f, g) = [(T (f + g), f + g) − (T (f − g), f − g)
4
+ i (T (f + ig), f + ig) − i (T (f − ig), f − ig)].
M
|Re(T f, g)| ≤ [kf k2 + kgk2 ].
2
So if kf k ≤ 1 and kgk ≤ 1, then |Re(T f, g)| ≤ M . In general, we may
replace g by eiθ g in the last inequality to find that whenever kf k ≤ 1 and
kgk ≤ 1, then |(T f, g)| ≤ M , and invoking Lemma 5.1 once again gives
the result, kT k ≤ M .
(b) Let us note that if T and S are bounded linear transformations of H to
itself, then so is their product T S, defined by (T S)(f ) = T (S(f )). More-
over we have automatically (T S)∗ = S ∗ T ∗ ; in fact, (T Sf, g) = (Sf, T ∗ g) =
(f, S ∗ T ∗ g).
(c) One can also exhibit a natural connection between linear transforma-
tions on a Hilbert space and their associated bilinear forms. Suppose first
that T is a bounded operator in H. Define the corresponding bilinear
form B by
5.3 Examples
Having presented the elementary facts about Hilbert spaces, we now
digress to describe briefly the background of some of the early develop-
ments of the theory. A motivating problem of considerable interest was
that of the study of the “eigenfunction expansion” of a differential oper-
ator L. A particular case, that of a Sturm-Liouville operator, arises on
an interval [a, b] of R with L defined by
d2
L= − q(x),
dx2
where q is a given real-valued function. The question is then that of
expanding an “arbitrary” function in terms of the eigenfunctions ϕ, that
is those functions that satisfy L(ϕ) = µϕ for some µ ∈ R. The classi-
cal example of this is that of Fourier series, where L = d2 /dx2 on the
interval [−π, π] with each exponential einx an eigenfunction of L with
eigenvalue µ = −n2 .
When made precise in the “regular” case, the problem for L can be
resolved by considering an associated “integral operator” T defined on
L2 ([a, b]) by
Z b
T (f )(x) = K(x, y)f (y) dy,
a
LT (f ) = f.
It turns out that a key feature that makes the study of T tractable is
a certain compactness it enjoys. We now pass to the definitions and
elaboration of some of these ideas, and begin by giving two relevant
illustrations of classes of operators on Hilbert spaces.
{ϕk } if
• kT k = supk |λk |.
Uh (f )(x) = f (x + h)
6 Compact operators
We shall use the notion of sequential compactness in a Hilbert space H:
a set X ⊂ H is compact if for every sequence {fn } in X, there exists a
subsequence {fnk } that converges in the norm to an element in X.
Let H denote a Hilbert space, and B the closed unit ball in H,
B = {f ∈ H : kf k ≤ 1}.
If {ϕk }∞ 2 d
k=1 denotes an orthonormal basis for L (R ), then the collection
{ϕk (x)ϕ` (y)}k,`≥1 is an orthonormal basis for L2 (Rd × Rd ); the proof of
this simple fact is outlined in Exercise 7. As a result
∞
X P
K(x, y) ∼ ak` ϕk (x)ϕ` (y), with k,` |ak` |2 < ∞.
k,`=1
We define an operator
Z
Pn
Tn f (x) = Kn (x, y)f (y)dy, where Kn (x, y) = k,`=1 ak` ϕk (x)ϕ` (y).
Rd
T ϕk = λk ϕk ,
then λk ∈ R and λk → 0 as k → ∞.
6. Compact operators 191
where we have used in the last equality the fact that the inner product is
conjugate linear in the second variable. Since f 6= 0, we must have λ = λ
and hence λ ∈ R.
For (ii), suppose f1 and f2 have eigenvalues λ1 and λ2 , respectively.
By the previous argument both λ1 and λ2 are real, and we note that
λ1 (f1 , f2 ) = (λ1 f1 , f2 )
= (T f1 , f2 )
= (f1 , T f2 )
= (f1 , λ2 f2 )
= λ2 (f1 , f2 ).
Proof. Let Vλ denote the null-space of T − λI, that is, the eigenspace
of T corresponding to λ. If Vλ is not finite-dimensional, there exists
a countable sequence of orthonormal vectors {ϕk } in Vλ . Since T is
compact, there exists a subsequence {ϕnk } such that T (ϕnk ) converges.
192 Chapter 4. HILBERT SPACES: AN INTRODUCTION
the fact that |λnk | > µ leads to a contradiction, since {ϕk } is an or-
thonormal set and thus kλnk ϕnk − λnj ϕnj k2 = λ2nk + λ2nj ≥ 2µ2 .
λ = kT k = sup{(T f, f ) : kf k = 1},
We are now equipped with the necessary tools to prove the spectral
theorem. Let S denote the closure of the linear space spanned by all
eigenvectors of T . By Lemma 6.5, the space S is non-empty. The goal
is to prove that S = H. If not, then since
(9) S ⊕ S ⊥ = H,
7 Exercises
1. Show that properties (i) and (ii) in the definition of a Hilbert space (Section 2)
imply property (iii): the Cauchy-Schwarz inequality |(f, g)| ≤ kf k · kgk and the
triangle inequality kf + gk ≤ kf k + kgk.
[Hint: For the first inequality, consider (f + λg, f + λg) as a positive quadratic
function of λ. For the second, write kf + gk2 as (f + g, f + g).]
[Hint: For (a) consider f (x) = |x|−α , when |x| ≤ 1 or when |x| > 1.]
7. Suppose {ϕk }∞ 2 d
k=1 is an orthonormal basis for L (R ). Prove that the collection
{ϕk,j }1≤k,j<∞ with ϕk,j (x, y) = ϕk (x)ϕj (y) is an orthonormal basis of L2 (Rd ×
Rd ).
[Hint: First verify that the R{ϕk,j } are orthonormal, by Fubini’s theorem. Next,
for each j consider Fj (x) = Rd F (x, y)ϕj (y) dy. If one assumes that (F, ϕk,j ) = 0
R
for all j, then Fj (x)ϕk (x) dx = 0.]
8. Let η(t) be a fixed strictly positive continuous function on [a, b]. Define Hη =
L2 ([a, b], η) to be the space of all measurable functions f on [a, b] such that
Z b
|f (t)|2 η(t) dt < ∞.
a
(a) Show that Hη is a Hilbert space, and that the mapping U : f 7→ η 1/2 f gives
a unitary correspondence between Hη and the usual space L2 ([a, b]).
(b) As a result,
„ «n ff∞
1 i−x 1
π 1/2 i+x i+x n=−∞
P (f ) = f if f ∈ S and P (f ) = 0 if f ∈ S ⊥ .
14. Suppose H and H0 are two completions of a pre-Hilbert space H0 . Show that
there is a unitary mapping from H to H0 that is the identity on H0 .
[Hint: If f ∈ H, pick a Cauchy sequence {fn } in H0 that converges to f in H. This
sequence will also converge to an element f 0 in H0 . The mapping f 7→ f 0 gives the
required unitary mapping.]
(a) Verify that |F0 (z)| ≤ eπ/2 in the unit disc, but that limr→1 F0 (r) does not
exist.
[Hint: Note that |F0 (r)| = 1 and F0 (r) oscillates between ±1 infinitely often
as r → 1.]
∞
X
F (z) = δ j F0 (ze−iαj ),
j=1
where δ is sufficiently small. Show that limr→1 F (reiθ ) fails to exist when-
ever θ = αj , and hence F fails to have a radial limit for a dense set of points
on the unit circle.
lim F (w)
w → z
w ∈ Γs (z)
exists.
Prove that if F is holomorphic and bounded on the open unit disc, then F has
a non-tangential limit for almost every point on the unit circle.
[Hint: Show that the Poisson integral of a function f has non-tangential limits at
every point of the Lebesgue set of f .]
7. Exercises 197
Γs (z)
18. Let H denote a Hilbert space, and L(H) the vector space of all bounded linear
operators on H. Given T ∈ L(H), we define the operator norm
d(T1 , T2 ) = kT1 − T2 k
kT T ∗ k = kT ∗ T k = kT k2 = kT ∗ k2 .
21. There are several senses in which a sequence of bounded operators {Tn } can
converge to a bounded operator T (in a Hilbert space H). First, there is con-
vergence in the norm, that is, kTn − T k → 0, as n → ∞. Next, there is a weaker
convergence, which happens to be called strong convergence, that requires that
Tn f → T f , as n → ∞, for every vector f ∈ H. Finally, there is weak conver-
gence (see also Exercise 20) that requires (Tn f, g) → (T f, g) for every pair of
vectors f, g ∈ H.
(a) Show by examples that weak convergence does not imply strong convergence,
nor does strong convergence imply convergence in the norm.
(b) Show that for any bounded operator T there is a sequence {Tn } of bounded
operators of finite rank so that Tn → T strongly as n → ∞.
is compact in H.
25. Suppose T is a bounded operator that is diagonal with respect to a basis {ϕk },
with T ϕk = λk ϕk . Then T is compact if and only if λk → 0.
7. Exercises 199
26. Suppose w is a measurable function on Rd with 0 < w(x) < ∞ for a.e. x, and
K is a measurable function on R2d that satisfies:
Z
(i) |K(x, y)|w(y) dy ≤ Aw(x) for almost every x ∈ Rd , and
Rd
Z
(ii) |K(x, y)|w(x) dx ≤ Aw(y) for almost every y ∈ Rd .
Rd
28. Suppose H = L2 (B), where B is the unit ball in Rd . Let K(x, y) be a mea-
surable function on B × B that satisfies |K(x, y)| ≤ A|x − y|−d+α for some α > 0,
whenever x, y ∈ B. Define
Z
T f (x) = K(x, y)f (y)dy.
B
(c) Show that the range of λI − T is all of H if and only if the null-space of
λI − T ∗ is trivial.
30. Let H = L2 ([−π, π]) with [−π, π] identified as the unit circle. Fix a bounded
sequence {λn }∞
n=−∞ of complex numbers, and define an operator T f by
∞
X ∞
X
T f (x) ∼ λn an einx whenever f (x) ∼ an einx .
n=−∞ n=−∞
(b) Verify that T commutes with translations, that is, if we define τh (x) =
f (x − h) then
T ◦ τh = τh ◦ T for every h ∈ R.
and extended to R with period 2π. Suppose f ∈ L1 ([−π, π]) is extended to R with
period 2π, and define
Z π
1
T f (x) = K(x − y)f (y) dy
2π −π
Z π
1
= K(y)f (x − y) dy.
2π −π
(b) Show that the mapping f 7→ T f is compact and symmetric on L2 ([−π, π]).
(c) Prove that ϕ(x) ∈ L2 ([−π, π]) is an eigenfunction for T if and only if ϕ(x)
is (up to a constant multiple) equal to einx for some integer n 6= 0 with
eigenvalue 1/n, or ϕ(x) = 1 with eigenvalue 0.
(d) Show as a result that {einx }n∈Z is an orthonormal basis of L2 ([−π, π]).
Note that in Book I, Chapter 2, Exercise 8, it is shown that the Fourier series
of K is
X einx
K(x) ∼ .
n
n6=0
T (f )(t) = tf (t).
(a) Prove that T is a bounded linear operator with T = T ∗ , but that T is not
compact.
35. Let H be a Hilbert space. Prove the following variants of the spectral theorem.
202 Chapter 4. HILBERT SPACES: AN INTRODUCTION
(a) If T1 and T2 are two linear symmetric and compact operators on H that
commute (that is, T1 T2 = T2 T1 ), show that they can be diagonalized simul-
taneously. In other words, there exists an orthonormal basis for H which
consists of eigenvectors for both T1 and T2 .
8 Problems
N
X
(f, g) = aλk bλk
k=1
PN PN
if f (x) = k=1 aλk eiλk x and g(x) = k=1 bλk e
iλk x
.
4.∗ This problem provides some examples of functions that fail to have radial limits
almost everywhere.
P 2n
(a) At almost every point of the boundary unit circle, the function ∞ n=0 z
fails to have a radial limit.
P 2n P
(b) More generally, suppose F (z) = ∞ n=0 an z . Then, if |an |2 = ∞ the
functionP F fails to have radial limits at almost every boundary point. How-
ever, if |an |2 < ∞, then F ∈ H 2 (D), and we know by the proof of Theo-
rem 3.3 that F does have radial limits almost everywhere.
d2 f
L(f )(x) = − q(x)f (x).
dx2
Here the function q is continuous and real-valued on [a, b], and we assume for
simplicity that q is non-negative. We say that ϕ ∈ C 2 ([a, b]) is an eigenfunction
of L with eigenvalue µ if L(ϕ) = µϕ, under the assumption that ϕ satisfies the
boundary conditions ϕ(a) = ϕ(b) = 0. Then one can show:
(a) The eigenvalues µ are strictly negative, and the eigenspace corresponding
to each eigenvalue is one-dimensional.
(c) Let K(x, y) be the “Green’s kernel” defined as follows. Choose ϕ− (x) to be
a solution of L(ϕ− ) = 0, with ϕ− (a) = 0 but ϕ0− (a) 6= 0. Similarly, choose
ϕ+ (x) to be a solution of L(ϕ+ ) = 0 with ϕ+ (b) = 0, but ϕ0+ (b) 6= 0. Let
w = ϕ0+ (x)ϕ− (x) − ϕ0− (x)ϕ+ (x), be the “Wronskian” of these solutions, and
note that w is a non-zero constant.
Set
(
ϕ− (x)ϕ+ (y)
if a ≤ x ≤ y ≤ b,
K(x, y) = w
ϕ+ (x)ϕ− (y)
w
if a ≤ y ≤ x ≤ b.
8. Problems 205
L(T f ) = f.
d2 f df
L(f )(x) = (1 − x2 ) − 2x .
dx2 dx
11.∗ The Hermite functions hk (x) are defined by the generating identity
∞
X tk 2 2
hk (x) = e−(x /2−2tx+t ) .
k!
k=0
` d
´
(a) They satisfy the “creation” and “annihilation” identities x − dx hk (x) =
` d
´
hk+1 (x) and x + dx hk (x) = hk−1 (x) for k ≥ 0 where h−1 (x) = 0. Note
2 2
that h0 (x) = e−x /2 , h1 (x) = 2xe−x /2 , and more generally hk (x) =
2
Pk (x)e−x /2 , where Pk is a polynomial of degree k.
(b) Using (a) one sees that the hk are eigenvectors of the operator L = −d2 /dx2 +
x2 , with L(hk ) = λk hk , where λk = 2k + 1. One observes that these func-
tions are mutually orthogonal. Since
Z
[hk (x)]2 dx = π 1/2 2k k! = ck ,
R
P Hk (x)Hk (y)
(c) Suppose that K(x, y) = ∞ k=0 , and also F (x) = T (f )(x) =
R λk
R
K(x,
P y)f (y) dy. Then T is a symmetric
P∞ ak Hilbert-Schmidt operator, and
if f ∼ ∞ k=0 a k H k , then F ∼ k=0 λk H k .
One can show on the basis of (a) and (b) that whenever f ∈ L2 (R), not only is
F ∈ L2 (R), but also x2 F (x) ∈ L2 (R). Moreover, F can be corrected on a set of
measure zero, so it is continuously differentiable, F 0 is absolutely continuous, and
F 00 ∈ L2 (R). Finally, the operator T is the inverse of L in the sense that
α
1 Recall that xα = xα 1 α2
1 x2 · · · xd
d ∂ β
and ( ∂x ∂ β1
) = ( ∂x ∂
) · · · ( ∂x )βd , where α =
1 d
(α1 , . . . , αd ) and β = (β1 , . . . , βd ), with αj and βj positive integers. The order of α is
denoted by |α| and defined to be α1 + · · · + αd .
1. The Fourier transform on L2 209
Lemma 1.2 The space S(Rd ) is dense in L2 (Rd ). In other words, given
any f ∈ L2 (Rd ), there exists a sequence {fn } ⊂ S(Rd ) such that
kf − fn kL2 (Rd ) → 0 as n → ∞.
For the proof of the lemma, we fix f ∈ L2 (Rd ) and ² > 0. Then, for
each M > 0, we define
½
f (x) if |x| ≤ M and |f (x)| ≤ M ,
gM (x) =
0 otherwise.
Then, |f (x) − gM (x)| ≤ 2|f (x)|, hence |f (x) − gM (x)|2 ≤ 4|f (x)|2 , and
since gM (x) → f (x) as M → ∞ for almost every x, the dominated con-
vergence theorem guarantees that for some M , we have
Kδ (x) = δ −d ϕ(x/δ).
210 Chapter 5. HILBERT SPACES: SEVERAL EXAMPLES
Lemma 1.3 Let H1 and H2 denote Hilbert spaces with norms k · k1 and
k · k2 , respectively. Suppose S is a dense subspace of H1 and T0 : S → H2
a linear transformation that satisfies kT0 (f )k2 ≤ ckf k1 whenever f ∈ S.
1. The Fourier transform on L2 211
T (f ) = lim T0 (fn ),
n→∞
where the limit is taken in the L2 sense. Clearly, the argument in the
proof of the lemma shows that in our special case the extension F con-
tinues to satisfy the identity (3):
and satisfies again the identity kF0−1 (g)kL2 = kgkL2 . Therefore, arguing
in the same fashion as above, we can extend F0−1 to L2 (Rd ) by a limiting
argument. Then, given f ∈ L2 (Rd ), we choose a sequence {fn } in the
Schwartz space so that kf − fn kL2 → 0. We have
f = F −1 F(f ) = FF −1 (f ),
where the limit is taken in the L2 -norm. Note in fact that if χR denotes
the characteristic function of the ball {x ∈ Rd : |x| ≤ R}, then for each
R the function f χR is in both L1 and L2 , and f χR → f in the L2 -norm.
2. The Hardy space of the upper half-plane 213
(iii) The identity of the various definitions of the Fourier transform dis-
cussed above allows us to choose fˆ as the preferred notation for the
Fourier transform. We adopt this practice in what follows.
(The choice of the particular notation F̂0 will become clearer below.)
We claim that for any δ > 0 the integral (6) converges absolutely and
uniformly as long as y ≥ δ. Indeed, |F̂0 (ξ)e2πiξz | = |F̂0 (ξ)|e−2πξy , hence
by the Cauchy-Schwarz inequality
Z ∞ µZ ∞ ¶1/2 µZ ∞ ¶1/2
2πiξz 2 −4πξδ
|F̂0 (ξ)e | dξ ≤ |F̂0 (ξ)| dξ e dξ ,
0 0 0
2 Further motivation and some elementary background material may be found in The-
(Note that if ζ lies in the upper half-plane, Im(ζ) > δ, then the disc
centered at ζ of radius r belongs to R2+ .) Alternatively, integrating over
r, we have the mean-value property in terms of discs,
Z
1
(9) F (ζ) = 2 F (ζ + z) dx dy, z = x + iy.
πδ |z|<δ
and Lemma 2.8, Chapter 5 in Book I for the case of harmonic functions);
later in this chapter we in fact prove the extension of (9) to Rd .
From (9) we see from the Cauchy-Schwarz inequality that
Z
2 1
|F (ζ)| ≤ 2 |F (ζ + z)|2 dx dy.
πδ |z|<δ
Bδ (ζ)
Recalling that η > δ, we see that the last expression is in fact majorized
by
Z
2δ sup |F (x + iy)|2 dx = 2δ kF k2H 2 (R2 ) .
+
y>0 R
2
In all |F (ζ)|2 ≤ 2
πδ kF kH 2 in the half-plane Im(ζ) > 0, which proves the
lemma.
We now turn to the proof of the identity (7). Starting with F in
H 2 (R2+ ), we improve it by replacing it with the function F ² defined by
1
F ² (z) = F (z) , with ² > 0.
(1 − i²z)2
Observe that |F ² (z)| ≤ |F (z)| when Im(z) > 0; also F ² (z) → F (z) for
each such z, as ² → 0. This shows that for each y > 0, F ² (x + iy) →
216 Chapter 5. HILBERT SPACES: SEVERAL EXAMPLES
G(z) = F ² (z)e−2πizξ .
and hence
Z
sup |F̂0 (ξ)|2 e−4πξy dξ = kF k2H 2 (R2 ) < ∞.
+
y>0 R
2. The Hardy space of the upper half-plane 217
Finally this in turn implies that F̂0 (ξ) = 0 for almost every ξ ∈ (−∞, 0).
For if this were not the case, then for appropriate positive numbers a, b,
and c we could have that |F̂0R(ξ)| ≥ a for ξ in a set E in (−∞, −b), with
m(E) ≥ c. This would give |F̂0 (ξ)|2 e−4πξy dξ ≥ a2 ce4πby , which grows
indefinitely as y → ∞. The contradiction thus obtained shows that F̂0 (ξ)
vanishes almost everywhere when ξ ∈ (−∞, 0).
To summarize, for each y > 0 the function F̂y (ξ) equals F̂0 (ξ)e−2πξy ,
with F̂0 ∈ L2 (0, ∞). The Fourier inversion formula then yields the repre-
sentation (6) for an arbitrary element of H 2 , and the proof of the theorem
is concluded.
The second result we deal with may be viewed as the half-plane ana-
logue of Fatou’s theorem in the previous chapter.
Theorem 2.3 Suppose F belongs to H 2 (R2+ ). Then limy→0 F (x + iy) =
F0 (x) exists in the following two senses:
(i) As a limit in the L2 (R)-norm.
(ii) As a limit for almost every x.
Thus F has boundary values (denoted by F0 ) in either of the two senses
above. The function F0 is sometimes referred to as the boundary-value
function of f . The proof of (i) is immediate from what we already know.
Indeed, if F0 is the L2 function whose Fourier transform is F̂0 , then
Z ∞
kF (x + iy) − F0 (x)k2L2 (R) = |F̂0 (ξ)|2 |e−2πξy − 1|2 dy,
0
with
1 y
Py (x) =
π y 2 + x2
the Poisson kernel.3 This identity holds for every (x, y) ∈ R2+ and any
function f in L2 (R). To see this, we begin by noting the following ele-
mentary integration formulas:
Z ∞
i
(11) e2πiξz dξ = if Im(z) > 0,
0 2πz
3 This is the analogue in R of the identity (3) for the circle, given in Chapter 4.
218 Chapter 5. HILBERT SPACES: SEVERAL EXAMPLES
and
Z
1 y
(12) e−2π|ξ|y e2πiξx dξ = if y > 0.
R π y + x2
2
which equals
· ¸
i 1 1 1 y
+ =
2π x + iy −x + iy π y 2 + x2
by (11).
Next we establish (10) when f belongs to (say) the space S. Indeed, for
fixed (x, y) ∈ R2+ consider the function Φ(t, ξ) = f (t)e−2πiξt e−2π|ξ|y e2πiξx
on R2 = {(ξ, t)}. Since |Φ(t, ξ)| = |f (t)|e−2π|ξ|y , then (because f is rapidly
decreasing) Φ is integrable over R2 . Applying Fubini’s theorem yields
Z µZ ¶ Z µZ ¶
Φ(t, ξ) dξ dt = Φ(t, ξ) dt dξ.
R R R R
R
The right-hand side Robviously gives R fˆ(ξ)e−2π|ξ|y e2πixξ dξ, while the
left-hand side yields R f (t)Py (x − y) dt in view of (12) above. However,
if we use the relation (6) in Chapter 2 we see that
Z Z
f (t)Py (x − y) dt = f (x − t)Py (t) dt.
R R
Thus the Poisson integral representation (10) holds for every f ∈ S. For
a general f ∈ L2 (R) we consider a sequence {fn } of elements in S, so
that fn → f (and also fˆn → fˆ) in the L2 -norm. A passage to the limit
then yields the formula for f from the corresponding formula for each
fn . Indeed, by the Cauchy-Schwarz inequality we have
¯Z ¯ µZ ¶1/2
¯ ¯
¯ [fˆ(ξ) − fˆn (ξ)]e−2π|ξ|y
e2πixξ
dξ ¯ ≤ kfˆ − fˆn kL2 e−4π|ξ|y
dξ ,
¯ ¯
R R
2. The Hardy space of the upper half-plane 219
and also
¯Z ¯ µZ ¶1/2
¯ ¯
¯ [f (x − t) − fn (x − t)]Py (t) dt¯ ≤ kf − fn kL2 2
|Py (t)| dt ,
¯ ¯
R R
and the right-hand sides tend to 0 because for each fixed (x, y) ∈ R2+ the
functions e−2π|ξ|y , ξ ∈ R, and Py (t), t ∈ R, belong to L2 (R).
Having established the Poisson integral representation (10), we return
to our given element F ∈ H 2 (R2+ ). We know that there is an L2 function
F̂0 (ξ) (which vanishes when ξ < 0) such that (6) holds. With F0 the
L2 (R) function whose Fourier transform is F̂0 (ξ), we see from (10), with
f = F0 , that
Z
F (x + iy) = F0 (x − t)Py (t) dt.
R
¡R ¢1/2
However
R R
|H(x − t)|2 dt ≤ kF0 kL2 , while (as is easily seen)
2
|t|≥N
|Py (t)| dt → 0 as y → 0. Hence F (x + iy) → F0 (x) for a.e x with
220 Chapter 5. HILBERT SPACES: SEVERAL EXAMPLES
The following comments may help clarify the thrust of the above the-
orems.
(i) Let S be the subspace of L2 (R) consisting of all functions F0 arising in
Theorem 2.3. Then, since the functions F0 are exactly those functions in
L2 whose Fourier transform is supported on the half-line (0, ∞), we see
that S is a closed subspace. We might be tempted to say that S consists
of those functions in L2 that arise as boundary values of holomorphic
functions in the upper half-plane; but this heuristic assertion is not exact
if we do not add a quantitative restriction such as in the definition (5)
of the Hardy space. See Exercise 4.
(ii) Suppose we defined P to be the orthogonal projection on the subspace
[
S of L2 . Then, as is easily seen, (P f )(ξ) = χ(ξ)fˆ(ξ) for any f ∈ L2 (R);
here χ is the characteristic function of (0, ∞). The operator P is also
closely related to the Cauchy integral. Indeed, if F is the (unique)
element in H 2 (R2+ ) whose boundary function (according to Theorem 2.3)
is P (f ), then
Z
1 f (t)
F (z) = dt, z ∈ R2+ .
2πi R t − z
To prove this it suffices to verify that for any f ∈ L2 (R) and any fixed
z = x + iy ∈ R2+ , we have
Z ∞ Z
1 f (t)
fˆ(ξ)e2πiξz dξ = dt.
0 2πi R t−z
This is proved in the same way as the Poisson integral representation (10)
except here we use the identity (11) instead of (12). The details may be
left to the interested reader. Also, the reader might note the close analogy
between this version of the Cauchy integral for the upper-half plane, and
a corresponding version for the unit disc, as given in Example 2, Section 4
of Chapter 4.
(iii) In analogy with the periodic case discussed in Exercise 30 of Chap-
ter 4, we define a Fourier multiplier operator T on R to be a linear
operator on L2 (R) determined by a bounded function m (the multi-
plier), such that T is defined by the formula (T [ f )(ξ) = m(ξ)fˆ(ξ) for
2
any f ∈ L (R). The orthogonal projection P above is such an operator
and its multiplier is the characteristic function χ(ξ). Another closely
related operator of this type is the Hilbert transform H defined by
3. Constant coefficient partial differential equations 221
P = I+iH
2 . Then H is a Fourier multiplier operator corresponding to the
multiplier 1i sign(ξ). Among the many important properties of H is its
connection to conjugate harmonic functions. Indeed, for f a real-valued
function in L2 (R), f and H(f ) are, respectively, the real and imaginary
parts of the boundary values of a function in the Hardy space. More
about the Hilbert transform can be found in Exercises 9 and 10 and
Problem 5 below.
with aα ∈ C constants.
In the study of the classical examples of L, such as the wave equation,
the heat equation, and Laplace’s equation, one already sees the Fourier
transform entering in an important way.4 For general L, this key role
is further indicated by the following simple observation. If, for example,
we try to solve this equation with both u and f elements in S, then this
is equivalent to the algebraic equation
P (ξ)û(ξ) = fˆ(ξ),
where P (ξ) is the characteristic polynomial of f defined by
X
P (ξ) = aα (2πiξ)α .
|α|≤n
In a more general setting, matters are not so easy: aside from the ques-
tion of defining (13), the Fourier transform is not directly applicable;
also, solutions that we prove to exist (but are not unique!) have to be
understood in a wider sense.
Lemma 3.1 The space C0∞ (Ω) is dense in L2 (Ω) in the norm k · kL2 (Ω) .
where (·, ·) denotes the inner product on L2 (Ω) (which is the restriction
of the usual inner product on L2 (Rd )). The identity (14) is proved by
successive integration by parts. Indeed, consider first the special case
when L = ∂/∂xj , and then L∗ = −∂/∂xj . If we use Fubini’s theorem,
integrating first in the xj variable, then in this case (14) reduces to the
functions.
6 This means that the closure of the support of f , as defined in Section 1 of Chapter 2,
0 p π
8 One may write, for example, fn = f ∗ ϕ1/n , where {ϕ² } is the approximation to the
identity, as in the proof of Lemma 1.2.
3. Constant coefficient partial differential equations 225
The heart of the matter lies in an inequality that we state next, but
whose proof (which uses the Fourier transform) is postponed until the
next section.
The usefulness of this lemma comes about for the following reason.
If L is a finite-dimensional linear transformation, the solvability of L
(the fact that it is surjective) is of course equivalent with the fact that
its adjoint L∗ is injective. In effect, the lemma provides the analytic
substitute for this reasoning in an infinite-dimensional setting.
We first prove the theorem assuming the validity of the inequality in
the lemma.
Consider the pre-Hilbert space H0 = C0∞ (Ω) equipped with the inner
product and norm
Hence
kKf kL2 (Ω) = kukL2 (Ω) = kL∗ U kL2 (Ω) = kU k0 ≤ ckf kL2 (Ω) ,
This assertion follows directly from the mean-value identity (8) in Sec-
tion 2 with ζ = 0 and r = 1, via the Cauchy-Schwarz inequality. With it
we begin by factoring P :
Y Y
P (z) = (z − α) (z − β) = P1 (z)P2 (z),
|α|≥1 |β|<1
where each product is finite and taken over the roots of P whose absolute
values are ≥ 1 and < 1,Qrespectively.
Note that |P1 (0)| = |α|≥1 |α| ≥ 1.
For P2 we write
X µ ¶k
∂
L∗ = (−1)k ak ,
∂x
0≤k≤n
P
where an = (2πi)−n . If we let Q(ξ) = 0≤k≤n (−1)k ak (2πiξ)k be its
characteristic polynomial, then we note that
d
L∗ ψ(ξ) = Q(ξ)ψ̂(ξ) whenever ψ ∈ C0∞ (R).
ter 2 and Exercise 26 in Chapter 3), integration in ξ can be carried out in the new
coordinates as well.
230 Chapter 5. HILBERT SPACES: SEVERAL EXAMPLES
where Py (x) is the analogous Poisson kernel for the upper half-plane. A
somewhat similar convolution formula was obtained when Ω is a strip.
Also, the Dirichlet problem can be solved explicitly for certain Ω by using
conformal mappings.11
In general, however, there are no explicit solutions, and other methods
must be found. An idea that was used intially was based on an approach
of wide utility in mathematics and physics: to find the equilibrium state
of a system one seeks to minimize an appropriate “energy” or “action.”
In the present case the role of this energy is played by the Dirichlet
integral, which is defined for appropriate functions U by
Z Z ¯ ¯ ¯ ¯
¯ ∂U ¯2 ¯ ∂U ¯2
D(U ) = |∇U |2 = ¯ ¯ +¯ ¯
¯ ¯ ¯ ∂x2 ¯ dx1 dx2 .
Ω Ω ∂x1
(Note the similarity with the expression of the “potential energy” in the
case of the vibrating string in Chapters 3 and 6 of Book I.) In fact,
P
10 The Laplacian of a function u in Rd is defined by 4u = dk=1 ∂ 2 u/∂x2k .
11 The close relation between conformal maps and the Dirichlet problem is discussed in
the last part of Section 1 of Chapter 8, in Book II.
4*. The Dirichlet principle 231
that approach underlies the proof Riemann proposed for his well-known
mapping theorem. About this early history R. Courant has written:
We then note that D(u) = hu, ui. If v is any function in C 2 (Ω) with
v|∂Ω = 0, then for all ² we have
since u + ²v and u have the same boundary values, and u minimizes the
Dirichlet integral. We note, however, that
Hence
and since ² can be both positive or negative, this can happen only if
Rehu, vi = 0. Similarly, considering the perturbation u + i²v, we find
Imhu, vi = 0, and therefore hu, vi = 0. An integration by parts then pro-
vides
Z
0 = hu, vi = − (4u)v
Ω
In the limit as ² tends to 0, we find that the minimum value of the integral
D(ϕ) is zero. This minimum value cannot be reached by a C 1 function
satisfying the boundary conditions, since D(ϕ) = 0 implies ϕ0 (x) = 0 and
thus ϕ is constant.
4*. The Dirichlet principle 233
n=0
Thus
ZZ ï ¯ ¯ ¯ ! Z ρZ ï ¯ 2 ¯ ¯2 !
¯ ∂u ¯2 ¯ ∂u ¯2 2π ¯ ∂u ¯ ¯ ¯
¯ ¯ +¯ ¯ dx1 dx2 = ¯ ¯ + ¯ ∂u ¯ dθrdr
1
¯ ∂x1 ¯ ¯ ∂x2 ¯ ¯ ∂r ¯ r ¯ ∂θ ¯
2
Dρ 0 0
where Dρ is the disc of radius 0 < ρ < 1 centered at the origin. Since
Note that the left-hand side of (20) is well-defined for any u that is inte-
grable on compact subsets of Ω. Thus, in particular, a weakly harmonic
function needs to be defined only almost everywhere. Clearly, however,
any harmonic function is weakly harmonic.
Another notion is the mean-value property generalizing the iden-
tity (9) in Section 2 for holomorphic functions. A continuous function u
defined in Ω satisfies this property if
Z
1
(21) u(x0 ) = u(x) dx
m(B) B
Proof. Since the sets Ω and ∂Ω are compact and u is continuous, the
two maxima above are clearly attained. We suppose that maxx∈Ω |u(x)|
is attained at an interior point x0 ∈ Ω, for otherwise there is nothing to
prove. R
1
Now by the mean-value property, |u(x0 )| ≤ m(B) B
|u(x)| dx. If for
0 0
some point x ∈ B we had |u(x )| < |u(x0 )|, then a similar inequality
would hold in a small neighborhood of x0 ,R and since |u(x)| ≤ |u(x0 )|
1
throughout B, the result would be that m(B) B
|u(x)| dx < |u(x0 )|, which
is a contradiction. Hence |u(x)| = |u(x0 )| for each x ∈ B. Now this is
true for each ball Br of radius r, centered at x0 , such that Br ⊂ Ω. Let
r0 be the least upper bound of such r; then B r0 intersects the boundary
Ω at some point x̃. Since |u(x)| = |u(x0 )| for all x ∈ B r , r < r0 , it follows
by continuity that |u(x̃)| = |u(x0 )|, proving the corollary.
Turning to the proofs of the theorems, we first establish a variant
of Green’s formula (for the unit ball) that does not explicitly involve
boundary terms.14 Here u, v, and η are assumed to be twice continuously
differentiable functions in a neighborhood of the closure of B, but η is
also supposed to be supported in a compact subset of B.
14 The more usual version requires integration over the (boundary) sphere, a topic
deferred to the next chapter. See also Exercises 6 and 7 in that chapter.
236 Chapter 5. HILBERT SPACES: SEVERAL EXAMPLES
d
X ∂v ∂η
∇v · ∇η = ,
∂xj ∂xj
j=1
This yields the lemma if we subtract from this the symmetric formula
with u and v interchanged.
We shall apply the lemma when u is a given harmonic function, while
v is one of the three following “test” functions: first, v(x) = 1; second,
v(x) = |x|2 ; and third, v(x) = |x|−d+2 if d ≥ 3, while v(x) = log |x| if
d = 2. The relevance of these choices arises because 4v = 0 in the first
case, while 4v is a non-zero constant in the second case; also v in the
third case is a constant multiple of a “fundamental solution,” and in
particular v(x) is harmonic for x 6= 0.
When v(x) = 1, we take η = η²+ , where η²+ (x) = 1 for |x| ≤ 1 − ²,
η²+ (x)
= 0 for 1, and |∇η²+ (x)| ≤ c/². We accomplish this by setting
³ |x| ≥ ´
|x|−1+²
η²+ (x) =χ ² for 1 − ² ≤ |x| ≤ 1, where χ is a fixed C 2 function
on [0, 1] that equals 1 in [0, 1/4] and equals 0 in [3/4, 1]. A picture of η²+
is given in Figure 3.
Since u is harmonic, we see that with v = 1, Lemma 4.5 implies
Z
(22) ∇u · ∇η²+ dx = 0.
B
Next we take v(x) = |x| ; then clearly 4v = 2d, and with η = η²+ the
2
lemma yields:
Z Z Z
2d uη²+ dx = |x|2 (∇u · ∇η²+ ) dx − 2 u(x · ∇η²+ ) dx.
B B B
4*. The Dirichlet principle 237
1−² 1 |x|
0 ² 1−² 1 |x|
R
Now the first integral is (−d + 2) S²+ u|x|−d (x · ∇η²+ ) dx, which by (23)
R
tends to c B u dx as ² → 0, where c is the constant (2 − d)d, since |x|−d −
1 = O(²) over S²+ . The second term tends to zero as ² → 0 because of (22)
and the fact that the integrand there is supported in the shell S²+ . A
similar argument for d = 2, with v(x) = log |x|, yields the result with
c = 1.
To consider the contribution near the origin, that is, over B² , we tem-
porarily make the additional assumption that u(0) = 0. Then because
of the differentiability assumption satisfied by a harmonic function, we
have u(x)
R = O(|x|) as |x| → 0. Now over B² we have two terms, the first
being B² u∇(|x|−d+2 )∇η² dx, which is majorized by
Z µZ ¶
O(²)|x|−d+1 O(1/²) dx ≤ O |x|−d+1 dx ≤ O(²),
B² |x|≤²
using the result just cited. We have used the fact that ∇u is bounded
and ∇η² is O(1/²) throughout B. Letting ² → 0 we see that this term
tends to zero also. A similar argument works when d = 2.
Thus we have proved that if u is harmonic Rin a neighborhood of the
closure of the unit ball B, and u(0) = 0, then B u dx = 0. We can drop
the assumption u(0) = 0 by applying the conclusion we have just reached
to u(x) − u(0) in place of u(x). Therefore we have achieved the mean-
value property (21) for the unit ball.
Now suppose Br (x0 ) = {x : |x − x0 | < r} is the ball of radius r cen-
tered at x0 , and consider U (x) = u(x0 + rx). If we suppose that u is har-
monic in Br (x0 ), then clearly U is harmonic in the unit ball (indeed, the
property of being harmonic is unchanged under translations x → x + x0
and dilations x → rx, as is easily verified). Thus if u were supported
R in Ω,
1
and Br (x0 ) ⊂ Ω, then by the result just proved U (0) = m(B) B
U (x) dx,
which means that
Z Z
1 1
u(x0 ) = u(x0 + rx) dx = d u(x0 + x) dx
m(B) |x|≤1 r m(B) |x|≤r
Z
1
= u(x) dx,
m(Br (x0 )) Br
and this
R is u(x0 ) if we use (25) again, this time with ψ = 1, and recall
that ϕ(y) dy = 1. We have therefore proved the lemma.
We see from this that every continuous function which satisfies the
mean-value property is its own regularization! To be precise, we have
d
X d
1 X
(27) u(x0 + x) − u(x0 ) = aj x j + ajk xj xk + ²(x),
2
j=1 j,k=1
R
where ²(x) = O(|x|3 ) as |x| → 0. We note next that |x|≤r xj dx = 0 and
R
x x dx = 0 for all j and k with k 6= j. This follows by carrying
|x|≤r j k
out the integrations first in the xj variable and noting that the integral
vanishes
R because
R xj is2 an odd function. Also by an obvious symmetry
2
x dx = |x|≤r xk dx, and by the relative dilation-invariance (see
|x|≤r j R
Section 3, Chapter 1) these are equal to r2 |x|≤r (x1 /r)2 dx =
R
rd+2 |x|≤1 x21 dx = crd+2 , with c > 0. We now integrate both sides of (27)
over the ball {|x| ≤ r}, divide by rd , and use the mean-value property.
The result is that
d µ Z ¶
c 2X cr2 1
r ajj = (4u)(x0 ) = O |²(x)| dx = O(r3 ).
2 2 rd |x|≤r
j=1
by Fubini’s theorem, and the inner integral vanishes for y, |y| ≤ 1, be-
cause it equals (u, 4ψr ), with ψr = ψ(x + ry). Thus we have
(u ∗ ϕr , 4ψ) = 0,
where of course
d
X ∂u ∂v
∇u · ∇v = .
∂xj ∂xj
j=1
Now since |Ij | ≤ d(Ω), summing over the disjoint intervals Ij gives
Z Z
|v(x1 , x0 )|2 dx1 ≤ d(Ω)2 |∇v(x1 , x0 )|2 dx1 ,
J(x0 ) J(x0 )
With these preliminaries out of the way, we first try to solve the bound-
ary value problem with f given on ∂Ω under the additional assumption
that f is the restriction to ∂Ω of a function F in C 1 (Ω). (How this
additional hypothesis can be removed will be explained below.) Fol-
lowing the prescription of Dirichlet’s principle, we seek a sequence {un }
with un ∈ C 1 (Ω) and un |∂Ω = F |∂Ω , such that the Dirichlet integrals
kun k2 converge to a minimum value. This means that un = F − vn ,
with vn ∈ S0 , and that limn→∞ kun k minimizes the distance from F to
S0 . Since S = S0 , this sequence also minimizes the distance from F to
S in H.
Now what do the elementary facts about orthogonal projections teach
us? According to the proof of Lemma 4.1 in the previous chapter, we
conclude that the sequence {vn }, and hence also the sequence {un },
both converge in the norm of H, the former having a limit PS (F ). Now
applying Lemma 4.9 to vn − vm we deduce that {vn } and {un } are also
Cauchy in the L2 (Ω)-norm, and thus converge also in the L2 -norm. Let
u = limn→∞ un . Then
(31) u = F − PS (F ).
In fact, supposing we can deal with the first issue raised, then with the
lemma we proceed as follows. We find the functions Un that are har-
monic in Ω, continuous on Ω, and such that Un |∂Ω = Fn |∂Ω . Now since
the {Fn } converges uniformly (to f ) on ∂Ω, it follows by the maximum
principle that the sequence {Un } converges uniformly to a function u
that is continuous on Ω, has the property that u|∂Ω = f , and which is
moreover harmonic (by Corollary 4.8 above). This achieves our goal.
The proof of Lemma 4.10 is based on the following extension principle.
Lemma 4.11 Let f be a continuous function on a compact subset Γ of
Rd . Then there exists a function G on Rd that is continuous, and so that
G|∂Γ = f .
Proof. We begin with the observation that if K0 and K1 are two
disjoint compact sets, there exists a continuous function 0 ≤ g(x) ≤ 1 on
Rd which takes the value 0 on K0 and 1 on K1 . Indeed, if d(x, Ω) denotes
the distance from x to Ω, we see that
d(x, K0 )
g(x) =
d(x, K0 ) + d(x, K1 )
has the required properties.
Now, we may assume without loss of generality that f is non-negative
and bounded by 1 on Γ. Let
1
¡ 2 ¢N −1
and 0 ≤ GN ≤ 3 3 on Rd . If we define
∞
X
G= Gn ,
n=1
Ω
α
x
`
T0
T
∂Ω
(4) The conditions on Ω in this theorem are not optimal: one can con-
struct examples of Ω when the problem is solvable for which the above
regularity fails.
For more details on the above, see Exercise 19 and Problem 4.
We turn to the proof of the theorem. It is based on the following
proposition, which may be viewed as a refined version of Lemma 4.9
above.
Proposition 4.13 For any bounded open set Ω in R2 that satisfies the
outside-triangle condition there are two constants c1 < 1 and c2 > 1 such
that the following holds. Suppose z is a point in Ω whose distance from
∂Ω is δ. Then whenever v belongs to C 1 (Ω) and v|∂Ω = 0, we have
Z Z
2 2
(32) |v(x)| dx ≤ Cδ |∇v(x)|2 dx.
Bc1 δ (z) Bc2 δ (z)∩Ω
The bound C can be chosen to depend only on the diameter of Ω and the
parameters ` and α which determine the triangles T .
Bc2 δ (z) Ω
Bc1 δ (z)
Let us see how the proposition proves the theorem. We have already
shown that it suffices to assume that f is the restriction to ∂Ω of an
F that belongs to C 1 (Ω). We recall we had the minimizing sequence
un = F − vn , with vn ∈ C 1 (Ω) and vn |∂Ω = 0. Moreover, this sequence
converges in the norm of H and L2 (Ω) to a limit v, such that u = F − v
is harmonic in Ω. Then since (32) holds for each vn , it also holds for
v = F − u; that is,
Z Z
2 2
(33) |(F − u)(x)| dx ≤ Cδ |∇(F − u)(x)|2 dx.
Bc1 δ (z) Bc2 δ (z)∩Ω
250 Chapter 5. HILBERT SPACES: SEVERAL EXAMPLES
The absolute continuity of the integral guarantees that the last integral
tends to zero with δ, since m(Bc2 δ ) → 0. However, by the mean-value
property, Av(u)(z) = u(z), while by the continuity of F in Ω,
Z
1
Av(F )(z) = F (x) dx → f (y),
m(Bc1 δ (z)) Bc δ (z)
1
because F |∂Ω = f and z → y. Altogether this gives u(z) → f (y), and the
theorem is proved, once the proposition is established.
To prove the proposition, we construct for each z ∈ Ω whose distance
from ∂Ω is δ, and for δ sufficiently small, a rectangle R with the following
properties:
γ
∂Ω
γ
Side of triangle
x2
angle of the side of the triangle to the x2 -axis. This angle can be taken
to be γ, with γ > α/4. (See Figure 7.)
There is an alternate possibility that occurs with this figure reflected
through the x2 -axis.
With this picture in mind we construct the rectangle R as indicated
in Figure 8.
It has its long side parallel to the x2 -axis, contains the disc Bc1 δ (z),
and every segment R parallel to the x2 -axis intersects the (extension) of
the side of the triangle.
Note that the coordinates of z are (−δ sin γ, δ cos γ). We choose c1 <
sin γ, then Bc1 δ (z) lies in the same (left) half-plane as z.
We next focus our attention on two points: P1 , which lies on the x1 -
axis at the intersection of this axis with the far side of the rectangle; and
P2 , which is at the corner of that side of the rectangle, that is, at the
intersection of the (continuation) of the side of the outside triangle and
the further side of the rectangle. The coordinates of P1 are (−a, 0), where
a = δc1 + δ sin γ. The coordinates of P2 are (−a, −a cos γ
sin γ ). Note that the
distance of P2 from the origin is a/ sin γ, which is δ + c1 δ/ sin γ ≤ 2δ,
since c1 < sin γ.
Now we observe that the length of the larger side of the rectangle is
the sum of the part that lies above the x1 -axis and the part that lies
below. The upper part has length the sum of the radius of the disc plus
the height of z, and this is c1 δ + δ cos γ ≤ 2δ. The lower part has length
equal to a/ tan γ, which is δ cos γ + δc1 cos γ
sin γ ≤ 2δ, since c1 < sin γ. Thus
252 Chapter 5. HILBERT SPACES: SEVERAL EXAMPLES
Bc1 δ (z)
γ
∂Ω
P1
x1
x2
P2
However, we note that Bc1 δ (z) ⊂ R, and Bc2 δ (z) ⊃ R when c2 ≥ 2. Thus
the desired inequality (32) is established, still under the assumption that
δ is small, that is, δ ≤ `/2. When δ > `/2 it suffices merely to use the
crude estimate (29) and the proposition is then proved. The proof of the
theorem is therefore complete.
5 Exercises
\
(c) Establish (f ∗ k)(ξ) = k̂(ξ)fˆ(ξ) for a.e. ξ.
Prove that (2π)−1/2 M extends to a unitary operator from L2 (R+ , dt/t) to L2 (R).
The Mellin transform serves on R+ , with its multiplicative structure, the same
purpose as the Fourier transform on R, with its additive structure.
4. Consider F (z) = ei/z /(z + i) in the upper half-plane. Note that F (x + iy) ∈
L2 (R), for each y > 0 and y = 0. Observe also that F (z) → 0 as |z| → 0. However,
F ∈/ H 2 (R2+ ). Why?
5. For a < b, let Sa,b denote the strip {z = x + iy, a < y < b}. Define H 2 (Sa,b )
to consist of the holomorphic functions F in Sa,b so that
Z
kF k2H 2 (Sa,b ) = sup |F (x + iy)|2 dx < ∞.
a<y<b R2
254 Chapter 5. HILBERT SPACES: SEVERAL EXAMPLES
Define H 2 (Sa,∞ ) and H 2 (S−∞,b ) to be the obvious variants of the Hardy spaces
for the half-planes {z = x + iy, y > a} and {z = x + iy, y < b}, respectively.
(a) Show that F ∈ H 2 (Sa,b ) if and only if F can be written as
Z
F (z) = f (ξ)e−2πizξ dξ,
R
R
with R
|f (ξ)|2 (e4πaξ + e4πbξ ) dξ < ∞.
(c) Show that lima<y<b,y→a F (x + iy) = Fa (x) exists in the L2 -norm and also
almost everywhere, with a similar result for lima<y<b,y→b F (x + iy).
c
[Hint: Prove that for f ∈ H, we have |f (z)| ≤ d(z,Ωc )
kf k for z ∈ Ω, where c =
−1/2
π , using the mean-value property (9). Thus if {fn } is a Cauchy sequence in
H, it converges uniformly on compact subsets of Ω.]
(a) If {ϕn }∞
n=0 is an orthonormal basis of H, then
∞
X c2
|ϕn (z)|2 ≤ for z ∈ Ω.
n=0
d(z, Ωc )
(c) To prove (b) it is useful to characterize the function B(z, w), called the
Bergman kernel, by the following property. Let T be the linear transfor-
mation on L2 (Ω) defined by
Z
T f (z) = B(z, w)f (w) du dv, w = u + iv.
Ω
n
Also, the sequence { z (n+1) ∞
π 1/2
}n=0 is an orthonormal basis of H. Moreover,
in this case
1
B(z, w) = .
π(1 − zw)2
8. Continuing with Exercise 6, suppose Ω is the upper half-plane R2+ . Then every
f ∈ H has a representation
√ Z ∞
(34) f (z) = 4π fˆ0 (ξ)e2πiξz dξ, z ∈ R2+ ,
0
R∞
where 0 |fˆ0 (ξ)|2 dξ
ξ
< ∞. Moreover, the mapping fˆ0 → f given by (34) is a uni-
tary mapping from L2 ((0, ∞), dξ
ξ
) to H.
(c) If δa denotes the dilation operator, δa (f )(x) = f (ax) with a > 0, then H
commutes with δa , δa H = Hδa .
A converse is given in Problem 5 below.
10. Let f ∈ L2 (R) and let u(x, y) be the Poisson integral of f , that is u = (f ∗
Py )(x), as given in (10) above. Let v(x, y) = (Hf ∗ Py )(x), the Poisson integral of
the Hilbert transform of f . Prove that:
(a) F (x + iy) = u(x, y) + iv(x, y) is analytic in the half-plane R2+ , so that u and
v are conjugate harmonic functions. We also have f = limy→0 u(x, y) and
Hf = limy→0 v(x, y).
1
R dt
(b) F (z) = πi R
f (t) t−z .
1 x
(c) v(x, y) = f ∗ Qy , where Qy (x) = π x2 +y 2
is the conjugate Poisson kernel.
i
[Hint: Note that πz
= Py (x) + iQy (x), z = x + iy.]
(a) Assume d ≥ 2. Show that for each constant coefficient partial differential
operator L, there are unbounded connected open sets Ω for which the above
holds for all u ∈ C0∞ (Ω).
(b) Show that kukL2 (Rd ) ≤ ckL(u)kL2 (Rd ) for all u ∈ C0∞ (Rd ) if and only if
|P (ξ)| ≥ c > 0 all ξ, where P is the characteristic polynomial of L.
[Hint: For (a) consider first L = (∂/∂x1 )n and a strip {x : −1 < x1 < 1}.]
14. Suppose F and G are two integrable functions on a bounded interval [a, b].
Show that G is the weak derivative of F if and only if F can be corrected on a set
of measure 0, such that F is absolutely continuous and F 0 (x) = G(x) for almost
every x.
[Hint: If G is the weak derivative of F , use an approximation to show that
Z b Z b
G(x)ϕ(x)dx = − F (x)ϕ0 (x)dx
a a
0 α−h α β β+h
15. Suppose f ∈ L2 (Rd ). Prove that there exists g ∈ L2 (Rd ) such that
„ «α
∂
f (x) = g(x)
∂x
in the weak sense, if and only if
16. Sobolev embedding theorem. Suppose n is the smallest integer > d/2. If
„ «α
∂
f ∈ L2 (Rd ) and f ∈ L2 (Rd )
∂x
in the weak sense, for all 1 ≤ |α| ≤ n, then f can be modified on a set of measure
zero so that f is continuous and bounded.
[Hint: Express f in terms of fˆ, and show that fˆ ∈ L1 (Rd ) by the Cauchy-Schwarz
inequality.]
17. The conclusion of the Sobolev embedding theorem fails when n = d/2. Con-
sider the case d = 2, and let f (x) = (log 1/|x|)α η(x), where η is a smooth cut-
off function with η = 1 for x near the origin, but η(x) = 0 if |x| ≥ 1/2. Let
0 < α < 1/2.
(a) Verify that ∂f /∂x1 and ∂f /∂x2 are in L2 in the weak sense.
(b) Show that f cannot be corrected on a set of measure zero such that the
resulting function is continuous at the origin.
Then
X
P (ξ) = aα (2πiξ)α
|α|≤n
P
(a) Check that L is elliptic if and only if |α|=n aα (2πξ)α vanishes only when
ξ = 0.
19. Suppose u is harmonic in the punctured unit disc D∗ = {z ∈ C : 0 < |z| < 1}.
(a) Show that if u is also continuous at the origin, then u is harmonic throughout
the unit disc.
[Hint: Show that u is weakly harmonic.]
(b) Prove that the Dirichlet problem for the punctured unit disc is in general
not solvable.
20. Let F be a continuous function onR the closure D of the unit disc. Assume that
F is in C 1 on the (open) disc D, and D |∇F |2 < ∞.
P∞Let f (eiθ ) denote the restriction
P∞ of F to the unit circle, and write f (eiθ ) ∼
inθ
n=−∞ an e . Prove that n=−∞ |n| |an |2 < ∞.
P R
[Hint: Write F (reiθ ) ∼ ∞ n=−∞ Fn (r)e
inθ
, with Fn (1) = an . Express D |∇F |2 in
polar coordinates, and use the fact that
Z 1 Z 1
1
|F (1)|2 ≤ L−1 |F 0 (r)|2 dr + L |F (r)|2 dr,
2 1/2 1/2
6 Problems
1. Suppose F0 (x) ∈ L2 (R). Then a necessary and sufficient condition that there
exists an entire analytic function F , such that |F (z)| ≤ Aea|z| for all z ∈ C, and
F0 (x) = F (x) a.e. x ∈ R, is that F̂0 (ξ) = 0 whenever |ξ| > a/2π.
R∞
[Hint: Consider the regularization F ² (z) = −∞ F (z − t)ϕ² (t) dt and apply to it
the considerations in Theorem 3.3 of Chapter 4 in Book II.]
Dj
3.∗ Suppose the bounded domain Ω has as its boundary a closed simple continuous
curve. Then the boundary value problem is solvable for Ω. This is because there
260 Chapter 5. HILBERT SPACES: SEVERAL EXAMPLES
Domain I Domain II
Figure 11. Domains with a cusp
The set I has as its boundary a smooth curve, with the exception of an (inside)
cusp. The set II is similar, except it has an outside cusp. Both I and II fall
within the scope of the result of Problem 3, and hence the boundary value problem
is solvable in each case. However, II satisfies the outside-triangle condition while
I does not.
Next, for large N , choose Φ so that it equals 1 in the ball |ξ| ≤ N . Then m(ξ) =
T̂ (Φ)(ξ) for |ξ| ≤ N .]
As a consequence of this theorem show that if T is a bounded operator on L2 (R)
that commutes with translations and dilations (as in Exercise 9 above), then
(a) If (T f )(−x) = T (f (−x)) it follows T = cI, where c is an appropriate con-
stant and I the identity operator.
(b) If (T f )(−x) = −T (f (−x)), then T = cH, where c is an appropriate constant
and H the Hilbert transform.
where the supremum is taken over all balls containing the point x.
Complete the following outline to prove that there exists a constant C so that
In other words, the map that takes f to f ∗ (although not linear) is bounded
on L2 (Rd ). This differs notably from the situation in L1 (Rd ), as we observed in
Chapter 3.
(i) µ∗ (∅) = 0.
(ii) If E1 ⊂ E2 , then µ∗ (E1 ) ≤ µ∗ (E2 ).
(iii) If E1 , E2 , . . . is a countable family of sets, then
Ã∞ ! ∞
[ X
µ∗ Ej ≤ µ∗ (Ej ).
j=1 j=1
In other words, E separates any set A in two parts that behave well
in regard to the exterior measure µ∗ . For this reason, (1) is sometimes
referred to as the separation condition. One can show that in Rd with the
Lebesgue exterior measure the notion of measurability (1) is equivalent
1. Abstract measure spaces 265
≥ µ∗ (A).
Therefore all the inequalities above are equalities, and we conclude that
G ∈ M, as desired. Moreover, by taking A = G in the above, we find
that µ∗ is countably additive on M, and the proof of the theorem is
complete.
Our previous observation that sets of exterior measure 0 are Carathéodory
measurable shows that the measure space (X, M, µ) in the theorem
is complete: whenever F ∈ M satisfies µ(F ) = 0 and E ⊂ F , then
E ∈ M.
The last property is of course called the triangle inequality, and a func-
tion d that satisfies all these conditions is called a metric on X. For
example, the set Rd with d(x, y) = |x − y| is a metric space. Another
example is provided by the space of continuous functions on a compact
set K with d(f, g) = supx∈K |f (x) − g(x)|.
A metric space (X, d) is naturally equipped with a family of open balls.
Here
defines the open ball of radius r centered at x. Together with this, we say
that a set O ⊂ X is open if for any x ∈ O there exists r > 0 so that the
open ball Br (x) is contained in O. A set is closed if its complement is
open. With these definitions, one checks easily that an (arbitrary) union
of open sets is open, and a similar intersection of closed sets is closed.
Finally, on a metric space X we can define, as in Section 3 of Chapter 1,
the Borel σ-algebra, BX , that is the smallest σ-algebra of sets in X
that contains the open sets of X. In other words BX is the intersection
of all σ-algebras that contain the open sets. Elements in BX are called
Borel sets.
We now turn our attention to those exterior measures on X with the
special property of being additive on sets that are “well separated.” We
show that this property guarantees that this exterior measure defines a
measure on the Borel σ-algebra. This is achieved by proving that all
Borel sets are Carathéodory measurable.
Given two sets A and B in a metric space (X, d), the distance between
A and B is defined by
This property played a key role in the case of exterior Lebesgue measure.
An = {x ∈ F c ∩ A : d(x, F ) ≥ 1/n}.
S∞
Then An ⊂ An+1 , and since F is closed we have F c ∩ A = n=1 An .
Also, the distance between F ∩ A and An is ≥ 1/n, and since µ∗ is a
metric exterior measure, we have
1
d(Bn+1 , An ) ≥ .
n(n + 1)
Indeed, if x ∈ Bn+1 and d(x, y) < 1/n(n + 1) the triangle inequality shows
that d(y, F ) < 1/n, hence y ∈
/ An . Therefore
k
X
µ∗ (A2k+1 ) ≥ µ∗ (B2j ).
j=1
k
X
µ∗ (A2k ) ≥ µ∗ (B2j−1 ).
j=1
P P
Since µ∗ (A) is finite, we find that both series µ∗ (B2j ) and µ∗ (B2j−1 )
are convergent. Finally, we note that
∞
X
µ∗ (An ) ≤ µ∗ (F c ∩ A) ≤ µ∗ (An ) + µ∗ (Bj ),
j=n+1
1. Abstract measure spaces 269
and this proves the limit (3). Letting n tend to infinity in the inequal-
ity (2) we find that µ∗ (A) ≥ µ∗ (F ∩ A) + µ∗ (F c ∩ A), and hence F is
measurable, as was to be shown.
Given a metric space X, a measure µ defined on the Borel sets of X
will be referred to as a Borel measure. Borel measures that assign a
finite measure to all balls (of finite radius) also satisfy a useful regularity
property. The requirement that µ(B) < ∞ for all balls B is satisfied in
many (but not in all) circumstances that arise in practice.1 When it does
hold, we get the following proposition.
Proof.
S∞ We need the following preliminary observation. Suppose
∗
F = k=1 Fk , where the Fk are closed sets. Then for any ² > 0, we can
find a closed set F ⊂ F ∗ such that µ(F ∗ − F ) < ². To prove this we can
assume that the sets {Fk } are increasing. Fix a point x0 ∈SX, and let Bn
∞
denote the ball {x : d(x, x0 ) < n}, with B0 = {∅}. Since n=1 Bn = X,
we have that
[
F∗ = F ∗ ∩ (B n − Bn−1 ).
1 This restriction is not always valid for the Hausdorff measures that are considered in
(i) µ0 (∅) = 0.
(ii) If
S∞ E1 , E2 , . . . is a countable collection of disjoint sets in A with
k=1 Ek ∈ A, then
à ∞
! ∞
[ X
µ0 Ek = µ0 (Ek ).
k=1 k=1
S∞
the sets Ek0 are disjoint elements of A, Ek0 ⊂ Ek and E = k=1 Ek0 . By
(ii) in the definition of a premeasure, we have
∞
X ∞
X
µ0 (E) = µ0 (Ek0 ) ≤ µ0 (Ek ).
k=1 k=1
∞
X
µ0 (Ej ) ≤ µ∗ (A) + ².
j=1
≥ µ∗ (E ∩ A) + µ∗ (E c ∩ A).
272 Chapter 6. ABSTRACT MEASURE AND INTEGRATION THEORY
One notes below that µ is the only such extension of µ0 under the as-
sumption that µ is σ-finite.
Proof. The exterior measure µ∗ induced by µ0 defines a measure µ on
the σ-algebra of Carathéodory measurable sets. Therefore, by the result
in the previous lemma, µ is also a measure on M that extends µ0 . (We
should observe that in general the class M is not as large as the class of
all sets that are measurable in the sense of (1).)
To prove that this extension is unique whenever µ is σ-finite, we argue
as follows. Suppose that ν is another measure on M that coincides with
µ0 on A, and supposeSthat F ∈ M has finite measure. We claim that
µ(F ) = ν(F ). If F ⊂ Ej , where Ej ∈ A, then
∞
X ∞
X
ν(F ) ≤ ν(Ej ) = µ0 (Ej ),
j=1 j=1
so
S that ν(F ) ≤ µ(F ). To prove the reverse inequality, note that if E =
Ej , then the fact that ν and µ are two measures that agree on A gives
n
[ n
[
ν(E) = lim ν( Ej ) = lim µ( Ej ) = µ(E).
n→∞ n→∞
j=1 j=1
If the sets Ej are chosen so that µ(E) ≤ µ(F ) + ², then the fact that
µ(F ) < ∞ implies µ(E − F ) ≤ ², and therefore
Proposition 1.6 For any set E and any ² > 0, there are sets E1 ∈
Aσ and E2 ∈ Aσδ , such that E ⊂ E1 , E ⊂ E2 , and µ∗ (E1 ) ≤ µ∗ (E) + ²,
while µ∗ (E2 ) = µ∗ (E).
Measurable functions
A function f on X with values in the extended real numbers is measur-
able if
N
X
ak χ Ek ,
k=1
where Ek are measurable sets of finite measure and ak are real numbers.
Approximations by simple functions played an important role in the defi-
nition of the Lebesgue integral. Fortunately, this result continues to hold
in our abstract setting.
|ϕk (x)| ≤ |ϕk+1 (x)| and lim ϕk (x) = f (x) for all x.
k→∞
The proof of this result can be obtained with some obvious minor
modifications of the proofs of Theorems 4.1 and 4.2 in Chapter 1. Here,
one makes use of the technical S condition imposed on X, that of being σ-
finite. Indeed, if we write X = Fk , where Fk ∈ M are of finite measure,
then the sets Fk play the role of the cubes Qk in the proof of Theorem 4.1,
Chapter 1.
Another important result that generalizes immediately is Egorov’s the-
orem.
• Suppose {fk }∞
k=1 is a sequence of measurable functions defined on
a measurable set E ⊂ X with µ(E) < ∞, and fk → f a.e. Then
for each ² > 0 there is a set A² with A² ⊂ E, µ(E − A² ) ≤ ², and
such that fk → f uniformly on A² .
2. Integration on a measure space 275
and consequently
Z Z
fn dµ → f dµ as n → ∞.
276 Chapter 6. ABSTRACT MEASURE AND INTEGRATION THEORY
2
Similarly we can define
R L (X,2µ) to be the equivalence class of measurable
functions for which X |f (x)| dµ(x) < ∞. The norm is then
µZ ¶1/2
2
(5) kf kL2 (X,µ) = |f (x)| dµ(x) .
X
3 Examples
We now discuss some useful examples of the general theory.
and B ∈ M2 . We then let A denote the collection of all sets in X that are
finite unions of disjoint measurable rectangles. It is easy to check that A
is an algebra of subsets of X. (Indeed, the complement of a measurable
rectangle is the union of three disjoint such rectangles, while the union
of two measurable rectangles is the disjoint union of at most six such
rectangles.) From now on we abbreviate our terminology by referring to
measurable rectangles simply as “rectangles.”
On the rectangles we define the function µ0 by µ0 (A × B) = µ1 (A)µ2 (B).
Now the fact that µ0 has a unique extension to the algebra A for which
µ0 becomes a premeasure is a consequence of the following fact: when-
ever a rectangle A × B is the disjoint
S∞ union of a countable collection of
rectangles {Aj × Bj }, A × B = j=1 Aj × Bj , then
∞
X
(6) µ0 (A × B) = µ0 (Aj × Bj ).
j=1
Hence integrating in
Px∞1 and using the monotone convergence theorem we
get µ1 (A)µ2 (B) = j=1 µ1 (Aj )µ2 (Bj ), which is (6).
Now that we know that µ0 is a premeasure on A, we obtain from The-
orem 1.5 a measure (which we denote by µ = µ1 × µ2 ) on the σ-algebra
M of sets generated by the algebra A of measurable rectangles. In this
way, we have defined the product measure space (X1 × X2 , M, µ1 × µ2 ).
Proof. One notes first that all the assertions hold immediately when
E is a (measurable) rectangle. Next suppose E is a set in Aσ . Then we
can decompose it as a countable union of disjoint rectangles S Ej . (If the
Ej are not already disjoint we only need to replace the Ej by k≤j Ek −
S S∞
k≤j−1 Ek .) Then for each x2 we have E
x2
= j=1 Ejx2 , and we observe
that {Ejx2 } are disjoint sets. Thus by (7) applied to each rectangle Ej
and the monotone convergence theorem we get our conclusion for each
set E ∈ Aσ .
Next assume E ∈ Aσδ and that (µ1 × µ2 )(E) < ∞. ThenTthere is
∞
a sequence {Ej } of sets with Ej ∈ Aσ , Ej+1 ⊂ Ej , and E = j=1 Ej .
x2
We let fj (x2 ) = µ1 (Ej ) and f (x2 ) = µ1 (E x2 ). To see that E x2 is µ1 -
measurable and f (x2 ) is well-defined, note that E x2 is the decreasing
limit of the sets Ejx2 , which we have seen by the above are measur-
able. Moreover, since E1 ∈ Aσ and (µ1 × µ2 )(E1 ) < ∞, we see that
fj (x2 ) → f (x2 ), as j → ∞ for each x2 . Thus f (x2 ) is measurable. How-
ever, {fj (x2 )} is a decreasing sequence of non-negative functions, hence
Z Z
f (x2 ) dµ2 (x) = lim fj (x2 ) dµ2 (x),
X2 j→∞ X2
and therefore (7) is proved in the case when (µ1 × µ2 )(E) < ∞. Now
since we assumed both µ1 and µ2 are σ-finite, we can find sequences F1 ⊂
F
S2∞⊂ · · · ⊂ Fj ⊂S· ∞
· · ⊂ X1 and G1 ⊂ G2 ⊂ · · · ⊂ Gj ⊂ · · · ⊂ X2 , with
F
j=1 j = X1 , j=1 Gj = X2 , µ1 (Fj ) < ∞, and µ2 (Gj ) < ∞ for all j.
Then we merely need to replace E by Ej = E ∩ (Fj × Gj ), and let j → ∞
to obtain the general result.
We now extend the result in the above proposition to an arbitrary
measurable set E in X1 × X2 , that is, E ∈ M, the σ-algebra generated
by the measurable rectangles.
Proof. Note that if the desired conclusions hold for finitely many
functions, they also hold for their linear combinations. In particular it
suffices to assume that f is non-negative. When f = χE , where E is a set
of finite measure, what we wish to prove is contained in Proposition 3.2.
Hence the desired result also holds for simple functions. Therefore by
the monotone convergence theorem it is established for all non-negative
functions, and the theorem is proved.
We remark that in general the product space (X, M, µ) constructed
above is not complete. However, if we define the completed space (X, M, µ)
as in Exercise 2, the theorem continues to hold in this completed space.
The proof requires only a simple modification of the argument in Propo-
sition 3.2.
Our intention here is to deal with the formula that, with appropriate
definitions and under suitable hypotheses, states:
Z Z µZ ∞ ¶
d−1
(9) f (x) dx = f (rγ)r dr dσ(γ).
Rd S d−1 0
intervals E1 = (a, b), and thus for all open sets. Thus we have m(E1 ×
E2 ) = µ1 (E1 )µ2 (E2 ) for all open sets E1 , and hence for all closed sets,
and therefore for all Lebesgue measurable sets. (In fact, we can find
sets F1 ⊂ E1 ⊂ O1 with F1 closed and O1 open, such that m1 (O1 ) − ² ≤
m1 (E1 ) ≤ m1 (F1 ) + ², and apply the above to F1 × E2 and O1 × E2 .)
So we have established the identity (10) for all measurable rectangles
and as a result for all finite unions of measurable rectangles. This is
the algebra A that occurs in the proof of Theorem 3.3, and hence by
the uniqueness in Theorem 1.5, the identity extends to the σ-algebra
generated by A, which is the σ-algebra M on which the measure µ is
defined. To summarize, whenever E ∈ M, the assertion (9) holds for
f = χE .
d
To go further we note that any openS∞ set in R − {0} can be written
as a countable union of rectangles j=1 Aj × Bj , where Aj and Bj are
open in (0, ∞) and S d−1 , respectively. (This small technical point is
taken up in Exercise 12.) It follows that any open set is in M, and
therefore so is any Borel set. Thus (9) is valid for χE whenever E is
any Borel set in Rd − {0}. The result then goes over to any Lebesgue
set E 0 ⊂ Rd − {0}, since such a set can be written as a disjoint union
E 0 = E ∪ Z, where E is a Borel set and Z ⊂ F , with F a Borel set
of measure zero. To finish the proof we follow the familiar steps of
deducing (9) for simple functions, and then by monotonic convergence
for non-negative integrable functions, and from that for the general case.
S∞
where the infimum is taken over all coverings of E of the form j=1 (aj , bj ].
It is easy to verify that µ∗ is an exterior measure on R. We observe
next that µ∗ ((a, b]) = (F (b) − F (a)), if a < b. Clearly µ∗ ((a, b])
S∞≤ F (b) −
F (a), since (a, b], then covers itself. Next, suppose that j=1 (aj , bj ]
covers (a, b]; then it covers [a0 , b] for any a < a0 < b. However, by the
right-continuity of F , if ² > 0 is given, we can always choose S b0j > bj such
∞
that F (bj ) ≤ F (bj ) + ²/2 . Now the union of open intervals j=1 (aj , b0j )
0 j
SN
covers [a0 , b]. By the compactness of this interval, j=1 (aj , b0j ) covers
[a0 , b] for some N . Thus since F is increasing we have
N
X N
X
F (b) − F (a0 ) ≤ F (b0j ) − F (aj ) ≤ (F (bj ) − F (aj ) + ²/2j )
j=1 j=1
≤ µ∗ ((a, b]) + ².
We may assume, after subdividing the intervals (aj , bj ] into smaller half-
open intervals, that each interval in the covering has length less than δ.
When this is so each interval can intersect at most one of the two sets E1
or E2 . If we denote by J1 and J2 the sets of those indices for which (aj , bj ]
intersects ES1 and E2 , respectively, then S
J1 ∩ J2 is empty; moreover, we
have E1 ⊂ j∈J1 (aj , bj ] as well as E2 ⊂ j∈J2 (aj , bj ]. Therefore
X X
µ∗ (E1 ) + µ∗ (E2 ) ≤ F (bj ) − F (aj ) + F (bj ) − F (aj )
j∈J1 j∈J2
∞
X
≤ F (bj ) − F (aj ) ≤ µ∗ (E1 ∪ E2 ) + ².
j=1
note that if, for instance, x0 > 0, the sets En = (0, x0 + 1/n] decrease
to E = (0, x0 ] as n → ∞, hence µ(En ) → µ(E), since µ(E1 ) < ∞. This
means that F (x0 + 1/n) → F (x0 ). Since F is increasing, this implies
that F is right-continuous at x0 . The argument for any x0 ≤ 0 is similar,
and thus the theorem is proved.
(ii) If {Ej }∞
j=1 are disjoint subsets of M, then
Ã∞ ! ∞
[ X
ν Ej = ν(Ej ).
j=1 j=1
P
Note that for this to hold the sum ν(Ej ) must be independent of
S∞
the rearrangements of terms, so that if ν( j=1 Ej ) is finite, it implies
that the sum converges absolutely.
Examples of signed measures arise naturally if we drop the assumption
that f be non-negative in the expression
Z
ν(E) = f dµ,
E
∞
X
|ν|(E) = sup |ν(Ej )|,
j=1
where the supremum S is taken over all partitions of E, that is, over all
∞
countable unions E = j=1 Ej , where the sets Ej are disjoint and belong
to M.
The fact that |ν| is actually additive is not obvious, and is given in the
proof below.
Consequently, taking the supremum over the numbers αj gives the first
inequality in (11).
For the reverse inequality, let Fk be any other partition of E. For a
fixed k, {Fk ∩ Ej }j is a partition of Fk , so
¯ ¯
X X ¯¯X ¯
¯
|ν(Fk )| = ¯ ν(Fk ∩ Ej )¯ ,
¯ ¯
k k j
By the proposition we see that ν + and ν − are measures, and they clearly
satisfy
ν = ν+ − ν− and |ν| = ν + + ν − .
In the general situation the relation between the two conditions (12)
and (14) is clarified by the following observation.
Proposition 4.2 The assertion (14) implies (12). Conversely, if |ν| is
a finite measure, then (12) implies (14).
That (12) is a consequence of (14) is obvious because µ(E) = 0 gives
|ν(E)| < ² for every ² > 0. To prove the converse, it suffices to consider
the case when ν is positive, upon replacing ν by |ν|. We then assume
that (14) does not hold. This means that it fails for some fixed ² >
−n
0. Hence for each n, there is a measurable set E Tn∞with∗ µ(En ) < 2∗
∗
while ν(En ) ≥ ². Now let E =P
S lim supn→∞ En = n=1 En , where En =
∗ k n−1
k≥n Ek . Then since µ(En ) ≤ k≥n 1/2 = 1/2 , and the decreasing
sets {Ek } are contained in a set of finite measure (E1∗ ), we get µ(E ∗ ) = 0.
∗
1
Note that f ∈ L (X, µ), since νa (X) ≤ ν(X) < ∞. If µ and ν are
S σ-finite
and positive we may clearly find sets Ej ∈ M such that X = Ej and
and then we can write for each j, νj = νj,a + νj,s where νj,s ⊥ µj and
νj,a = fj dµj . Then it suffices to set
X X X
f= fj , νs = νj,s , and νa = νj,a .
292 Chapter 6. ABSTRACT MEASURE AND INTEGRATION THEORY
νa − νa0 = νs0 − νs .
5* Ergodic theorems
Ergodic theory had its beginnings in certain problems in statistical me-
chanics studied in the late nineteenth century. Since then it has grown
rapidly and has gained wide influence in a number of mathematical disci-
plines, in particular those related to dynamical systems and probability
theory. It is not our purpose to try to give an account of this broad
and fascinating theory. Rather, we restrict our presentation to some of
the basic limit theorems that lie at its foundation. These theorems are
most naturally formulated in the general context of abstract measure
spaces, and thus for us they serve as excellent illustrations of the general
framework developed in this chapter.
The setting for the theory is a σ-finite measure space (X, M, µ) en-
dowed with a mapping τ : X → X such that whenever E is a measurable
subset of X, then so is τ −1 (E), and µ(τ −1 (E)) = µ(E). Here τ −1 (E) is
the pre-image of E under τ ; that is, τ −1 (E) = {x ∈ X : τ (x) ∈ E}. A
mapping τ with these properties is called a measure-preserving trans-
formation. If in addition for such a τ we have the feature that it is a
bijection and τ −1 is also a measure-preserving transformation, then τ is
referred to as a measure-preserving isomorphism.
Let us note that if τ is a measure-preserving transformation, then
f (τ (x)) is measurable if f (x) is measurable, and is integrable if f is
integrable; moreover, then
Z Z
(18) f (τ (x)) dµ(x) = f (x) dµ(x).
X X
(i) Here X = Z, the integers, with µ its counting measure; that is,
µ(E) = #(E) = the number of integers in E, for any E ⊂ Z. We
define τ to be the unit translation, τ : n 7→ n + 1. Note that τ gives
a measure-preserving isomorphism of Z.
(iii) Here X is the unit circle, given as R/Z, with the measure induced
from Lebesgue measure on R. That is, we may realize X as the unit
interval (0, 1], and take µ to be the Lebesgue measure restricted
to this interval. For any real number α, the translation x 7→ x +
α, taken modulo Z, is well defined on X = R/Z, and is measure-
preserving. (See the related Exercise 3 in Chapter 2.) It can be
interpreted as a rotation of the circle by angle 2πα.
E1 E2 E
0 1/2 1 0 1/2 1
Having pointed out these examples, we can now return to the general
theory. The notions described above are of interest, in part, because they
abstract the idea of a dynamical system, one whose totality of states is
represented by the space X, with each point x ∈ X giving a particular
state of the system. The mapping τ : X → X then describes the trans-
formation of the system after a unit of time has elapsed. For such a
system there is often associated a notion of “volume” or “mass” that is
unchanged by the evolution, and this is the role of the invariant measure
µ. The iterates, τ n = τ ◦ τ ◦ · · · ◦ τ (n times) describe the evolution of
the system after n units of time, and a principal concern is the average
behaviour, as n → ∞, of various quantities associated with the system.
Thus one is led to study averages
n−1
1X
(19) An (f )(x) = f (τ k (x)),
n
k=0
(21) kT f k = kf k,
5*. Ergodic theorems 295
where k · k denotes the Hilbert space (that is, the L2 ) norm. This is clear
from (18) with f replaced by |f |2 . Observe that if τ were also supposed
to be a measure-preserving isomorphism, then T would be invertible and
hence unitary; but we do not assume this.
Now with T as above, consider the subspace S of invariant vec-
tors, S = {f ∈ H : T (f ) = f }. Clearly, because of (21), the subspace
S is closed. Let P denote the orthogonal projection on this subspace.
The theorem that follows deals with the “mean” convergence, meaning
convergence in the norm.
(i) S = S∗ .
n−1
1X k
An (f0 ) = T (f0 ) = f0 = P (f ) for every n ≥ 1.
n
k=0
n−1 n−1
1X k 1X k
An (f10 ) = T (1 − T )(g) = T (g) − T k+1 (g)
n n
k=0 k=0
1
= (g − T n (g)).
n
Since T is an isometry, the above identity shows that An (f10 ) converges
to 0 in the norm as n → ∞.
For the last term, we use once again the fact that each T k is an isometry
to obtain
n−1
1X k
kAn (f1 − f10 )k ≤ kT (f1 − f10 )k ≤ kf1 − f10 k < ².
n
k=0
Finally, from (22) and the above three observations, we deduce that
lim supn→∞ kAn (f ) − P (f )k ≤ ², and this concludes the proof of the the-
orem.
A
(24) µ({x : f ∗ (x) > α}) ≤ kf kL1 (X,µ) for all α > 0.
α
There are several proofs of this theorem. The one we choose emphasizes
the close connection to the maximal function given in Section 1.1 of
Chapter 3, and we shall in fact deduce the present theorem from the
one-dimensional case of that chapter. This argument gives the value
A = 6 for the constant in (24). By a different argument one can obtain
A = 1, but this improvement is not relevant in what follows.
Before beginning the proof, we make some preliminary remarks. Note
that in the present case the function f ∗ is automatically measurable,
since it is the supremum of a countable number of measurable functions.
Also, we may assume that our function f is non-negative, since otherwise
we may replace it by |f |.
f (n) f˜(x)
−1 n=0 1 2 −1 x=0 1 2
Figure 2. Extension of f to R
S
Similarly, if E ⊂ Z, denote by Ẽ the set in R given by Ẽ = n∈E [n, n +
1).
R Note thatP as a result of these definitions we have m(Ẽ) = #(E) and
R
f (x) dx = n∈Z f (n), and thus kf˜kL1 (R) = kf kL1 (Z) . Here m is the
˜
Lebesgue measure on R, and # is the counting measure on Z. Note also
298 Chapter 6. ABSTRACT MEASURE AND INTEGRATION THEORY
that
m−1
X Z m
f (n + k) = f˜(n + t) dt.
k=0 0
Rm Rm
However, because 0
f˜(n + t) dt ≤ −1
f˜(x + t) dt whenever x ∈ [n, n +
1), we see that
m−1 µ ¶ Z m
1 X m+1 1
f (n + k) ≤ f˜(x + t) dt if x ∈ [n, n + 1).
m m m + 1 −1
k=0
Taking the supremum over all m ≥ 1 in the above and noting that (m +
1)/m ≤ 2, we obtain
To be clear about the notation here: f ∗ (n) denotes the maximal function
of f on Z defined by (23), with f (τ k (n)) = f (n + k), while (f˜)∗ is the
maximal function as defined in Chapter 3, of the extended function f˜
on R.
By (25)
6
(26) #({n : f ∗ (n) > α}) ≤ kf kL1 (Z) ,
α
since kf˜kL1 (R) = kf kL1 (Z) . This disposes of the special case when X = Z.
Step 2. The general case.
By a sleight-of-hand we shall “transfer” the result for Z just proved to
the general case. We proceed as follows.
For every positive integer N , we consider the truncated maximal func-
∗
tion fN defined as
m−1
∗ 1 X
fN (x) = sup f (τ k (x)).
1≤m≤N m
k=0
5*. Ergodic theorems 299
∗ ∗
Since {fN (x)} forms an increasing sequence with N , and limN →∞ fN (x) =
∗
f (x) for every x, it suffices to show that
∗ A
(27) µ{x : fN (x) > α} ≤ kf kL1 (X,µ) ,
α
with constant A independent of N . Letting N → ∞ will then give the
desired result.
So in place of f ∗ we estimate fN
∗
, and to simplify our notation we write
∗
the latter as f , dropping the N subscript. Our argument will compare
the maximal function f ∗ with the special case arising for Z. To clarify
the formula below we temporarily adopt the expedient of denoting the
second maximal function by M(f ). Thus for a positive function f on Z
we set
m−1
1 X
M(f )(n) = sup f (n + k).
1≤m m
k=0
Then
m−1 m−1
1 X 1 X
Am (f )(x) = f (τ k (x)) = F (x, k).
m m
k=0 k=0
Thus
Because of the maximal estimate (26) for Z, we see that the integrand
above is no more than
b−1
A A X
kFb (x, n)kL1 (Z) = f (τ n (x)),
α α
n=0
with of course A = 6. R n
R Hence, integrating this over X and recalling that X f (τ (x)) dµ =
X
f (x) dµ gives us
A A
aµ(Eα ) ≤ b kf kL1 (X) = (a + N ) kf kL1 (X) .
α α
¡ ¢
Thus µ(Eα ) ≤ A N
α 1 + a kf kL (X) , and letting a → ∞ yields estimate (27).
1
which is finite.
As a result, Am (F )(x) converges for almost every x ∈ X. Finally,
to prove the corresponding convergence for Am (f )(x), we argue as in
Theorem 1.3 in Chapter 3 and set
Then it suffices to see that µ(Eα ) = 0 for all α > 0. However, since
An (f ) − Am (f ) = An (F ) − Am (F ) + An (H) − Am (H), and Am (F )(x) con-
verges almost everywhere as m → ∞, it follows that almost every point
302 Chapter 6. ABSTRACT MEASURE AND INTEGRATION THEORY
and thus µ(Eα ) ≤ µ(Eα0 ) ≤ µ({x : 2 supm |Am (H)(x)| > α}). The last
quantity is majorized by A/(α/2)kHkL1 ≤ 2²A/α by Theorem 5.3. Since
² was arbitrary we see that µ(Eα ) = 0, and hence Am (f )(x) is a Cauchy
sequence for almost every x, and the theorem is proved.
To establish the corollary, observe that if f ∈ L2 (X), we know by
Theorem 5.1 that Am (f ) converges to P (f ) in the L2 -norm, and hence
a subsequence converges almost everywhere to that limit, showing that
P (f ) = P 0 (f ) in that case. Next, for any f that is merely integrable, we
have
Z m−1 Z Z
1 X
|Am (f )| dx ≤ |f (τ k (x))| dµ(x) = |f (x)| dµ(x),
X m X X
k=0
0
and thus sinceR Am0 (f ) → P (f ) almost
R everywhere, we get by Fatou’s
lemma that X |P (f )(x)| dµ(x) ≤ X |f (x)| dµ(x). With this the corol-
lary is also proved.
It can be shown that the conclusions of the theorem and corollary are
still valid if we drop the assumption that the space X has finite measure.
The modifications of the argument needed to obtain this more general
conclusion are outlined in Exercise 26.
to a constant, then both µ(Ea ) and µ(Eac ) must have strictly positive
measure for some a. In the converse direction we merely need to note
that if all characteristic functions of measurable sets that are invariant
must be constants, then τ is ergodic.
The following result subsumes the conclusion of Theorem 5.4 for er-
godic transformations. We keep to the assumption of that theorem that
the underlying space X has measure equal to 1.
m−1 Z
1 X
f (τ k (x)) converges to f dµ for a.e. x ∈ X as m → ∞.
m X
k=0
The result has the interpretation that the “time average” of f equals its
“space average.”
Proof. By Theorem 5.1 we know that the averages Am (f ) converge
to P (f ), whenever f ∈ L2 , where P is the orthogonal projection on the
subspace of invariant vectors. Since in this case the invariant vectors
form a one-dimensional space Rspanned by the constant functions, we
observe that P (f ) = 1(f, 1) = X f dµ, where 1 designates the function
identically equal to 1 on X. To verify this, note that P is the identity on
constants and annihilates all functions orthogonal to constants. Next we
write any f ∈ L1 as g + h, where g ∈ L2 and khkL1 < ². Then P 0 (f ) =
P 0 (g) + P 0 (h). However, we also know that P 0 (g) = P (g), and kP 0 (h)k ≤
khkL1 < ² by the corollary to Theorem 5.4. Thus
Z Z
0
P (f ) − f dµ = (g − f ) dµ + P 0 (h)
X X
R
yields that kP 0 (f ) − RX f dµkL1 ≤ kg − f kL1 + ² < 2². This shows that
P 0 (f ) is the constant X f dµ and the assertion is proved.
We shall now elaborate on the nature of ergodicity and illustrate its
thrust in terms of several examples.
both m and k are 0, in which case the integral equals 1. Thus (31)
holds for all exponentials f (x) = e2πimx , g(x) = e2πikx , and therefore by
linearity for all trigonometric polynomials f and g. It is from there an
easy step to use the completeness in Chapter 4 to pass to all f and g in
L2 ((0, 1]) by approximating these functions in the L2 -norm by trigono-
metric polynomials.
Let us observe that the action of rotations τ : x 7→ x + α of the unit
circle for irrational α, although ergodic, is not mixing. Indeed, if we take
f (x) = g(x) = e2πimx , m 6= 0, then (T n f, g) = e2πinmα (f, g) = e2πinmα ,
while (f, 1) = (1, g) = 0; thus (T n f, g) does not converge to (f, 1)(1, g)
as n → ∞.
Finally, we note that the doubling map τ : x 7→ 2x mod 1 on (0, 1]
is not uniquely ergodic. Besides the Lesbesgue measure, the measure ν
with ν{1} = 1 but ν(E) = 0 if 1 ∈ / E is also preserved by τ .
Further examples of ergodic transformations are given below.
Observe that property (i) is equivalent with each of the following three assertions
(holding for all pairs λ, µ with µ > λ): (a) the range of E(µ) contains the range of
E(λ); (b) E(µ)E(λ) = E(λ); (c) E(µ) − E(λ) is an orthogonal projection.
Now given a spectral resolution {E(λ)} and an element f ∈ H, note that the
function λ 7→ (E(λ)f, f ) = kE(λ)f k2 is also increasing. As a result, the polar-
ization identity (see Section 5 in Chapter 4) shows that for every pair f, g ∈ H,
6*. Appendix: the spectral theorem 307
The last assertion means that if for some operator A we have AT = T A, then
AS = SA.
The existence of S is seen as follows. After multiplying by a suitable positive
scalar, we may assume that PkT k ≤ k1. Consider the binomial expansion of (1 −
t)1/2 , given by (1 − t)1/2 = ∞ bk t , for |t| < 1. The relevant fact that is needed
k=0P
here is that the bk are real and ∞ k=0 |bk | < ∞. Indeed, by direct calculation of
the power series expansion of (1 − t)1/2 we find that b0 = 1, b1 = −1/2, b2 = −1/8,
and more generally, bk = −1/2 · 1/2 · · · (k − 3/2)/k!, if k ≥ 2, from which it follows
that bk = O(k−3/2 ). Or morePsimply, since bk < 0Pwhen k ≥ 1, if we let t → 1 in
the definition, we see that − ∞ k=1 bk = 1 and so
∞
k=0 |bk | = 2.
Pn k
Now let sn (t) denote the polynomial k=0 bk t . Then the polynomial
2n
X
(34) s2n (t) − (1 − t) = cn
kt
k
k=0
P2n
has the property
P∞that k=0 |cn
k | → 0 as n → ∞. In fact, sn (t) = (1 − t)
1/2
− rn (t),
k 2 2
with rn (t) = k=n+1 bk t , so sn (t) − (1 − t) = −rn (t) − 2sn (t)rn (t). Now the left-
hand side is clearly a polynomial of degree ≤ 2n, and so comparing P coefficients with
n
those on the right-hand side shows
P that the c
Pk are majorized by 3 j>n |bj | |bk−j |.
From this it is immediate that k |cn k | = O( j>n |b j |) → 0 as n → ∞, as asserted.
To apply this, set T1 = I −PT ; then 0 ≤ T1 ≤ I, and thus kT1 k ≤ 1, by Proposi-
tion 6.2. Let Sn = snP (T1 ) = n k 0
k=0 bk T1 , with T1 = I. Then in terms of operator
norms, kSn − Sm k ≤ k≥min(n,m) |bk | → 0 as n, m → ∞, because kT1k k ≤ kT1 kk ≤
1. Hence Sn converges to some operator S. Clearly Sn is symmetric
P for each n,
and thus S isPalso symmetric. Moreover, by (34), Sn2 − T = 2n n k
k=0 ck T1 , therefore
kSn2 − T k ≤ |cn 2
k | → 0 as n → ∞, which implies that S = T . Finally, if A com-
mutes with T it clearly commutes with every polynomial in T , hence with Sn , and
thus with S. The proof of the proposition is therefore complete.
Proposition 6.5 Suppose T is symmetric and a and b are given by (33). If p(t) =
P n
tk is a real polynomial which is positive for t ∈ [a, b], then the operator
k=0 ckP
p(T ) = n k=0 ck T
k
is positive.
Q Q Q
To see this, write p(t) = c j (t − ρj ) k (ρ0k − t) ` ((t − µ` )2 + ν` ), where c is pos-
itive and the third factor corresponds to the non-real roots of p(t) (arising in con-
jugate pairs), and the real roots of p(t) lying in (a, b) which are necessarily of
even order. The first factor contains the real roots ρj with ρj ≤ a, and the second
factor the real roots ρ0k with ρ0k ≥ b. Since each of the factors T − ρj I, ρ0j I − T
and (T − µ` I)2 + ν`2 I is positive and these commute, the desired conclusion follows
from the previous proposition.
6*. Appendix: the spectral theorem 309
Proof. We note that for each fixed f ∈ H the sequence of positive numbers
(Tn f, f ) is decreasing and hence convergent. Now observe that for any positive
operator S with kSk ≤ M we have
In fact, the quadratic function (S(tI + S)f, (tI + S)f ) = t2 (Sf, f ) + 2t(Sf, Sf ) +
(S 2 f, Sf ) is positive for all real t. Hence its discriminant is negative, that is,
kS(f )k4 ≤ (Sf, f )(S 2 f, Sf ), and (35) follows. We apply this to S = Tn − Tm with
n ≤ m; then kTn − Tm k ≤ kTn k ≤ kT1 k = M , and since ((Tn − Tm )f, f ) → 0 as
n, m → ∞ we see that kTn f − Tm f k → 0 as n, m → ∞. Thus limn→∞ Tn (f ) =
T (f ) exists, and T is also clearly positive.
The basic functions Φ, Φ = ϕλ , that give us the spectral resolution are defined
for each real λ by
We note that ϕλ (t) = lim ϕλn (t), where ϕλn (t) = 1 if t ≤ λ, ϕλn (t) = 0 if t ≥ λ + 1/n,
and ϕλn (t) is linear for t ∈ [λ, λ + 1/n]. Thus each ϕλ (t) is a limit of a decreasing
sequence of continuous functions. In accordance with the above we set
E(λ) = ϕλ (T ).
Since limn→∞ ϕλn1 (t)ϕλn2 (t) = ϕλn1 (t) whenever λ1 ≤ λ2 , we see that E(λ1 )E(λ2 ) =
E(λ1 ). Thus E(λ)2 = E(λ) for every λ, and because E(λ) is symmetric it is
therefore an orthogonal projection. Moreover, for every f ∈ H
thus E(λ) is increasing. Clearly E(λ) = 0 if λ < a, since for those λ, ϕλ (t) = 0 on
[a, b]. Similarly, E(λ) = I for λ ≥ b.
Next we note that E(λ) is right-continuous. In fact, fix f ∈ H and ² > 0. Then
for some n, which we now keep fixed, kE(λ)f − ϕλn (T )f k < ². However, ϕµ n (t)
converges to ϕλn (t) uniformly in t as µ → λ. Hence supt |ϕµ λ
n (t) − ϕn (t)| < ², if
|µ − λ| < δ, for an appropriate δ. Thus by the corollary kϕµ λ
n (T ) − ϕn (T )k < ²
µ
and therefore kE(λ)f − ϕn (T )k < 2². Now with µ ≥ λ we have that E(µ)E(λ) =
E(λ) and E(µ)ϕµ n (T ) = E(µ). As a result kE(λ)f − E(µ)f k < 2², if λ ≤ µ ≤ λ +
δ. Since ² was arbitrary, the right continuity is established.
Finally we verify the spectral representation (32). Let a = λ0 < λ1 < · · · < λk =
b be any partition of [a, b] for which supj (λj − λj−1 ) < δ. Then since
k
X
t= t(ϕλj (t) − ϕλj−1 (t)) + tϕλ0 (t)
j=1
we note that
k
X
t≤ λj (ϕλj (t) − ϕλj−1 (t)) + λ0 ϕλ0 (t) ≤ t + δ.
j=1
k
X
T ≤ λj (E(λj ) − E(λj−1 )) + λ0 E(λ0 ) ≤ T + δI,
j=1
6*. Appendix: the spectral theorem 311
and thus T differs in norm from the sum above by at most δ. As a result
˛ Z ˛
˛ k
X ˛
˛ ˛
˛(T f, f ) − λj d(E(λ)f, f ) − λ0 (E(λ0 )f, f )˛ ≤ δkf k2 .
˛ (λj−1 ,λj ] ˛
j=1
But as we vary the partitions of [a, b], letting their meshes δ tend to zero, the
Rb Rb
above sum tends to a− λ d(E(λ)f, f ). Therefore (T f, f ) = a− λ d(E(λ)f, f ), and
the polarization identity gives (32).
A similar argument shows that if Φ is continuous on [a, b], then the operator
Φ(T ) has an analogous spectral representation
Z b
(36) (Φ(T )f, g) = Φ(λ) d(E(λ)f, g).
a−
P
This is because |Φ(t) − kj=1 Φ(λj )(ϕλj (t) − ϕλj−1 (t)) − Φ(λ0 )ϕλ0 (t)| < δ 0 , where
δ 0 = sup|t−t0 |≤δ |Φ(t) − Φ(t0 )|, which tends to zero as δ → 0.
This representation also extends to continuous Φ that are complex-valued (by
considering the real and imaginary parts separately) or for Φ that are limits of
decreasing pointwise continuous functions.
6.4 Spectrum
We say that a bounded operator S on H is invertible if S is a bijection of H
and its inverse, S −1 , is also bounded. Note that S −1 satisfies S −1 S = SS −1 = I.
The spectrum of S, denoted by σ(S), is the set of complex numbers z for which
S − zI is not invertible.
Proposition 6.8 If T is symmetric, then σ(T ) is a closed subset of the interval
[a, b] given by (33).
Note that if z ∈/ [a, b], the function Φ(t) = (t − z)−1 is continuous on [a, b] and
Φ(T )(T − zI) = (T − zI)Φ(T ) = I, so Φ(T ) is the inverse of T − zI. Now suppose
T0 = T − λ0 I is invertible. Then we claim that T0 − ²I is invertible for all (com-
plex) ² that are sufficiently small. This will prove that the complement of σ(T ) is
open. Indeed, T0 − ²I = T0 (I − ²T0−1 ), and we can invert the operator (I − ²T0−1 )
(formally) by writing its inverse as a sum
∞
X
²n (T0−1 )n+1 .
n=0
P P n −1 n+1
Since ∞ n −1 n+1
n=0 k² (T0 ) k≤ |²| kT0 k , the series converges when |²| < kT0−1 k−1 ,
and the sum is majorized by
1
(37) kT0−1 k .
1 − |²|kT0−1 k
P
Thus we can define the operator (T0 − ²I)−1 as limN →∞ T0−1 N n −1 n+1
n=0 ² (T0 ) ,
and it gives the desired inverse, as is easily verified.
Our last assertion connects the spectrum σ(T ) with the spectral resolution
{E(λ)}.
312 Chapter 6. ABSTRACT MEASURE AND INTEGRATION THEORY
To put it another way, F (λ) is constant on each open interval of the complement
of σ(T ).
To prove this, let J be one of the open intervals in the complement of σ(T ),
x0 ∈ J, and J0 the sub-interval centered at x0 of length 2², with ² < k(T − x0 I)−1 k.
First note that if z has non-vanishing imaginary part then (T − zI)−1 is given by
Φz (T ), with Φz (t) = (t − z)−1 . Hence (T − zI)−1 (T − zI)−1 is given by Ψz (T ),
with Ψz (t) = 1/|t − z|2 . Therefore by the estimate given in (37) and the represen-
tation (36) applied to Φ = Ψz , we obtain
Z
dF (λ)
≤ A0 ,
|λ − z|2
as long as z is complex and |x0 − z| < ². We can therefore obtain the same in-
equality
R for x real, |x0 − x| < ². NowR integration in x ∈ J0 using the fact that
dx
J² |λ−x|2 = ∞ for every λ ∈ J² , gives J² dF (λ) = 0. Thus F (λ) is constant in J² ,
but since x0 was an arbitrary point of J the function F (λ) is constant throughout
J and the proposition is proved.
7 Exercises
2. Let (X, M, µ) be a measure space. One can define the completion of this
space as follows. Let M be the collection of sets of the form E ∪ Z, where E ∈ M,
and Z ⊂ F with F ∈ M and µ(F ) = 0. Also, define µ(E ∪ Z) = µ(E). Then:
4. Let r be a rotation of Rd . Using the fact that the mapping x 7→ r(x) preserves
Lebesgue measure (see Problem 4 in Chapter 2 and Exercise 26 in Chapter 3), show
that it induces a measure-preserving map of the sphere S d−1 with its measure dσ.
A converse is stated in Problem 4.
(c) If B is“ the unit ball, vd = m(B) = π d/2 /Γ(d/2 + 1), since this quantity
R 1 d−1 ”
equals 0 r dr σ(S d−1 ). (See Exercise 14 in Chapter 2.)
6. A version of Green’s formula for the unit ball B in Rd can be stated as follows.
Suppose u and v are a pair of functions that are in C 2 (B). Then one has
Z Z „ «
∂u ∂v
(v4u − u4v) dx = v −u dσ.
B S d−1 ∂n ∂n
Here S d−1 is the unit sphere with dσ the measure defined in Section 3.2, and
∂u/∂n, ∂v/∂n denote the directional derivatives of u and v (respectively) along
the inner normals to S d−1 .
Show that the above can be derived from Lemma 4.5 of the previous chapter by
taking η = η²+ and letting ² → 0.
8. The fact that the Lebesgue measure is uniquely characterized by its translation
invariance can be made precise by the following assertion: If µ is a Borel measure
on Rd that is translation-invariant, and is finite on compact sets, then µ is a
multiple of Lebesgue measure m. Prove this theorem by proceeding as follows.
9. Let C([a, b]) denote the vector space of continuous functions on the closed and
bounded interval [a, b]. Suppose we are given a Borel measure µ on this interval,
with µ([a, b]) < ∞. Then
Z b
f 7→ `(f ) = f (x) dµ(x)
a
is a linear functional on C([a, b]), with ` positive in the sense that `(f ) ≥ 0 if f ≥ 0.
Prove that, conversely, for any linear functional ` on C([a, b]) that is positive in
Rb
the above sense, there is a unique finite Borel measure µ so that `(f ) = a f dµ for
f ∈ C([a, b]).
[Hint: Suppose a = 0 and u ≥ 0. Define F (u) by F (u) = lim²→0 `(f² ), where
1 for 0 ≤ x ≤ u,
f² (x) =
0 for u + ² ≤ x,
and f² is linear between u and u + ². (See Figure 3.) Then F is increasing and
Rb
right-continuous, and `(f ) can be written as a f (x) dF (x) via Theorem 3.5.]
The result also holds if [a, b] is replaced by a closed infinite interval; we then
assume that ` is defined on the continuous functions of bounded support, and
obtain that the resulting µ is finite on all bounded intervals.
A generalization is given in Problem 5.
(d) ν ¿ |ν|.
1
f²
0 u u+² b
(i) µR A is0 absolutely continuous with respect to Lebesgue measure and µA (E) =
E
F (x) dx for every Lebesgue measurable set E.
R R
(ii) RAs a result, if F is absolutely continuous, then f dµ = f dF =
f (x)F 0 (x) dx whenever f and f F 0 are integrable.
Here rj and rk0 range over all positive rationals, and {γ` } is a countable dense set
of S d−1 .]
13. Let mj be the Lebesgue measure for the space Rdj , j = 1, 2. Consider the
product Rd = Rd1 × Rd2 (d = d1 + d2 ), with m the Lebesgue measure on Rd . Show
that m is the completion (in the sense of Exercise 2) of the product measure
m1 × m2 .
15. The product theory extends to infinitely many factors, under the requisite
assumptions. We consider measure spaces (Xj , Mj , µj ) with µj (Xj ) = 1 for all
but finitely many j. Define a cylinder set E as
(a) Check that the completion µ is Lebesgue measure induced on the cube
Q = {x : 0 < xj ≤ 1, j = 1, . . . , d}.
(b) For each function f on Q let f˜ be its extension to Rd which is periodic, that
is, f˜(x + z) = f˜(x) for every z ∈ Zd . Then f is measurable on Td if and
only if f˜ is measurable on Rd , and f is continuous on Td if and only if f˜ is
continuous on Rd .
d
(c) Suppose f andR g are integrable on T . Show that the integral defining
(f ∗ g)(x) = Td f (x − y)g(y) dy is finite for a.e. x, that f ∗ g is integrable
over Td , and that f ∗ g = g ∗ f .
R
to mean that an = Td f (x)e−2πin·x dx. Prove that if g is also integrable,
P
and g ∼ n∈Zd bn e2πin·x , then
X
f ∗g ∼ an bn e2πin·x .
n∈Zd
2πin·x
(e) Verify that {eP }n∈Zd is an orthonormal basis for L2 (Td ). As a result
kf kL2 (Td ) = n∈Zd |an |2 .
[Hint: For (e), reduce to the case d = 1 by Fubini’s theorem. To prove (f) let
g(x) = g² (x) = ²−d , if 0 < xj ≤ ², j = 1, . . . , d, and g² (x) = 0 elsewhere
P in Q. Then
(f ∗ gR² )(x) → f (x) uniformly asP² → 0. However (f ∗ g² )(x) = an bn e2πinx with
bn = Td g² (x)e−2πin·x dx, and |an bn | < ∞.]
7. Exercises 317
21. Let Td be the torus, and τ : x 7→ x + α the mapping arising in Exercise 17.
Then τ is ergodic if and only if α = (α1 , . . . , αd ) with α1 , α2 , . . . , αd , and 1 are
linearly independent over the rationals. To do this show that:
m−1 Z
1 X
(a) f (τ k (x)) → f (x) dx as m → ∞, for each x ∈ Td , whenever f is
m Td
k=0
continuous and periodic and α satisfies the hypothesis.
Q
22. Let X = ∞ i=1 Xi , where each (Xi , µi ) is identical to (X1 , µ1 ), with µ1 (X1 ) =
1, and let µ be the corresponding product measure defined in Exercise Q 15. Define
the shift τ : X → X by τ ((x1 , x2 , . . .)) = (x2 , x3 , . . .) for x = (xi ) ∈ ∞
i=1 Xi .
[Hint: For (b) note that µ(τ −n (E ∩ F )) = µ(E)µ(F ) whenever E and F are cylin-
der sets and n is sufficiently large. For (c) note that, for example, if we fix a point
x ∈ X1 , the set E = {(xi ) : xj = x all j} is invariant.]
Q
23. Let X = ∞ i=1 Z(2), where each factor is the two-point space Z(2) = {0, 1}
with µ1 (0) = µ1 (1) = 1/2, and suppose µ denotes the product
P measure on X. Con-
aj
sider the mapping D : X → [0, 1] given by D({aj }) → ∞ j=1 2j . Then there are
denumerable sets Z1 ⊂ X and Z2 ⊂ [0, 1], such that:
(a) D is a bijection from X − Z1 to [0, 1] − Z2 .
(b) A set E in X is measurable if and only if D(E) is measurable in [0, 1], and
µ(E) = m(D(E)), where m is Lebesgue measure on [0, 1].
Q
(c) The shift map on ∞ i=1 Z(2) then becomes the doubling map of example (b)
in Section 5.4.
24. Consider the following generalization of the doubling map. For each integer
m, m ≥ 2, we define the map τm of (0, 1] by τ (x) = mx mod 1.
X∞
aj
x= j
, where each aj is an integer 0 ≤ aj ≤ m − 1.
j=1
m
#{j : aj = k, 1 ≤ j ≤ n} 1
→ as N → ∞.
N m
Note the analogy with the equidistribution statements in Section 2, Chap-
ter 4, of Book I.
25. Show that the mean ergodic theorem still holds if we replace the assumption
that T is an isometry by the assumption that T is a contraction, that is, kT f k ≤
kf k for all f ∈ H.
[Hint: Prove that T is a contraction if and only if T ∗ is a contraction, and use the
identity (f, T ∗ f ) = (T f, f ).]
satisfies
The proof is the same as outlined in Problem 6, Chapter 5 for the maximal function
on Rd . With this, extend the pointwise ergodic theorem to the case where µ(X) =
∞, as follows:
1
Pm−1 k
(a) Show that limm→∞ m k=0 f (τ (x)) converges for a.e. x to P (f )(x) for
every f ∈ L (X), because this holds for a dense subspace of L2 (X).
2
(b) Prove that the conclusion holds for every f ∈ L1 (X), because it holds for
the dense subspace L1 (X) ∩ L2 (X).
27. We saw that if kfn kL2 ≤ 1, then fnn(x) → 0 as n → ∞ for a.e. x. However, show
that the analogue where one replaces the L2 -norm by the L1 -norm fails, by con-
structing a sequence {fn }, fn ∈ L1 (X), kfn kL1 ≤ 1, but with lim supn→∞ fnn(x) =
∞ for a.e. x.
[Hint: Find intervals In ⊂ [0, 1], so that m(In ) = 1/(n log n) but lim supn→∞ {In } =
[0, 1]. Then take fn (x) = n log nχIn .]
8 Problems
hence Φ(Qk ) = Φ(zk ) + Φ0 (zk )(Qk − zk ) + o(²), and as a result (1 − η(²))Φ0 (zk )(Qk −
zk ) ⊂ Φ(Qk ) − Φ(zk ) ⊂ (1 + η(²))Φ0 (zk )(Qk − zk ), where η(²) → 0 as ² → 0. This
means that
X X
m(Φ(O)) = m(Φ(Qk )) = | det(Φ0 (zk ))| m(Qk ) + o(1) as ² → 0
k k
b = lim 1 b δ ),
µ(B) m((B)
δ→0 2δ
(b) One may apply (a) to the case when S is the (upper) half of the unit sphere
in Rd , given by y = F (x), F (x) = (1 − |x|2 )1/2 , |x| < 1, x ∈ Rd−1 . Show
that in this case dµ = dσ, the measure on the sphere arising in the polar
coordinate formula in Section 3.2.
(c) The above conclusion allows one to write an explicit formula for dσ in
terms of spherical coordinates. Take, for example, the case d = 3, and
write y = cos θ, x = (x1 , x2 ) = (sin θ cos ϕ, sin θ sin ϕ) with 0 ≤ θ < π/2, 0 ≤
ϕ < 2π. Then according to (a) and (b) the element of area dσ equals
(1 − |x|2 )−1/2 dx. Use the change of variable theorem in Problem 1 to deduce
that in this case dσ = sin θ dθ dϕ. This may be generalized to d dimensions,
d ≥ 2, to obtain the formulas in Section 2.4 of the appendix in Book I.
8. Problems 321
4.∗ Let µ be a Borel measure on the sphere S d−1 which is rotation-invariant in the
following sense: µ(r(E)) = µ(E), for every rotation r of Rd and each Borel subset
E of S d−1 . If µ(S d−1 ) < ∞, then µ is a constant multiple of the measure σ arising
in the polar coordinate integration formula.
[Hint: Show that
Z
Yk (x) dµ(x) = 0
S d−1
5.∗ Suppose X is a metric space, and µ is a Borel measure on X with the property
that µ(B) < ∞ for every ball B. Define C0 (X) to be the vector space of continuous
R
functions on X that are each supported in some closed ball. Then `(f ) = X f dµ
defines a linear functional on C0 (X) that is positive, that is, `(f ) ≥ 0 if f ≥ 0.
Conversely, for any positive linear functional ` on C0 (X),Rthere exists a unique
Borel measure µ that is finite on all balls, such that `(f ) = f dµ.
(b) Show that τ is ergodic (in fact, mixing) if and only if A has no eigenvalues
of the form e2πip/q , where p and q are integers.
[Hint: The condition (b) is the same as (At )q has no invariant vectors, where At is
t k
the transpose of A. Note also that f (τ k (x)) = e2πi(A ) (n)·x where f (x) = e2πin·x .]
7.∗ There is a version of the maximal ergodic theorem that is akin to the “rising
sun lemma” and Exercise 6 in Chapter 3. Pm−1
Suppose f is real-valued, and f # (x) = supm m 1 k
k=0 f (τ (x)). Let E0 = {x :
#
f (x) > 0}. Then
Z
f (x) dx ≥ 0.
E0
8. Let X = [0, 1), τ (x) = h1/xi, x 6= 0, τ (0) = 0. Here hxi denotes the fractional
part of x. With the measure dµ = log1 2 1+x
dx
, we have of course µ(X) = 1.
Show that τ is a measure-preserving transformation.
P∞ 1 1
[Hint: k=1 (x+k)(x+k+1) = 1+x .]
10.∗ The connection between continued fractions and the transformation τ (x) =
h1/xi will now be described. A continued fraction, a0 + 1/(a1 + 1/a2 ) · · · , also
written as [a0 a1 a2 · · · ], where the aj are positive integers, can be assigned to any
positive real number x in the following way. Starting with x, we successively
transform it by two alternating operations: reducing it modulo 1 to lie in [0, 1),
and then taking the reciprocal of that number. The integers aj that arise then
define the continued fraction of x.
Thus we set x = a0 + r0 , where a0 = [x] = the greatest integer in x, and r0 ∈
[0, 1). Next we write 1/r0 = a1 + r1 , with a1 = [1/r0 ], r1 ∈ [0, 1), to obtain suc-
cessively 1/rn−1 = an + rn , where an = [1/rn−1 ], rn ∈ [0, 1). If rn = 0 for some n,
we write ak = 0 for all k > n, and say that such a continued fraction terminates.
Note that if 0 ≤ x < 1, then r0 = x and a1 = [1/x], while r1 = h1/xi = τ (x).
More generally then, ak (x) = [1/τ k−1 (x)] = a1 τ k−1 (x). The following properties
of continued fractions of positive real numbers x are known:
(c) The continued fraction is periodic, that is, ak+N = ak for some N ≥ 1, and
all sufficiently large k, if and only if x is an algebraic number of degree ≤ 2
over the rationals.
Another approach is relevant for curves that are not necessarily rec-
tifiable. Start with a curve Γ = {γ(t) : a ≤ t ≤ b}, and for each ² > 0
consider polygonal lines joining γ(a) to γ(b), whose vertices lie on suc-
cessive points of Γ, with each segment not exceeding ² in length. Denote
by #(²) the least number of segments that arise for such polygonal lines.
If #(²) ≈ ²−1 as ² → 0, then Γ is rectifiable. However, #(²) may well
grow more rapidly than ²−1 as ² → 0. If we had #(²) ≈ ²−α , 1 < α,
then, in the spirit of the previous example, it would be natural to say
that Γ has dimension α. These considerations have even an interest in
other parts of science. For instance, in studying the question of determin-
ing the length of the border of a country or its coastline, L.F. Richardson
found that the length of the west coast of Britain obeyed the empirical
law #(²) ≈ ²−α , with α approximately 1.5. Thus one might conclude
that the coast has fractional dimension!
While there are a number of different ways to make some of these
heuristic notions precise, the theory that has the widest scope and great-
est flexibility is the one involving Hausdorff measure and Hausdorff di-
mension. Probably the most elegant and simplest illustration of this
theory can be seen in terms of its application to a general class of self-
similar sets, and this is what we consider first. Among these are the
curves of von Koch type, and these can have any dimension between 1
and 2.
Next, we turn to an example of a space-filling curve, which, broadly
speaking, falls under the scope of self-replicating constructions. Not
only does this curve have an intrinsic interest, but its nature reveals the
important fact that from the point of view of measure theory the unit
interval and the unit square are the same.
Our final topic is of a somewhat different nature. It begins with the
realization of an unexpected regularity that all subsets of Rd (of finite
Lebesgue measure) enjoy, when d ≥ 3. This property fails in two di-
mensions, and the key counter-example is the Besicovitch set. This set
appears also in a number of other problems. While it has measure zero,
this is barely so, since its Hausdorff dimension is necessarily 2.
1 Hausdorff measure
The theory begins with the introduction of a new notion of volume or
mass. This “measure” is closely tied with the idea of dimension which
prevails throughout the subject. More precisely, following Hausdorff,
one considers for each appropriate set E and each α > 0 the quantity
mα (E), which can be interpreted as the α-dimensional mass of E among
sets of dimension α, where the word “dimension” carries (for now) only
1. Hausdorff measure 325
where diam S denotes the diameter of the set S, that is, diam S =
sup{|x − y| : x, y ∈ S}. In other words, for each δ > 0 we consider covers
of E by countable families of (arbitrary)
P sets with diameter less than δ,
and take the infimum of the sum k (diam Fk )α . We then define m∗α (E)
as the limit of these infimums as δ tends to 0. We note that the quantity
( ∞
)
X [
δ α
Hα (E) = inf (diam Fk ) : E ⊂ Fk , diam Fk ≤ δ all k
k k=1
δ
P ∗
Since ² is arbitrary, the inequality Hα (E) ≤ mα (Ej ) holds, and we let
δ tend to 0 to prove the countable sub-additivity of m∗α .
Property 3 If d(E1 , E2 ) > 0, then m∗α (E1 ∪ E2 ) = m∗α (E1 ) + m∗α (E2 ).
It suffices to prove that m∗α (E1 ∪ E2 ) ≥ m∗α (E1 ) + m∗α (E2 ) since the re-
verse inequality is guaranteed by sub-additivity. Fix ² > 0 with ² <
d(E1 , E2 ). Given any cover of E1 ∪ E2 with sets F1 , F2 . . . , of diame-
ter less than δ, where δ < ², we let
Then {Fj0 } and {Fj00 } are covers for E1 and E2 , respectively, and are
disjoint. Hence,
X X X
(diam Fj0 )α + (diam Fi00 )α ≤ (diam Fk )α .
j i k
Taking the infimum over the coverings, and then letting δ tend to zero
yields the desired inequality.
At this point, we note that m∗α satisfies all the properties of a metric
Carathéodory exterior measure as discussed in Chapter 6. Thus m∗α
is a countably additive measure when restricted to the Borel sets. We
shall therefore restrict ourselves to Borel sets and write mα (E) instead
of m∗α (E). The measure mα is called the α-dimensional Hausdorff
measure.
Property
S∞ 4 If {Ej } is a countable family of disjoint Borel sets, and
E = j=1 Ej , then
∞
X
mα (E) = mα (Ej ).
j=1
For what follows in this chapter, the full additivity in the above prop-
erty is not needed, and we can manage with a weaker form whose proof
is elementary and not dependent on the developments of Chapter 6. (See
Exercise 2.)
and rotations
mα (rE) = mα (E),
where r is a rotation in Rd .
Moreover, it scales as follows:
The constant cd equals m(B)/(diam B)d , for the unit ball B; note that
this ratio is the same for all balls B in Rd , and so cd = vd /2d (where vd
denotes the volume of the unit ball). The proof of this property relies on
the so-called iso-diametric inequality, which states that among all sets of
a given diameter, the ball has largest volume. (See Problem 2.) Without
using this geometric fact one can prove the following substitute.
−1
Letting δ and ² tend
S to 0, we get md (E) ≤ Pcd m(E). For the reverse
d
direction, let E ⊂ j Fj be a covering with j (diam Fj ) ≤ md (E) + ².
We can always find closed balls Bj centered at a point
P of Fj so that
Bj ⊃SFj and diam Bj = 2 diam Fj . However, m(E) ≤ j m(Bj ), since
E ⊂ j Bj , and the last sum equals
X X
cd (diam Bj )d = 2d cd (diam Fj )d ≤ 2d cd (md (E) + ²) .
Property 8 If m∗α (E) < ∞ and β > α, then m∗β (E) = 0. Also, if m∗α (E) >
0 and β < α, then m∗β (E) = ∞.
2. Hausdorff dimension 329
Consequently
Since m∗α (E) < ∞ and β − α > 0, we find in the limit as δ tends to 0,
that m∗β (E) = 0.
The contrapositive gives m∗β (E) = ∞ whenever m∗α (E) > 0 and β < α.
We now make some easy observations that are consequences of the
above properties.
1. If I is a finite line segment in Rd , then 0 < m1 (I) < ∞.
2. More generally, if Q is a k-cube in Rd (that is, Q is the product of
k non-trivial intervals and d − k points), then 0 < mk (Q) < ∞.
3. If O is a non-empty open set in Rd , then mα (O) = ∞ whenever
α < d. Indeed, this follows because md (O) > 0.
4. Note that we can always take α ≤ d. This is because when α > d,
mα vanishes on every ball, and hence on all of Rd .
2 Hausdorff dimension
Given a Borel subset E of Rd , we deduce from Property 8 that there
exists a unique α such that
½
∞ if β < α,
mβ (E) =
0 if α < β.
In other words, α is given by
2.1 Examples
The Cantor set
The first striking example consists of the Cantor set C, which was con-
structed in Chapter 1 by successively removing the middle-third intervals
in [0, 1].
The inequality
mα (C) ≤ 1
and part (i) follows. This result now immediately implies conclusion (ii).
Having fixed x and y, we then minimize the right hand side by choosing
n so that both terms have the same order of magnitude. This is achieved
by taking n so that 3n |x − y| is between 1 and 3. Then, we see that
since 3γ = 2 and 3−n is not greater than |x − y|. This argument is re-
peated in Lemma 2.8 below.
With E = C, f the Cantor-Lebesgue function, and α = γ = log 2/ log 3,
the two lemmas give
Rectifiable curves
A further example of the role of dimension comes from looking at con-
tinuous curves in Rd . Recall that a continuous curve γ : [a, b] → Rd is
said to be simple if γ(t1 ) 6= γ(t2 ) whenever t1 6= t2 , and quasi-simple
if the mapping t 7→ z(t) is injective for t in the complement of finitely
many points.
This follows since |t1 − t2 | is the length of the curve between t1 and t2 ,
which is greater than the distance from γ̃(t1 ) to γ̃(t2 ). Since γ̃ satisfies
the conditions of Lemma 2.2 with exponent 1 and M = 1, we find that
m1 (Γ) ≤ L.
Γj = {γ(t) : tj ≤ t ≤ tj+1 },
SN −1
so that Γ = j=0 Γj , and hence
N
X −1
m1 (Γ) = m1 (Γj )
j=0
and clearly the segment [0, `j ] on the x-axis is contained in the image
π(Γj ). Therefore, Lemma 2.2 guarantees
`j ≤ m1 (Γj ),
P
and thus m1 (Γ) ≥ `j .PSince by definition the length L of Γ is the
supremum of the sums `j over all partitions of [a, b], we find that
m1 (Γ) ≥ L, as desired.
Conversely, if Γ has strict Hausdorff dimension 1, then m1 (Γ) < ∞,
and the above argument shows that Γ is rectifiable.
The reader may note the resemblance of this characterization of rec-
tifiability and an earlier one in terms of Minkowski content, given in
Chapter 3. In this connection we point out that there is a different
notion of dimension that is sometimes used instead of Hausdorff dimen-
sion. For a compact set E, this dimension is given in terms of the size
of E δ = {x ∈ Rd : d(x, E) < δ} as δ → 0. One observes that if E is a
k-dimensional cube in Rd , then m(E δ ) ≤ cδ d−k as δ → 0, with m the
Lebesgue measure of Rd . With this in mind, the Minkowski dimen-
sion of E is defined by
One can show that the Hausdorff dimension of a set does not exceed its
Minkowski dimension, but that equality does not hold in general. More
details may be found in Exercises 17 and 18.
We choose to call the lower left vertex of a triangle the vertex of that
triangle. With this choice there are 3k vertices of the k th generation.
The argument that follows is based on the important fact that all these
vertices belong to
S∞ S.
Suppose S ⊂ j=1 Fj , with diam Fj < δ. We wish to prove that
X
(diam Fj )α ≥ c > 0
j
v ∈ 4k ⊂ 40` ⊂ B ∗ ,
as shown in Figure 2.
Next, there is a positive constant c such that B ∗ can contain at most
c distinct triangles of the `th generation. This is because triangles of the
336 Chapter 7. HAUSDORFF MEASURE AND FRACTALS
4k
40`
v
B B∗
`th generation have disjoint interiors and area equal to c0 4−` , while B ∗
has area at most equal to c00 4−` . Finally, each 40` contains 3k−` triangles
of the k th generation, hence B can contain at most c3k−` vertices of
triangles of the k th generation.
PN
To complete the proof that j=1 (diam Bj )α ≥ c > 0, note that
N
X X
(diam Bj )α ≥ N` 2−`α ,
j=1 `
as desired.
We give a final example that exhibits properties similar to the Cantor
set and Sierpinski triangle. It is the curve discovered by von Koch in 1904.
K0
K1
K2
K3
Figure 3. The first few stages in the construction of the von Koch curve
whenever j 0 ≥ j.
In the limit as j tends to infinity, the polygonal lines Kj tend to the
von Koch curve K. Indeed, we have
We have already observed that |Kj+1 (t) − Kj (t)| ≤ 3−j . Since Kj travels
a distance of 3−j in 4−j units of time, we see that
µ ¶j
4
|Kj0 (t)| ≤ except when t = `/4j .
3
2. Hausdorff dimension 339
and
and therefore
∞
X ∞
X
|f (t) − fj (t)| ≤ |fk+1 (t) − fk (t)| ≤ B −k ≤ cB −j .
k=j k=j
Aj |t − s| ≤ B −j ,
while raising the second inequality to the power γ, and using the fact
that (AB)γ = B gives
1 ≤ B j |t − s|γ .
as was to be shown.
In particular, this result with Lemma 2.2 implies that
1 log 4
dim K ≤ = .
γ log 3
To prove that mγ (K) > 0 and hence dim K = log 4/ log 3 requires an ar-
gument similar to the one given for the Sierpinski triangle. In fact,
this argument generalizes to cover a general family of sets that have a
self-similarity property. We therefore turn our attention to this general
theory next.
Remarks. We mention some further facts about the von Koch curve.
More details can be found in Exercises 13, 14, and 15 below.
` `
` `
Thus for every α, 1 < α < 2, we have a curve of this kind of dimen-
sion α. Note that when ` → 1/4 the limiting curve is a straight line
segment, which has dimension 1. When ` → 1/2, the limit can be
seen to correspond to a “space-filling” curve.
2. The curves t 7→ K` (t), 1/4 < ` ≤ 1/2, are each nowhere differen-
tiable. One can also show that each curve is simple when 1/4 ≤
` < 1/2.
2.2 Self-similarity
The Cantor set C, the Sierpinski triangle S, and von Koch curve K all
share an important property: each of these sets contains scaled copies
of itself. Moreover, each of these examples was constructed by iterating
a process closely tied to its scaling. For instance, the interval [0, 1/3]
contains a copy of the Cantor set scaled by a factor of 1/3. The same is
true for the interval [2/3, 1], and therefore
C = C1 ∪ C 2 ,
where C1 and C2 are scaled versions of C. Also, each interval [0, 1/9],
[2/9, 3/9], [6/9, 7/9] and [8/9, 1] contains a copy of C scaled by a factor
of 1/9, and so on.
In the case of the Sierpinski triangle, each of the three triangles in the
first generation contains a copy of S scaled by the factor of 1/2. Hence
S = S1 ∪ S 2 ∪ S 3 ,
K = K1 ∪ K2 ∪ K3 ∪ K4 ,
F = S1 (F ) ∪ · · · ∪ Sm (F ).
We point out the relevance of the various examples we have already seen.
When F = C is the Cantor set, there are two similarities given by
Here, α and β are the points drawn in the first diagram in Figure 5.
If F = K, the von Koch curve, we have
x x x
S1 (x) = , S2 (x) = ρ + α, S3 (x) = ρ−1 + β,
3 3 3
and
x
S4 (x) = + γ,
3
2. Hausdorff dimension 343
β
α
0 β 0 α γ 1
where ρ is the rotation centered at the origin and of angle π/3. There
are m = 4 similarities which have ratio r = 1/3. The points α, β, and γ
are shown in the second diagram in Figure 5.
D1 D2
Figure 6. Construction of the Cantor dust
S1 (x) = µx,
S2 (x) = µx + (0, 1 − µ),
S3 (x) = µx + (1 − µ, 1 − µ),
S4 (x) = µx + (1 − µ, 0).
F = S1 (F ) ∪ · · · ∪ Sm (F ).
Lemma 2.10 There exists a closed ball B so that Sj (B) ⊂ B for all
j = 1, . . . , m.
The proof of the lemma is simple and may be left to the reader.
Using both lemmas we may now prove Theorem 2.9. We first choose
B as in Lemma 2.10, and let Fk = S̃ k (B), where S̃ k denotes the k th com-
position of S̃, that is, S̃ k = S̃ k−1 ◦ S̃ with S̃ 1 = S̃. Each Fk is compact,
non-empty, and Fk ⊂ Fk−1 , since S̃(B) ⊂ B. If we let
∞
\
F = Fk ,
k=1
then
T∞ F is compact,Tnon-empty, and clearly S̃(F ) = F , since applying S̃
∞
to k=1 Fk yields k=2 Fk , which also equals F .
Uniqueness of the set F is proved as follows. Suppose G is another
compact set so that S̃(G) = G. Then, an application of part (iv) in
Lemma 2.11 yields dist(F, G) ≤ r dist(F, G). Since r < 1, this forces
dist(F, G) = 0, so that F = G, and the proof of Theorem 2.9 is com-
plete.
346 Chapter 7. HAUSDORFF MEASURE AND FRACTALS
mα (F ) = mrα mα (F ).
log m
α= .
log 1/r
O ⊃ S1 (O) ∪ · · · ∪ Sm (O),
Fk = S̃ k (B),
2. Hausdorff dimension 347
and S̃ k (B) is the union of mk sets of diameter less than crk (with c =
diam B), each of the form
≤ c0 mk rαk
≤ c0 ,
Then the open sets of the k th generation are disjoint, since those of
the first generation are disjoint. Moreover if k ≥ `, each open set of the
`th generation contains mk−` open sets of the k th generation.
Suppose v is a vertex of the k th generation, and let O(v) denote the
open set in the k th generation which is associated to v, that is, v and
O(v) carry the same label (n1 , n2 , . . . , nk ). Since x is at a fixed distance
from the original open set O, and O has a finite diameter, we find that
(a) d(v, O(v)) ≤ crk .
(b) c0 rk ≤ diam O(v) ≤ crk .
As in the case of the Sierpinski triangle, it suffices to prove that if
B = {Bj }N
j=1 is a finite collection of balls whose diameters are less than
δ and whose union covers F , then
N
X
(diam Bj )α ≥ c > 0.
j=1
r` ≤ diam Bj ≤ r`−1 .
By the lemma, we see that the total number of vertices of the k th gen-
eration that can be covered by the collection B can be no more than
3. Space-filling curves 349
P
c ` N` mk−` . Since mk vertices of the k th generation belong to F ,
P all k−`
we must have c ` N` m ≥ mk , and hence
X
N` m−` ≥ c.
`
3 Space-filling curves
The year 1890 heralded an important discovery: Peano constructed a
continuous curve that filled an entire square in the plane. Since then,
many variants of his construction have been given. We shall describe here
a construction that has the feature of elucidating an additional significant
fact. It is that from the point of measure theory, speaking broadly, the
unit interval and unit square are “isomorphic.”
Theorem 3.1 There exists a curve t 7→ P(t) from the unit interval to
the unit square with the following properties:
(i) P maps [0, 1] to [0, 1] × [0, 1] continuously and surjectively.
(ii) P satisfies a Lipschitz condition of exponent 1/2, that is,
Corollary 3.2 There are subsets Z1 ⊂ [0, 1] and Z2 ⊂ [0, 1] × [0, 1], each
of measure zero, such that P is bijective from
m1 (E) = m2 (P(E)).
350 Chapter 7. HAUSDORFF MEASURE AND FRACTALS
1
dim F ([0, 1]) ≤ dim[0, 1],
γ
I1 ⊃ I2 ⊃ · · · ⊃ Ik ⊃ · · · ,
(iii) The set of t for which the chain in part (ii) is not unique is a set
of measure zero (in fact, this set is countable).
Proof. Part (i) follows from the fact that {I k } is a decreasing sequence
of compact sets whose diameters go to 0.
For part (ii), we fix t and note that for each k there exists at least one
quartic interval I k with t ∈ I k . If t is of the form `/4k , where 0 < ` < 4k ,
then there are exactly two quartic intervals of the k th generation that
contain t. Hence, the set of points for which the chain is not unique is
precisely the set of dyadic rationals
`
, where 1 ≤ k, and 0 < ` < 4k .
4k
Note that of course, these fractions are the same as those of the form
0 0
`0 /2k with 0 < `0 < 2k . This set is countable, hence has measure 0.
352 Chapter 7. HAUSDORFF MEASURE AND FRACTALS
The points where ambiguity occurs are precisely those where ak = 3 for
all sufficiently large k, or equivalently where ak = 0 for all sufficiently
large k.
Part of our description of the Peano mapping will follow from associ-
ating to each quartic interval a dyadic square. These dyadic squares
are obtained by sub-dividing the unit square [0, 1] × [0, 1] in the plane by
successively bisecting the sides.
For instance, dyadic squares of the first generation arise from bisecting
the sides of the unit square. This yields four closed squares S1 , S2 , S3
and S4 , each of side length 1/2 and area |Si | = 1/4, for i = 1, . . . , 4.
The dyadic squares of the second generation are obtained by bisecting
each dyadic square of the first generation, and so on. In general, there
are 4k squares of the k th generation, each of side length 1/2k and area
1/4k .
A chain of dyadic squares is a decreasing sequence of squares
S1 ⊃ S2 ⊃ · · · ⊃ Sk ⊃ · · · ,
In this case, the set of ambiguities consists of all points (x1 , x2 ) where
one of the coordinates is a dyadic rational. Geometrically, this set is
the (countable) union of vertical and horizontal segments in [0, 1] × [0, 1]
determined by the grid of dyadic rationals. This set has measure zero.
3. Space-filling curves 353
where
bk = (0, 0) if bk = 0,
bk = (0, 1) if bk = 1,
bk = (1, 0) if bk = 2,
bk = (1, 1) if bk = 3.
Then m(E0 ) = 0.
for k even, and note that N2 corresponds to all sequences {ak } where
one of the following four exclusive alternatives holds for all sufficiently
large k: either ak is 0 or 1; or ak is 2 or 3; or ak is 0 or 2; or ak is 1
or 3. By similar reasoning the points Φ−1 (N2 ) and Φ(N1 ) form sets of
measure zero in I and I × I respectively.
We now turn to the proof that Φ∗ (which is a bijection from I − Z1
to (I × I) − Z2 ) is measure preserving. For this it is useful to recall
Theorem 1.4 in Chapter 1, whereby any S∞ open set O in the unit interval
I can be realized as a countable union j=1 Ij , where each Ij is a closed
interval and the Ij have disjoint interiors. Moreover, an examination of
the proof shows that the intervals can be taken to be dyadic, that is, of the
form [`/2j , (` + 1)/2j ], for appropriate integers ` and j. Further, such an
interval is itself a quartic interval if j is even, j = 2k, or the union of two
quartic intervals [(2`)/22k , (2` + 1)/22k ] and [(2` + 1)/22k , (2` + 2)/22k ],
if j is odd, j = 2k − 1. Thus any open set in I can be given as a union of
quartic intervals whose interiors are disjoint. Similarly, any open set in
the square I × I is a union of dyadic squares whose interiors are disjoint.
Now let E beSany set of measure zero in I − Z1 and ²P > 0. Then we
can cover E ⊂ j S Ij , where Ij are quartic intervals and j m1 (Ij ) < ².
Because Φ∗ (E) ⊂ j Φ∗ (Ij ), then
X X
m2 (Φ∗ (E)) ≤ m2 (Φ∗ (Ij )) = m1 (Ij ) < ².
Thus Φ∗ (E) is measurable and m2 (Φ∗ (E)) = 0. Similarly, (Φ∗ )−1 maps
sets of measure zero in (I × I) − Z2 to sets of measure zero in I.
Now the argument above also shows that if O is any open set in I,
then Φ∗ (O − Z1 ) is measurable, and m2 (Φ∗ (O − Z1 )) = m1 (O). Thus
this identity goes over to Gδ sets in I. Since any measurable set differs
from a Gδ set by a set of measure zero, we see that we have established
that m2 (Φ∗ (E)) = m1 (E) for any measurable subset of E of I − Z1 . The
same argument can be applied to (Φ∗ )−1 , and this completes the proof
of the theorem.
The Peano mapping will be obtained as Φ∗ for a special correspon-
dence Φ.
I− I+
0 1
S2 S3 S2 S3
S1 S4 S1 S4
σ
σ
S4 S3 S4 S3
S1 S2 S1 S2
S2 S3
I1 I2 I3 I4
S1 S4
Now suppose Φ has been defined for all quartic intervals of generation
less than or equal to k. We now write the intervals in generation k in
increasing order as I1 , . . . , I4k , and let Sj = Φ(Ij ). We then divide I1
into four quartic intervals of generation k + 1 and denote them by I1,1 ,
I1,2 , I1,3 , and I1,4 , where the intervals are chosen in increasing order.
Then, we assign to each interval I1,j a dyadic square Φ(I1,j ) = Sj of
generation k + 1 contained in S1 so that:
(a) S1,1 is the lower-left sub-square of S1 ,
(b) S1,4 touches the side that S1 shares with S2 ,
(c) S1,1 , S1,2 , S1,3 , and S1,4 is a traverse.
This is possible, since the induction hypothesis guarantees that S2 is
adjacent to S1 .
358 Chapter 7. HAUSDORFF MEASURE AND FRACTALS
j − 12
tj = for j = 1, . . . , 4k .
4k
Pk (tj ) = xj .
Also set
and
Note that the distance |xj − xj+1 | = 1/2k , while |tj − tj+1 | = 1/4k for
0 ≤ j ≤ 4k . Also
1
|x1 − x0 | = |x4k − x4k+1 | = ,
2 · 2k
while
1
|t1 − t0 | = |t4k − t4k+1 | = .
2 · 4k
However,
√
|Pk+1 (t) − Pk (t)| ≤ 2 2−k ,
because when `/4k ≤ t ≤ (` + 1)/4k , then Pk+1 (t) and Pk (t) belong to
the same dyadic square of generation k.
Therefore the limit
∞
X
P(t) = lim Pk (t) = P1 (t) + Pj+1 (t) − Pj (t)
k→∞
j=1
k. Thus Φ(I k ) and Φ(J k ) must be adjacent squares for all sufficiently
large k. Hence
\ \
Φ(I k ) = Φ(J k ).
k k
This proves conclusion (iii) of Theorem 3.1. The other conclusions hav-
ing already been established, we need only note that the corollary is
contained in Theorem 3.5.
As a result, we conclude that t 7→ P(t) also induces a measure pre-
serving mapping from [0, 1] to [0, 1] × [0, 1]. This concludes the proof of
Theorem 3.1.
1 Note that there are two planes perpendicular to γ and of distance |t| from the origin;
this accounts for the fact that t may be either positive or negative.
4*. Besicovitch sets and regularity 361
Et,γ = E ∩ Pt,γ .
Et1 ,γ
Pt2 ,γ
Pt1 ,γ
We observe that for almost every t the set Et,γ is md−1 measurable
and, moreover, md−1 (Et,γ ) is a measurable function of t. This is a
direct consequence of Fubini’s theorem and the above decomposition,
Rd = Rd−1 × R. In fact, so long as the direction γ is pre-assigned, not
much more can be said in general about the function t 7→ md−1 (Et,γ ).
362 Chapter 7. HAUSDORFF MEASURE AND FRACTALS
A significant part of (i) is that for a.e. γ, the slice Et,γ is measurable
for all values of the parameter t. In particular, one has the following.
Corollary 4.2 Suppose E is a set of measure zero in Rd with d ≥ 3.
Then, for almost every γ ∈ S d−1 , the slice Et,γ has zero measure for all
t ∈ R.
The fact that there is no analogue of this when d = 2 is a consequence of
the existence of a Besicovitch set, (also called a “Kakeya set”), which is
defined as a set that satisfies the three conditions in the theorem below.
Theorem 4.3 There exists a set B in R2 that:
(i) is compact,
(ii) has Lebesgue measure zero,
(iii) contains a translate of every unit line segment.
Note that with F = B and γ ∈ S 1 one has m1 (F ∩ Pt0 ,γ ) ≥ 1 for some t0 .
If m1 (F ∩ Pt,γ ) were continuous in t, then this measure would be strictly
positive for an interval in t containing t0 , and thus we would have
m2 (F ) > 0, by Fubini’s theorem. This contradiction shows that the ana-
logue of Theorem 4.1 cannot hold for d = 2.
While the set B has zero two-dimensional measure, this assertion can-
not be improved by replacing this measure by α-dimensional Hausdorff
measure, with α < 2.
Theorem 4.4 Suppose F is any set that satisfies the conclusions (i)
and (iii) of Theorem 4.3. Then F has Hausdorff dimension 2.
4*. Besicovitch sets and regularity 363
The integration is performed over the plane Pt,γ with respect to the
measure md−1 discussed above. We first make the following simple ob-
servation:
1. If f is continuous and has compact support, then f is of course
integrable on every plane Pt,γ , and so R(f )(t, γ) is defined for all
(t, γ) ∈ R × S d−1 . Moreover it is a continuous function of the pair
(t, γ) and has compact support in the t-variable.
2. If f is merely Lebesgue integrable, then f may fail to be measurable
or integrable on Pt,γ for some (t, γ), and thus R(f )(t, γ) is not
defined for those (t, γ).
3. Suppose f is the characteristic function of the set E, that is, f =
χE . Then R(f )(t, γ) = md−1 (Et,γ ) if Et,γ is measurable.
It is this last property that links the Radon transform to our problem.
Key estimates in this conclusion involve a maximal “Radon transform”
defined by
R∗ (f )(γ) = sup |R(f )(t, γ)|,
t∈R
The proof of the theorem relies on the interplay between the Radon
transform and the Fourier transform.
For fixed γ ∈ S d−1 , we let R̂(f )(λ, γ) denote the Fourier transform of
R(f )(t, γ) in the t-variable
Z ∞
R̂(f )(λ, γ) = R(f )(t, γ)e−2πiλt dt.
−∞
Proof. For each unit vector γ we use the adapted coordinate system
described above: x = (x1 , . . . , xd ) where γ coincides with the xd direc-
tion. We can then write each x ∈ Rd as x = (u, t) with u ∈ Rd−1 , t ∈ R,
where x · γ = t = xd and u = (x1 , . . . , xd−1 ). Moreover
Z Z
f= f (u, t) du,
Pt,γ Rd−1
R R ∞ ³R ´
and Fubini’s theorem shows that Rd
f (x) dx = −∞ Pt,γ
f dt. Ap-
−2πix·(λγ)
plying this to f (x)e in place of f (x) gives
Z Z ∞ µZ ¶
ˆ
f (λγ) = f (x)e−2πix·(λγ)
dx = f (u, t) du e−2πiλt dt
R d −∞ Rd−1
Z ÃZ !
∞
= f e−2πiλt dt.
−∞ Pt,γ
Let us observe the crucial point that the greater the dimension d, the
larger the factor |λ|d−1 as |λ| tends to infinity. Hence the greater the
dimension, the better the decay of the Fourier transform R̂(f )(λ, γ),
and so the better the regularity of the Radon transform R(f )(t, γ) as a
function of t.
Proof. The Plancherel formula in Chapter 5 guarantees that
Z Z
2 2
|f (x)| dx = 2 |fˆ(ξ)|2 dξ.
Rd Rd
and the proof is complete once we invoke the result of Lemma 4.7.
The final ingredient in the proof of Theorem 4.5 consists of the follow-
ing:
where
Z ∞
sup |F̂ (λ)| ≤ A and |F̂ (λ)|2 |λ|d−1 dλ ≤ B 2 .
λ∈R −∞
Then
Clearly, the first integral Ris bounded by cA. To estimate the second inte-
gral it suffices to bound |λ|>1 |F̂ (λ)| dλ. An application of the Cauchy-
Schwarz inequality gives
Z µZ ¶1/2 µZ ¶1/2
2 d−1 −d+1
|F̂ (λ)|dλ ≤ |F̂ (λ)| |λ| dλ |λ| dλ .
|λ|>1 |λ|>1 |λ|>1
Since one has the inequality2 |eix − 1| ≤ |x|, we immediately see that
We may then write the difference F (t1 ) − F (t2 ) as a sum of two inte-
grals. The integral over |λ| ≤ 1 is clearly bounded by cA|t1 − t2 |α . The
second integral, the one over |λ| > 1, can be estimated from above by
Z
α
|t1 − t2 | |F̂ (λ)||λ|α dλ.
|λ|>1
We now gather these results to prove the theorem. For each γ ∈ S d−1
let
F (t) = R(f )(t, γ).
Let
Z ∞
A(γ) = sup |F̂ (λ)| and B 2 (γ) = |F̂ (λ)|2 |λ|d−1 dλ.
λ −∞
Then by (4)
sup |F (t)| ≤ c(A(γ) + B(γ)).
t∈R
2 The distance in the plane from the point eix to the point 1 is shorter than the length
Therefore,
and thus
Z
|R∗ (f )(γ)|2 dσ(γ) ≤ c(kf k2L1 (Rd ) + kf k2L2 (Rd ) ),
S d−1
R
since B 2 (γ) dσ(γ) = 2kf k2L2 by Lemma 4.8. Consequently,
Z
R∗ (f )(γ) dσ(γ) ≤ c(kf kL1 (Rd ) + kf kL2 (Rd ) ).
S d−1
with F (t) = R(f )(t, γ), is justified for almost every γ ∈ S d−1 by the
Fourier inversion result in Theorem 4.2 of Chapter 2. Indeed, we have
seen that A(γ) and B(γ) are finite for almost every γ, and thus F̂ is
integrable for those γ. This completes the proof of the theorem. The
corollary follows the same way if we use (5) instead of (4).
We now return to the situation in the plane to see what information
we may deduce from the above analysis. The inequality (2) as it stands
does not hold when d = 2. However, a modification of it does hold, and
this will be used in the proof of Theorem 4.4.
If f ∈ L1 (Rd ) we define
Z t+δ
1
Rδ (f )(t, γ) = R(f )(s, γ) ds
2δ t−δ
Z
1
= f (x) dx.
2δ t−δ≤x·γ≤t+δ
The same argument as in the proof of Theorem 4.5 applies here, except
that we need a modified version of Lemma 4.9. More precisely, let us set
Z ∞ µ 2πi(t+δ)λ ¶
e − e2πi(t−δ)λ
Fδ (t) = F̂ (λ) dλ,
−∞ 2πiλ(2δ)
Indeed, we use the fact that |(sin x)/x| ≤ 1 to see that, in the definition
of Fδ (t), the integral over |λ| ≤ 1 gives the cA. Also, the integral over
|λ| > 1 can be split and is bounded by the sum
Z Z
c
|F̂ (λ)| dλ + |F̂ (λ)||λ|−1 dλ.
1<|λ|≤1/δ δ 1/δ≤|λ|
≤ cB(log 1/δ)1/2 .
≤ cB
supported in a fixed compact set, and so that fn (x) → f (x) a.e. By the
bounded convergence theorem, kfn − f kL1 and kfn − f kL2 both tend to
zero as n → ∞, and upon selecting a subsequence if necessary, we can
suppose that kfn − f kL1 + kfn − f kL2 ≤ 2−n . By what we have just
proved in Step 2 we have, for a.e. γ ∈ S d−1 , that fn (x) → f (x) on Pt,γ
a.e. with respect to the measure md−1 , for each t ∈ R. Thus again by the
bounded convergence theorem for those (t, γ), we see that R(fn )(t, γ) →
R(f )(t, γ), and this limit defines R(f ). Now applying Theorem 4.5 to
fn − fn−1 gives
∞ Z
X ∞
X
R∗ (fn − fn−1 )(γ) dσ(γ) ≤ c 2−n < ∞.
n=1 S d−1 n=1
for a.e.γ ∈ S d−1 , and hence for those γ the sequence of functions R(fn )(t, γ)
converges uniformly. As a consequence, for those γ the function R(f )(t, γ)
is continuous in t, and the inequality (2) is valid for this f . The inequality
with (3) is deduced in the same way.
Finally, we deal with the general f in L1 ∩ L2 by approximating it by
a sequence of bounded functions each with bounded support. The details
of the argument are similar to the case treated above and are left to the
reader.
Observe that the special case f = χE of the proposition gives us The-
orem 4.1.
This inequality was proved under the assumption that f was continuous
and had compact support. In the present situation it goes over without
difficulty to the general case where f ∈ L1 ∩ L2 , by an easy limiting
argument, since it is clear that R∗δ (fn )(γ) converges to R∗δ (f )(γ) for all
γ if fn → f in the L1 -norm.
372 Chapter 7. HAUSDORFF MEASURE AND FRACTALS
Now suppose FSis a Besicovitch set and α is fixed with 0 < α < 2.
∞
Assume that F ⊂ i=1 Bi is a covering, where Bi are balls with diameter
less than a given number. We must show that
X
(diam Bi )α ≥ cα > 0.
i
m(F ∗ ) ≤ cN δ 2 .
then we obtain
Let
\ [
Fk = F Bi ,
2−k−1 ≤diam Bi ≤2−k
and let
[
Fk∗ = Bi∗ ,
2−k−1 ≤diam Bi ≤2−k
m1 (sγ ∩ Fk ) ≥ ak .
S
Otherwise, since F = Fk , we would have
X
m1 (sγ ∩ F ) < ak = 1,
because any point of distance less than 2−k from Fk must belong to Fk∗ .
Since the choice of k may depend on γ, we let
m(Ek0 ) ≥ 2πak0 ,
P
for otherwise m(S1 ) < 2π ak = 2π. As a result
Recalling that by our choice ak ≈ 2−²k , and noting that kχFk∗0 kL2 ≤
1/2 0
cNk0 2−k , we obtain
0 0 1/2
2(1−2²)k ≤ c(log 2k )1/2 Nk0 .
0
Finally, this last inequality guarantees that Nk0 2−αk ≥ cα as long as
4² < 2 − α.
This concludes the proof of the theorem.
Once the theorem is proved, our job is done. Indeed, a finite union of
rotations of the set F contains unit segments of any slope, and that set
is therefore a Besicovitch set.
The proof of the required properties of the set F amounts to showing
the following paradoxical facts about the set C + λC, for λ > 0. Here
C + λC = {x1 + λx2 : x1 ∈ C, x2 ∈ C}:
• C + λC has one-dimensional measure zero, for a.e. λ.
• C + 12 C is the interval [0, 3/2].
Let us see how these two assertions imply the theorem. First, we note
that the set F is closed (and hence compact), because both E0 and E1
are closed. Next observe that with 0 < y < 1, the slice F y of the set
F is exactly (1 − y)C + y2 C. This set is obtained from the set C + λC,
where λ = y/(2(1 − y)), by scaling with the factor 1 − y. Hence F y is of
measure zero whenever C + λC is also of measure zero. Moreover, under
the mapping y 7→ λ, sets of measure zero in (0, ∞) correspond to sets of
measure zero in (0, 1). (For this see, for example, Exercise 8 in Chapter 1,
or Problem 1 in Chapter 6.) Therefore, the first assertion and Fubini’s
theorem prove that the (two-dimensional) measure of F is zero.
Finally the slope s of the segment joining the point (x0 , 0), with the
point (x1 , 1) is s = 1/(x1 − x0 ). Thus the quantity s can be realized if
376 Chapter 7. HAUSDORFF MEASURE AND FRACTALS
K(λ) = C + λC,
and we shall sometimes omit the λ and write K(λ) = K, when this causes
no confusion. By its definition we have
K = K1 ∪ K2 ∪ K3 ∪ K4 ,
where each Ki` equals Cj`1 + λCj`2 for a pair of indices j1 , j2 . In fact,
this relation among the indices sets up a bijection between the i with
1 ≤ i ≤ 4` , and the pair j1 , j2 with 1 ≤ j1 ≤ 2` and 1 ≤ j2 ≤ 2` . Note
that each Ki` is a translate of K1` , and each Ki` is also
S obtained from K by
a similarity of ratio 4−` . Now note that C = C/4 (C/4 + 3/4) implies
that
λ λ 3λ
K(λ) = C + λC = (C + C) ∪ (C + C + )
4 4 4
3λ
= K(λ/4) ∪ (K(λ/4) + ).
4
Thus K(λ) has measure zero if and only if K(λ/4) has measure zero.
Hence it suffices to prove that K(λ) has measure zero for a.e. λ ∈ [1, 4].
After these preliminaries let us observe that we immediately obtain
that m(K(λ)) = 0 for some special λ’s, those for which the following
coincidence takes place: for some ` and a pair i and i0 with i 6= i0 ,
`
4
X
m(K(λ)) ≤ m(Ki` (λ)) = (4` − 1)4−` m(K(λ)),
i=1, i6=i0
(10) a + λ0 b = a0 + λ0 b0 .
Note that the fact that ν1 6= ν2 means that |b − b0 | ≥ 1/2. Next, look-
ing at the `th generation we find via (7) that there are indices 1 ≤
j1 , j2 , j10 , j20 ≤ 2` , so that a ∈ Cj`1 ⊂ Cµ1 , b ∈ Cj`2 ⊂ Cν1 , a0 ∈ Cj`0 ⊂ Cµ2 , b0 ∈
1
Cj`0 ⊂ Cν2 . We also observe that the above sets are translates of each
2
other, that is, Cj`1 = Cj`0 + τ1 and Cj`2 = Cj`0 + τ2 , with |τk | ≤ 1. Hence if
1 2
i and i0 correspond to the pairs (j1 , j2 ) and (j10 , j20 ), respectively, we have
Now let (A, B) be the pair that corresponds to (a0 , b0 ) under the above
translations, namely
(12) A = a0 + τ1 , B = b0 + τ2 .
(13) A + λB = a0 + λb0 .
In fact, by (12) we have put B in Cj`2 ⊂ Cν1 , while b0 is in Cj`0 ⊂ Cν2 . Thus
2
|B − b0 | ≥ 1/2, since ν1 6= ν2 . We can therefore solve (13) by taking
λ = (A − a0 )/(b0 − B). Now we compare this with (10), and get λ0 =
(a − a0 )/(b0 − b). Moreover, |A − a| ≤ 4−` and |B − b| ≤ 4−` , since A
and a both lie in Cj`1 , and B and b lie in Cj`2 . This yields the inequality
(14) |λ − λ0 | ≤ c4−` .
4*. Besicovitch sets and regularity 379
Also, (12) and (13) clearly imply τ (λ) = τ1 + λτ2 = 0, and this together
with (11) proves the coincidence.
Therefore our proposition is proved under the restriction we made
earlier that ν1 6= ν2 . The situation where instead µ1 6= µ2 is obtained
from the case ν1 6= ν2 if we replace λ0 by λ−1 `
0 . Note that Ki (λ0 ) =
` ` ` ` `
Ki0 (λ0 ) if and only if Cj1 + λ0 Cj2 = Cj 0 + λ0 Cj 0 and this is the same as
1 2
Cj`2 + λ−1 ` ` −1 `
0 Cj1 = Cj20 + λ0 Cj10 . This allows us to reduce to the case µ1 6=
µ2 , since Cj`1 ⊂ Cµ1 and Cj`0 ⊂ Cµ2 . Here the fact that 1 ≤ λ0 ≤ 4 gives
1
λ−1
0 ≤ 1 and guarantees that the constant c in (9) can be taken to be
independent of λ0 . The proposition is therefore established.
Note that as a consequence, the following holds near the points λ where
the coincidence (9) takes place: If |λ − λ| ≤ ²4−` , then
Indeed, for fixed ² > 0, let Λ² denote the set of λ that satisfies (15) for
some `, i and i0 . For any interval I of length not exceeding 1, we have
because of (9) and (15). Thus Λc² has no points of Lebesgue density,
hence Λc² has measure zero, and thus ΛT ² is a set of full measure. (See
Corollary 1.5 in Chapter 3.) Since Λ = ² Λ² , and Λ² decreases with ²,
we see that Λ also has full measure and our assertion is proved.
Finally, our theorem will be established once we show that m(K(λ)) =
0 whenever λ ∈ Λ. To prove this, we assume contrariwise that m(K(λ)) >
0. Using again the point of density argument, there must be for any
3 The terminology that Λ has “full measure” means that its complement has measure
zero.
380 Chapter 7. HAUSDORFF MEASURE AND FRACTALS
ing intervals Ii and Ii0 , respectively, with m(Ii ) = m(Ii0 ) = 4−` m(I).
Moreover,
Also, as in (15), Ii0 = Ii + τ (λ), with |τ (λ)| ≤ ²4−` . This shows that
So m(Ki` ∩ Ii ∩ Ii0 ) > 12 m(Ii ∩ Ii0 ) and the same holds for i0 in place of i.
Hence m(Ki` ∩ Ki`0 ) > 0, and this contradicts the decomposition (8) and
the fact that m(Ki` ) = 4−` m(K) for every i. Therefore we obtain that
m(K(λ)) = 0 for every λ ∈ Λ, and the proof of Theorem 4.12 is now
complete.
5 Exercises
[Hint: Suppose E1 ∩ E2 = {x}, let B² denote the open ball centered at x and of
diameter ², and let E ² = E ∩ B²c . Show that
m∗α (E ² ) ≥ Hα
²
(E) ≥ m∗α (E) − µ(²) − ²α ,
5. Exercises 381
6. Let {Ek } be a sequence of Borel sets in Rd . Show that if dim Ek ≤ α for some
α and all k, then
[
dim Ek ≤ α.
k
7. Prove that the (log 2/ log 3)-Hausdorff measure of the Cantor set is precisely
equal to 1.
[Hint: Suppose we have a covering of C by finitely many closed intervals {Ij }.
Then there exists P
another covering
P of C by intervals {I`0 } each of length 3−k for
some k, such that j |Ij |α ≥ ` |I`0 |α ≥ 1, where α = log 2/ log 3.]
9. Consider the set Cξ1 × Cξ2 in R2 , with Cξ as in the previous exercise. Show that
Cξ1 × Cξ2 has strict Hausdorff dimension dim(Cξ1 ) + dim(Cξ2 ).
10. Construct a Cantor-like set (as in Exercise 4, Chapter 1) that has Lebesgue
measure zero, yet Hausdorff dimension 1.
P
[Hint: Choose `1 , `2 , . . . , `k , . . . so that 1 − kj=1 2j−1 `j tends to zero sufficiently
slowly as k → ∞.]
(a) Show that for any real number λ, the set Cξ + λCξ is similar to the projection
of D on the line in R2 with slope λ = tan θ.
(b) Note that among the Cantor sets Cξ , the value ξ = 1/2 is critical in the
construction of the Besicovitch set in Section 4.4. In fact, prove that with
ξ > 1/2, then Cξ + λCξ has Lebesgue measure zero for every λ. See also
Problem 10 below.
[Hint: mα (Cξ + λCξ ) < ∞ for α = dim Dµ .]
13. Consider the von Koch curve K` , 1/4 < ` < 1/2, as defined in Section 2.1.
Prove for it the analogue of Theorem 2.7: the function t 7→ K` (t) satisfies a Lip-
schitz condition of exponent γ = log(1/`)/ log 4. Moreover, show that the set K`
has strict Hausdorff dimension α = 1/γ.
[Hint: Show that if O is the shaded open triangle indicated in Figure 14, then O ⊃
S0 (O) ∪ S1 (O) ∪ S2 (O) ∪ S3 (O), where S0 (x) = `x, S1 (x) = ρθ (`x) + a, S2 (x) =
ρ−1
θ (`x) + c, and S3 (x) = `x + b, with ρθ the rotation of angle θ. Note that the
sets Sj (O) are disjoint.]
` `
` a b `
14. Show that if ` < 1/2, the von Koch curve t 7→ K` (t) in Exercise 13 is a simple
curve.
5. Exercises 383
P∞
[Hint: Observe that if t = j=1 aj /4j , with aj = 0, 1, 2, or 3, then
∞
\ ` ` ´´
{K(t)} = Saj · · · Sa2 Sa1 (O) .]
j=1
15. Note that if we take ` = 1/2 in the definition of the von Koch curve in
Exercise 13 we get a “space-filling” curve, one that fills the right triangle whose
vertices are (0, 0), (1, 0), and (1/2, 1/2). The first three steps of the construction
are as in Figure 15, with the intervals traced out in the indicated order.
8 9
2 3 7 10
6 11
2 3 5 12 14 15
1 4 1 4 13 16
Figure 15. The first three steps of the von Koch curve when ` = 1/2
16. Prove that the von Koch curve t 7→ K` (t), 1/4 < ` ≤ 1/2 is continuous but
nowhere differentiable.
[Hint: If K0 (t) exists for some t, then
K(un ) − K(vn )
lim
n→∞ un − vn
17. For a compact set E in Rd , define #(²) to be the least number of balls of
radius ² that cover E. Note that we always have #(²) = O(²−d ) as ² → 0, and
#(²) = O(1) if E is finite.
One defines the covering dimension of E, denoted by dimC (E), as inf β such
that #(²) = O(²−β ), as ² → 0. Show that dimC (E) = dimM (E), where dimM is the
Minkowski dimension discussed in Section 2.1, by proving the following inequalities
for all δ > 0:
384 Chapter 7. HAUSDORFF MEASURE AND FRACTALS
[Hint: To prove (ii), use Lemma 1.2 in Chapter 3 to find a collection of disjoint
balls B1 , B2 , . . . , BN of radius δ/3, each centered at E, such that their “triples”
B̃1 , B̃2 , . . . , B̃N (of radius δ) cover E. Then #(δ) ≤ N , while N m(Bj ) = cN δ d ≤
m(E δ ), since the balls Bj are disjoint and are contained in E δ .]
(a) Prove that dim(E) ≤ dimM (E), where dim and dimM are the Hausdorff and
Minkowski dimensions, respectively.
(b) However, prove that if E = {0, 1/ log 2, 1/ log 3, . . . , 1/ log n, . . .}, then
dimM E = 1, yet dim E = 0.
19. Show that there is a constant cd , dependent only on the dimension d, such
that whenever E is a compact set,
m(E 2δ ) ≤ cd m(E δ ).
20. Show that if F is the self-similar set considered in Theorem 2.12, then it has
the same Minkowski dimension as Hausdorff dimension.
[Hint: Each Fk is the union of mk balls of radius cr k . In the converse direction one
sees by Lemma 2.13 that if ² = rk , then each ball of radius ² can contain at most
c0 vertices of the kth generation. So it takes at least mk /c0 such balls to cover F .]
21. From the unit interval, remove the second and fourth quarters (open intervals).
Repeat this process in the remaining two closed intervals, and so on. Let F be the
limiting set, so that
∞
X
F = {x : x = ak /4k ak = 0 or 2}.
k=1
23. Suppose S1 , . . . , Sm are similarities with ratio r, 0 < r < 1. For each set E,
let
S̃(E) = S1 (E) ∪ · · · ∪ Sm (E),
and suppose F denotes the unique non-empty compact set with S̃(F ) = F .
(a) If x ∈ F , show that the set of points {S̃ n (x)}∞
n=1 is dense in F .
24. Suppose E is a Borel subset of Rd with dim E < 1. Prove that E is totally
disconnected, that is, any two distinct points in E belong to different connected
components.
[Hint: Fix x, y ∈ E, and show that f (t) = |t − x| is Lipschitz of order 1, and hence
dim f (E) < 1. Conclude that f (E) has a dense complement in R. Pick r in the
complement of f (E) so that 0 < r < f (y), and use the fact that E = {t ∈ E :
|t − x| < r} ∪ {t ∈ E : |t − x| > r}.]
27. Show that the modification of the inequality (2) of Theorem 4.5 fails if we
drop kf kL2 (Rd ) from the right-hand side.
[Hint: Consider R∗ (f² ), with f² defined by f² (x) = (|x| + ²)−d+δ , for |x| ≤ 1, with
δ fixed, 0 < δ < 1, and ² → 0.]
6 Problems
• Each Ij is a finite sequence of consecutive positive integers; that is, for all j
where vd denotes the volume of the unit ball in Rd . In other words, among sets of
a given diameter, the ball has maximum volume. Clearly, it suffices to prove the
inequality for E instead of E, so we can assume that E is compact.
(a) Prove the inequality in the special case when E is symmetric, that is, −x ∈ E
whenever x ∈ E.
In general, one reduces to the symmetric case by using a technique called Steiner
symmetrization. If e is a unit vector in Rd , and P is a plane perpendicular to e,
the Steiner symmetrization of E with respect to E is defined by
1
S(E, e) = {x + te : x ∈ P, |t| ≤ L(E; e; x)},
2
3. Suppose S is a similarity.
(a) Show that S maps a line segment to a line segment.
(b) Show that if L1 and L2 are two segments that make an angle α, then S(L1 )
and S(L2 ) make an angle α or −α.
In the case when F is the Cantor set, the Cantor-Lebesgue function is µ([0, x]).
7. Formulate and prove a generalization of Theorem 3.5 to the effect that once
appropriate sets of measure zero are removed, there is a measure-preserving iso-
morphism of the unit interval in R and the unit cube in Rd .
8.∗ There exists a simple continuous curve in the plane of positive two-dimensional
measure.
388 Chapter 7. HAUSDORFF MEASURE AND FRACTALS
10.∗ Let Cξ be the Cantor set considered in Exercises 8 and 11. If ξ < 1/2, then
Cξ + λCξ has positive Lebesgue measure for almost every λ.
Notes and References
There are several excellent books that cover many of the subjects treated here.
Among these texts are Riesz and Nagy [27], Wheeden and Zygmund [33], Fol-
land [13], and Bruckner et al. [4].
Introduction
The citation is a translation of a passage in a letter from Hermite to Stieltjes [18].
Chapter 1
The citation is a translation from the French of a passage in [3].
We refer to Devlin [7] for more details about the axiom of choice, Hausdorff
maximal principle, and well-ordering principle.
See the expository paper of Gardner [14] for a survey of results regarding the
Brunn-Minkowski inequality.
Chapter 2
The citation is a passage from the preface to the first edition of Lebesgue’s book
on integration [20].
Devlin [7] contains a discussion of the continuum hypothesis.
Chapter 3
The citation is from Hardy and Littlewood’s paper [15].
Hardy and Littlewood proved Theorem 1.1 in the one-dimensional case by
using the idea of rearrangements. The present form is due to Wiener.
Our treatment of the isoperimetric inequality is based on Federer [11]. This
work also contains significant generalizations and much additional material on
geometric measure theory.
A proof of the Besicovitch covering in the lemma in Problem 3∗ is in Mat-
tila [22].
For an account of functions of bounded variations in Rd , see Evans and
Gariepy [8].
An outline of the proof of Problem 7 (b)∗ can be found at the end of Chapter 5
in Book I.
The result in part (b) of Problem 8∗ is a theorem of S. Saks, and its proof as
a consequence of part (a) can be found in Stein [31].
Chapter 4
The citation is translated from the introduction of Plancherel’s article [25].
An account of the theory of almost periodic functions which is touched upon
in Problem 2∗ can be found in Bohr [2].
The results in Problems 4∗ and 5∗ are in Zygmund [35], in Chapters V and VII,
respectively.
Consult Birkhoff and Rota [1] for more on Sturm-Liouville systems, Legendre
polynomials, and Hermite functions.
Chapter 5
389
390 NOTES AND REFERENCES
See Courant [6] for an account of the Dirichlet principle and some of its applica-
tions. The solution of the Dirichlet problem for general domains in R2 and the
related notion of logarithmic capacity of sets are treated in Ransford [26]. Fol-
land [12] contains another solution to the Dirichlet problem (valid in Rd , d ≥ 2)
by methods which do not use the Dirichlet principle.
The result regarding the existence of the conformal mapping stated in Prob-
lem 3∗ is in Chapter VII of Zygmund [35].
Chapter 6
The citation is a translation from the German of a passage in C. Carathéodory [5].
Petersen [24] gives a systematic presentation of ergodic theory, including a
proof of the theorem in Problem 7∗ .
The facts about spherical harmonics needed in Problem 4∗ can be found in
Chapter 4 in Stein and Weiss [32].
We refer to Hardy and Wright [16] for an introduction to continued fractions.
Their connection to ergodic theory is discussed in Ryll-Nardzewski [28].
Chapter 7
The citation is a translation from the German of a passage in Hausdorff’s arti-
cle [17], while Mandelbrot’s citation is from his book [21].
Mandelbrot’s book also contains many interesting examples of fractals arising
in a variety of different settings, including a discussion of Richardson’s work on
the length of coastlines. (See in particular Chapter 5.)
Falconer [10] gives a systematic treatment of fractals and Hausdorff dimension.
We refer to Sagan [29] for further details on space-filling curves, including the
construction of a curve arising in Problem 8∗ .
The monograph of Falconer [10] also contains an alternate construction of the
Besicovitch set, as well as the fact that such sets must necessarily have dimension
two. The particular Besicovitch set described in the text appears in Kahane [19],
but the fact that it has measure zero required further ideas which are contained,
for instance, in Peres et al. [30].
Regularity of sets in Rd , d ≥ 3, and the estimates for the maximal function
associated to the Radon transform are in Falconer [9], and Oberlin and Stein [23].
The theory of Besicovitch sets in higher dimensions, as well as a number of
interesting related topics can be found in the survey of Wolff [34].
Bibliography
391
392 BIBLIOGRAPHY
[19] J. P. Kahane. Trois notes sur les ensembles parfaits linéaires. En-
seignement Math., 15:185–192, 1969.
The page numbers on the right indicate the first time the symbol or
notation is defined or used. As usual, Z, Q, R, and C denote the integers,
the rationals, the reals, and the complex numbers respectively.
Relevant items that also arose in Book I or Book II are listed in this
index, preceeded by the numerals I or II, respectively.
Fσ , 23 Borel
Gδ , 23 σ-algebra, 23, 267
σ-algebra measure, 269
Borel, 23 on R, 281
of sets, 23 sets, 23, 267
Borel, 267 Borel-Cantelli lemma, 42, 63
σ-finite, 263 boundary, 3
σ-finite signed measure, 288 boundary-value function, 217
O notation, 12 bounded convergence theorem, 56
bounded set, 3
absolute continuity bounded variation, 116
of the Lebesgue integral, 66 Brunn-Minkowski inequality, 34, 48
absolutely continuous
functions, 127 canonical form, 50
measures, 288 Cantor dust, 47, 343
adjoint, 183, 222 Cantor set, 8, 38, 126, 330, 387
algebra of sets, 270 constant dissection, 38
almost disjoint (union), 4 Cantor-Lebesgue
almost everywhere, a.e., 30 function, 38, 126, 331, 387
almost periodic function, 202 theorem, 95
approximation to the identity, 109; Carathéodory measurable, 264
(I)49 Cauchy
arc-length parametrization, 136; in measure, 95
(I)103 integral, 179, 220; (II)48
area of unit sphere, 313 sequence, 159; (I)24; (II)24
area under graph, 85 Cauchy-Schwarz inequality, 157,
averaging problem, 100 162; (I)72
axiom of choice, 26, 48 chain
of dyadic squares, 352
basis of quartic intervals, 351
algebraic, 202 change of variable formula, 149;
orthonormal, 164 (I)292
Bergman kernel, 254 characteristic
Besicovitch function, 27
covering lemma, 153 polynomial, 221, 258
set, 360, 362, 374 closed set, 2, 267; (II)6
Bessel’s inequality, 166; (I)80 closure, 3
Blaschke factors, 227; (I)26, 153, coincidence, 377
219 compact linear operator, 188
398 INDEX