Bressoud - A Radical Approach To Lebesgue's Theory of Integration (2008)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 345
At a glance
Powered by AI
The passage discusses the development of the Lebesgue integral and measure theory from Riemann's definition of the integral. It traces the work of mathematicians like Jordan, Borel, and Lebesgue who developed these theories.

The book is a introduction to measure theory and Lebesgue integration. It is motivated by the historical questions that led to the development of these theories from Riemann's original definition of the integral.

The author is David M. Bressoud, who is a professor of mathematics at Macalester College. Some background on the author is also provided.

A RADICAL APPROACH TO LEBESGUE'S THEORY

OF INTEGRATION

Meant for advanced undergraduate and graduate students in mathematics, this


lively introduction to measure theory and Lebesgue integration is rooted in and
motivated by the historical questions that led to its development. The author stresses
the original purpose of the definitions and theorems and highlights some of the
difficulties that were encountered as these ideas were refined.
The story begins with Riemann's definition of the integral, a definition created
so that he could understand how broadly one could define a function and yet
have it be integrable. The reader then follows the efforts of many mathematicians
who wrestled with the difficulties inherent in the Riemann integral, leading to the
work in the late nineteenth and early twentieth centuries of Jordan, Borel, and
Lebesgue, who finally broke with Riemann's definition. Ushering in a new way of
understanding integration, they opened the door to fresh and productive approaches
to many of the previously intractable problems of analysis.

David M. Bressoud is the DeWitt Wallace Professor of Mathematics at Macalester


College. He was a Peace Corps Volunteer in Antigua, West Indies, received his
PhD from Temple University, and taught at The Pennsylvania State University
before moving to Macalester. He has held visiting positions at the Institute for
Advanced Study, the University of Wisconsin, the University of Minnesota, and
the University of Strasbourg. He has received a Sloan Fellowship, a Fulbright
Fellowship, and the MAA Distinguished Teaching Award. He has published more
than 50 research articles in number theory, partition theory, combinatorics, and the
theory of special functions. His other books include Factorization and Primality
Testing, Second Year Calculus from Celestial Mechanics to Special Relativity, A
Radical Approach to Real Analysis, and Proofs and Confirmations, for which he
won the MAA Beckenbach Book Prize.
Council on Publications
James Daniel, Chair
MAA Textbooks Editorial Board
Zaven A. Karian, Editor
George Exner
Thomas Garrity
Charles R. Hadlock
William Higgins
Douglas B. Meade
Stanley E. Seltzer
Shahriar Shahriari
Kay B. Somers

MAA TEXTBOOKS

Combinatorics: A Problem Oriented Approach, Daniel A. Marcus


Complex Numbers and Geometry, Liang-shin Hahn
A Course in Mathematical Modeling, Douglas Mooney and Randall Swift
Creative Mathematics, H. S. Wall
Cryptological Mathematics, Robert Edward Lewand
Differential Geometry and Its Applications, John Oprea
Elementary Cryptanalysis, Abraham Sinkov
Elementary Mathematical Models, Dan Kalman
Essentials of Mathematics, Margie Hale
Field Theory and Its Classical Problems, Charles Hadlock
Fourier Series, Rajendra Bhatia
Game Theory and Strategy, Philip D. Straffin
Geometry Revisited, H. S. M. Coxeter and S. L. Greitzer
Knot Theory, Charles Livingston
Mathematical Connections: A Companion for Teachers and Others, Al Cuoco
Mathematical Modeling in the Environment, Charles Hadlock
Mathematics for Business Decisions Part 1: Probability and Simulation (electronic textbook),
Richard B. Thompson and Christopher G. Lamoureux
Mathematics for Business Decisions Part 2: Calculus and Optimization (electronic textbook),
Richard B. Thompson and Christopher G. Lamoureux
The Mathematics of Games and Gambling, Edward Packel
Math Through the Ages, William Berlinghoff and Fernando Gouvea
Noncommutative Rings, I. N. Herstein
Non-Euclidean Geometry, H. S. M. Coxeter
Number Theory Through Inquiry, David C. Marshall, Edward Odell, and Michael Starbird
A Primer of Real Functions, Ralph P. Boas
A Radical Approach to Real Analysis, 2nd edition, David M. Bressoud
Real Infinite Series, Daniel D. Bonar and Michael Khoury, Jr.
Topology Now!, Robert Messer and Philip Straffin
Understanding Our Quantitative World, Janet Andersen and Todd Swanson

MAA Service Center


P.O. Box 91112
Washington, DC 20090-1112
1-800-331-1 MAA FAX: 1-301-206-9789
A RADICAL APPROACH TO LEBESGUE'S
THEORY OF INTEGRATION

DAVID M. BRESSOUD
Macalester College

CAMBRIDGE
UNIVERSITY PRESS
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi

Cambridge University Press


32 Avenue of the Americas, New York, NY 10013-2473, USA
www.cambridge.org
Information on this title: www.cambridge.org/978052 1884747
The Mathematical Association of America
1529 Eighteenth Street, NW, Washington, DC 20036

© David M. Bressoud 2008

This publication is in copyright. Subject to statutory exception


and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without
the written permission of Cambridge University Press.

First published 2008

Printed in the United States of America

A catalog record for this publication is available from the British Library.

Library of Congress Cataloging in Publication Data


Bressoud, David M., 1950—
A Radical approach to Lebesgue's theory of integration / David M. Bressoud.
p. cm. — (Classroom resource materials)
Includes bibliographical references and index.
ISBN-13: 978-0-521-88474-7 (hardback)
ISBN-10: 0-521-88474-8 (hardback)
ISBN-13: 978-0-521-71183-8 (pbk.)
ISBN-10: 0-521-7 1 183-5 (pbk.)
1. Integrals, Generalized. I. Title. II. Series.
QA312.B67 2008
5 15'.42—dc22 2007035326

ISBN 978-0-521-88474-7 hardback


ISBN 978-0-521-71183-8 paperback

Cambridge University Press has no responsibility for


the persistence or accuracy of URLs for external or
third-party Internet Web sites referred to in this publication
and does not guarantee that any content on such
Web sites is, or will remain, accurate or appropriate.
Dedicated to Herodotus,
the little lion of Cambridge Street,
and to the woman who loves him
Contents

Preface page xi
Introduction 1

1.1 The Five Big Questions 2


1.2 Presumptions 15
2 The Riemann Integral 23
2.1 Existence 24
2.2 Nondifferentiable Integrals 33
2.3 TheClassof 1870 40
3 Explorations of IR 51
3.1 Geometry of IR 51
3.2 Accommodating Algebra 59
3.3 Set Theory 71
4 Nowhere Dense Sets and the Problem with the Fundamental
Theorem of Calculus 81
4.1 The Smith—Volterra—Cantor Sets 82
4.2 Volterra's Function 89
4.3 Term-by-Term Integration 98
4.4 The Baire Category Theorem 109
5 The Development of Measure Theory 120
5.1 Peano, Jordan, and Borel 122
5.2 Lebesgue Measure 131
5.3 Carathéodory's Condition 140
5.4 Nonmeasurable Sets 150
6 The Lebesgue Integral 159
6.1 Measurable Functions 159
6.2 Integration 169
6.3 Lebesgue's Dominated Convergence Theorem 183
6.4 Egorov 's Theorem 191

ix
Contents

7 The Fundamental Theorem of Calculus 203


7.1 The Dini Derivatives 204
7.2 Monotonicity Implies Differentiability Almost Everywhere 212
7.3 Absolute Continuity 223
7.4 Lebesgue's FTC 231
8 Fourier Series 241
8.1 Pointwise Convergence 242
8.2 Metric Spaces 251
8.3 Banach Spaces 263
8.4 Hilbert Spaces 271
9 Epilogue 282
Appendixes 287
A Other Directions 287
A. 1 The Cardinality of the Collection of Borel Sets 287
A.2 The Generalized Riemann Integral 291
B Hints to Selected Exercises 299
Bibliography 317
Index 323
Preface

I look at the burning question of the foundations of infinitesimal analysis without sorrow,
anger, or irritation. What Weierstrass — Cantor — did was very good. That's the way it had
to be done. But whether this corresponds to what is in the depths of our consciousness is a
very different question. I cannot but see a stark contradiction between the intuitively clear
fundamental formulas of the integral calculus and the incomparably artificial and complex
work of the "justification" and their "proofs." One must be quite stupid not to see this at
once, and quite careless if, after having seen this, one can get used to this artificial, logical
atmosphere, and can later on forget this stark contradiction.

— Nikolai Nikolaevich Luzin

Nikolai Luzin reminds us of a truth too often forgotten in the teaching of analysis;
the ideas, methods, definitions, and theorems of this study are neither natural nor
intuitive. It is all too common for students to emerge from this study with little
sense of how the concepts and results that constitute modern analysis hang together.
Here more than anywhere else in the advanced undergraduate/beginning graduate
curriculum, the historical context is critical to developing an understanding of the
mathematics.
This historical context is both interesting and pedagogically informative. From
transfinite numbers to the Heine—Borel theorem to Lebesgue measure, these ideas
arose from practical problems but were greeted with a skepticism that betrayed
confusion. Understanding what they mean and how they can be used was an uncer-
tain process. We should expect our students to encounter difficulties at precisely
those points at which the contemporaries of Weierstrass, Cantor, and Lebesgue had
balked.
Throughout this text I have tried to emphasize that no one set out to invent
measure theory or functional analysis. I find it both surprising and immensely sat-
isfying that the search for understanding of Fourier series continued to be one of the
principal driving forces behind the development of analysis well into the twentieth

xi
xii Preface

century. The tools that these mathematicians had at hand were not adequate to the
task. In particular, the Riemann integral was poorly adapted to their needs.
It took several decades of wrestling with frustrating difficulties before mathe-
maticians were willing to abandon the Riemann integral. The route to its eventual
replacement, the Lebesgue integral, led through a sequence of remarkable insights
into the complexities of the real number line. By the end of 1 890s, it was recognized
that analysis and the study of sets were inextricably linked. From this rich interplay,
measure theory would emerge. With it came what today we call Lebesgue's domi-
nated convergence theorem, the holy grail of nineteenth-century analysis. What so
many had struggled so hard to discover now appeared as a gift that was almost free.
This text is an introduction to measure theory and Lebesgue integration, though
anyone using it to support such a course must be forewarned that I have intentionally
avoided stating results in their greatest possible generality. Almost all results are
given only for the real number line. Theorems that are true over any compact set
are often stated only for closed, bounded intervals. I want students to get a feel for
these results, what they say, and why they are important. Close examination of the
most general conditions under which conclusions will hold is something that can
come later, if and when it is needed.
The title of this book was chosen to communicate two important points. First,
this is a sequel to A Radical Approach to Real Analysis (ARATRA). That book ended
with Riemann's definition of the integral. That is where this text begins. All of the
topics that one might expect to find in an undergraduate analysis book that were
not in ARATRA are contained here, including the topology of the real number line,
fundamentals of set theory, transfinite cardinals, the Bolzano—Weierstrass theorem,
and the Heine—Borel theorem. I did not include them in the first volume because
I felt I could not do them justice there and because, historically, they are quite
sophisticated insights that did not arise until the second half of the nineteenth
century.
Second, this book owes a tremendous debt to Thomas Hawkins' Lebesgue's
Theory of Integration: Its Origins and Development. Like ARATRA, this book is
not intended to be read as a history of the development of analysis. Rather, this
is a textbook informed by history, attempting to communicate the motivations,
uncertainties, and difficulties surrounding the key concepts. This task would have
been far more difficult without Hawkins as a guide. Those who are intrigued by the
historical details encountered in this book are encouraged to turn to Hawkins and
other historians of this period for fuller explanation.
Even more than ARATRA, this is the story of many contributions by many mem-
bers of a large community of mathematicians working on different pieces of the
puzzle. I hope that I have succeeded in opening a small window into the workings
of this community. One of the most intriguing of these mathematicians is Axel
Preface xiii

Harnack, who keeps reappearing in our story because he kept making mistakes, but
they were good mistakes. Harnack's errors condensed and made explicit many of
the misconceptions of his time, and so helped others to find the correct path. For
ARATRA, it was easy to select the four mathematicians who should grace the cover:
Fourier, Cauchy, Abel, and Dirichlet stand out as those who shaped the origins
of modern analysis. For this book, the choice is far less clear. Certainly I need to
include Riemann and Lebesgue, for they initiate and bring to conclusion the princi-
pal elements of this story. Weierstrass? He trained and inspired the generation that
would grapple with Riemann's work, but his contributions are less direct. Heine,
du Bois-Reymond, Jordan, Hankel, Darboux, or Dini? They all made substantial
progress toward the ultimate solution, but none of them stands out sufficiently.
Cantor? Certainly yes. It was his recognition that set theory lies at the heart of
analysis that would enable the progress of the next generation. Who should we
select from that next generation: Peano, Volterra, Borel, Baire? Maybe Riesz or
one of the others who built on Lebesgue's insights, bringing them to fruition? Now
the choice is even less clear. I have settled on Bore! for his impact as a young
mathematician and to honor him as the true source of the Heine—Borel theorem,
a result that I have been very tempted to refer to as he did: the first fundamental
theorem of measure theory.
I have drawn freely on the scholarship of others. I must pay special tribute to Soo
Bong Chae's Lebesgue Integration. When I first saw this book, my reaction was
that I did not need to write my own on Lebesgue integration. Here was someone
who had already put the subject into historical context, writing in an elegant yet
accessible style. However, as I have used his book over the years, I have found that
there is much that he leaves unsaid, and I disagree with his choice to use Riesz's
approach to the Lebesgue integral, building it via an analysis of step functions.
Riesz found an elegant route to Lebesgue integration, but in defining the integral
first and using it to define Lebesgue measure, the motivation for developing these
concepts is lost. Despite such fundamental divergences, the attentive reader will
discover many close parallels between Chae's treatment and mine.
I am indebted to many people who read and commented on early drafts of
this book. I especially thank Dave Renfro who gave generously of his time to
correct many of my historical and mathematical errors. Steve Greenfield had the
temerity to be the very first reader of my very first draft, and I appreciate his many
helpful suggestions on the organization and presentation of this book. I also want to
single out my students who, during the spring semester of 2007, struggled through
a preliminary draft of this book and helped me in many ways to correct errors
and improve the presentation of this material. They are Jacob Bond, Kyle Braam,
Pawan Dhir, Elizabeth Gillaspy, Dan Gusset, Sam Handler, Kassa Haileyesus, Xi
Luo, Jake Norton, Stella Stamenova, and Linh To.
xiv Preface

I am also grateful to the mathematicians and historians of mathematics who sug-


gested corrections and changes or helped me find information. These include Roger
Cooke, Larry D'Antonio, Ivor Grattan-Guinness, Daesuk Han, Tom Hawkins, Mark
Huibregtse, Nicholas Rose, Peter Ross, Jim Smoak, John Stillwell, and Sergio B.
Volchan. I am also indebted to Don Albers of the MAA and Lauren Cowles of
Cambridge who so enthusiastically embraced this project, and to the reviewers for
both MAA and Cambridge whose names are unknown to me but who gave much
good advice.
Corrections, commentary, and additional material for this book can be found at
www macalester. edu/aratra.

David M. Bressoud
[email protected]
June 19, 2007
1

Introduction

By 1850, most mathematicians thought they understood calculus. Real progress


was being made in extending the tools of calculus to complex numbers and spaces
of higher dimensions. Equipped with appropriate generalizations of Fourier series,
solutions to partial differential equations were being found. Cauchy's insights had
been assimilated, and the concepts that had been unclear during his pioneering
work of the 1 820s, concepts such as uniform convergence and uniform continuity,
were coming to be understood. There was reason to feel confident.
One of the small, nagging problems that remained was the question of the
convergence of the Fourier series expansion. When does it converge? When it
does, can we be certain that it converges to the original function from which the
Fourier coefficients were derived? In 1829, Peter Gustav Lejeune Dirichlet had
proven that as long as a function is piecewise monotonic on a closed and bounded
interval, the Fourier series converges to the original function. Dirichlet believed
that functions did not have to be piecewise monotonic in order for the Fourier series
to converge to the original function, but neither he nor anyone else had been able
to weaken this assumption.
In the early 1 850s, Bernard Riemann, a young protege of Dirichlet and a student
of Gauss, would make substantial progress in extending our understanding of
trigonometric series. In so doing, the certainties of calculus would come into
question. Over the next 60 years, five big questions would emerge and be answered.
The answers would be totally unexpected. They would forever change the nature
of analysis.

1. When does a function have a Fourier series expansion that converges to


that function?
2. What is integration?
3. What is the relationship between integration and differentiation?
2 Introduction

4. What is the relationship between continuity and differentiability?


5. When can an infinite series be integrated by integrating each term?

This book is devoted to explaining the answers to these five questions — answers
that are very much intertwined. Before we tackle what happened after 1850, we
need to understand what was known or believed in that year.

1.1 The Five Big Questions


Fourier Series
Fourier's method for expanding an arbitrary function F defined on [—yr, rt] into a
trigonometric series is to use integration to calculate coefficients:
p7r
1
ak = — / F(x)cos(kx)dx (k ? 0), (1.1)
Jt f-yr

1
bk = —I F(x)sin(kx)dx (k> 1). (1.2)
Jt J-yr

The Fourier expansion is then given by

F(x) = + [ak cos(kx) + bk sin(kx)]. (1.3)

The heuristic argument for the validity of this procedure is that if F really can
be expanded in a series of the form given in Equation (1.3), then
pm
F(x)cos(nx)dx
J—yr

=f + cos(kx) ± bk sin(kx)]) cos(nx) dx

=f f
cos(nx) dx + ak cos(kx) cos(nx) dx

00

+> bk sin(kx) cos(nx)dx.


k=1 J
Since n and k are integers, all of the integrals are zero except for the one involving
These integrals are easily evaluated:
pm
F(x)cos(nx)dx =
J—yr
1.1 The Five Big Questions 3

Similarly,
p 7t
F(x)sin(nx)dx = (1.6)
J—7t
This is a convincing heuristic, but it ignores the problem of interchanging inte-
gration and summation, and it sidesteps two crucial questions:

1. Are the integrals that produce the Fourier coefficients well-defined?


2. If these integrals can be evaluated, does the resulting Fourier series actually
converge to the original function?
Not all functions are integrable. In the 1 820s, Dirichlet proposed the following
example.

Example 1.1. The characteristic function of the rationals is defined as


— J1, x is rational,
f(x)
— 0, x is not rational.

This example demonstrates how very strange functions can be if we take seriously
the definition of a function as a well-defined rule that assigns a value to each number
in the domain. Dirichlet's example represents an important step in the evolution
of the concept of function. To the early explorers of calculus, a function was an
algebraic rule such as sin x or x2 — 3, an expression that could be computed to
whatever accuracy one might desire.
When Augustin-Louis Cauchy showed that any piecewise continuous function
is integrable, he cemented the realization that functions could also be purely geo-
metric, representable only as curves. Even in a situation in which a function has no
explicit algebraic formulation, it is possible to make sense of its integral, provided
the function is continuous.
Dirichlet stretched the concept of function to that of a rule that can be individually
defined for each value of the domain. Once this conception of function is accepted,
the gates are opened to very strange functions. At the very least, integrability can
no longer be assumed.
The next problem is to show that our trigonometric series converges. In his
1829 paper, Dirichlet accomplished this, but he needed the hypothesis that the
original function F is piecewise monotonic, that is the domain can be partitioned
into a finite number of subintervals so that F is either monotonically increasing or
monotonically decreasing on each subinterval.
The final question is whether the function to which it converges is the function
F with which we started. Under the same assumptions, Dirichlet was able to show
that this is the case, provided that at any points of discontinuity of F, the value
Introduction

taken by the function is the average of the limit from the left and the limit from the
right.
Dirichlet's result implies that the functions one is likely to encounter in physical
situations present no problems for conversion into Fourier series. Riemann recog-
nized that it was important to be able to extend this technique to more complicated
functions now arising in questions in number theory and geometry. The first step
was to get a better handle on what we mean by integration.

Integration
It is ironic that integration took so long to get right because it is so much older
than any other piece of calculus. Its roots lie in methods of calculating areas,
volumes, and moments that were undertaken by such scientists as Archimedes
(287—212 Bc), Liu Hui (late third century AD), ibn al-Haytham (965—1039), and
Johannes Kepler (1571—1630). The basic idea was always the same. To evaluate
an area, one divided it into rectangles or triangles or other shapes of known area
that together approximated the desired region. As more and smaller figures were
used, the region would be matched more precisely. Some sort of limiting argument
would then be invoked, some means of finding the actual area based on an analysis
of the areas of the approximating regions.
Into the eighteenth century, integration was identified with the problem of
"quadrature," literally the process of finding a square equal in area to a given
area and thus, in practice, the problem of computing areas. In section 1 of Book
I of his Mathematical Principles of Natural Philosophy, Newton explains how to
calculate areas under curves. He gives a procedure that looks very much like the
definition of the Riemann integral, and he justifies it by an argument that would be
appropriate for any modern textbook.
Specifically, Newton begins by approximating the area under a decreasing curve
by subdividing the domain into equal subintervals (see Figure 1.1). Above each
subinterval, he constructs two rectangles: one whose height is the maximum value
of the function on that interval (the circumscribed rectangle) and the other whose
height is the minimum value of the function (the inscribed rectangle). The true area
lies between the sum of the areas of the circumscribed rectangles and the sum of
the areas of the inscribed rectangles.
The difference between these areas is the sum of the areas of the rectangles
aKbl, bLcm, cMdn, dDEo. If we slide all of these rectangles to line up under
a Kbl, we see that the sum of their areas is just the change in height of the function
multiplied by the length of any one subinterval. As we take narrower subintervals,
the difference in the areas approaches zero. As Newton asserts: "The ultimate ratios
which the inscribed figure, the circumscribed figure, and the curvilinear figure have
1.1 The Five Big Questions 5

a
if
K

Figure 1.1. Newton's iliustration from Mathematical Principles of Natural Philosophy. (Newton, 1999, p. 433)

to one another are ratios of equality," which is his way of saying that the ratio of
any two of these areas approaches 1. Therefore, the areas are all approaching the
same value as the length of the subinterval approaches 0.
In Lemma 3 of his book, Newton considers the case where the subintervals are
not of equal length (using the dotted line f F in Figure 1.1 in place of 1B). He
observes that the sum of the differences of the areas is still less than the change in
height multiplied by the length of the longest subinterval. We therefore get the same
limit for the ratio so long as the length of the longest subinterval is approaching
zero.
This method of finding areas is paradigmatic for an entire class of problems in
which one is multiplying two quantities such as

• area = height x width,


• volume = cross-sectional area x width,
• moment = mass x distance,
• work = force x distance,
• distance = speed x time, or
• velocity = acceleration x time,

where the value of the first quantity can vary as the second quantity increases. For
example, knowing that "distance = speed x time," we can find the distance traveled
by a particle whose speed is a function of time, say v(t) = 8t + 5, 0 t 4. If we
split the time into four intervals and use the velocity at the start of each interval,
we get an approximation to the total distance:

1 + 13•1+21•1 1=68.
introduction

If we use eight intervals of length 1/2 and again take the speed at the start of each
interval, we get
1 1 1 1

If we use 1,024 intervals of length 1/256 and take the speed at the start of each
interval, we get
1 161 1 81 1 1,183 1

256 32 256 16 256 32 256


1 343
= = 83.9375.
16
As we take more intervals of shorter length, we approach the true distance, which
is 84. How do you actually get 84? We can think of this as taking infinitely many
intervals of infinitely short length.
Leibniz's notation is a brilliant encapsulation of this process:

f f(x)dx.
The product is f(x) dx, the value of the first quantity times the infinitesimal
increment. The elongated S, f, represents the summation.
This is all precalculus. The insight at the heart of calculus is that if f(x) represents
the slope of the tangent to the graph of a function F at x, then this provides an
easy method for computing limits of sums of products: If x ranges over the interval
[a, b], then the value of this integral is F(b) — F(a). Thus, to find the area under
the curve v = 8t + 5 from t = 0 to t = 4, we can observe that f(t) = 8t + 5 is the
derivative of F(t) = 4t2 + St. The desired area is equal to
(4 42+5 4) (4 02+5 O)—84
The calculating power of calculus comes from this dual nature of the integral. It can
be viewed as a limit of sums of products or as the inverse process of differentiation.
It is hard to find a precise definition of the integral from the eighteenth century.
The scientists of this century understood and exploited the dual nature of the
integral, but most were reluctant to define it as the sum of products of f(x) times
the infinitesimal dx, for that inevitably led to the problem of what exactly is meant
by an "infinitesimal." It is a useful concept, but one that is hard to pin down.
George Berkeley aptly described infinitesimals as "ghosts of departed quantities."
He would object, "Now to conceive a quantity infinitely small, that is, infinitely
less than any sensible or imaginable quantity or than any the least finite magnitude
is, I confess, above my capacity."

George Berkeley, The Analyst, as quoted in Struik (1986, pp. 335, 338).
1.1 The Five Big Questions 7

The result was that when a definition of f f(x) dx was needed, the integral was
simply defined as the operator that returns you to the function (or, in modern use, the
class of functions) whose derivative is f. One of the early calculus textbooks written
for an undergraduate audience was S. F. Lacroix's Traité élémentaire de Calcul
Différentiel et de Calcul Integral of 1802 (Elementary Treatise of Differential
Calculus and Integral Calculus). Translated into many languages, it would serve
as the standard text of the first half of the nineteenth century. It provides no explicit
definition of the integral, but does state that
Integral calculus is the inverse of differential calculus. Its goal is to restore the functions from
their differential coefficients.

After this clarification of what is meant by integration, Lacroix then proceeds to


deal with the definite integral which "is found by successively calculating the value
of the integral when x = a, then when x = b, and subtracting the first result from
the second."
This would continue to be the standard definition of integration in calculus
texts until the 1950s and 1960s. There is no loss in the power of calculus. The
many textbook writers who took this approach then went on to explain how the
definite integral can be used to evaluate limits of sums of products. Pedagogically,
this approach has merit. It starts with the more intuitively accessible definition.
Mathematically, this definition of integration is totally inadequate.

Cauchy and Riemann Integrals


Fourier and Cauchy were among the first to fully realize the inadequacy of defining
integration as the inverse process of differentiation. It is too restrictive. Fourier
wanted to apply his methods to arbitrary functions. Not all functions have an-
tiderivatives that can be expressed in terms of standard functions. Fourier tried
defining the definite integral of a nonnegative function as the area between the
graph of the function and the x-axis, but that begs the question of what we mean
by area. Cauchy embraced Leibniz's understanding as a limit of products, and he
found a way to avoid infinitesimals.
To define fa" f(x) dx, Cauchy worked with finite approximating sums. Given a
partition of [a, hi: (a = x0 <x1 < = b), we consider

f xk_1).

If we can force all of these approximating sums to be as close to each as other as


we wish simply by limiting the size of the difference between consecutive values
in the partition, then these summations have a limiting value that is designated as
introduction

the value of the definite integral, and the function f is said to be integrable over
[a,b].
Equipped with this definition, Cauchy succeeded in proving that any continuous
or piecewise continuous function is integrable. The class of functions to which
Fourier's analysis could be applied was suddenly greatly expanded.
When Riemann turned to the study of trigonometric series, he wanted to know
the limits of Cauchy's approach to integration. Was there an easy test that could
be used to determine whether or not a function could be integrated? Cauchy had
chosen to evaluate the function at the left-hand endpoint of the interval simply for
convenience. As Riemann thought about how far this definition could be pushed,
he realized that his analysis would be simpler if the definition were stated in a
slightly more complicated but essentially equivalent manner. Given a partition of
[a, b]: (a = x0 <x1 < <x,, = b), we assign a tag to each interval, a number
contained in that interval, and consider all sums of the form

— Xk1).

A partition together with such a collection of tags, e [x11, xj], is called a


tagged partition. If we can force all of these approximating sums to be as close
to each other as we wish simply by limiting the size of the difference between
consecutive values in the partition, then these summations have a limiting value.
We call this limiting value the definite integral, and the function f is said to be
integrable over [a, b]. In the next chapter, we shall see why this seemingly more
complicated definition of the integral simplifies the process of determining when a
function is integrable.
Riemann succeeded in clarifying what is meant by integration. In the process, he
was able to clearly identify and delimit the set of functions that are integrable and
to make it possible for others to realize that this limit definition introduces serious
difficulties, difficulties that eventually would lead to the rejection of Riemann's
definition in favor of a radically different approach to integration proposed by Henri
Lebesgue. In particular, Riemann's definition greatly complicates the relationship
between integration and differentiation.

The Fundamental Theorem of Calculus


The fundamental theorem of calculus is, in essence, simply a statement of the
equivalence of the two means of understanding integration, as the inverse process
of differentiation and as a limit of sums of products. The precise theorems to
which this designation refers today arise from the assumption that integration is
1.1 The Five Big Questions 9

defined as a limiting process. They then clarify the precise relationship between
integration and differentiation. The actual statements that we shall use are given by
the following theorems.

Theorem 1.1 (FTC, evaluation). If f is the derivative of F at every point on


[a, b], then under suitable hypotheses we have that

Ja
f(t)dt = F(b) - F(a). (1.7)

Theorem 1.2 (FTC, antiderivative). 1ff is integrable on the interval [a, b], then
under suitable hypotheses we have that
d f'X
/ f(t)dt = f(x). (1.8)
dx Ja

The first of these theorems tells us how we can use any antiderivative to obtain a
simple evaluation of a definite integral. The second shows that the definite integral
can be used to create an antiderivative, the definite integral of f from a to x is a
function of x whose derivative is f. Both of these statements would be meaningless
if we had defined the integral as the antiderivative. Their meaning and importance
comes from the assumption that fa" f(t) dt is defined as a limit of summations.
In both cases, I have not specified the hypotheses under which these theorems
hold. There are two reasons for this. One is that much of the interesting story that
is to be told about the creation of analysis in the late nineteenth century revolves
around finding necessary and sufficient conditions under which the conclusions
hold. When working with Riemann's definition of the integal, the answer is com-
plicated. The second reason is that the hypotheses that are needed depend on the
way we choose to define the integral. For Lebesgue's definition, the hypotheses are
quite different.

A Brief History of Theorems 1.1 and 1.22


The earliest reference to Theorem 1.1 of which I am aware is Siméon Denis
Poisson's 1820 Suite du Mémoire sur les Intégrales Définies. There he refers to it
as "the fundamental proposition of the theory of definite integrals." Poisson's work
is worth some digression because it illustrates the importance of how we define the
definite integral and the difficulties encountered when it is defined as the difference
of the values of an antiderivative at the endpoints.

2
With thanks to Larry D'Antonio and Ivor Grattan-Guinness for uncovering many of these references.
10 introduction

Siméon Denis Poisson (1781—1840) studied and then taught at the École Poly-
technique. He succeeded to Fourier's professorship in mathematics when Fourier
departed for Grenoble to become prefect of the department of Isère. It was Poisson
who wrote up the rejection of Fourier's Theory of the Propoagation of Heat in
SolidBodies in 1808. When, in 1815, Poisson published his own article on the flow
of heat, Fourier pointed out its many flaws and the extent to which Poisson had
rediscovered Fourier's own work.
Poisson, as a colleague of Cauchy at the École Polytechnique, almost certainly
was aware of Cauchy's definition of the definite integral even though Cauchy had
not yet published it. But the relationship between Poisson and Cauchy was far from
amicable, and it would have been surprising had Poisson chosen to embrace his
colleague's approach. Poisson defines the definite integral as the difference of the
values of the antiderivative. It would seem there is nothing to prove. What Poisson
does prove is that if F has a Taylor series expansion and F' = f, then

where = b — a
F(b) — F(a) = n—*oo
lim t f (a + (j — 1)t),
n
j=1
Poisson begins with the observation that for 1 <j <n and t = (b — a)/n, there
is a k > 1 and a collection of functions R1 such that

F(a + jt) = F(a + (j — 1)t) + tf(a + (I — 1)t) + tl+kR1(t),


and therefore

— 1)t)]

tf(a + (j — 1)t) + t1+k R1(t).


=
now asserts that the functions R1(t) stay bounded. In fact, we know
Poisson
by the Lagrange remainder theorem that we can take k = 1 and these functions
are bounded by the supremum of I f'(x)I/2 over all x in [a, b]. It follows that
R1(t) approaches 0 as n approaches infinity.
The confusion over the meaning of the definite integral is revealed in Poisson's
attempt to complete the proof by connecting this limit back to the definite integral.
He appeals to the Leibniz conception of the integral as a sum of products:
Using the language of infinitesimals, we shall say what we needed to show, that F(b) — F(a)
is the sum of the values of f(x) dx as x increases by infinitesimal amounts from x = a to
x = b, dx being the difference between two consecutive values of this variable.3

Poisson (1820, pp. 323—324).


1.1 The Five Big Questions 11

The statement and proof of Theorem 1.2 can be found in Cauchy's Résumé des
Leçons Données a L'Ecole Polytechnique of 1823, the same place where he first
defines the definite integral. It is not stated as a fundamental theorem. In fact, it is
not identified as a theorem or proposition, simply a result mentioned in the text en
route to the real problem which is to define the indefinite integral, the general class
of functions that have f as their derivative.
The term "Fundamental Principles of the Integral Calculus" appears in Lardner's
An Elementary Treatise on the Differential and Integral Calculus of 1825, and these
include the statement of the evaluation part of the fundamental theorem of calculus.
But this statement is one of nine principles that include the fact that the integral is
a linear operator as well as many rules for integrating specific functions.
The term "fundamental theorem for integrals" was used to refer to the evaluation
part of the fundamental theorem of calculus in Charles de Freycinet's De L'Analyse
Infinitesimal. Étude sur la Métaphysique du haut Calcul of 1860. de Freycinet
(1828—1923) was trained as a mining engineer, was elected to the French senate in
1876, and served four times as prime minister of France. It would be interesting
to know if there have been any other heads of state that have written calculus
textbooks.
The full modern statement of both parts of the fundamental theorem of calculus
with the definite integral defined as a limit in Cauchy's sense, referred to as the
"fundamental theorem of integral calculus," can be found in an appendix to an
article on trigonometric series published by Paul du Bois-Reymond in 1876. In
1880, he published an extended discussion and proof of this theorem in the widely
read journal Mathematische Annalen.
The fundamental theorem of integral calculus was popularized in English in the
early twentieth century by the publication of Hobson's The Theory of Functions
of a Real Variable and the Theory of Fourier's Series of 1907. This is a thorough
treatment of analysis that was very influential. Hobson gives statements of the
fundamental theorem for both the Riemann and Lebesgue integrals. Some evidence
that this may be the source of this phrase in English is given by the classic English-
language calculus textbook of the first half of the twentieth century, Granville's
Elements of the Differential and Integral Calculus. Granville does not mention
a "fundamental theorem" in his first edition of 1904, but in the second edition
of 1911, we do find it. Since Granville defines integration to be the reversal of
differentation, his fundamental theorem is that the definite integral is equal to the
limit of the approximating summations.
It seems that G. H. Hardy may be responsible for dropping the adjective
"integral." In the first edition (1908) of G. H. Hardy's A Course of Pure Mathemat-
ics, there is no mention of the phrase "fundamental theorem of calculus." It does
appear, without the adjective "integral," in the second edition, published in 1914.
12 introduction

Although the term "fundamental theorem of calculus" gained popularity as the


twentieth century progressed, it took a while before there was an agreed meaning.
Richard Courant's Differential and Integral Calculus of 1934 has a section entitled
"The Fundamental Theorems of the Differential and Integral Calculus" in which
he states Theorems 1.1 and 1.2 as well as several other related results:
• Different indefinite integrals of the same function differ only by an additive
constant.
• The integral of a continuous function f is itself a continuous function of the
upper limit.
• The difference of two primitives (antiderivatives) of the same function is always
a constant.
• Every primitive F of a given function f can be represented in the form F(x) =
c + f(u) du, where c and a are constants.
In the case of Theorem 1.2, Courant's hypothesis is that f is continuous. Theo-
rem 1.1 is stated as being true of any function f with antiderivative F. In fact, this
is not quite true. As we shall see, we either need to put some restrictions on f in
Theorem 1.1 or abandon the Riemann integral for one that is better-behaved.

Continuity and Differentiability


The fourth big question asks for the relationship between continuity and differen-
tiability. We know that a function that is differentiable at a given value of x must
also be continuous at that value, and it is clear that the converse does not hold.
The function f(x) = lxi is continuous but not differentiable at x = 0. But how
nondifferentiable can a continuous function be?
Throughout the first half of the nineteenth century, it was generally believed
that a continuous function would be differentiable at most points.4 Mathematicians
recognized that a function might have finitely many values at which it failed to
have a derivative. There might even be a sparse infinite set of points at which a
continuous function was not differentiable, but the mathematical community was
honestly surprised when, in 1875, Gaston Darboux and Paul du-Bois Reymond5
published examples of continuous functions that are not differentiable at any value.
The question then shifted to what additional assumptions beyond continuity
would ensure differentiability. Monotonicity was a natural candidate. Weierstrass
constructed a strictly increasing continuous function that is not differentiable at
any algebraic number, that is to say, at any number that is the root of a polynomial

Although Bernhard Boizano had shown how to construct a function that is everywhere continuous and nowhere
differentiable, his example only existed in a privately circulated manuscript and was not published until 1930.
du Bois-Reymond's example was found by Weierstrass's who had described it in his lectures but never
published it.
1.1 The Five Big Questions 13

with rational coefficients. It is not differentiable at 1/2 or or — 2

Weierstrass's function is differentiable at Can we find a continuous, increasing


function that is not differentiable at any value? The surprising answer is No. In
fact, in a sense that later will be made precise, a continuous, monotonic function
is differentiable at "most" values of x. There are very important subtleties lurking
behind this fourth question.

Term-by-term Integration
Returning to Fourier series, we saw that the heuristic justification relied on inter-
changing summation and integration, integrating an infinite series of functions by
integrating each summand. This works for finite summations. It is not hard to find
infinite series for which term-by-term integration leads to a divergent series or,
even worse, a series that converges to the wrong value.
Weierstrass had shown that if the series converges uniformly, then term-by-term
integration is valid. The problem with this result is that the most interesting series,
especially Fourier series, often do not converge uniformly and yet term-by-term
integration is valid. Uniform convergence is sufficient, but it is very far from
necessary. As we shall see, finding useful conditions under which term-by-term
integration is valid is very difficult so long as we cling to the Riemann integral.
As Lebesgue would show in the opening years of the twentieth century, his
definition of the integral yields a simple, elegant solution, the Lebesgue dominated
convergence theorem.

Exercises
1.1.1. Find the Fourier expansions for fi(x) = x and f2(x) = x2 over [—7r, 7r].

1.1.2. For the functions fi and f2 defined in Exercise 1.1.1, differentiate each
summand in the Fourier series for f2. Do you get the summands in the Fourier
series for 2fi? Differentiate each summand in the Fourier series for fi. Do you get
the summand in the Fourier series for
1.1.3. Using the Fourier series expansion for x2 (Exercise 1.1.1) evaluated at
x= 7t, show that
2

n=1
= 6

1.1.4. Show that if k is an integer? 1, then

cos(kx) dx sin(kx) dx =
J —Jr =J
14 Introduction

Show that if n and k are positive integers, then

sin(kx) cos(nx) dx = 0.
J—Jr

Show that if n and k are distinct positive integers, then

cos(kx) cos(nx) dx sin(kx) sin(nx) dx = 0.


J —Jr =J —JT

1.1.5. Using the definition of continuity, justify the assertion that the characteristic
function of the rationals, Example 1.1, is not continuous at any real number.
1.1.6. Let C be the circumscribed area, I the inscribed area in Newton's illustration,
using intervals of length L\x. Newton claims that he has demonstrated that
C
lim —=1
I
but what he actually proves is that

lim (C — I) = 0.
Ax—÷O

Show that since I is monotonically increasing as L\x approaches 0, and C is


monotonically decreasing, these two statements are equivalent.
1.1.7. The population of a certain city can be modeled using a population density
function, p(x), measured in people per square mile, where x is the distance from the
center of the city. The density function is valid in all directions for 0 x 5 miles.
Set up a sum of products that approximates the total population and then convert
this sum of products into an integral.
1.1.8. Mass distributed along one side of a balance beam is modeled by a function
m(x), where x is the distance from the fulcrum, 0 x 6 meters. Set up a sum
of products that approximates the total moment resulting from this mass and then
convert this sum of products into an integral.
1.1.9. Show that if a function is not bounded on [a, b], then the Riemannn integral
on [a, b] cannot exist.
1.1.10. Consider the function f(x) = —1 <x <0, f(0) = 0. Since this
function is not bounded on [—1, 0], the Riemann integral does not exist (see
Exercise 1.1.9). Show that, nevertheless, the Cauchy integral of this function over
this interval does exist.
1.1.11. Explain why it is that if a function is Riemann integrable over [a, b], then
it must be Cauchy integrable over that interval.
1.1.12. There are many functions for which there is no simple, closed expression
for an antiderivative. The function sin(t2) is one such example. Nevertheless, the
1.2 Presumptions 15

definite integral of this function can be evaluated to whatever precision is desired,


using the definition of the integral as a limit of sums of products. A certain object
travels along a straight line with velocity v(t) = sin(t2), starting at x = 3 at time
t = 0. Explain how to use the fundamental theorem of calculus (either form) and a
definite integral to find the position at time t = 2, accurate to six digits.
1.1.13. Work through Poisson's proof of Theorem 1.1 in the specific case F(x) =
ln(x), f(x) = F'(x) = 1/x, a = 1, b = 2. Specifically: What is the value of k?
Use the Lagrange remainder theorem to find a bound on that is valid for all
n and j. Show that t1 approaches 0 as n approaches infinity.
1.1.14. Explain how to use the Lagrange remainder theorem to justify Poisson's
assertion that if all derivatives of F exist at every point in [a, b], then

lim t1+k = 0.
n —+00
j=1

1.1.15. Define
J x, x is rational,
gx —
— 0, x is not rational.
For what values of x is g continuous? For what values of x is g differentiable?
1.1.16. Define
I x2, x is rational,
0, x is not rational.
For what values of x is h continuous? For what values of x is h differentiable?
1.1.17. Prove that if a function is not continuous at x = a then it cannot be differ-
entiable at x = a.
1.1.18. Show that
(nxe_nx2) dx.
f' (lim
0 )
dx lim 0f'

1.2 Presumptions
In this book, we presume that the reader is familiar with certain notations, defini-
tions, and theorems. The most important of these are summarized here.

Notation
{x e [a, b] I f(x) > 0), set notation, to the left of is the description of the general
set in which this particular set sits, to the right is the condition or conditions
16 Introduction

satisfied by elements of this set. Braces are also used to list the elements of the
set; thus, {1, 2, ..., 10) is the set of positive integers from 1 to 10.
sequence notation in which the order is important; this sequence could
also be written as (1, 1/2, 1/3, .. .). When it is clear that we are working with a
sequence, this may be written without specifying the limits on n: (a1, a2, ...) =
(ar).

N, the set of positive integers, {1, 2, 3, . . .}.

Q, the set of rational numbers.

IR, the set of real numbers.

C, the set of complex numbers.

f, the sequence of functions converges (pointwise) to f.


S fl T, the intersection of sets S and T; the set of elements in both S and T.
S U T, the union of sets S and T; the set of elements in either S or T.
5C the complement of 5; for the purposes of this book, the complement is always
taken in IR; 5c is the set of real numbers that are not elements of S.
S — T, the set of elements of S that are not in T; S — T = Sn Tc.
0, the empty set; the set that has no elements.
f(S), the image of S; f(S) = {f(x) x e 5).
[a] denotes the floor of a, the greatest integer less than or equal to a; similarly,
[al denotes the ceiling of a, the least integer greater than or equal to a.

Definitions
continuity: The function f is continuous at c if for every E > 0 there is a response
> 0 such that Ix —Cl implies that I f(x) — f(c)l <E.

uniform continuity: The function f is uniformly continuous over the set S if for
every E > 0 there is a response > 0 such that for every c e 5, lx — Cl <
implies that lf(x) — f(c)l <E.

intermediate value property: A function f has the intermediate value property


on the interval [a, b] if given any two points x1, x2 e [a, b] and any number N
satisfying f(xi) < N < f(x2), there is at least one value c between x1 and x2
such that f(c) = N.
1.2 Presumptions 17

monotonic sequence: a sequence that is either increasing (each element is greater


than or equal to the previous element) or decreasing (each element is less than
or equal to the previous element).

monotonic function: either an increasing function (x <y f(x) f(y)) or a


decreasing function (x <y f(x) f(y)). A function is piecewise mono-
tonic on [a, b] if we can partition this interval into finitely many subintervals so
that the function is monotonic on each subinterval.

convergence: The sequence converges to A if for every E > 0 there is


a response N such that n N implies that — Al < E. The series

converges to S if the sequence of partial sums, (Sn), = Ck, converges


to S.

pointwise convergence: The sequence of functions (fn) converges pointwise to F


if at each value of x, the sequence converges to F(x). Note that the
response may depend on both E and x.

uniform convergence: The sequence of functions converges uniformly to


F over the set S if for every E > 0 there is a response N such that for every x e S
and every n N, we have that — < E. Note that the response
may depend on E but not on x.

max 5: the greatest element in 5; mm 5: the least element in S.

least upper bound or sup 5: the least value that is greater than or equal to every
element of 5; greatest lower bound or inf 5: the greatest value that is less than
or equal to every element of S. We also write

sup f(x) = sup{f(x) x e 5),


I
inf f(x) = inf{f(x) I
x e 5).
xES xES

lim sup (lim), lim inf (lim): For a sequence (an

= inf sup ak , lim an = sup (inf ak


n>1 \k>n I n>1 \k>n

For a function f,

lim f(x) = sup (inf {f(x) 0 < Ix - Cl <E)).


e>O
Introduction

Cauchy sequence: The sequence (ar) is Cauchy if for each E > 0 there is a response
N such that for every m, n ? N we have that lam — <E.

nested interval principle: Given any nested sequence of closed intervals in IR,

[a1, b1 I D [a2, b2] D [a3, b3] D

there is at least one real number contained in all of these intervals,

0.

vector space: A vector space is a set that is closed under addition, closed under
multiplication by scalars from a field such as IR, and that satisfies the following
conditions where X, Y, Z, 0 denote vectors and a, b, 1 denote scalars:

1. commutativity: X + Y = Y + X,
2. associativity of vectors: (X + Y) + Z = X + (Y + Z),
3. additive identity: 0+ X = X + 0 = X,
4. additive inverse: X + (—X) = 0,
5. associativity of scalars: a(bX) = (ab)X,
6. distributivity of scalars: (a + b)X = aX + bX,
7. distributivity of vectors: a(X + Y) = aX + aY,
8. scalar identity: 1X = X.

Theorems
The designation ARATRA 3.1 means that this is theorem (or proposition, lemma, or
corollary) 3.1 in A Radical Approach to Real Analysis.

Theorem 1.3 (DeMorgan's Laws). Let {Sk} be any finite or infinite collection of
sets, then

(ySk)C=nSkc, (flSk)C=ySkc.

Theorem 1.4 (Distributivity). Let S, T, U be any sets, then

S fl (T U U) = (S fl T) U (S fl U), S U (T fl U) = (S U T) fl (S U U).

Theorem 1.5 (Mean Value Theorem, ARATRA 3.1). Given a function f that is
differentiable at all points strictly between a and x and continuous at all points on
1.2 Presumptions 19

the closed interval from a to x, there exists a real number c strictly between a and
x such that
f(a)
= f'(c). (1.9)

Theorem 1.6 (Intermediate Value Theorem, ARATRA 3.3). 1ff is continuous on


the interval [a, b], then f has the intermediate value property on this interval.

Theorem 1.7 (Darboux's Theorem, ARATRA 3.14). 1ff is differentiable on [a, b],
then f' has the intermediate value property on [a, b].

Theorem 1.8 (The Cauchy Criterion, ARATRA 4.2). A sequence of real numbers
converges if and only if it is a Cauchy sequence.

Theorem 1.9 (Absolute Convergence Theorem, ARATRA 4.4). If ai I + 1a21 +


1a3 I
converges then so does a1 + a2 + a3

Theorem 1.10 (Continuity of Infinite Series, ARATRA 5.6). If fi + f2 + +


converges uniformly to F over the interval (a, ,8) and if each of the summands is
continuous at every point in (a, ,8), then the function F is continuous at every point

Theorem 1.11 (Term-by-term Differentiation, ARATRA 5.7). Let fi + f2 + +


be a series of functions that converges at x = a and for which the series of
deriviatives, f1' + + + , converges uniformly over an open interval I that
contains a. It follows that
1. F = fi + f2 + +••• converges uniformly over the interval I,
2. F is differentiable at x = a, and
3. forallx e I, F'(x) =

Theorem 1.12 (Term-by-term Integration, ARATRA 5.8). Let + f2 + +


be uniformly convergent over the interval [a, b], converging to F. If each fk is
integrable over [a, b], then so is F and
fb = OQfb
F(x)dx fk(x)dx.

Theorem 1.13 (Continuity on [a, b] Uniform Continuity, ARATRA 6.3). If


f is continuous over the closed and bounded interval [a, b], then it is uniformly
continuous over this interval.
20 Introduction

Theorem 1.14 (Continuous Integrable, ARATRA 6.6). If f is a con-


tinuous function on the closed, bounded interval [a, b], then f is integrable
over [a, b].

Theorem 1.15 (Continuity of Integral, ARATRA 6.8). Let f be a bounded inte-


grable function on [a, b] and define Ffor x in [a, b] by
cx
F(x)=J f(t)dt.
a

Then F is continuous at every point between a and b.

Exercises
1.2.1. Give an example of a function and an interval for which the function is
continuous but not uniformly continuous on the interval.
1.2.2. Give an example of a sequence that converges but is not monotonic.
1.2.3. Prove or find a counterexample to the statement: Every infinite sequence
contains an infinite monotonic subsequence.
1.2.4. Give an example of a sequence of functions and an interval for which the
sequence converges pointwise but not uniformly on the interval.
1.2.5. Prove that 2 by showing how to find a response N for each
E >0.
1.2.6. The lim sup, can also be defined as the value A, such that given
any E > 0, there is a response N such that n N implies that < A + E, and for
every M e N, there is an m > M such that A — E <am. Show that this definition
is equivalent to the definition

A = inf (supak
n>1 \k>n

1.2.7. Prove that if is bounded, then exists.

1.2.8. Prove that if A = exists, then we can find a subsequence that


converges to A.
1.2.9. Show that the set of all real-valued continuous functions defined on [0, 1] is
a vector space.
1.2.10. Use the nested interval principle to prove that every Cauchy sequence
converges.
1.2.11. Show that the nested interval principle does not necessarily hold if we
replace closed intervals with open intervals.
1.2 Presumptions 21

1.2.12. Justify DeMorgan's laws (Theorem 1.3). Show that

(ysk)C x g
(ysk)C

1.2.13. Justify the distributivity theorem (Theorem 1.4).

1.2.14. Prove that given any two sets F1 and F2, if

S1 = F1 fl and S2 = fl

then

F1 = (F2 U fl

1.2.15. Give an example of a function f and an interval [a, b] such that f is


continuous on [a, b], differentiable at all but one point of (a, b), and for which
there is no c e (a, b) for which
f(b) — f(a)
= f'(c).
b—a
1.2.16. Give an example of a function f and an interval [a, b] such that f
has the intermediate value property on [a, b] but it is not continuous on this
interval.

1.2.17. Use the mean value theorem, Theorem 1.5, to prove the following weaker
form of Darboux's theorem: If f'is the derivative of f on an open interval con-
taining c and if f'(x) and f'(x) exist, then these one-sided limits
must be equal.

1.2.18. Give an example of a series that converges but does not converge abso-
lutely.

1.2.19. Give an example of a series of continuous functions and an interval such


that the series does not converge uniformly over the interval, but it does converge
pointwise to a continuous function on this interval.

1.2.20. Give an example of a series of continuous functions and an interval such


that the series does not converge to a continuous function on this interval.

1.2.21. Give an example of a series, f' + f2 + •., of differentiable functions


and an interval such that the series converges uniformly to f over the interval,
but

for all x in this interval.


22 Introduction

1.2.22. Give an example of a series, fi + f2 + •, of integrable functions and an


interval [a, b] such that the series does not converge uniformly to f over [a, b] but
b °° °° b

f
1.2.23. Prove that if f is integrable over [a, b] then there exists c e [a, b] for which
1

Ja a
2
The Riemann Integral

Bernard Riemann received his doctorate in 1851, his Habilitation in 1854. The
habilitation confers recognition of the ability to create a substantial contribution
to research beyond the doctoral thesis, and it is a necessary prerequisite for ap-
pointment as a professor in a German university. Riemann chose as his habilitation
thesis the problem of Fourier series. It was titled Uber die Darstelibarkeit einer
Function durch eine trigonometrische Reihe (On the representability of a function
by a trigonometric series), and, strictly speaking, it answered the broader ques-
tion: When can a function over (—7r, 7r) be represented as a series of the form
ao/2 + cos(nx) + sin(nx))? This is where we find the Riemann in-
tegral, introduced in a short section before the main body of the thesis, part of
the groundwork that he needed to lay before he could tackle the real problem of
representability by a trigonometric series.
Riemann had studied with Dirichlet in Berlin before going to Gottingen to com-
plete his doctorate under the direction of Gauss. In the fall of 1852, Dirichlet visited
Gottingen. Shortly afterward, Riemann wrote to his friend Richard Dedekind,

The other morning Dirichlet stayed with me for about two hours; he gave me the notes necessary
for my Habilitation so completely that my work has become much easier; otherwise, for some
things I would have searched for a long time in the library.'

Riemann was almost certainly referring to the extensive introduction to his thesis
in which he describes the progress that had been made in understanding Fourier
series until that time. But it is also clear that Dirichlet had continued to think about
this problem, and he may have had some useful advice.
Riemann's thesis on trigonometric series was not published until 1868, two years
after his death at the age of 39. Dedekind was responsible for this publication.

Dedekind (1876, p. 578), as quoted in Hochkirchen (2003, p. 261).

23
24 The Riemann Integral

Richard Dedekind (183 1—1916) and Bernhard Riemann both studied with Gauss at
Gottingen and then worked with Dirichiet who succeeded to Gauss's chair. They
developed a strong friendship. In 1862, Dedekind took a position at the Brunswick
Polytechnikum where he would remain for the rest of his career. Today he is
best known for his work in number theory and modern algebra, especially for
establishing the theory of the ring of integers of an algebraic number field.
In 1870, three significant papers appeared that built on Riemann's accomplish-
ments: Hermann Hankel's Untersuchungen über die unendlich oft oscillirendend
und unstetigen Funktionen (Investigations on infinitely often oscillating and dis-
continuous functions), Eduard Heine's Uber trigonometrische Reihen (On trigono-
metric series), and Georg Cantor's Uber einen die trigonometrischen Reihen be-
treffenden Lehrsatz (On a theorem concerning trigonometric series). These papers
accomplished two important tasks. The first was to clarify the concept of uniform
convergence and the related issue of when term-by-term integration is legitimate.
The second was to turn the question of integrability of a function to the study of
the set of points at which the function is discontinuous, thus opening the way to
the development of set theory and a deeper understanding of the structure of the
real numbers.
A fourth seminal paper directly inspired by Riemann's thesis was Gaston
Darboux's Mémoire sur lesfonctions discontinues (Memoir on discontinuous func-
tions) of 1875. In 1873, Darboux had published a translation of Riemann's thesis
into French. It is clear that he studied it very carefully. His 1875 paper greatly sim-
plified the treatment of the Riemann integral. In discussing the Riemann integral,
we shall rely on Darboux's definitions and insights.
Gaston Darboux (1842—19 17) studied at the École Normale Supérieur and taught
there from 1872 to 1878. He then went to the Sorbonne where, in 1880, he succeeded
Michel Chasles as chair of higher geometry. Darboux is best known for his work
in differential geometry, but among his many contributions to mathematics, he also
edited Fourier's Collected Works.

2.1 Existence
Riemann devotes three brief pages to the definition of the definite integral, the
definition of an improper integral, and the statement and proof of the necessary
and sufficient condition for integrability. He then spends one page describing a
function that is discontinuous at every rational number with an even denominator
but which is integrable, thus showing that while continuity is a sufficient condition
for integrability, it is far from necessary. As Darboux demonstrated, there is a lot
to mine from these four pages.
2.1 Existence 25

Definition: Integration (Riemann)


A function f is Riemann integrable over the interval [a, b] and its integral has the
value V if for every error bound E > 0, there is a response > 0 such that for any
partition (xo = a, x1 = b) with subintervals of length less than (that is to
say, 1x1 — x1 —i for all j) and for any set of tags x E [xo, x1 I, E [x1, x21,
E xv], the corresponding Riemann sum lies within E of the value V:

- xJi) - V <€.

Given a function f defined on [a, b], we can find a Riemann sum approximation
to the definite integral f(x) dx by choosing a partition of the interval

a set of tags e [xO, x1], x e x,j. The Riemann


sum is then given by

- 1).

Using the Cauchy criterion for convergence, the value V will exist if given any
0, there is a response > 0 so that any two Riemann sums with intervals of
length less than will differ by less than €. The value of the integral is denoted
by
pb
V=J f(x)dx.
a

The greatest difficulty with this definition is handling the variability in the tags
off since can be any value in the interval [x1_1, xi]. Darboux saw that the way
to do this is to work with the least upper bound2 (or supremum) and the greatest
lower bound (or infimum) of the set {f(x) <x <x1}.
Every Riemann sum for this partition lies between the upper and lower Darboux
sums (see top of next page). While it may not be possible to find a Riemann sum
that actually equals the upper or the lower Darboux sum, we can find Riemann
sums for this partition that come arbitrarily close to the Darboux sums.
The function f is Riemann integrable if and only if we can force all Riemann
sums to be within of our specified value V = f f(x) dx simply by restricting our

2
Actually, Darboux at this time did not make a clear distinction between the supremum and maximum of a set.
26 The Riemann Integral

Definition: Darboux sums


Given a function f defined on [a, bi and a partition P = (a = xO <x1 <
= b) of this interval, we define
= sup {f(x) xj_1 <x
<x
The upper Darboux sum, is

S(P; f) M1(x1 — x1_1), (2.1)


=
and the lower Darboux sum, is

S(P; f) = - Xj1). (2.2)

partitions to those with interval length less than an appropriately chosen response
This will happen if and only if the upper and lower Darboux sums for these
partitions are within of the specified value V. It follows that f is Riemann
integrable if and only if we can make the difference between the upper and lower
Darboux sums as small as we wish by controlling the length of the intervals in the
partition,

x1 — for all j — — <E.

In order to guarantee that this sum is less than E, we need some control on the
size of — m1, what is called the oscillation of the function over the interval

xII. If we can force the oscillation to be as small as we wish by taking


sufficiently short intervals, then we have integrability. We choose so that —

rn, < €/(b — a). It follows that

This implies that every continuous function is integrable (see Exercise 2.1.9).
What about a discontinuous function? If f is discontinuous, then there will be
intervals that include the points of discontinuity where the oscillation cannot be
made as small as we wish. If our function is integrable and our partition includes
intervals where the oscillation is greater than or equal to a, then the sum of the
2.1 Existence 27

lengths of these intervals must be less than €/cr. If denotes the sum over the
intervals on which the oscillation is at least a, then

x1_1) — x1_1) < —.

If we choose a smaller bound for the difference between the upper and lower
Darboux sums, then we get an even smaller bound on the sum of the lengths of the
intervals on which the oscillation was at least a. Since we can force the difference
between the upper and lower Darboux sums to be as small as we wish, we can
also force the sum of the lengths of the intervals on which the oscillation exceeds
a to be as small as we wish, just by controlling the lengths of the intervals in the
partition.
Riemann realized that this also works the other way. If for every a > 0, we can
force the sum of the lengths of the intervals on which the oscillation exceeds a to
be as small as we wish by restricting the lengths of the intervals in the partition,
then we can force the upper and lower Darboux sums to be within any specified
of each other. We define D to be the difference between the least upper bound
and the greatest lower bound of {f(x) a x b}, so that — rn, D for all
j. We let a = €/2(b — a) and choose a limit on the partition intervals so that those
on which the oscillation exceeds a have total length less than €/2D. We split the
difference in Darboux sums into over those intervals where the oscillation is
at least a and over the intervals where the oscillation is strictly less than a:

— m1)(x1 — x1_1) = — — x1_1)

— — xJ_i)

— + — x1_1)

(2.3)

We have proven Riemann's criterion for integrability.

Theorem 2.1 (Conditions for Riemann Integrability). Let f be a bounded func-


tion on [a, bi. This function is integrable over [a, bi if and only if for any a > 0,
a bound on the oscillation, and for any v > 0, a bound on the sum of the lengths
of the intervals where the oscillation exceeds a, we can find a response so that
for any partition of [a, bI with subintervals of length less than the subintervals
on which the oscillation is at least a have a combined length that is strictly less
than v.
28 The Riemann Integral

The Darboux Integrals


In 1881, Vito Volterra showed how to use Darboux sums to create upper and lower
integrals that exist for every function. Looking at the upper Darboux sums, we
see that as the partition gets finer (including more points), the value of the upper
sum gets smaller, decreasing as it approaches the value of the Riemann integral.
This suggests taking the greatest lower bound of all the upper Darboux sums. If
the Riemann integral exists, it will equal this greatest lower bound. Similarly, if
the Riemann integral exists, then it will equal the least upper bound of the lower
Darboux sums. Although first described by Volterra, these integrals usually carry
Darboux's name because they are defined in terms of his sums.
It is not too hard to see that if f is Riemann integrable, then the upper and lower
Darboux integrals must be equal. It will take some work to show the implication
in the other direction, that if the upper and lower Darboux integrals are equal, then
the function is Riemann integrable. But this work will be worth it, for it produces
a very useful test for integrability.

Theorem 2.2 (Darboux Integrability Condition). Let f be a bounded function


on [a, bI. This function is Riemann integrable over this interval if and only if the
upper and lower Darboux integrals are equal.

Proof We take the easy direction first. We leave it as Exercise 2.1.11 to prove that
çb
f(x)dx<J f(x)dx.
J
—a a

It follows that for any partition P, we have


-7b
S(P; f) f(x) dx f(x) dx <S(P; f). (2.4)
—a a

Definition: Upper and lower Darboux integrals


Let P denote the set of all partitions of [a, bi. The upper Darboux integral of f
over [a, bi is defined by

/ f(x)dx= infS(P;f).
Ja
Similarly, the lower Darboux integral is defined by

I f(x) dx = sup S( P; f).


La
2.1 Existence 29

Figure 2.1. Solid vertical bars mark the points of partition P. Dotted vertical bars mark the points of partition
P3. The partition Q consists of all vertical bars, solid or dotted.

1ff is Riemann integrable, then we can find a partition for which S(P; f) — S(P; f)
is less than any specified' positive value. It follows that the absolute value of the
difference between the upper and lower Darboux integrals is also less than any
specified positive value, which can only be true if the difference is 0.
In the other direction, if the Darboux integrals are equal, then this common value
is our candidate for V, the value of the Riemann integral. Given any E > 0, we can
find an upper Darboux sum S(P1; f) and a lower Darboux sum S(P2; f) that are
each less than €/2 away from V. If we let P3 denote the common refinement of P1
and P2, P3 = P1 U P2, then

S(P2; f) <S(P3; f) <S(P3; f) <S(P1; f),


and therefore every Riemann sum for the partition P3 is within /2 of V. The same
is true for any refinement of P3. We still need to show that every Riemann sum
with sufficiently short intervals differs by at most E from V, even if it shares no
points with P3.
Let D denote the oscillation — the difference between the least upper bound and
the greatest lower bound — of f over the entire interval [a, b]. Let m denote the
number of intervals in P3. We take any partition P with intervals of length less than
€/2m D (see Figure 2.1). Let Q = P U P3; Q has at most m — 1 more points than
P. In Figure 2.1, Q has five more points than P. The difference between the upper
Darboux sum for P and the upper Darboux sum for Q is the sum of the areas of
the shaded rectangles. There are at most m — 1 such rectangles, their lengths are
each bounded by the lengths of the intervals in P, which is less than €/2mD, and
their heights are bounded by the oscillation of f, which is D. The upper Darboux
sums differ by at most

2mD 2
Since Q is a refinement of P3, we get an upper bound on the upper Darboux sum
for P,

S(P;f)<S(Q;f)+€/2 <S(P3;f)+€/2 < V+€.


30 The Riemann Integral

By a similar argument,

S(P;f) >S(Q;f)-€/2
Every Riemann sum for P is within of V.

This theorem gives us a simple condition that is equivalent to Riemann in-


tegrability. If for each 0, we can find just one partition P for which
S(P; f) — S(P; f) <€, then the upper and lower Darboux integrals must be equal.
If they are equal, then we can find such a partition for each E.

Corollary 2.3 (One Partition Suffices). Let f be a bounded function on [a, b].
This function is Riemann integrable over this interval if and only iffor each 0
there is a partition P for which S(P; f) — S(P; f) <€.

Improper Integrals
One of the drawbacks of Riemann 's definition of the integral is that it only applies
to bounded functions on finite intervals, an issue that clearly was of concern to
Riemann, for immediately after giving his definition, he explains how to deal with
integrals of unbounded functions. Today we refer to these as improper integrals.
Strictly speaking, the Riemann integral does not exist in this case. However,
there may be a value that can be assigned to such an integral by taking a limit of
integrals that are Riemann integrable. For unbounded integrals such as
dx
i_i
we evaluate the integral on intervals for which the function is bounded and then
take the limit of these values as the endpoints approach the point at which we have
a vertical asymptote:
1' dx dx(El çl dx
I
f—i lxl'/2
=limIJ—i IxI"2 +limI J62 IxI"2

= lim —i + lim €2

/
+ 2)' + lim 1\
1/2 1/2
= lim I

=4.
As Riemann went to great pains to point out, the existence of an antiderivative is
no guarantee that the improper integral exists. When there is more than one limit,
2.1 Existence 31

Definition: Improper integral


An integral is improper if either the function that is being integrated or the interval
over which the function is integrated is unbounded.

they must be taken independently. For example, the antiderivative of 1/x is ln lxi
andlnflI—lnI — fl =O—O=O,but
f1 dx f61 dx ci dx
I —=lim
X
I —+lim
X
I X
J—1

= lim lnIxI + lim lnixl


—1 62

= lim ln I I
— lim
62 -± 0+
ln I I.

Since neither limit is finite, this function is not integrable over [—1, 1].

Exercises
2.1.1. Explain why if P and Q are partitions of the same interval and Q is a
refinement of P, Q P, and if f is any bounded function on this interval, then

S(P;f)<S(Q;f)<S(Q;f)<S(P;f).
2.1.2. Consider the function f defined by
1, x=O,
f(x)= x, O<x<1,
0, x=1.

Let Pbe the partition (0, 1/4, 1/2, 3/4, 1). Find the upper and lower Darboux
sums, S(P; f) and S(P; f).
2.1.3. Using the function f defined in Exercise 2.1.2 and given E = 1/2, find a
response so that for any partition P into intervals of length less than the
difference between S(P; f) and S(P; f) will be less than 1/2.
2.1.4. Using the function f defined in Exercise 2.1.2 over the interval 1/2 x 1,
explain why no Riemann sum can equal the upper Darboux sum no matter what
partition we choose.
2.1.5. Consider the function
1
O<x<1,
2
32 The Riemann Integral

where [a] denotes the greatest integer less than or equal to a. Show that this series
converges for all x e [0, 1], that it is monotonically increasing, and that g(O) = 0,
g(1) = 1. Find all points at which g is discontinuous and at these points find the
difference between the limit from the left and the limit from the right.
2.1.6. Using the function g defined in Exercise 2.1.5, show that it is Riemann
integrable over [0, 1].

2.1.7. Using the function g defined in exercise 2.1.5, find the value of g(x) dx.
Show the work that leads to your conclusion.

2.1.8. Prove that f is continuous at c if and only if given any E > 0 there is a
response for which the oscillation of f over (c — c + is less than €.
2.1.9. Using the fact that a continuous function on a closed and bounded interval
is uniformly continuous on that interval, prove that if f is continuous on [a, b],
then f is Riemann integrable over [a, b].
2.1.10. Find the upper and lower Darboux integrals of the characteristic function
of the rationals (Example 1.1 on page 3) over the interval [0, 1].
2.1.11. Prove that if f is bounded on [a, b], then

f(x)dx<J f(x)dx.
a

2.1.12. Define the function h by

x, xe[O,1]flQ,
h(x)
= 0, x e [0, 1] — Q.

Find the upper and lower Darboux integrals of h over [0, 11.

2.1.13. Define the function k by


xe[—3,3]flQ,
k(x)= Ix,
lo, x e[—3,3]—Q.
Find the upper and lower Darboux integrals of k over [—3, 3].
2.1.14. Define the function m by

1, x=0,
m(x) = 1/q, x = p/q e Q, gcd(p, q) = 1, q 1,
0, xgQ.
Show that m is integrable over [0, 1].
2.2 Nondifferentiable Integrals 33

2.1.15. Define the function n by


Ii, x=1/n,neN,

n is integrable over [0, 11 and that n(x) dx = 0.

2.1.16. Define the function p by


10, x=0,
1/x — [1/x], otherwise.
Show that p is integrable over [0, 1].
2.1.17. Find all positive values of a for which the improper integral
çl dx
f_i IxIa
has a value. Show the work that leads to your conclusion.
2.1.18. Show that for every a, 0 <a < 1, the improper integral

I
f'/IaI I1I\Idx
Il—I—al—I
Jo \LxJ LxJJ
exists and has value a ln a.

2.1.19. Prove that if a function is bounded and Cauchy integrable over [a, bi, then
it is also Riemann integrable over that interval.

2.2 Nondifferentiable Integrals


What clearly excited Darboux most about Riemann's thesis was his example of a
function that has discontinuities at all rational numbers with even denominators
and yet is still integrable. Riemann's example appears on the fourth page of his
explanation of integration.

Example 2.1. Riemann defined the function


x— [x], [x] <x < [x] + 1/2,
((x)) = 0, x = [x] + 1/2, (2.5)
x—[x]—1, [x]+1/2<x<[x]+1
(see Figure 2.2). He then defined

(2.6)
///
34 The Riemann Integral

//
0.6

0.4

0.2

/2 /1
/
Figure 2.2. Graph of y = ((x)).

y
0.6

0.4

0.2

x
0.2 0.4

—0 2

—0 . 4

—0 . 6

Figure 2.3. Graph of y =

Since I((nx))I < 1/2, this series converges for all x. It has a discontinuity whenever
nx is half of an odd integer, and that will happen for every x that is a rational
number with an even denominator (see Figure 2.3).

Specifically, if x = a/2b, where a is odd and a and b are relatively prime, and
if n is an odd multiple of b, then
+ = -1/2 and +
- = 1/2.
-
2.2 Nondifferentiable Integrals 35

We want to be able to assert that


00

= (2.7)

00

= (2.8)

The first line of these equalities assumes that we can interchange limits, that is

lim f(x + v) — f(x) = lim


/ 00
((nx+nv))—((nx))
fl
\n=1
00
((nx+nv))—((nx))
= lim . (2.9)
n=1

The justification of this interchange rests on the uniform convergence of our series
over the set of all x and is left as Exercise 2.2.1.
Our function f has a discontinuity at every rational number with an even denom-
inator, but it is integrable. Given any a > 0, there are only finitely many rational
numbers between 0 and 1 at which the variation is larger than a. If the variation is
larger than a at x = a/2b, then b must satisfy

8b2

which means that b is a positive integer less than


Given any E > 0, we want to find a bound on the interval length that guarantees
that the upper and lower Darboux sums differ by less than E. Choose a = E/2. If
there are N rational numbers in [0, 1] with denominators less than then
we choose our interval bound so that N3 is less than E/2.
36 The Riemann Integral

Darboux's Observation
Darboux observed that if f is integrable over an open interval containing a, if we
define a new function F by
x
F(x)= f(t)dt,
a

and if f(x) and f(x) exist, then


• F(a+h)—F(a)
urn = lim f(x),
h

h
This follows immediately from the mean value theorem of integral calculus:
pa+h
F(a+h)— F(a)= f(t)dt =h .f(c) (2.10)
Ja
for some c strictly between a and a + h, valid for any h 0.
Therefore, if f(x) f(x), then F cannot be differentiable at
a. On the other hand, F(a + h) — F(a) can be made arbitrarily small simply by
limiting the size of h, and therefore F is continuous at every point. The antideriva-
tive of Riemann's function is continuous and not differentiable at
rational values with even denominators.
This directly contradicts assertions made by Ampere and by Duhamel that con-
tinuity guarantees differentiability, at least at all but a sparse set of values. Our
question #4, "What is the relationship between continuity and differentiability?"
was now wide open.
Darboux went beyond this to find a continuous function that is not differentiable
at any value of x.

Example 2.2. Consider


00
sin((n + 1)!x)
g(x)=
n!
n=1

This is a uniformly convergent series of continuous functions, and therefore g is


continuous for all values of x. The fact that it is not differentiable at any value of
x requires a bit more work.3

Quite a bit more work. Darboux's original justification, published in 1875, had several flaws. He published an
addendum in 1879 in which he corrected the justification of his original example and gave a simpler example,
cos(n! x)/n! (see Exercises 2.24—2.2.9).
2.2 Nondifferentiable Integrals 37

That same year of 1875, Paul du Bois-Reymond published Weierstrass's example


of an everywhere continuous but nowhere differentiable function.

Example 2.3. Consider the function defined by the uniformly convergent series

where 0 <b < 1 and a is odd integer for which ab > 1 + (for example,
b = 2/3, a = 9 can be used).4

Weierstrass had publicly presented this example to the Berlin Academy in 1872,
but it had not appeared in print.
At the same time, Weierstrass produced an example, valid for any bounded,
countably infinite set S, of an increasing, continuous function that is not differen-
tiable at any point of S. The set of rational numbers in [0, 11 is an example of a
countable set. The set of all algebraic numbers in [0, 11, all roots of polynomials
with rational coefficients, is also countable, as we shall see in the next chapter.

Example 2.4. Given our favorite bounded, countably infinite sequence, (a1, a2,
a3, . .), we define the function
.

x I ln(x2)\
h(0)=0.
2 )'
We choose any k strictly between 0 and 1. The Weierstrass function is given by

w(x) = —ar).

If x then the derivative of h(x — with respect to x is

— = +1 sin ln [(x — an)2]) + cos ln [(x — an)2]),

which always lies strictly between 0 and 2. Since h is an increasing function, so is


w. If the set of possible values of x is bounded, so is the set of values of h(x — an),
and therefore the series that defines w converges uniformly. Since h is a continuous
function, so is w.

To show that w is not differentiable at aN, we rewrite our function as


w(x) = kNh(x — aN) + — an).

For an explanation, see A Radical Approach to Real Analysis, 2nd ed., pp. 259—262.
38 The Riemann Integral

Using the mean value theorem, we see that


= 2k
<2
x—aN n=1
1—k
On the other hand,
h(x—aN)—h(0) 7 (1 1
= (1 + —sin( —ln[(x —aN) ]JJ, 2\\ (2.11)
x—aN \ 2 \2 JJ
which oscillates between 1/2 and 3/2 as x approaches aN. Fork < 1/5, the function
w is not differentiable at aN.

Summary
To summarize the situation with regard to question #4 as it stood in 1875:
• If f is differentiable at a, then it is also continuous at a. Any function that is
differentiable at every point in an interval is also continuous over that interval.
• There are functions that are continuous at every point in an interval but differen-
tiable at none of the points in that interval.
• For any countable set of points, we can find an increasing, continuous function
that is not differentiable at any point in the set.
What was not known was whether or not it is possible to construct an increasing,
continuous function that is not differentiable at any point in the interval. It would
take 30 years to find the answer to this question.
With regard to question #3, the fundamental theorem of calculus, we have seen
that we can find an integrable function for which

f(t)dt
dx a

does not exist for values of x that are rational numbers with even denominators.
Questions that remained open included
• Could dX fa f(t) dt exist but not equal f(x)?
• Could f be integrable but f(t) dt fail to exist at every point?

Exercises
2.2.1. Prove that f(x) = converges uniformly. Prove that the inter-
change of limits in Equation (2.9) is allowed.
2.2.2. Use the mean value theorem to prove that if f is continuous on [a, bi
and differentiable with a nonnegative derivative at all points of (a, b) except for
c e (a, b) where it is not differentiable, then f is a monotonically increasing
function over [a, bi.
2.2 Nondifferentiable Integrals 39

2.2.3. For the function h defined in Example 2.4, find the supremum and infimum
of {h'(x)I 0 <x < 1}. Show that

lim h'(x) lim h'(x).


x—±O x—±O

Exercises 2.2.4—2.2.5 step through Darboux's 1879 proof that

cos(n! x)
Jr(x) =

is continuous at all x and not differentiable at any value of x.

2.2.4. Show that Jr is a uniformly convergent series of continuous functions and


therefore is continuous.

We shall show that


Jr(x+h)—Jr(x)
lim
h

does not exist at any x, and therefore Jr is nowhere differentiable. We fix an E > 0
and, for each N e N, define h = E/N!. The variable h depends on N.

2.2.5. Show that for every positive integer n,


cos(n! (x + h)) — cos(n! x) • h
= — sin(n! x) — — n! cos(n! (2.12)
n!h 2
for some between x and x + h.

2.2.6. Show that


N-i
2
) —<—
N! - N'
n=i

and therefore
N-i N-i
h E n! E
(2.13)

2.2.7. Justify the equality

cos(n!(x+h)) — °° cos(n!x)
n! — n!
n—N+i n—N+i
00

cos(n!(x+h))—cos(n!x)
n=N+i
n!h
40 The Riemann Integral

Justify each inequality,


cos(n!(x +h))—cos(n!x)
n!h
n=N+1
2/1
N! h \\N + 1
+
1

(N + 1)(N +2)
+
1
+...
— (N + 1)(N + 2)(N +3)
4
(2.14)
— EN
2.2.8. Using Equations (2.12)—(2.14), we see that
'N—i
Jr(x + h) — Jr(x) — cos(N! (x + h)) — cos(N! x)
h — —
sin(n! x) + N!h
n=1 /
+ E(E, N)
cos(N!x +E))—cos(N!x)
=— +
E

+ E(E, N), (2.15)


where E(E, < Note that for fixed E > 0, N = N(h) approaches oc as h
approaches 0. Justify the following statement: If the + h) — h
exists, then
• cos(N!x +E))—cos(N!x)
lim (2.16)
N—±oc E

is independent of E
2.2.9. Show that
cos(N! x + 2E) — cos(N! x) — cos(N! x + E) — cos(N! x)
2E E
cos(E) — 1
= cos(N! x + E). (2.17)
E

Therefore, the limit in (2.16) is independent of E if and only if

lim cos(N!x+E)=0
regardless of the value of E > 0, and this is not true for any x.

2.3 The Class of 1870


Three important papers appeared in 1870, papers that built on Riemann's work
on Fourier series and integration in general. These were written by Eduard Heine,
Georg Cantor, and Hermann Hankel. Heinrich Eduard Heine (1821—1881) was a
2.3 The Class of 1870 41

student of Dirichiet in Berlin. In 1848 he took up a position at Halle University


where he would remain for the rest of his career. Georg Ferdinand Ludwig Philipp
Cantor (1845—1918) was born in St. Petersburg, Russia. The family moved to
Germany in 1856. He studied with Kummer and Weierstrass at the University of
Berlin, receiving his doctorate in 1867. He joined Heine at Halle in 1869. Cantor's
dissertation had been on number theory, but Heine was working on the problem of
the uniqueness of the representation of a function as a trigonometric series, and he
convinced Cantor to join him in this task.
The traditional approach to Fourier series was to start with a function, calculate
its Fourier coefficients,
1
ak = — I F(x) cos(kx) dx (k > 0),
,T
1 f
bk = — I F(x)sin(kx)dx (k 1),
,T

and then study the convergence of the resulting series


00
a0
+ cos(kx) + bk sin(kx). (2.18)

Riemann turned this around by starting with arbitrary trigonometric series in the
form of (2.18) and asking what properties such a function must possess. Does a
trigonometric series have to be integrable? If it is integrable, then we can calculate
its Fourier coefficients. Is this Fourier series always identical to the series with
which we started? One outcome of this line of reasoning was the question whether
two distinct trigonometric series could converge to the same function. If this were
possible, then the difference between these series would be a trigonometric series
with some nonzero coefficients that converges to 0. It would have been very sur-
prising if someone had exhibited such a series, but no one could prove that it does
not exist.
Uniqueness is easy to prove if the trigonometric series converges uniformly. As
Weierstrass had shown in his Berlin lectures of the 1 860s, term-by-term integration
is valid for uniformly convergent series. Since we begin with a trigonometric series,
Fourier's heuristic argument given in Equation (1.4) on page 2 actually proves that
the series is unique. The problem is that if a series of continuous functions converges
uniformly, then it converges to a continuous function. The most interesting Fourier
series of the time converged to discontinuous functions and thus could not be
uniformly convergent.
Dirichlet had been able to show that if we start with a continuous function
and form the trigonometric series given in Equation (2.18), then that series is
uniformly convergent. It follows from his analysis that if we work with a piecewise
42 The Riemann Integral

continuous function, a function that is continuous at all but finitely many points,
then its Fourier series is uniformly convergent on any closed interval that does
not contain a point of discontinuity. This led Heine to describe a condition that
is almost as good as uniform convergence, uniform convergence in general. A
series with finitely many exceptional points that is uniformly convergent on any
closed interval that does not contain one of these points is uniformly convergent in
general.
Heine succeeded in proving that if a trigonometric series is uniformly convergent
in general and converges to the function identically equal to zero, then all of the
coefficients must be zero. That is to say, among the set of trigonometric series
that are uniformly convergent in general, no two distinct series converge to the
same function. It was Heine who convinced Cantor to take up the question of what
happens when the convergence is not uniform in general.
In his 1870 paper, Cantor drew on Riemann's methods to get around the need
for uniform convergence on any interval. He proved that if a trigonometric series
converges to 0 at all x, then all coefficients of the series must be 0. By 1871,
Cantor realized that his proof would work if it is known that the trigonometric
series converges to 0 at all but at most finitely many points. He began working on
the problem of an infinite number of exceptional points. For what infinite sets S
can we conclude that if a trigonometric series converges to 0 at all points not in
S, then all of the coefficients must be 0? Cantor was on his way to inventing set
theory.

Hankel's Innovations
Early in 1871, Cantor reviewed Hankel's 1870 paper Untersuchungen über die
unendlich oft oszillierenden und unstetigen Funckionen (Investigations on infinitely
often oscillating and discontinuous functions). It spurred his thinking about infinite
sets of discontinuities.
Hermann Hankel (1839—1873) took classes with Riemann at Gottingen and
Weierstrass in Berlin before earning his doctorate in 1862 at the University of
Leipzig where he then taught. His 1870 paper came shortly after his move to
TUbingen. In it, he attempted to clarify Riemann's necessary and sufficient condi-
tions for integrability.
We have considered the oscillation of a function over an interval. It is defined as
the difference between the least upper bound of the values of the function and the
greatest lower bound of those values. Hankel focused this onto a single point. We
consider all open intervals that contain that point and look at the oscillation over
each of these intervals. As the intervals become smaller, the oscillation can only
2.3 The Class of 1870 43

Definition: Oscillation
Given a function f and an interval I, the oscillation of f over I is
w(f; I) = sup{f(x) I
x e I} — inf{f(x) I
x e I).
The oscillation of f at the point c is
w(f;c) = infw(f;I),
IeI
where I is the set of open intervals containing c. If f(x) f(c)
f(x), then this is equivalent to

w(f;c)= limf(x)—
x—±c
limf(x).

decrease. The oscillation at a point x is the greatest lower bound over all open
intervals that contain x of the oscillation over j•5
The following proposition follows immediately from the second definition of
oscillation at a point. The equivalence of these definitions is left as Exercise 2.3.14.

Proposition 2.4 (Continuous w = 0). The function f is continuous at c if


and only if w(f; c) = 0.

With this notion, Riemann's criterion for integrability can now be stated in terms
of S5, the set of points with oscillation at least a. A function f is integrable over
the interval [a, bi if and only if for each a > 0, we can put the points of S5 fl[a, bi
inside a finite union of intervals, intervals that can be chosen so that the sum of
their lengths is less than any predetermined positive amount.
It would take many years before the terminology was fixed, but we see here the
beginning of the idea of the outer content of a set of points (see definition at top
of next page), denoted ce, e for "exterior."
Any finite set of points has outer content zero. This is because given any E > 0,
we can put a small interval around each point so that the sum of the lengths of the
intervals is less than E.
If we consider the set { 1, 1/2, 1/3, 1/4, .1' it also has outer content zero. Given
.
.

any E > 0, we put an interval of length E/2 around 0. That contains all but finitely
many points from this set. The remaining points, because they are finite, can be put
inside a union of intervals whose lengths add to less than E/2. On the other hand,
the set of rational numbers between 0 and 1, Q fl[0, 11, has outer content 1.

Hankel actually defined a different but related concept he called the "jump" of f at c, the largest a such that
inside any interval containing c there is a point x for which f(x) — > a.
44 The Riemann Integral

Definition: Finite cover and outer content


Given a set 5, a finite cover of S is a finite collection of intervals whose union
contains S. The length of a cover C, denoted 1(C), is the sum of the lengths of the
intervals in the cover. The outer content of a bounded set S is
Ce(S) = jflf 1(C),
CECS

where is the set of all covers of S.

Although Hankel did not have the terminology of outer content, he did grasp the
idea and turned it into a characterization of when a function is Riemann integrable.
Recast into the language of outer content, Hankel's insight is summarized in the
following theorem.

Theorem 2.5 (Integrable = 0). Given a bounded function f defined


on the interval [a, bi, let S5 be the set of points in [a, bi with oscillation greater
than or equal to a. The function f is Riemann integrable over [a, bi if and only
for every a > 0, the outer content of S5 is zero.

The first to explicitly use this measure of the size of a set was Otto Stolz (1842—
1905) in 1881. The term "content" (Inhalt) is due to Cantor in 1884. The distinction
between inner and outer content would be made by Guiseppe Peano in 1887 (see
Section 5.1). The concept would be popularized in Jordan's Cours d'analyse of
1893—1896, because of which it is sometimes referred as Jordan content or Jordan
measure when the inner and outer content of a set are the same.
The outer content is the same whether we use open or closed intervals in our
finite cover (see Exercise 2.3.15). Because the distinction was not yet recognized
as important, mathematicians of this period usually referred simply to intervals
without distinguishing whether they were open or closed.
Theorem 2.5 is a profound and very useful result. Hankel did not have the
terminology to state this result as we have here, but he did understand it fully.
Unfortunately, Hankel used it to reach a faulty conclusion about when a discontin-
uous function is Riemann integrable. He was led astray because of the paucity of
examples of highly discontinuous functions.

Hankel's Types of Discontinuity


It is clear that Hankel was very impressed by Riemann's example of a function
(Example 2.1) that is discontinuous at all rational numbers with even denomina-
tors, yet is integrable. He sought to understand what happens in general. Using a
2.3 The Class of 1870 45

Definition: Dense
A set S is dense in the interval I if every open subinterval of I contains at least
one point of S.

Definition: Totally discontinuous


A discontinuous function is totally discontinuous in an interval if the set of points
of continuity is not dense in that interval.

Definition: Pointwise discontinuous


A discontinuous function is pointwise discontinuous in an interval if the set of
points of continuity is dense in that interval.

technique that he dubbed "condensation of singularities," Hankel showed how to


take a function with a singularity at one point, either a discontinuity or an infinite
oscillation such as sin(1/x) near x = 0, and use it to construct integrable functions
with singularities at every rational number. Cantor would later simplify this method
and show how to apply it to any countable set. What is significant for our purposes
is that both the set of points at which the function is continuous and the set of points
at which the function is discontinuous are dense.
The rational numbers are dense in R. The rational numbers with even denomi-
nators are also dense. So are the irrational numbers.
Hankel noticed that all of his examples of integrable functions that are discontin-
uous on a dense set of points have the property that the set of points of continuity is
also dense. The examples that we have seen so far of Riemann integrable functions
that are discontinuous on a dense set of points include Riemann's function (Exam-
ple 2.1), the function g in Exercise 2.1.5, and the function m in Exercise 2.1.14.
What characterizes all of these examples as well as the others that Hankel found
is that the set of points of continuity are also dense. This suggested to him that
he should separate discontinuous functions into two classes: those for which the
points of continuity are not dense and those for which the points of continuity are
dense.
Thus, for example, Dirichlet's function (Example 1.1) is totally discontinuous
since it is discontinuous at every point.
All of the examples that we have seen so far of Riemann integrable func-
tions that are discontinuous on a dense set of points are pointwise discontinuous.
Hankel believed that every pointwise discontinuous function must be Riemann
integrable.
46 The Riemann Integral

Hankel's Error
Hankel's argument for his assertion that every pointwise discontinuous function
is Riemann integrable is not unreasonable. We choose an arbitrary a > 0. If a
function is continuous at one value, then we can find an interval around this value
on which the oscillation is less than a. If the set of points of continuity is dense,
then we have succeeded in putting each element of this dense set inside an interval
that contains no points of Sa.
Those points with oscillation larger than a constitute a very thin set. Between
any two points of this set there must be an entire open interval of points not in
the set. Hankel believed that such a set must have outer content 0 and therefore
must be Riemann integrable. This belief was reinforced by the fact that all of the
examples of pointwise discontinuous functions that Hankel knew, examples such
as Riemann's function, were integrable.
Hankel's fallacy, and he was not the only prominent mathematician to fall into
it, was to assume that such a thin set cannot have positive outer content. Thomas
Hawkins has presented evidence that between 1870 and 1875 this was the case for
Hankel, for Axel Harnack, and for Paul du Bois-Reymond. But in 1878, when Ulisse
Dini published his book on the theory of functions of real variables, Fondamenti per
la teorica dellefunczioni di variabili reali, Dini expressed doubt in the validity of
Hankel's claim. As we shall see in Chapter 4, finding the flaw in Hankel's reasoning
would greatly advance our understanding of the structure of the real numbers, as it
also revealed problems with Riemann's definition of the integral.

Cantor's 1872 Paper


In 1872, Cantor published Uber die Ausdehnung eines Satzes aus der Theorie
der trigonometrischen Riehen (On the extension of a theorem from the theory of
trigonometric series) where he proved results on the uniqueness of trigonometric
series that converges to zero except possibly at an infinite set of points. The preface
to this paper contains Cantor's construction of the irrational numbers, a step he
recognized as necessary before he could work with them.
Cantor's discussion of infinite sets to which he could extend his results on
uniqueness of the trigonometric series began with the set {1, 1/2, 1/3, 1/4, .. .1.
This set has the very nice property that if we consider any open interval that contains
0 and remove the points that are in that interval, then we are left with a finite set.
The point 0 is called an accumulation point of this set, and Cantor designated this
set as an infinite set of type 1.
Cantor actually defined type 1 sets to be infinite sets for which the derived set
is finite, but he was only working with bounded sets. To extend his definition to
2.3 The Class of 1870 47

Definition: Accumulation point


Given a set 5, a point x is an accumulation point (also known as a limit point or
cluster point) of S if every open interval that contains x also contains infinitely
many points of S.

Definition: Derived set, type 1 set


The set of accumulation points of S is called the derived set of S, denoted S'. A
set is type 1 if its derived set is nonempty but the derived set of its derived set is
empty.

Definition: Type n sets, first and second species


A S set is type n if its derived set is type n — 1 and S itself is not type n — 1. A
set is called first species if it is type n for some finite integer n 0. A set that is
not first species is said to be second species.

unbounded sets, we count the number of times that we need to take the derived set
in order to get to the empty set. A set with no accumulation points is considered to
be type 0. The set {1, 1/2, 1/3, 1/4, .. .} is type 1 because its derived set is {0} and
the derived set of {0} is the empty set.
If a derived set is infinite, then we can consider the derived set of its derived set.
For example, starting with the set
11 1

Lm n
its derived set contains {1, 1/2, 1/3, 1/4, .. .}, and with a little work (see Exer-
cise 2.3.11) you can show that the derived set equals { 1, 1/2, 1/3, 1/4, . .}. The .

set T is not type 1, but T" = 0, and we say that T is type 2.


A set is of first species if we get to the empty set after a finite number of
derivations.
The set Q is not first species. Its derived set is the entire real number line. The
derived set of R is again R. What Cantor was able to prove is that if a trigonometric
series converges to 0 at all points except possibly on a set of exceptional points that
is first species, then all coefficients of the trigonometric series are zero.
Cantor's identification of sets of first species gave further impetus to the concept
of outer content. Any bounded type 1 set, that is to say any set with a finite number
of limit points, has outer content zero. We put small intervals around the limit
points so that the sum of the lengths of these intervals is less than E/2. As we shall
48 The Riemann Integral

show in the next chapter, any infinite set in a bounded interval has a limit point, so
once the limit points have been covered, there can only be finitely many points left.
Any bounded type 2 set has outer content zero. The set of limit points is a
bounded type 1 set and so can be covered by intervals of total length less than E/2.
The points from the original set that are not covered by these intervals are finite in
number.
We now see that, by induction, any bounded first species set has outer content
zero. If we have proven that a bounded set of type n must have outer content zero,
then so must any bounded set of type n + 1 because its derived set has type n.
Combining this with Hankel's insights, we see that if Sa, the set of points at which
the oscillation is greater than or equal to a, is of first species for all a > 0, then
the function is Riemann integrable. This appears to give even greater credence to
Hankel's claim that any pointwise discontinuous function is Riemann integrable.
Surely if there is an entire interval of continuity around every point of continuity
and the points of continuity are dense, then what is left over must be first species.
It is to Cantor's credit that he realized that what seemed obvious was not so
clear. He recognized that before any further progress could be made, he needed to
understand the structure of the real number line. Cantor now embarked on a quest
that would profitably engage the remainder of his career and mark him as one of
the great mathematicians.

Exercises
2.3.1. Find the derived set of each of the following sets. Which of these sets are
first species?
1. Qfl[0,1J
2. [0,1I—Q
3.
4.
5. k=1 2or3j
6. (O,i)U(3,4)

2.3.2. Find the outer content of each of the sets in Exercise 2.3.1. Justify each
answer.

2.3.3. Show that if S has outer content zero and T is any bounded set, then

Ce (S U T) = Ce(T).

2.3.4. Define the function f by f(x) = sin(1/x), x 0, f(0) = 0. What is the


oscillation of f at 0? Justify your answer.
2.3 The Class of 1870 49

2.3.5. Find the oscillation at x = 1/3 of the function


x,

Justify your answer.


2.3.6. Which of the following sets are dense in [0, 11? For each set, justify your
answer.

1.Q
2. [0,1I—Q
3. The set of rational numbers with denominators that are a power of 2
4. The set of real numbers that have no 2 in their decimal expansion
5. The set of real numbers that have a 2 somewhere in their decimal expansion
6. The set of rational numbers with denominators less than or equal to 1,000
7. The set of rational numbers with denominators that are prime
8. The set of rational numbers with numerators that are prime
2.3.7. Show that the function rn defined in Exercise 2.1.14 is pointwise discontin-
uous.
2.3.8. Consider the function h defined in Exercise 2.1.12. Is it possible to find an
E > 0 so that h is totally discontinuous in the open interval (—E, E)? Explain why

or why not.
2.3.9. Consider the function g in Exercise 2.1.5. Find all values of x e [0, 1] at
which the function is not continuous, and find the oscillation of r at each of
these points. Determine whether this function is totally discontinuous or pointwise
discontinuous and justify your answer.
2.3.10. Prove that for any set S,

s" c S',
the derived set of the derived set of S is contained in the derived set of S.
2.3.11. Prove that the derived set ofT = { + rn n NJ is the set U = In
N} U{0}. First show that U ç T'. To prove that T has no other limit points, show
that there are only finitely many points of T in (1/N, 1/(N — 1)) that are not of
the form 1/N + 1/n.
2.3.12. Prove that
11 1 1

trni rn2 rnk

is type k. It is clear that its derived set includes Tk_1. The key is to show that for all
k the derived set of Tk does not contain any points other than 0 that are not in Tkl.
50 The Riemann Integral

2.3.13. Consider the set of zeros of the function f' defined by fi(x) = sin(1/x).
What is the type of this set? What is the type of the set of zeros of f2(x) =
sin (1/fi(x))? What is the type of the set of zeros of f3(x) = sin (1 /f2(x))?

2.3.14. Prove that if f(x) < f(c) f(x), then w(f; I) =


f(x) — f(x), where I is the set of open intervals containing c.
2.3.15. Given a set S, let 0 be the set of finite covers of S by open sets and let C be
the set of finite covers of S by closed sets. Show that if 0 0 is any finite cover of
S by open intervals, then adding the endpoints to each of these open intervals gives
an element of C, a finite cover by closed sets. Show that if C C is an finite cover
of S by closed sets, then removing the endpoints of each of these closed intervals
covers all of S except possibly for a finite set of points. Use these observations to
prove that it does not matter whether we use open or closed (or even half-open)
intervals in defining the outer content of a set.
2.3.16. Given a function f defined over some interval, we use to denote the set
of points at which the oscillation of f is least a. Give an example of a function for
which Ce(Sa) = 0 for all a > 0 but for which

Ce
(U = 1.

2.3.17. Prove that if the function f is Riemann integrable on [0, 11, then it is either
continuous or pointwise discontinuous on [0, 11.
3
Explorations of R

Like most students entering college, mathematicians of the midnineteenth century


thought they understood real numbers. In fact, the real number line turned out
to be much subtler and more complicated than they imagined. As Weierstrass,
Dedekind, Cantor, Peano, Jordan, and many others would show, the real numbers
contain many surprises and can be quite unruly. Until they were fully understood,
it would be impossible to come to a solid understanding of integration.
The complexity of real numbers illustrates a recurrent theme of mathematics. The
real number line is a human construct, created by extrapolation from the world we
experience, employing a process of mental experiments in which choices must be
made. In one sense, the choices that have been made in formulating the properties
of the real number line are arbitrary, but they have been guided by expectations
built from reality.
The complexity of the real numbers arises from the superposition of two sets of
patterns: the geometry of lines and distances on the one hand, and the experience
of discrete numbers — integers, rationals, algebraic numbers — on the other. This
is a template for much of mathematical creation. Patterns that arise in one context
are recognized as sharing attributes with those of a very different genesis. As these
patterns are overlaid and the points of agreement are matched, a larger picture
begins to emerge. The miracle of mathematics lies in the fact that this artificial
creation does not appear to be arbitrary. Repeatedly throughout the history of
mathematics, this superposition of patterns has led to insights that are useful. In
Wigner's phrase, we are privileged to witness the "unreasonable effectiveness of
mathematics." We have tapped into something that does appear to have a reality
beyond the constructions with which we began. That it entails unexpected subtleties
should come as no surprise.

3.1 Geometry of R
The real number line is, above all else, a line. While true lines may not exist in the
world of our senses, we do see them at the intersection of flat or apparently flat

51
52 Explorations of IR

surfaces such as the line of the horizon when looking across a sea or prairie. To
imagine the line as infinite is easy, for that is simply imagining the absence of an end.
For the line as a geometric construct, the natural operation is demarcation of
distance. There are two critical properties of distances that become central to the
nature of real numbers:
1. However small a distance we measure, it is always possible to imagine a smaller
distance.
2. Any two distances are commensurate. However small one distance might be
and large the other, one can always use the smaller to mark out the larger.
The second property is known as the Archimedean principle,' that given any two
distances, one can always find a finite multiple of the smaller that exceeds the
larger.
Let us now take our line and mark a point on it, the origin. We conduct a mental
experiment. We stretch the line, doubling distances from the origin. What does the
line now look like? It cannot have gotten any thinner. It did not have any width to
begin with. A point that was a certain distance from the origin is now twice as far,
but the first property tells us that the line itself should look the same. No gaps or
previously unseen structures are going to appear as we stretch it. No matter how
many times we double the length, what we see does not change.

An Infinite Extension
We now kick our mental experiment up a level and imagine stretching the line by
an infinite amount. What happens to our line? There are two reasonable answers,
and a choice must be made. One reasonable answer is that it still looks like a line.
This answer builds on the human expectation that whatever has never changed,
never will change. Every time we have doubled the length, the line has remained
unchanged. Is stretching by an infinite factor so different?
It is different. Go back to our original line and identify one of the points other
than the origin. What happens to that point as we magnify by an infinite factor? No
matter how large our field of vision, that point has moved outside of it. All points
other than the origin have moved outside the field of vision. All that is left is the
point at the origin. Infinite magnification has turned our line into a single point.
I said that we have a choice. We could hold onto our instinctive answer that the
infinitely magnified line is still a line. That choice contradicts the Archimedean
principle, and so it is not the generally accepted route to construction of the real
numbers, but I do want to go down that road a little way.

Also known as the Archimedean axiom or the continuity axiom. It predates Archimedes, appearing as definition
4 of book 5 of Euclid's Elements.
3.1 Geometry of R 53

Consider one of the points other than the origin on the infinitely magnified
line. Where was it before the magnification? It certainly was not any measurable
distance away from the origin, otherwise it would have sailed off to infinity when
we magnified the line. It was not on top of the origin because then it would have
stayed put. It must have been off the origin, but less than any measurable distance
away from the origin. Its distance from the origin must have been so small that it is
incommensurate with any measurable distance. No matter how many of these tiny
distances we take, we cannot fill any measurable distance, no matter how small.
The tiny distances are known as infinitesimals. Leibniz used them to explain his
development of the calculus. It is possible to develop analysis using the real number
line with infinitesimals, though the full complexity of infinitesimals is much greater
than what is suggested by this simple thought experiment. This approach to calculus
through infinitesimals is called nonstandard analysis and was developed in the
early 1960s, beginning with the work of Abraham Robinson at Princeton. Initially,
the logical underpinnings required to work with infinitesimals were daunting. Since
then, these foundations have been greatly simplified, and nonstandard analysis has
vocal proponents.
If the real number line includes infinitesimals, then every point on the real line
must be surrounded by a cloud of points that are an infinitesimal distance away. The
infinitely magnified line, if it really is a replica of the original line, must also contain
infinitesimals, and they must be stretched from points that were infinitesimally small
with respect to the original infinitesimals. Thus every infinitesimal is surrounded
by a cloud of points whose distance is infinitesimally infinitesimal, and these by
third-order infinitesimals, and so on.
"And so on" is a wonderful human phrase. We actually can imagine this unimag-
inable construction, or at least imagine enough that we are prepared to accept it
and work with it. There is a real choice to be made. That choice was made in the
early nineteenth century. Looking back, the decision to hold to the Archimedean
principle appears inevitable. As I said earlier, the real number line is, above all else,
a line. We are working with distances, and distances in our everyday experience
are commensurate. Those seeking foundations for analysis preferred to stay to the
simpler, surer, and more intuitive ground of commensurate distances.

Topology of R
The only geometry on a line is the measurement of distance. There is an entire
branch of mathematics that is built on the concept of distance and its generalizations:
topology. Topology gets much more interesting in higher dimensions, but there is
a lot going on in just one dimension.
We begin with some basic definitions. The basic building block of topology is the
E-neighborhood of a point a. Objects such as neighborhoods are best visualized
54 Explorations of IR

Definition: e-Neighborhood
Given e > 0, the e-neighborhood of a point a, N6(a), is the set of all points whose
distance from a is strictly less than e:

NE(a) = Ix—al <EJ. (3.1)

Definition: Open set


We say that a set S is open if for each x e 5, there is an e-neighborhood of x
that is contained entirely inside S: x e S implies that there is an e > 0 such that
N6(x) c S.

in two- or three-dimensional space, but you also need to visualize them as they live
on the real number line.
On the real number line, this is the open interval (a — E, a + E). In the plane R2,
it is a disc centered at a (without the bounding edge), and in R3 it is the solid ball
centered at a (without the bounding surface). It is also often useful to work with
a deleted or punctured neighborhood of a point a, consisting of a neighborhood
with the point a removed, {xI 0 < Ix — al <E}.
Any open interval is open. Any union of open intervals is open. The empty set is
open. (Since there is no x S, this statement is true about every x that is in S). The
entire real line R is open. In two dimensions, the inside of any polygon, without
the boundary, is open.

Theorem 3.1 (Equivalent Definition of Continuity). A function is continuous


over its domain if and only if the inverse image of every open set in the range is an
open set in the domain.

Consider the function defined by f(x) = x2. The inverse image of (1, 4) con-
sists of all points of R whose square lies strictly between 1 and 4. This is
(—2, —l)tJ (1,2). The inverse image of(—1, 4) is (—2,2). Think about why.
Proof We begin with the E-8 definition of continuity. A function f is continuous
at a if and only if given any E > 0, there exists a response S such that Ix — a < S
implies that If(x) — f(a)I <E.
We first translate this statement into the language of E-neighborhoods. A function
f is continuous at a if and only if given any E-neighborhood of f(a), there is a
S-neighborhood of a such that

f NE (f(a)).
3.1 Geometry of R 55

This containment is equivalent to N8(a) ç f -' Thus the E-6 definition


of continuity can be restated as: Given a point a in the domain of f and any E > 0,
there exists a response 6 such that
N8(a) c (N€ (f(a))).
We now translate this into the language of open sets. Start by assuming that f is
continuous and let T be an open set in the range. If x e f_i(T), then f(x) e T, so
we can find an E-neighborhood c T. We have seen that continuity implies
that there is a 6 > 0 for which N8(x) c f c f_i(T), so f_i(T) is
open.
If we assume that (T) is open for every open set T in the range, then pick
any f(x) e T and any E for which c T. We know that is open,
and therefore by our assumption so is (f(x))). Since this set is open, we
can find a 6 > 0 for which N8(x) C f_i

Be careful. If f is continuous, it does not necessarily follow that if S is open, so


is f(S) (consider any nonmonotonic continuous function, see Exercise 3.1.2).

More Definitions
A closed interval is closed. Any finite set of points is closed. Any set of points
that forms a convergent sequence, taken together with its limit, is a closed set.
The empty set and the entire real line IR are closed. These are the only two sets
that are both open and closed (see Exercise 3.1.8). Many sets such as (0, 11 and
{1, 1/2, 1/3, 1/4, . .1 (without the limiting value 0) are neither open nor closed.
.

If every neighborhood of x contains a point of S other than x itself, then every


neighborhood of x contains infinitely many points of S (see Exercise 3.1.7), and
therefore x is an accumulation point of S (recall the definition on page 46). We can
characterize closed sets in terms of accumulation points.

Proposition 3.2 (Characterization of Closed Sets). A set S is closed if and only


if it contains all of its accumulation points.

Proof If S is closed, then its complement is open. No accumulation point of S could


be in the complement (see Exercise 3.1.11), SO S contains all of its accumulation

Definition: Closed set


A set S is closed if and only if its complement (Sc, the set of points that are not
in S) is open.
Explorations of R

Definition: Interior, closure, and boundary


The interior of S is the union of all open sets contained in S. The closure of S is
the intersection of all closed sets that contain S. The boundary of 5, consists
of all points in the closure that are not in the interior.

points. If S contains all of its accumulation points, then any point in Sc sits inside
some E-neighborhood entirely inside 5c (see Exercise 3.1.12). Therefore, 5c is
open and so S is closed. LI

It follows that any closed set contains its derived set. If S is closed, then its
elements might or might not be accumulation points. Thus, the set

S = {O, 1/2, 1/3, 1/4, ..., 1/n, . . .1

is closed, but 0 is the only accumulation point of S. On the other hand, if T = [0, 11,
then T is closed and every point in T is an accumulation point of T. This closed
set is equal to its derived set. Any derived set is closed (see Exercise 3.1.21).
It took a long time for the mathematical community to recognize the importance
of open and closed sets. Closed sets, which contain their own derived set, are the
older notion and can be traced back to Cantor in 1884. Mathematicians working
in analysis in the late nineteenth and even early twentieth centuries would fail to
make it clear whether the intervals that they described were to include endpoints
or not. Often it made no difference. Sometimes it was critically important. One of
the most infamous examples was Borel's 1905 Leçons sur lesfonctions de variable
réelles (Lectures on real variable functions).2 As late as the early twentieth century,
some mathematicians including the Youngs used the term "open set" to mean a
set that is not closed. The current meaning can be traced to Baire in 1899. It was
Lebesgue who popularized the current definition.
The interior of an open set is itself. The interior of the closed interval [a, bI
is the open interval (a, b). The interior of a finite set of points is the empty set.
The closure of a closed set is itself, while the closure of the open interval (a, b)
is [a, b]. The half-open, half-closed interval [— 1, 1) has closure [— 1, 1], interior
(—1, 1), and a boundary that consists of two points, { — 1, 1). The closure of
{1/n I n e N) is {1/n I n e N) U{0). Its interior is the empty set. The boundary is
{1/n In eN)U{0).
The interior, closure, and boundary can also be described in terms of E-
neighborhoods.

2
Renfro speculates that this might have been the fault of Maurice Fréchet, then a young graduate student, who
transcribed these lectures.
3.1 Geometry of R 57

Proposition 3.3 (Characterizations of Interior, Closure, and Boundary). A


point x is in the interior of S if and only if there is some E -neighborhood of x that
is completely contained in S. A point x is in the closure of S if and only if every
E-neighborhood of x contains a point that is in S. A point x is in 8S if and only if
every E-neighborhood of x contains a point that is in S and a point that is not in S.

Proof The characterization of the interior of S is left for Exercise 3.1.22.


From DeMorgan's laws (Theorem 1.3, the complement of the intersection of
the closed sets that contain S is the union of the open sets that are contained in
Sc. Therefore, x is in the closure of S if and only if it is not in the interior of 5C,
which holds if and only if there is an E-neighborhood of x that is not completely
contained in Sc.
The characterization of points in the boundary follows from the definition of the
boundary as the set of points that are in the closure and not in the interior.

Note that a set S is dense (recall the definition on p. 45) in T if, for all x e T,
every E-neighborhood of x contains infinitely many points of S. The set of rational
numbers is dense in IR. We can take a much smaller set and still be dense in IR. The
set of rational numbers whose denominators are even (recall Riemann's function,
Example 2.1) is dense. Even if we restrict ourselves to the set of rational numbers
whose denominators are powers of 2, this also is a dense subset of R. If a set S is
dense in R, then every real number is an accumulation point of S. In this case, the
closure of S is the entire real number line.

Exercises
3.1.1. Prove that every constant function is continuous.
3.1.2. Give an example of a continuous function f and an open set S in the domain
of f such that f(S), the image of 5, is not open.
3.1.3. Define the function f over the domain [— 1/m, 1/mi by
f(x) = sin(1/x), x 0, f(0) = 0.
Describe f1(N112(0)), the inverse image of N112(0). Is this inverse image open,
closed, or neither?
3.1.4. Define the function g over the domain [— 1/m, 1 /m] by

g(x) = x sin(1/x), x 0, g(0) = 0.


Describe g' (N112(0)), the inverse image of N112(0). Is this inverse image open,
closed, or neither?
58 Explorations of R

3.1.5. For each of the following sets in 1R2 state whether the set is open, closed, or
neither. Then find the closure, the interior, and the boundary of the set.
1.
2.
3.
4.
5.
6.

7.

3.1.6. For each of the following sets in R, state whether the set is open, closed, or
neither. Then find the closure, the interior, and the boundary of the set.
1.Q
2. IR-Q
(1/2n, 1/2n — 1)

U n=1
[1/2n, 1/2n — 11
5. the set of all rational numbers with denominators that are less than 1,000
6. the set of all rational numbers with denominators that are powers of 2
7. the set of all rational numbers with numerators that are powers of 2

3.1.7. Prove that every neighborhood of x contains at least one point of S other
than x itself if and only if every neighborhood of x contains infinitely many points
of S.

3.1.8. Prove that if a set is both open and closed, then it is either IR or the empty
set.

3.1.9. Give an example of a set with an accumulation point that is not a boundary
point.

3.1.10. Give an example of a set with a boundary point that is not an accumulation
point.

3.1.11. Prove that if 5C the complement of 5, is open, then 5C cannot contain an


accumulation point of S.
3.1.12. Prove that if x e 5C is not an accumulation point of 5, then there is an
E > 0 for which C

3.1.13. Prove that any union of open sets is open.


3.1.14. Prove that any finite intersection of open sets is open.
3.1.15. Give an example of an infinite intersection of open sets that is not open.
3.2 Accommodating Algebra 59
3.1.16. Give an example of an infinite intersection of open sets that is open.
3.1.17. Prove that any intersection of closed sets is closed.
3.1.18. Prove that any finite union of closed sets is closed.
3.1.19. Give an example of an infinite union of closed sets that is not closed.
3.1.20. Give an example of an infinite union of closed sets that is closed.
3.1.21. Let S be a derived set (the set of accumulation points of some set T). Show
that Sc is open, and thus S is closed.
3.1.22. Prove that a point x is in the interior of S if and only if there is some
E-neighborhood of x that is completely contained in S.

3.2 Accommodating Algebra


The real number line entails more than distance. We want to assign values to
its points. We choose a point that will represent the origin, 0, and then pick a
second point, label it "1," and use the distance between the origin and 1 as our
basic linear unit. We can locate points whose distances from the origin correspond
to integers and rational numbers. We can even locate points that correspond to
irrational lengths such as the length of the diagonal of a unit square. Taking
mirror images across the origin, we locate the negatives of these numbers. We have
imposed a system of a discrete set of objects — integers, rational numbers, algebraic
numbers — onto our continuum of distances. But this system does not account for
all points in IR. In 1844, Joseph Liouville proved that there are points on the real
line that are not algebraic.
It is a very small and self-evident step to believe that every point on the con-
tinuum of the real line corresponds to a number, but this step carries enormous
repercussions, for from now on we will be using the geometric notion of distance
to inform our concept of number that, until now, had been restricted to quantities
arising from algebraic constructions.

Implications of the Bolzano—Weierstrass Theorem


Although some mathematicians resisted expanding the notion of number beyond
algebraic numbers, most recognized the need to do so. Foremost among them were
Richard Dedekind and Karl Weierstrass.
Beginning in the academic year 1857—1858 and then every 2 years until the
1880s, Weierstrass lectured on analysis at the University of Berlin. In these
lectures, he developed and expounded many of the basic principles of analysis.
His students would work through these ideas, refine them, and eventually publish
60 Explorations of R

them. It was in these lectures that Weierstrass first explained what today we call the
Bolzano—Weierstrass theorem. His proof rested on the nested interval principle.
The shared attribution with Bernhard Bolzano arises from Weierstrass's acknowl-
edgment of his indebtedness to Bolzano's 1817 proof that the convergence of every
Cauchy sequence implies that every bounded, increasing sequence has a limit.

Theorem 3.4 (Bolzano—Weierstrass Theorem). Any bounded infinite set S has


an accumulation point.

Proof Weierstrass's proof proceeds as follows. If we have an infinite set of points


contained in the bounded interval [a, b], we consider the two half-intervals, [a, (a +
b)/21 and [(a + b)/2, bi, and choose one that has infinitely many points from
our set. We then divide this interval in half and choose an interval of length
(b — a)/4 that contains infinitely many points from our set. We continue in this
way, generating an infinite sequence of nested intervals of arbitrarily small length,
each of which contains infinitely many points from our set. The nested interval
principle guarantees a point a that lies in all of these intervals. Every neighborhood
of a contains one of these nested intervals, and therefore every neighborhood of a
contains infinitely many points from our set. The point a is an accumulation point
ofS.

The nested interval principle implies what we can call the Bolzano—Weierstrass
principle, that every bounded infinite set has a limit point, but the Bolzano—
Weierstrass principle also implies the nested interval principle. Given a nested
sequence of intervals, we create an infinite set S by choosing one point from
each interval so that no point duplicates any of the previously chosen points. By
Bolzano—Weierstrass, the set S has a limit point, and since every neighborhood
of this limit point contains infinitely many elements of 5, this limit point must be
inside every interval.
How do we define and assign values to those points on the real number line that
are predicted by Bolzano—Weierstrass but are neither algebraic nor roots of common
functions? Is there a way we can start with arithmetic and build to all of the values
represented by the real number line? Several notable mathematicians wrestled with
this question. Beginning in the late 1 850s, Richard Dedekind, Karl Weierstrass,
Georg Cantor, and H. Charles Mèray each found his own solution. Dedekind and
Cantor published their solutions in 1872. The details of these solutions are less im-
portant than what they have in common, all drawing on a fundamental observation
about the structure of IR: The set of rationals is dense on the real number line.
Every open interval, no matter how far we may have zoomed in, contains at
least one and therefore infinitely many rational numbers (see Exercise 3.2.1). Any
3.2 Accommodating Algebra 61

point on the real number line is uniquely determined by reference to these rational
numbers. To Dedekind, each irrational number is described by considering all
rationals less than the point in question and all rationals that are greater. Given
two such sets with the property that every element of the first is strictly less than
every element of the second and their union consists of all rational numbers, such
a "Dedekind cut" defines a unique point in IR. Weierstrass used series. Cantor and
Mèray used sequences of rational values that could be forced as close as desired to
the point in question by taking sufficiently many terms or going out sufficiently far
in the sequence. Cantor identified each point in IR with the collection of rational
Cauchy sequences that converge to this point. Each point on the real number line
is identified by an appropriate collection of Cauchy sequences.
We are using a modified form of Cantor's definition when we identify the
elements of IR with all possible decimals to infinitely many places. Such a deci-
mal expansion represents a choice of a particular Cauchy sequence. For example,
we identify m with a Cauchy sequence that begins (3, 3.1, 3.14, 3.141, 3.1415,
3.14159, .. .). When we write "m = 3.14159.. .," there are two observations that
we need to make. The first is that, actually, we have not specified the location of
7t. The information supplied by 3.14159... tells us nothing about the digit that
follows 9. There are an infinitude of points on the real number line that begin with
this particular decimal expansion. Giving a thousand or a million or a billion digits
gets us no further in the sense that there are still infinitely many different points
whose expansions start with those digits.
The second observation is that the statement "m = 3.14159. ." nevertheless
.

does tell us something very important. It implies that there is a sequence of rational
approximations to m that begins (3, 3.1, 3.14, 3.141, 3.1415, ...) and that eventu-
ally will enter and stay within any open interval containing m, no matter how small
that interval might be. We may not know how to find an arbitrary term of this se-
quence, but we are asserting its existence. Such a statement tells us that m is a point
on the real number line and that it is located within the interval [3.14159, 3.141601.
There are, of course, many explicit Cauchy sequences that represent m. One of
them is given by
7 4 44 444 4444
35 357 3579
(44—— 4——+— 4——+——— 4——+—---—+— ...
3

The point of Cantor's construction is not that we can find a Cauchy sequence for
each real number, but that it exists.

Completeness
Dedekind based his construction on the assumption that every nonempty bounded
set should have a least upper bound. Cantor based his construction on the
62 Explorations of R

Definition: Completeness
A set of numbers is called complete if it has any of the four equivalent properties:

• Every sequence of closed, nested intervals has a nonempty intersection that


belongs to the set.
• Every bounded subset has a least upper bound in the set.
• Every Cauchy sequence converges to a point in the set.
• Every infinite bounded subset has a limit point in the set.

assumption that every Cauchy sequence should converge. The nested interval prin-
ciple is another equivalent assumption. These are different but equivalent ways of
making precise what we mean when we describe the real number line as a contin-
uum. This property of IR would eventually come to be known as completeness.
In particular, the set of all real numbers is complete. The set of all rational
numbers is not complete. Today, rather than attempting to define the set of real
numbers so as to justify their completeness, it is common to simply assert as
axiomatic that IR contains all rational numbers and is complete.
I have already explained Weierstrass 's proof that the nested interval principle and
the Bolzano—Weierstrass principle are equivalent. The equivalence of the remaining
statements is left for you in Exercises 3.2.8—3.2.10.
The nineteenth century witnessed an increasing sense of paradox from the in-
terplay of algebra and geometry on the real number line. It reached a peak in the
1 880s. One of the curious phenomena that was discovered and debated in that
decade began with the observation that the rational numbers are denumerable or
countable.
Informally, a set is countable if it is possible to list its elements in order: first,
second, third The rational numbers in [0, 11 can be listed by ordering them by
the size of the denominator (when reduced) and among those of equal denominator
by the numerator:
01112131234151 32 )
1'1'2'3'3'4'4'5'5'5'5'6'6'7' (

The fact that the set of rational numbers is countable leads to an important charac-
terization of open sets in IR.

Theorem 3.5 (Characterization of Open Sets). Every open set in IR is a countable


union of disjoint open intervals.

Proof Let U be an open set, and choose any t e U. Let a = inf{x (x, t] c U). The
I

point a cannot be in U, otherwise there would be a neighborhood of a contained in U


3.2 Accommodating Algebra 63

Definition: Countable
A set is countable or denumerable if it is finite or if it is in one-to-one correspon-
dence with N, the set of positive integers. A set that is countable and not finite is
called countably infinite.

and we could find an x <a for which (x, t] c U. Similarly, let b = sup{x [t, x) c I

U). The point b cannot be in U, but (a, b) c U. (Note that we might have a =
and/or b = oc.) We call this interval 1(t) = (a, b).
If s and t are two points in U, then either I(s) = 1(t) or I(s) fl 1(t) = 0, so U
is a union of disjoint open intervals. To see that there are at most countably many
open intervals, we observe that we can find a distinct rational number inside each
interval.

Harnack's Mistake
We take the ordering of the rational points in [0, 1] given in (3.2) and call them (a1 =
0, a2 = 1, a3 = 1/2, a4 = 1/3, .. .). We choose any positive E and let 'k be the open
interval of length E/2k that is centered at ak: = (ak — ak + Does
the union of these intervals contain all points in [0, 11? In other words, can we put
the closed interval [0, 11 inside a countable union of open intervals whose lengths
add up to E? This was a problem first posed by Axel Harnack in 1885. He convinced
himself that the answer is "yes."
Axel Harnack (185 1—i 888) was the younger twin brother of the German theolo-
gian Adolf von Harnack. Axel earned his doctorate at Erlangen-Nurnberg Univer-
sity in 1875, working under the direction of Felix Klein. He is best known for his
work in harmonic analysis and the theory of algebraic curves.
In essence, what Harnack did was to ask himself, "What is the complement of a
countable union of intervals?" He believed that it must also be a countable union of
intervals. Think about this. The intervals might be open or closed or half-open/half-
closed, and a closed interval might be a single point. It is certainly true that the
complement of any finite union of intervals is a finite union of intervals. It is not
obvious that the same would not be true for countable unions. But if Harnack was
right, then the complement of U 'k is a countable union of intervals. The intervals
in the complement must be single points, otherwise they would contain rational
numbers between 0 and 1. We now put each of these countably many points inside
intervals whose lengths add up to E, and we now have all of [0, 11 contained within
a union of countably many intervals whose lengths add up to 2E.
In fact, Harnack's basic premise, that the complement of a countable union of
intervals is a countable union of intervals, is wrong. The complement can be an
64 Explorations of R

uncountable union. Georg Cantor and others understood this. The flaw in Harnack's
reasoning underscores some of the complexity of the real number line as a set of
numbers, and we shall treat it in full detail in the next section. But that does not
prove that his answer was wrong. This was accomplished by Emile Borel in 1895.

Borel's Series
A year after earning his doctorate, Borel published Sur quelques points de la théorie
des fonctions (On some points in the theory of functions), a paper that dealt with
a question from complex analysis. Specifically, he studied analytic continuation
across a boundary on which we have a countable dense set of poles (points for
which the function is unbounded in every neighborhood). We shall focus on a very
special case of the type of function he studied,3 the series
CX) CX)

1/2
where A <oc,
n=1 n=1

and the points } are dense in [0, 11. For example, we could take } to be the

set of rational numbers in [0, 1].


Does this series converge at any points in [0, 11? At first glance, it may appear
that the answer is obviously "no." After all, this function is unbounded in every
interval. But take a closer look. Choose any constant c > 0. If we can find an
x e [0, 11 so that
— > cA,1/2, for all n > 1,
then the series converges,
°°
A A

For any x in
= {x e [0, 1] — > for all n > 1
our series converges. But are there any points in
Borel considered the complement of It is a union of open intervals, x —
cA,1/2, the nth of which has length 2cA,Y2. The sum of the lengths of these intervals
is 2cA. If we choose c < 1/2A, then these intervals have total combined length
less than 1. To prove his result about analytic continuation, Borel needed to know
that there is at least one point in
Borel proved that if the sum of the lengths of the open intervals is strictly less
than 1, then there must be points — in fact, uncountably many points — that are

The interested reader can find a fuller description in Hawkins's Lebesgue's Theory of Integration,
pp. 97—106.
3.2 Accommodating Algebra 65

not in any of these intervals. For c < 1/2A, the set contains uncountably many
points in [0, 1]. In a note appended to this paper, he remarked that his proof actually
demonstrated a stronger statement, a theorem that today we call the Heine—Borel
theorem. In stating and proving this theorem, Borel was interested only in the case
of a countable collection of open intervals. We give it in its most general form.

Theorem 3.6 (Heine—Borel Theorem). If {Uk} is any countable or uncountable


collection of open sets whose union contains [a, b], then there is afinite subcollec-
tion, {Uk, Uk2, ..., }, whose union also contains [a, b].

This theorem will become one of our most useful tools for proving results about
measure theory and Lebesgue integration. It implies that if we have a collection
of open intervals for which the sum of the lengths is strictly less than 1, then the
union of those intervals cannot contain [0, 1]. If it did, then by Heine—Borel there
would be a finite subcollection that also contained [0, 1], and this is impossible. In
Exercise 3.2.3, you are asked to prove that a finite union of intervals of total length
less than 1 cannot cover [0, 1].
Henri Lebesgue in his 1904 book Leçons sur l'Intégration et la Recherche des
Fonctions Primitives (Lectures on integration and the search for antiderivatives)
gave the following proof, which is valid for any collection of open intervals,
including an uncountable collection.

Proof Consider the set S of points x e [a, b] for which [a, x] is contained in
a finite union of intervals from {Uk}. Since a e 5, we know that this set is not
empty. Since the set has an upper bound, it must have a least upper bound, call it
= sup S. The point ,8 is contained in one of these open sets, say U1 = (ai,
a1 If we take the finite union of intervals that contain [a, ,8) and add to
it the open set we have a finite union of open sets that contains [a, This
contradicts the assumption that ,8 = sup S unless ,8 = b.

Compactness
This property of closed, bounded intervals is so important that it has a name,
compactness. The Heine—Borel theorem tells us that any closed, bounded interval,
[a, b], is compact. We are interested in all sets that share this property.

Corollary 3.7 (Compact Closed and Bounded). A set is compact if and


only if it is closed and bounded.

Proof We leave it as Exercises 3.2.4 and 3.2.5 to show that if a set is not closed or
not bounded, then it is not compact.
66 Explorations of R

Definition: Cover and compactness


A collection of open sets, C, whose union contains the set S is called an open
cover of S. Given a cover C of 5, a subcover is any subcollection of sets from C
whose union also contains S. A set S is said to be compact if every open cover of
S contains a finite subcover.

We assume that S is closed and bounded. Since S is bounded, we can find a


closed interval [a, b] that contains S. Since S is closed, its complement 5C is open.
Let C be an open cover of S. If we add the set 5C to this collection, we get a
collection of open sets, call it C' whose union is all of lit It follows that C'is an
open cover of [a, b] and so, by the Heine—Borel theorem, it has a finite subcover of
[a, b] D 5. Jf 5C is in this finite subcover, we remove it to get a finite subcollection
of C that will still cover S.

Proposition 3.8 (Continuous Image of Compact Is Compact). IfS is compact


and f is continuous on 5, then f(S) is compact.

Proof We first prove that f(S) is bounded. Let (Ia) be a sequence of intervals
of length 1 for which f(S) c U Since f is continuous, is open and
is an open cover of S. Since S is compact, we can find a finite subcol-
lection (f_i (In)) that contains all of S. It follows that U 'flk contains f(S), and
this is a finite union of intervals of length 1.
We now prove that f(S) is closed. Let y be any accumulation point of f(S)
and choose a sequence c f(S) that converges to y. Since is
contained in the bounded set 5, the Bolzano—Weierstrass theorem promises us an
accumulation point, x0, of this sequence. It follows that there is a subsequence that
converges to this accumulation point, f_i (Yjk) x0. Since S is closed, xO must be
in S. By continuity, Yjk converges to f(xo), and therefore y = f(xo) e f(S). E

Two Corollaries
There are two immediate corollaries of the Heine—Borel theorem that are histori-
cally intertwined. They predate Borel's theorem of 1895. The first corollary is the
Bolzano—Weierstrass theorem, Theorem 3.4. Since S is bounded, it is contained in
a closed interval [a, b]. If there are no limit points, then for each point of [a, b] we
can find a neighborhood that contains only finitely many points of S. The collection
of these neighborhoods is an open cover of [a, b], and so there is a finite subcover
of [a, b] S. But this finite subcover contains only finitely many points of S.
Since the Heine—Borel theorem follows from the assumption that every bounded
set has a least upper bound, and it in turn implies the Bolzano—Weierstrass theorem,
3.2 Accommodating Algebra 67

we see that the Heine—Borel theorem is yet another equivalent statement of what it
means to say that R is complete.
The second corollary is Theorem 1.13: A continuous function on a closed and
bounded interval is uniformly continuous on that interval. In 1904, Lebesgue
observed that Heine—Borel implies "a pretty demonstration of the uniformity of
continuity." In his words,

Let f be a continuous function at all points of [a, bi. Each point of [a, b] is, by definition,
inside an interval on which the oscillation of f is less than E. One can cover [a, b] with a
finite number of them. Let I be the length of the shortest interval that is used. In each interval
of length 1, the oscillation of f is at most 2E since such an interval overlaps at most two of the
intervals The continuity is uniform.4

How Heine's Name Got Attached to This Theorem


Theorem 3.6 was discovered by Emile Borel in 1895. Eduard Heine died in 1881.
Why is this result called the Heine—Borel theorem? Pierre Dugac has told the
story5 behind this theorem, which he, tongue-in-cheek, refers to as the "Dirichlet—
Heine—Weierstrass—Borel—Schoenflies-—Lebesgue theorem." I shall summarize the
high points. The essence of the answer is that Heine became known as the first
person to prove Theorem 1.13, and his proof looks suspiciously like the proof of
Theorem 3.6.
When Cauchy, in 1823, first proved that every continuous function is integrable,
he needed more than ordinary continuity. He needed uniform continuity.6 There is
some debate about whether Cauchy meant continuity or uniform continuity when
he talked about continuous functions. He never made a clear distinction. Dirichlet
was the first to recognize this distinction and to realize that over a closed and
bounded interval it did not matter because any continuous function would also be
uniformly continuous.
In 1852, Dirichlet gave a course on integration at the University of Berlin, in
which he gave a proof of Theorem 1.13 that I shall loosely paraphrase. Given a
function f, continuous on [a, b], and given any E > 0, we need to find a response
6 so that for any pair of values x1, X2 e [a, b], lxi — x21 <6 implies that lf(xi) —
f(x2)l <E. Consider the set
Sa = {t e <x <t lf(x)—f(a)l <Ej.

Lebesgue (1904, p. 113, fn). The notation for closed intervals was not standard at that time. Instead of using
[a, bi, Lebesgue refers to it as "(a, b) including a and b."
See Dugac (1989, pp.
6
See A Radical Approach to RealAnalvsis, 2nd ed., pp. 241—243.
68 Explorations of R

Clearly a e Sa, so the the set is not empty. Since it is bounded, it has a least upper
bound. Set c1 = sup Sa. Dirichlet observes that if c1 b, then I f(a) — f(ci )I = E
(see Exercise 3.2.6). We now define

c2=sup{t e [ci,b] Ici <x <t If(x)—f(ci)I <E}.

We continue in this way, obtaining a sequence of values a < ci < c2 < c3 <
b with the property that f(ck+1) — f(ck)I = E and if Ck <x <Ck+1, then
I

If(x) — f(cj3I <E. If there are only a finite number of values of Ck before we get
to b, then we have uniform continuity (see Exercise 3.2.7). We only need to rule
out the possibility that (a < c1 < c2 < ...) is an infinite sequence, converging to
some c <b.
If there is such a limit c, we know that f is continuous at c, and therefore we
can find a response 6 so that c — 6 <x < c + 6 implies that If(x) — f(c)I < E/2.
Since the sequence (Ck) converges to c, we can find two consecutive elements of
the sequence that lie inside this 6-neighborhood, say cj and c1+i. But then

E= - f(cj+i)I f(c) - f(cj+i)I <E.


This concludes Dirichlet's proof.

Heine had been one of Dirichlet's students in Berlin. In 1872 he published an


important paper that solidified many of the key concepts of analysis, Die Elemente
der Functionenlehre (Elements of function theory). Among these was uniform
continuity. Heine stated Theorem 1.13 and proved it using precisely Dirichlet's
argument. Heine did not credit Dirichlet. The notes from Dirichlet's 1852 lectures
would be published in 1904, but Heine's paper marked the first time that this
argument appeared in print.
In succeeding years, specific cases of the Heine—Borel theorem reappeared in
various contexts. In 1880, Weierstrass proved that if for each x e [a, b] a series
converges uniformly in some neighborhood of x, then the series converges uni-
formly over [a, b]. In 1882, Salvatore Pincherle proved that if for each x e [a, b]
there is a neighborhood in which f stays bounded, then f must be bounded over
[a,b].
Arthur Schönflies, in 1900, was the first to point out that the Heine—Borel theorem
applied equally well to covers consisting of an uncountable number of open sets.
He was also the first to connect Heine's name to this result, describing it as an
extension by Borel of a theorem of Heine and designating it as the Heine—Borel
theorem. This name gained popularity when it was picked up by William Henry
Young in his 1902 paper, Overlapping Intervals.
Lebesgue was particularly incensed that Heine's name had been attached to
this theorem, and he campaigned for the designation Borel—Schönflies. Both Paul
3.2 Accommodating Algebra 69

Monte! and Giuseppe Vita!i questioned whether Schönflies had any specia! c!aim
to this theorem. They referred to it as the Bore!—Lebesgue theorem, acknow!edging
Lebesgue's priority in proving the genera! case. Bore! himse!f wou!d come to cal!
it the "first fundamenta! theorem of measure theory," a name with a great deal of
merit that, unfortunate!y, has not stuck.
In 1904, Oswa!d Veb!en pointed out that Theorem 3.4, the Bo!zano—Weierstrass
theorem, fo!lows from the Heine—Bore! theorem and, in fact, is equivalent to it.
The key to proving Heine—Bore! is that every bounded set must have a least upper
bound, but this is precisely the property of comp!eteness and thus equivalent to the
Bo!zano—Weierstrass theorem.

Exercises
3.2.1. Show that if every open interva! contains at !east one point of 5, then every
open interva! contains infinite!y many points of S. Show that no matter how many
points of S we have found, if it is a finite number, then we can always find one
more.

3.2.2. Every point in R can be represented with an infinite decima! expansion.


Thus, 3 can be written as 3.000000. ... Exp!ain why 3 can a!so be written as
2.9999999. ... Which points of R have more than one infinite decima! expansion?
Are there any points of R that have more than two infinite decima! expansions?
3.2.3. Prove that the interva! [0, 1] cannot be contained in a finite union of open
interva!s whose !engths sum to !ess than 1.
3.2.4. Show that if a set S is not bounded, then we can find an infinite co!lection of
open sets whose union contains S and such that no finite subcollection wi!! contain

3.2.5. Show that if a set S is not c!osed, then we can find an infinite co!!ection of
open sets whose union contains S and such that no finite subcollection wi!! contain
S. Use the fact that S is not c!osed if and only if there is a !imit point of S that is
not in S.
3.2.6. Prove that if f is continuous on [a, b] and s = sup{t <b I a <x <t
If(x) — f(a)I <E} and s b, then If(a) — f(s)I = E.
3.2.7. C!ean up Dirichlet's proof of Theorem 1.13. Assume that for every E > 0,
there is a finite sequence of values, say a = CO <c1 < = b, such
that if Ck <x < Ck+1 then I f(x) — f(ck)I < E. Exp!ain how to use this to find a 6
response to any cha!!enge of an E > 0.
3.2.8. Prove that the Bo!zano—Weierstrass princip!e, that every infinite bounded
set has a !imit point, imp!ies that every Cauchy sequence converges.
70 Explorations of R

3.2.9. Prove that if every Cauchy sequence converges, then every bounded set has
a least upper bound.
3.2.10. Prove that if every bounded set has least upper bound, then every sequence
of closed, nested intervals has a nonempty intersection.
3.2.11. In 1880, Weierstrass published a proof that if a series > converges
uniformly in some neighborhood of each x in the closed and bounded interval
[a, b], then the series converges uniformly over the entire interval. An outline
of the proof — translated into modern terminology — is given below. Justify each
of these statements. Identify where and how Weierstrass used the completeness
of JR.

1. For each x e [a, b], define


R(x) = sup {E > 0 convergence is uniform over NE(x) 1.
If x, y E [a, b] and D = Ix — yl, then R(x) D + R(y) and R(y) D+
R(x); therefore,
R(x) — D R(y) R(x) + D.
2. For x e [a, b], R is a continuous function.
3. The minimum value of R over [a, b] is strictly positive.
4. Choose an integer n so that (b — a)/n is strictly less than the minimum value
of R. Convergence is uniform over each of the intervals
r b—a b—al
Ia+(f—1) ,a+j , 1<j<n.
L

5. Convergence is uniform over [a, b].


3.2.12. Let be a sequence of positive numbers. Show that there exists a
positive sequence such that
00 00

and only if >n=1 converges.


3.2.13. Consider the series — where the are the rational num-
bers in (0, 1) with denominators that are powers of 2:
1 1 3 1 3 5 7 1
a1 = a2 = a3 = a4 = a5 = a6 = a7 = a8
=
For each = b/2m, b odd, define to be 2_4m. Show that
00 00

n=1 m=1
3.3 Set Theory 71

Find a value strictly between 0 and 1 that is not in


00 00
12k — 1 1 2k — 1 1

U — + = UU — 4m' +
n=1 m=1 k=1
3.2.14. Consider the bounded infinite set of numbers of the form

3m
MeN.
m=1

Show that 1/3 is not an accumulation point. Find three accumulation points of
this set.

3.3 Set Theory


The first mathematician to really exploit the strangeness of the algebraic overlay
of the real number line was Dirichlet when he exhibited the nowhere continuous
function of Example 1.1,
fi,
f(x)_lo
What is significant about this function is that it treats each point of the real contin-
uum as a discrete object that can be examined, tested, and categorized. Although
earlier mathematicians had defined functions in terms of their action at each real
value, no one before Dirichlet had pushed that notion of function to its logical
conclusion. Once this door was opened, truly strange functions began to emerge.
The key to the construction of pathological functions was an understanding of the
variety of possible subsets of R. Following Cantor's 1872 paper on trigonometric
series, many mathematicians assumed that any infinite subset must be either dense
in some interval or one of Cantor's sets of first species. If true, this would have
implied a simplicity in the structure of R that would have made the development
of analysis much easier. Cantor was less certain that this was the whole story.
Over the period 1872—1874, Cantor devoted his attention to the structure of
subsets of R. The first paper that emerged from this study marked the birth of
set theory. It made precise Cantor's realization that the algebraic numbers, those
arising from the algebraic overlay, constitute not just a minority but a negligible
wisp among the points of the real continuum. Georg Cantor had begun the study
of transfinite cardinals.

Cardinality
To lay the foundations for an explanation of Cantor's work, it will be helpful to
borrow concepts and language that would not come into being until much later,
72 Explorations of R

specifically the concept of cardinality. We shall not attempt a formal definition


of cardinality. That would take us too far afield. Informally, it is a property that
every set possesses and that, in some sense, measures the size of the set. What is
important for our purposes is that two sets have the same cardinality if there is a
one-to-one and onto correspondence between them.
Thus, the sets {1, 2, 3, 4, 5}, {a, b, c, d, e}, and {2, 7, 10, 15, 38} share the same
cardinality, which is denoted by 5. Cardinality is most interesting when applied
to infinite sets. Finite sets are not good examples from which to build an intuitive
understanding of cardinality. The "smallest" infinite set is N, the positive integers,
{1, 2, 3, .. .}. Proper containment need not change cardinality. The sets {2, 3, 4, . . .}
and {2, 4, 6, 8, . . .} have the same cardinality as N. In the first case, the one-to-

one and onto mapping is n n + 1; in the second it is n 2n. Even if we


expand to Z, the set of all integers, we still have the same cardinality because
it is possible to list the integers so that we establish a correspondence with N:
{0, 1, —1, 2, —2, 3, —3, . . .}. The cardinality of these sets is designated as read
aleph null. The countable sets are precisely those with cardinality that is finite or
A word of warning about establishing that a set S is countable: The key is that we
must be able to list the elements of S so that if you give me any positive integer, I can
find the unique element of S to which it corresponds, and if I give you any element
of S, you can find the unique positive integer to which it corresponds. Thus, listing
Z as (...,—3,—2,—1,0, 1,2,3,...) or as (0, 1,2,3,...,—1,—2,—3...) does
not establish this one-to-one and onto relationship. In both there is no well-defined
positive integer to which —1 corresponds.
If we consider Q, the set of all rational numbers, we still have not changed
the cardinality. To keep life a littler simpler, we first list just the positive rational
numbers. We cannot list all the positive integers, then all the halves, then the thirds,
and so on. There must be a finite integer that corresponds to 3/2. We consider
rational numbers only in reduced form (we write integers with denominator 1), and
we order them by the sum of the numerator and denominator. For those with the
same sum of numerator and denominator, we order them by the numerator:
1121312341
1' 2' 1' 3' 1' 4' 3' 2' 1' 5'
A similar trick can be used to show that any countable union of countable sets is
again countable. For each m e N, let (ami, am,2, am,3, . .) be a countable sequence
.

of values. We can order the union of all of these sequences, ordering by the sum of
the subscripts and by the first subscript when the sums are equal (see Figure 3.1).
Once we have ordered the positive rational numbers,

a1 = 1, a2 = 1/2, a3 = 2, a4 = 1/3
jas tCioaijj

i oqi oAfliSod SJoqwnu SJoqwnu ui Iluoj uooq (poAowoJ


jo 5105

'ZV— . . •

I7LST OiWJ

°'u. Jo
Jo si

Jo si
JO

uo

si
jo jo

U irnwouiciod jo
U si
u
jo
si
jo si
OM MOU)j

L
tDgJooD jo oip JOMOd Si j
74 Explorations of R

rational numbers are algebraic, and so the cardinality of the algebraic numbers can
be neither more nor less than The set of algebraic numbers has cardinality

Theorem 3.9 so). The cardinality of R is not

The proof presented here is a recasting into modern language of Cantor's original
1874 proof, interesting because of its use of the nested interval principle. The more
commonly known proof is based on an argument made by Cantor in 1891. You will
see it later in this section. The 1874 proof is important not just because it was first.
It also shows that if a set satisfies the nested interval principle and between any
two distinct elements there always lies a third, then the set cannot be countable.
Proof We need to prove that we cannot order the real numbers as (r1, r2, r3, . .), so .

we assume that we can and look for an absurdity that arises from this asssumption.
We pick any closed interval on the real line, say [0, 1], and find the first two real
numbers in our list that are inside this interval. Since all real numbers are in our
list, there are at least two inside [0, 1]. Call them a0 <b0. All of the numbers in
the open interval (ao, b0) are also inside [0, 1], so we have not yet encountered any
of them. We continue down the list until we find the first two real numbers in our
list that are inside (ao, b0), call them a1 < b1. We have not yet encountered any
numbers from the open interval (a1, b1), so we continue until we find the first two
in this interval.
We are generating a nested sequence of closed intervals,
[0,11 D [ao,bo] D [ai,b1] D [a2,b2] D ...,
with 0 <a0 <a1 <a2 < and 1 > > > b2 > . By the nested interval

principle, there is at least one real number contained in all of these intervals, call it
rm. By the strict inclusion of these intervals, rm is not equal to any of the endpoints.
Now we have a problem because and are preceded by at least 2n elements
from our sequence (r1, r2, r3, .. .). This means that we can find an n for which
and come after rm in our sequence. This contradicts the fact that are the
first two real numbers in the list that lie within the open interval E
Theorem 3.9 says that the continuum, the set of points in R, has a cardinality
strictly larger8 than a cardinality that is denoted by the letter c. Are there any
subsets of [0, 1] whose cardinality is larger than and less than c? Cantor believed
that it was not possible, a belief known as the continuum hypothesis.

8
It is not clear that cardinalities can always be ordered. A discussion of what it means for one cardinal number
to be larger than another can be found in Section 5.4.
3.3 Set Theory 75

The Continuum Hypothesis


The cardinality of R is denoted by c (for the continuum). One might well ask, why
not as the next cardinality after The problem is that we do not know that
c is the next cardinality after Cantor believed that it is, but he could not prove
it. The statement that c = is known as the continuum hypothesis. For much of
the twentieth century, proving the continuum hypothesis remained one of the great
unsolved challenges of mathematics.
The solution, achieved through work of Kurt Gödel and Paul Cohen, gives
us a surprising insight into the real numbers: We get to choose whether or not
the continuum hypothesis is true for the real number line. In other words, the
assumptions that have been made about the structure of the real number line are
consistent with having a subset whose cardinality lies between and c, and they
are also consistent with not having such subsets. The real number line is not as
definitively determined as we might have thought.
Kurt Gödel (1906—1978) is perhaps best known for his incompleteness theorem,
that any system of mathematical axioms that is sufficiently complex to include
arithmetic will have propositions that can be neither proven nor disproven within
that system. Gödel taught at the University of Vienna. After the outbreak of the
Second World War, fearing that he might be conscripted into the German army,
Gödel left for Princeton where he eventually became a member of the Institute for
Advanced Study. Einstein was one of his closest friends.
Paul Cohen (1934—2007) studied at Brooklyn College and the University of
Chicago, writing his doctoral dissertation on Topics in the Theory of Uniqueness
of Trigonometric Series. For most of his career, he taught at Stanford University.
In 1966, he received the Fields Medal,9 for his work on the foundations of set
theory.
The continuum hypothesis is not the only property of the real number line over
which we have a choice. In the early twentieth century, mathematicians recognized
the importance of a certain subset S of [0, 11, with the property that every real
number is a rational distance away from exactly one element in S. Initially, it was
assumed that such a set must exist. We can construct it first by picking one rational
number. We then take the irrational numbers and divide them into classes. Two
inationals are in the same class if they are a rational distance apart. We select
one inational from each class. Surely, this gives us our set. A few mathematicians

Until 2002, the Fields Medal was the highest honor any mathematician could win. Awarded only every four
years but to up to four people, it is restricted to mathematicians under the age of 40. In 2002, the Abel Prize was
created by the Norwegian government to honor Niels Henrik Abel. Similar to the Nobel Prize, it is awarded
each year.
76 Explorations of R

quibbled that this means making an uncountable number of selections, and it is not
clear how that can be done.
The ability to choose one element from each equivalence class and so define
this subset is a consequence of what came to be known as the axiom of choice.
It appears in many proofs, but it also has surprising and disturbing consequences
such as the Banach—Tarski paradox. As it turns out, the truth of the existence of
this set is also a matter of choice, a fact also proven by Gödel and Cohen. This
axiom will come to play an important role in Section 5.4, where we shall explore
it in greater detail and say something about the Banach—Tarski paradox.
The problem arises from thinking of R as a set of values that in some sense are
equivalent to algebraic values. The status of the continuum hypothesis shows just
how strange it really is to impose the concept of sets of numbers onto the points
of a continuous line. This does not mean that we should not think of R as a set of
numbers. That is an extremely useful construction that will lie at the heart of our
eventual solution of all of the problems regarding integration and the fundamental
theorem of calculus. But it is a reminder that we must tread very carefully. We are
now in a realm where intuition can no longer be trusted.

Power Sets
If A and B are sets, we use to denote the set of all mappings from B to A. The
reason for this notation is best explained through an example. Let
A={a,b,c}, B={1,2,3,4,5}.
A mapping from B to A assigns one of the letters from A to each of the numbers in
B. There are three possible images of 1 (we can have 1 —* a or 1 —* b or 1 —* c),
three possible images of 2 (no reason that we cannot use the same image more than
once), and so on for a total of 35 possible mappings. We see that for finite sets, the
cardinality of A raised to the cardinality of B.
We now extend this idea to infinite cardinalities. In general A8 means the
cardinality of the set of mappings from a set with cardinality B to a set with
cardinality A.
For example denotes the cardinality of the set of mappings from {1, 2} to N.
Each mapping is uniquely determined by a pair of positive integers, {(i, j) I i, i E
N}. The first coordinate is the image of 1; the second coordinate is the image of
2. We have seen (see Figure 3.1) that the cardinality of such pairs is again and
therefore
=
It is easy to extend this to any finite positive integer n: =
3.3 Set Theory 77

What about This is the cardinality of the set of mappings from N to {O, 1 }. We
are looking at an infinite sequences of Os and is, such as iOiOOl000l0000iO...
There is a natural conespondence between such sequences and the set of real
numbers between 0 and i. We just put a decimal point in front of the sequence and
read this sequence in base 2:

O.iOiOOi000i...2
1 0 i 0 0 i 0 0 0 i

We do have times when two different sequences represent the same real number,
for example,

O.OOiOOiii = O.OOiOi000,

but there are a countable number of these duplications. Exercises 3.3.5—3.3.7 es-
tablish that is the cardinality of R,

= C. (3.3)

The power set of S is the collection of all subsets of S. It is easy to establish a


conespondence between the power set and the set of all mappings from S to {O, i}.
For each mapping, a —* i if and only if a is in the subset. The mapping from N
to {O, i} given by iOlOOi000i0000lO... conesponds to the subset that contains
{i, 3, 6, iO, .}. For this reason, the power set of any set S is usually denoted
. .

by2S.
It was in i 89i, in an address to the first congress of the German Mathematical
Association, that Cantor stated and proved what is now known as Cantor's theorem,
that the cardinality of S can never equal the cardinality of its power set.

Theorem 3.10 (Cantor's Theorem). For any set 5, the cardinality of S is not the
same as the cardinality of the power set of S.

Proof. We assume that S and have the same cardinality and look for a contra-
diction. If the cardinalities were the same, then we would have a one-to-one and
onto mapping from S to the collection of subsets of 5, i/i S : We construct a
subset T c S according to the following rule: a e T if and only if a g i/i(a). Since
T is a subset of 5, it is an element of Since i/i is a one-to-one and onto mapping,
we can find an element b e S for which ifr(b) = T. Is b in T? If b is in T =
then b e ifr(b), so b is not an element of T. That is a contradiction, so b cannot be
in T. But if b is not in T = ifr(b), then b g ifr(b), so b is in T. Having assumed a
78 Explorations of R

one-to-one and onto mapping, we are led to a contradiction whether or not b is in


T. The mapping cannot exist.

Note that, since R has the cardinality of the power set of N, this theorem implies
that R is not countable. In fact, a variation on this argument has become the
common proof that R is not countable. Assume that the set of real numbers in
[0, is countable, and write down their decimal expansions in order

O.aIa2a3a4a5
O.b1b2b3b4b5

O.c1c2c3c4c5...

O.d1d2d3d4d5

Now choose any decimal whose digits do not include 0 or 9 (exercise 3.3.9 asks
you to explain why we avoid those digits) and whose first digit is not a1, whose
second digit is not b2, whose third digit is not c3, whose fourth digit is not d4,
and so on. This number does not appear in the list, and so we were wrong when
we claimed that we could list them in order. We see now that c is not the largest
possible cardinality; 2C is bigger; is even bigger than that.
We can go even further, taking the union of countably many sets of which
the first set has cardinality c and the nth set is the power set of the n — 1st set:
= c, = T1 = This union has cardinality strictly larger than
any of the cardinalities in the sequence. We can then take the power set of this union,
T2 = 2T1• We can now restart with this power set, continue through countably many
power sets, and take the union of these sets: U1 = Tn. This is only the second
iteration of this process. We can do it countably many times: Vi = U=1 W1 =
Vn We have still only described countably many cardinalities. There are
sets of cardinalities that are themselves uncountable, even sets of cardinalities that
have cardinality In fact, for every cardinality b, there is a set of cardinalities
that itself has cardinality b.

Exercises
3.3.1. Show that if(b1, b2, . . .) is any sequence, if x <bn for every n, and if y bn
for some n, then x <y.
3.3.2. Explain how to establish a one-to-one conespondence between N and the
set of all rational numbers.
3.3 Set Theory 79
3.3.3. Explain how to establish a one-to-one conespondence between R and the
set of all inational numbers.
3.3.4. Describe a one-to-one and onto mapping between each of following pairs of
sets:

1. N and the set of integer multiples of 3


2. R and set of real numbers in [0, 11
3. Q and the set of rational numbers in [0, 11
4. N and the set of all pairs of rational numbers in R2
5. R and the set all pairs of numbers (a, b) e

3.3.5. Prove that the set of real numbers in [0, 11 that have more than one repre-
sentation in base 2 is a countable set.
3.3.6. Prove that c + = c. That is to say, find a one-to-one mapping from R to
R for which the image omits countably many elements of R. Hint: If x =
b odd, then x x/2. Otherwise, x x.

3.3.7. Prove that c= c. That is to say, find a one-to-one mapping from


{(a,b)Ia e N,b eR}ontoR.
3.3.8. Explain the connection between the proof of Cantor's theorem, Theo-
rem 3.10, and the proof that the set of real numbers in [0, 1] is not countable.
3.3.9. In the common proof that R is not countable, why do we avoid the digits 0
and 9?
3.3.10. Prove that if A c B and there is a mapping from A onto B, then A and B
have the same cardinality.
3.3.11. Describe the set where A = { 1, 2, 3}. Find the cardinality of this set
and justify your answer.
3.3.12. Describe the set NA, where A = {1, 2, 3}. Find the cardinality of this set
and justify your answer.
3.3.13. Describe the set NN. Find the cardinality of this set and justify your answer.
3.3.14. Describe the sets and Find the cardinality of these sets and justify
your answers.
3.3.15. What is the meaning of and when S is a finite nonempty set? Does
the rule A B =A I BI still hold?

3.3.16. Consider the set of pairs (x, y) that are roots of polynomials in x and y
with rational coefficients; for example, x2 + xy + y2. What is the cardinality of
the set of all such pairs? Justify your answer.
80 Explorations of R

3.3.17. What is the cardinality of the set of all rational polynomials in it (all ex-
pressions of the form + +••• + + ao, where ao, a1, . . ,
.

a nested sequence of intervals with rational endpoints, [a1, b1 I


[a2, b21 , so that no rational number is contained in

fl[an, ba].
4
Nowhere Dense Sets and the Problem with the
Fundamental Theorem of Calculus

This chapter will focus on the types of sets that confused Hankel and Harnack
and many other mathematicians of the late nineteenth century. Consider again
what is left over when we order the rational numbers between 0 and 1, say =
0, a2 = 1, a3 = 1/2,..., and remove all numbers within 1/8 of a1, within 1/16 of
a2, ..., within 1/2 of As Borel showed, there is something left over. In fact,
if we let S denote the set of points that are not eliminated, then Ce(S)> 1/2 (see
Exercise 4.1.1).
This set also gives a counter-example to Hankel's contention that every pointwise
discontinuous function is Riemann integrable. If we define x = 1 if x e 5, = 0
if x g 5, then x is continuous at every rational number and so is pointwise
discontinuous. But no matter how we partition the interval [0, 11, the subintervals
that contain points of S — and thus have oscillation 1 — must have total length at
least 1/2. This function is not Riemann integrable over [0, 11.
Although S seems like a very sparse set, it does not fall into the category of
any of our characterizations of sparse sets. As we shall see in this chapter, it is not
countable. It does not have outer content 0. Since the outer content is not zero, it
cannot be first species. We need a new term to describe the way in which this set
is sparse.
A set T is dense in (a, b) if every open interval in (a, b) contains at least one
point of T. The set S is nowhere dense in (a, b) if and only if every open interval in
(a, b) contains an open subinterval with no points of S (see Exercise 4.1.3). Finite
sets are nowhere dense and so are discrete sets, sets such as N for which each
element is contained in a neighborhood that has no other elements of that set. But
there are also nowhere dense set that are not discrete.
The confusion exhibited by Hankel and many of his contemporaries arises from
the attempt to connect the intuitive idea of a "sparse" set with any of the precise
definitions that were beginning to emerge. In some sense, to be countable is to
be sparse. Such a set, though infinite, is of a smaller order of infinity than the

81
82 Nowhere Dense Sets

Definition: Nowhere dense


A set is nowhere dense if there are no open intervals in which S is dense. In other
words, S is nowhere dense if its derived set does not contain any open intervals.

continuum, R. But as we know with the rationals, a countable set can still be dense.
Mathematicians of the 1870s and into the 1880s would conflate the concepts of
discrete, nowhere dense, first species, and outer content zero, and often throw in the
assumption that such a set must be countable. It would take a while to straighten
these out and clarify how they are related.
The problem arises from the use of three incomparable ways of measuring the
size of a set: cardinality, density, and measure (which, for the moment, means
outer content). We shall straighten these out in Section 4.1. In Section 4.2 we
shall explain the disturbing implications for the fundamental theorem of calculus
that arise from the existence of nowhere dense sets with positive outer content.
Voltena's example of a bounded, pointwise discontinuous function that is not
Riemann integrable is built using such a set. In Section 4.3, we shall rely on our
improved understanding of nowhere dense sets to explore Osgood's justification
of term-by-term integration for any bounded series of continuous functions that
converge to a continuous function. Nowhere dense sets lie at the heart of these first
three sections, but as Osgood's proof makes very clear, they are an inconvenient
tool with which to explore analysis. I have included Osgood's proof for a specific
pedagogical purpose. In struggling with it, the reader is prepared to appreciate — as
did analysts of the early twentieth century — the incredible simplicity and clarity
of Lebesgue's approach. In Section 4.4 we shall explore Baire's insights into the
gulf that separates nowhere dense sets from intervals, insights that will restrict
the possibilities for the set of discontinuities of a function. His work marks the
culmination of our understanding of nowhere dense sets, preparing the ground for
and inspiring Lebesgue's development of measure theory.

4.1 The Smith—Volterra—Cantor Sets


We have seen that any first species set will have outer content zero. What about
sets that are not first species? Cantor would prove in 1883 that given any set S,
its derived set is a union of a countable set and a perfect set, a set that is its own
derived set.
The only perfect sets we have seen so far are the empty set, all of R, closed and
bounded proper intervals (not single points), and finite unions of closed, bounded,
and proper intervals. Since it contains its accumulation points, any perfect set is
closed. Not all closed sets are perfect.
4.1 The Smith—Volterra—Cantor Sets 83

Definition: Perfect
A set is perfect if it is equal to its derived set. In other words, S is perfect if and
only if every point of S is an accumulation point of 5, and all accumulation points
of S are in S.

There are two big questions that we shall answer in this section:

1. Can a nonempty perfect set be nowhere dense? If so, then we would have a set
that is second species and nowhere dense.
2. Can a nowhere dense set have positive outer content? If so, then Hankel's proof
that pointwise discontinuous functions are Riemann integrable collapses. It
should be possible to find a function for which has positive outer content
even though it is nowhere dense.

As we shall see, bounded, perfect, nowhere dense sets can be constructed with any
outer content we wish, provided only that the outer content is strictly less than the
length of interval that contains our set.
The first construction of a perfect, nowhere dense set was by the British math-
ematician Henry J. S. Smith (1826—1883) in 1875. Smith, who taught at Balliol
College in Oxford and was appointed Savilian professor of geometry in 1860, is
known primarily for his work in number theory. Not many mathematicians were
aware of Smith's construction, a fate that was shared by some of his other ground-
breaking work. Most of the exciting mathematics was happening in Germany and
France, and that is where attention was focused. In 1881, Vito Voltena showed how
to construct such a set, but Voltena was still a graduate student, and he published
in an Italian journal that was not widely read. Again, little notice was paid. Finally,
in 1883, Cantor rediscovered this construction for himself, and suddenly everyone
knew about it. Cantor's example is known as the Cantor ternary set. We shall use
this term to refer to Cantor's specific example, but the family of examples of per-
fect, nowhere dense sets exemplified by the work of Smith, Voltena, and Cantor
will be refened to as the Smith—Voltena—Cantor sets, or SVC sets.

The Cantor Ternary Set


We shall build a perfect, nowhere dense set with outer content zero that is contained
in [0, 11. We begin with this interval and remove the middle third, leaving us
with [0, 1/31 U [2/3, 1]. We now remove the middle third from each of these
intervals, leaving us with [0, 1/91 U [2/9, 1/3] U [2/3, 7/91 U [8/9, 1]. We remove
the middle third from each of these, leaving us with eight intervals, each of length
1/27. We remove the middle third from each of these and continue this process
84 Nowhere Dense Sets

I I I I I I I I

0 1/9 2/9 1/3 2/3 7/9 8/9 1

Figure 4.1. Construction of the Cantor ternary set by removal of middle thirds.

indefinitely (see Figure 4.1). We shall call the set of values that remain the Cantor
ternary set. What is left?
We clearly still have many of the rational numbers between 0 and 1 whose
denominators are powers of 3, the endpoints of the intervals we kept. It may seem
that that is all that we have, but there is more.
The easiest way to see what is left is to consider the base 3 expansion of the real
numbers between 0 and 1. This uses the digits 0, 1, and 2, for example,
2 1 0 2 0 1 586

When we eliminate all values between 1/3 and 2/3, we are eliminating those
numbers between 0.13 and 0.23. In other words, we take out all values with a 1
in the third's place and a nonzero digit after that. When we remove the intervals
(1/9, 2/9) and (7/9, 8/9), we are removing the values between 0.013 and 0.023 and
the values between 0.213 and 0.223. In other words, we remove those values with
a 1 in the ninth's place and a nonzero digit somewhere after that. As we continue
our removal, what we are eliminating are the numbers with a 1 anywhere in the
base 3 expansion, provided the 1 is eventually followed by a nonzero digit.
We can simplify the description of the Cantor ternary set. Base 3 representations
that terminate can also be written with repeating 2s. Thus, we have

0. 13 = 0.022222 ..
0.02 13 = 0.02022222. .

0.02202013 = 0.022020022222. . .3.

We can define the elements of the Cantor ternary set as those numbers that can be
written in base 3 without using the digit 1.
It is now easy to find elements of the Cantor ternary set that are not rational
numbers with denominators that are powers of 3. For example,
— 2
0.0202023=—+—+—+•••=-.
2 2 2 1 29 1

32 34 36 9 1—1/9 9 8 4

Proposition 4.1 (Properties of the Cantor Ternary Set). The Cantor ternary set,
C, is perfect, nowhere dense, and has outer content zero.
4.1 The Smith—Volterra—Cantor Sets 85

Proof. The complement of C is a union of open intervals, and therefore C is closed.


By Proposition 3.2, C contains all of its accumulation points. To show that every
point in C is an accumulation point, we begin with any a E C and any E > 0. We
can find another element of C that is in the E-neighborhood of a by finding a k for
which 3—k < E/2 and then switching the kth digit of a. If there is a 0 in the 3k
place of a, we change it to a 2. If there is a 2 in that place, we change it to a 0. The
number with the switched digit is in C and is equal to a ± 2. 3—k, and so it differs
from a by less than E. We thus see that every element of C is an accumulation point
of C, and therefore C is a perfect set.
Since this set is perfect, it can be dense in some open interval only if it already
contains that open interval. But between any two numbers, we can find a number
that requires the digit I somewhere in its base 3 representation, so C cannot contain
any open intervals. It is nowhere dense.
Taking open intervals that contain and are just little larger than [0, 1/31 and
[2/3, we can get the Cantor ternary set inside a finite union of open intervals
whose total length is less than 2/3 + E for any positive E. Using the four intervals
of length 1/9 that contain C, we can get this set inside a finite union of open
intervals of total length less than 4/9 + E. Using the eight intervals of length 1/27,
we see that the outer content is less than 8/27 + E. In general, for any n E N, we
can get C inside open intervals whose total length is less than + E. But
this bound can be brought as close to 0 as we wish, so the outer content of C is
zero. LII

The SVC sets consist of those sets that are constructed by starting with a closed
interval and removing an open subinterval. One then removes an open subinterval
from each of the remaining subintervals and continues through an infinite sequence
of such removals, choosing the subintervals that are removed so that every open
subinterval of the original set overlaps with at least one of the subintervals that are
removed. The SVC set is the intersection of this countably infinite collection of the
sets that remain after each iteration. Every SVC set is closed and nowhere dense.
A particular family of SVC sets consists of those formed by, at the kth iteration,
removing an open interval of length I from the center of each of the remaining
closed intervals. We shall denote the resulting set SVC(n), n > 3. The Cantor
ternary set will, from now on, be referred to as SVC(3).

The Devil's Staircase


The SVC sets lead to many strange constructions. One of the strangest is a function
often referred to as the Lebesgue singular function, but we shall use a more
descriptive label, the devil's staircase, DS(x).
86 Nowhere Dense Sets

Example 4.1. The function DS is a mapping from [0, 11 to [0, 1]. If x is in


SVC(3), then we take the base 3 expansion of x without is, replace each digit 2 in
this expansion by the digit i, and read the resulting number as a base 2 expansion.
For example,
2 i
= 0.23 0.i2 =
20 i i 5
— = 0.02023 0.OiOi2 = —+ — =—
8i 4 i6 i6
i i i i i

i — — i i i i

This function maps SVC(3) onto all of [0, i]. Since SVC(3) is also a subset of
[0, ii, this onto mapping implies that SVC(3) must have the same cardinality as
[0, ii, the cardinality c.

Note that ifa and b are elements of SVC(3), a <b, then DS(a) DS(b).
Equality occurs if and only if a and b are the endpoints of one of the open intervals
that was removed to create SVC(3). Assume that a and b are the endpoints of one
of these intervals, and assume that they agree in the n digits, d1 through We can
represent these values by

Their images under the function DS are the same:


(e1 =d1/2),
=
We now extend this function to all of [0, iJ by mapping the points not in SVC(3)
to the same value as the image of the endpoints of the removed open interval in
which the point lies. In other words, every point in [i/3, 2/3] is mapped to i/2.
Every point in [i/9, 2/91 is mapped to i/4; every point in [7/9, 8/9] is mapped to
3/4 (see Figure 4.2).
We thus get a continuous, increasing function with a graph that connects (0, 0)
and (i, i) but with horizontal steps of lengths that add up to i. Between any two
steps, no matter how close they may be, there are infinitely many other steps. This
function has the curious property that it is a nonconstant function with a derivative
that exists and is zero at every point in [0, i] except at values in SVC(3), a set with
outer content zero. Even though the derivative of DS does not exist at all values in
[0, i], every interval in [0, i] does contain a point at which the derivative exists. It
is possible to define the Riemann integral of the derivative if we simply restrict the
4.1 The Smith—Volterra—Cantor Sets 87

7/8 —

3/4

5/8 —

1/2 —

3/8-

1/4 — 1

1/8 —

1/9 2/9 1/3 2/3 7/9 8/9

Figure 4.2. The devil's staircase.

at which we evaluate DS' to those points where it exists. But then the integral
of the derivative of DS is a constant function, not DS.
This suggests that for the evaluation part of the fundamental theorem of calculus,

ph
F'(x) = f(x) f(x) dx = F(b) — F(a),
Ja

we want to insist that the derivative of F must exist at all points in [a, b]. But this
is not the end of our troubles. As we shall see in the next section, we can use the
SVC sets to create examples of functions that are differentiable at every point in
[a, b], F'(x) = f(x), and yet

ph
f(x)dx F(b) — F(a).
Ja
88 Nowhere Dense Sets

Exercises
4.1.1. Prove that if is any order of the rational numbers in [0, 1], then the
set of points that are not within (1/2)n+2 of has outer content at least 1/2.
4.1.2. Give an example of a closed set that is not perfect.
4.1.3. By definition, if S is nowhere dense in (a, b), then it is not dense in any
interval contained in (a, b). Show that this holds if and only if every open interval
contained in (a, b) has an open subinterval that has no points of S.
4.1.4. Explain the connection between the fact that the set of rational numbers that
can be written with denominators that are powers of 10 is dense in R and the fact
that every real number has a decimal expansion.
4.1.5. Prove that every real number between 0 and 1 can be represented in a base
3 expansion using the digits 0, 1, and 2.
4.1.6. Prove that a set that is first species must be nowhere dense.
4.1.7. Prove that between any two numbers, there will always be a number with a
1 somewhere in its base 3 representation. Consider the set of numbers whose base
3 expansion requires using the digit 1. Show that this set is dense in R.
4.1.8. Prove that the Cantor ternary set has cardinality c.
4.1.9. Prove that no finite set can be perfect unless it is the empty set.
4.1.10. Prove that no countably infinite set can be perfect.
4.1.11. Prove that a set is nowhere dense if and only if its derived set is nowhere
dense.
4.1.12. Prove that the devil's staircase is continuous by explaining how to find
a response 6 > 0 to the challenge E > 0 so that if lx — < 6, then DS(x) —I

DS(y)I <E.
4.1.13. Using the definition of the derivative, justify the assertion that the derivative
of the devil's staircase, DS, is not defined at any point of the Cantor ternary set,
SVC(3).
4.1.14. Let F denote the set of values in [0, 1] that can be written in base 5 without
the use of the digits 1 or 3. Thus, 1/5 = 0.15 = 0.0444.. is in F but 7/25 = 0.125
is not. Describe the open intervals that are removed from [0, 1] to create F. Find
the outer content of F.
4.1.15. For the set F defined in Exercise 4.1.14, define a function, DSF, that
takes each point in F, x = 0.d1d2 .. to y = 0.e1e2. where e, = d,/2. Thus,
.

1/5 = 0.0444. . is mapped to 0.0222. . .3 = 0.13 = 1/3. Show that if (a, b) is


one of the open intervals that is removed to form F, then DSF(a) = DSF(b). We
4.2 'Volterra's Function 89

can extend DSF to all of [0, 1] by defining it to equal DSF(a) = DSF(b) on each
of removed intervals (a, b). Sketch the graph of DSF.

4.2 Volterra's Function


The Cantor ternary set, SVC(3), is an example of a perfect, nowhere dense set
with outer content zero. Vito Volterra was interested in a perfect, nowhere dense
set with positive outer content because it would enable him to construct a function
with a bounded derivative that exists everywhere, but this derivative would not be
Riemann integrable over any closed, bounded interval.
Volterra was born in 1860 in Ancona and earned his doctorate in physics at
the University of Pisa in 1882, studying under the direction of Enrico Betti. He
taught at Pisa and then Torino before being appointed to the chair of mathematical
physics at the University of Rome in 1900. In 1931, all university faculty in Italy
were required to take an oath of allegiance to the Fascist government. Volterra was
one of eleven in all of Italy who refused.
Since the Riemann integral is, strictly speaking, defined only for bounded func-
tions, it is fairly easy to find examples of functions whose derivative cannot be
integrated.

Example 4.2. We consider the function

f(x) = x2 sin(x2), x 0, f(0) = 0.

If x 0, then the derivative of f is given by (see Figure 4.3)

f'(x) = 2x sin(x2) — 2x' cos(x2).


To find the derivative of f at x = 0, we need to rely on the definition

f (0) = lim f(x)—f(O)


x—0
= lim x sin(x -2 ) =
.

0.

So f is a function that is differentiable over the interval [—1, 1], but f' is not a
bounded function on this interval. The Riemann integral exists only for bounded
functions. Notice that if we treat f'(x) dx as an improper integral, then it does
satisfy the fundamental theorem of calculus

p1 1

lim I 2x sin(x2)
e_÷O+
— 2x' cos(x2) dx = lim x2 sin(x2) = sin(1). (4.1)
J
90 Nowhere Dense Sets

0.6

—31t

Figure 4.3. Graph of the derivative defined by f'(x) = 2x sin(x2) — 2x1 cos(x2).

Similarly,

lim 2x sin(x2) — 2x' cos(x2) dx = — sin(1). (4.2)


J1
The improper integral of f' from —Ito 1 exists and is equal to 0.
But is it possible to find a counterexample to the evaluation part of the funda-
mental theorem of calculus that does not rely on unbounded functions? Can we
find a function f for which f' exists and is bounded over an interval, say [0, 11,
but f'is not Riemann integrable over [0, 1]?
In 1878, Dini observed that if f is a nonconstant function that has a bounded
derivative over [a, hi, and if f'is zero on a dense subset of [a, bi, then f' cannot
be integrable on [a, b] (see Exercise 4.2.3), but he could not produce an example
of such a function. This is precisely the type of function that Volterra constructed
in 1881.

SVC(4)
The set SVC(3) was created by removing an interval of length 1/3, then two
intervals of length 1/32, then four of length and so on. To create the set
SVC(4), we remove an open interval of length 1/4 from the middle of [0, 1],
4.2 Volterra's Function 91

I I I I I I I I

0 5/32 7/32 3/8 5/8 25/32 27/32 1

Figure 4.4. Construction of the set SVC(4).

then an open interval of length 1/42 from each of the two remaining pieces, then
intervals of length from each of the four intervals that remain, and so on.
We do not have a nice characterization of SVC(4) comparable to the base 3
description of SVC(3), but we still wind up with a perfect, nowhere dense set (see
Exercises 4.2.4 and 4.2.5). Any finite collection of open sets that covers SVC(4)
must have lengths that add up to at least
1 2 22 1/
1------------—--...=1--(1+-+---+... \
1=-. 1 1 1

4 42 43 44 4\ 2 22 J 2
We can find finite open covers of SVC(4) for which the sum of the lengths of the
intervals comes as close as we wish to 1/2, and therefore the outer content of this
set is 1/2 (see Figure 4.4).
Our next function has a derivative that exists and is bounded but is not continuous
atx = 0.

Example 4.3. Consider the function

g(x)=x2sin(x'), g(0)=0.
This is very much like our previous function, but the derivative is now bounded,

g'(x) = 2x sin(x') — cos(x'), x 0, g'(O) = 0.

Recall that a function is Riemann integrable if and only if for every a > 0, the
set of points for which the oscillations exceeds a has outer content zero. For the
function g', the oscillation at x = 0 is 2. We have only a single point at which the
oscillation is positive, so this function is Riemann integrable. But what if we could
construct a function for which the oscillation at every point of SVC(4) is 2? That
would imply that the set of points at which the oscillation is greater than 1 does
not have zero content, and so the function cannot be Riemann integrable. Our basic
idea is to take copies of g and paste them into each of the intervals that have been
removed. The behavior of our new function at each of the endpoints of the removed
intervals will look just like the behavior of g at x = 0.

Example 4.4. (Volterra's Function). We craft our function with some care. To
find the piece of the function that will go into the interval of length 1/4, we start
92 Nowhere Dense Sets

o . 004

0. 002

—0 . 002

—0. 004

—0 . 006

—0. 008

Figure 4.5. Graph of hi(x), 0.35 <x <0.7.

with our function g and find the largest value less than 1/8 at which g' is zero, call
it a1. We now define the function h1 (x) by (see Figure 4.5)
0, x<3/8,
g(x — 3/8), 3/8 x 3/8 + a1,
hi(x)= g(ai), 3/8+ai <x <5/8—ai, (4.3)
g(5/8—x), 5/8—ai <x <5/8,
0, x>5/8.
We have constructed this function so that it is differentiable at every point in [0, 11,
and the oscillation of h'1 is 2 at both 3/8 and 5/8.
We now define h2, which will be nonzero in the two intervals of length 1 / 16.
We first find a2, the largest value less than 1/32 at which g' is zero. We then have
(see Figure 4.6)
0, x <5/32,
g(x — 5/32), 5/32<x <5/32-I-a2,
g(a2), 5/32+a2 <x <7/32—a2,
g(7/32 — x), 7/32 — a2 <x <7/32,
h2(x) = 0, 7/32 <x <25/32, (4.4)
g(x — 25/32), 25/32 <x <25/32+a2,
g(a2), 25/32+a2 <x <27/32—a2,
g(27/32 — x), 27/32—a2 <x <27/32,
0, x > 27/32.
4.2 Volterra's Function 93

0 . 00075

0. 0005

0 . 00025

—0.00025

—0 . 0005

—0. 00075

Figure 4.6. The graph of h2(x), 0.15 <x <0.22.

0.

—0 . 002

—0.004

—0 . 006

—0 . 008

Figure 4.7. The graph of hi(x) + h2(x), 0 x 1.

The derivative of h1 + h2 has oscillation 2 at 5/32, 7/32, 3/8, 5/8, 25/32, and 27/32
(see Figure 4.7).
We continue in this way. For each n we find the largest value less than
1122n+1 at which g' is zero. We construct a function that is nonzero only inside
the intervals of length 4_il, and in each of those intervals it is two mirrored copies
94 Nowhere Dense Sets

of g over the interval [0, connected by the constant function equal to g(an).
Volterra's function is

V(x) = (4.5)

If x is not in SVC(4), then we can find a neighborhood of x on which only one


of the is nonzero. It follows that for x SVC(4),

V'(x) = (4.6)

If x E SVC(4), then V(x) = 0 and IV(x) — V(y)l <(y — x)2. From the definition
of the derivative,
V(x)—V(y)
lim = 0. (4.7)
x—y
This is also equal to the value of at any x in SVC(4).
The oscillation of V' at any endpoint of one of the intervals is 2. Since every point
of SVC(4) is an accumulation point of the set of endpoints, every neighborhood
of a point in SVC(4) contains points where V' is 1 and points where V' is —1,
and thus we get oscillation 2 at every point of SVC(4). The function V' cannot be
integrated.
Notice that Vt is pointwise discontinuous. Every open interval contains a point —
in fact an entire open interval of points — at which V' is continuous. This is the
counterexample to Hankel's claim that every pointwise discontinuous function is
Riemann integrable.

Perfect, Nowhere Dense Sets


The basic idea behind the SVC sets is to progressively remove a countable collection
of open intervals so that from each remaining interval we remove yet another open
interval. As the next theorem shows, this — or any method equivalent to it — is the
only way to obtain a perfect, nowhere dense set.

Theorem 4.2 (Characterization of Perfect, Nowhere Dense Sets). If S is a


bounded, perfect, nowhere dense, nonempty set, a = inf S, b = sup S, then there
is a countably infinite collection of open intervals, each contained in [a, b] such
that S is the derived set of the set of endpoints of these intervals. Furthermore, the
cardinality of S is c.
4.2 Volterra's Function 95

Proof Since S contains its accumulation points, it is closed and points a and b are in
S. By Theorem 3.5, the complement of S in [a, b], Sc fl [a, b], consists of a count-
able union of disjoint open intervals. Since every point of S is an accumulation point
of 5, a cannot be a left endpoint of one of these open intervals, b cannot be a right
endpoint of one of these open intervals, and no right endpoint of any of these inter-
vals is a left endpoint of another interval. If 5c fl [a, b] consisted of only finitely
many intervals, then S would contain a closed interval of positive length. Since S is
nowhere dense, the number of intervals in 5c n [a, b] must be countably infinite.
We now show that S is the derived set of the set of endpoints of the disjoint open
intervals whose union is 5c fl [a, b]. Let (Ii, '2, 13,...) be an ordering of these
disjoint open intervals. Let be the left endpoint of interval and the
right endpoint. The set of endpoints is contained in S and, since S is perfect, its
derived set is also in S. Since S is nowhere dense, if s is any element of 5, every
neighborhood of s has a nonempty intersection with at least one of the intervals
and therefore an endpoint of this interval is in this neighborhood of s. It follows
that s is an accumulation point for the set of endpoints, and, therefore, S is the
derived set of the set of endpoints.
To prove that S is not countable, it is enough to find a one-to-one mapping from
into S. Given an infinite sequence of Os and is (mi, m2, m3, .. .) where =0
or 1, we create a sequence of nested intervals starting with [ao, b0] = [a, b]. Given
[ai_i, we find the first open interval in our ordered sequence, contained
in [ai_i, We define

[ai_i, if = 0,
if = 1.
We see that a is always a left endpoint. It
follows that and is strictly contained within (ai, by). The right-hand end-
points of our nested intervals, b1 > > . , form a bounded, decreasing sequence
that converges to some fi e S. This fi is the image to which our sequence is mapped,

(m1, m2, m3, ...)

To prove that two distinct sequences map to distinct elements of 5, we let k be


the first position at which the sequences differ. If mk = 0, then the image of this
sequence is strictly less than L(Iflk). If mk = 1, then the image of this sequence is
strictly greater than R(Iflk). The images are distinct.

Theorem 4.2 has implications for the continuum hypothesis. If a perfect set is
dense in an interval (a, b), then it must contain every point in [a, b]. Therefore,
every nonempty perfect set has cardinality c. When this is combined with Cantor's
result (see p. 82) that every derived set is the union of a countable set and a perfect
96 Nowhere Dense Sets

set, we see that every derived set, and therefore every closed set, has cardinality that
is finite, equal to or equal to c. Thus, if there is a subset of R with cardinality
strictly between and c, then it is not closed.
In 1903, Young proved that a subset of R with cardinality strictly between
and c cannot be the intersection of a countable collection of open sets. In 1914,
Hausdorff extended this to exclude sets that are the union of a countable collection
of sets that are the intersection of a countable collection of open sets. Finally,
in 1916, Hausdorff and Alexandrov showed that a set with such an intermediate
cardinality cannot be a Borel set (see definition on p. 127).

SVC(n)
While there are many ways of constructing the countable disjoint open intervals
that constitute the complement of our perfect, nowhere dense set, the method used
for SVC(3) and SVC(4) will work for finding perfect, nowhere dense sets in [0, 1]
whose outer content comes as close to 1 as we wish. We define SVC(n) as the
set that remains after removing an interval of length 1/n centered at 1/2, then an
interval of length 1/n2 from the center of each of the two remaining intervals, then
intervals of length 1/n3 from the centers of each of the remaining four intervals,
and so on, leaving a set with outer content
1 2 22 1 1 n—3
=1
1
2
n3 n 1—2/nn—2
Exercises
4.2.1. For each of the following combinations, either give an example of a bounded,
nonempty set with these properties or explain why such a set cannot exist.
1. nowhere dense, first species, and outer content 0
2. nowhere dense, first species, and positive outer content
3. nowhere dense, second species, and outer content 0
4. nowhere dense, second species, and positive outer content
5. dense in some interval, first species, and outer content 0
6. dense in some interval, first species, and positive outer content
7. dense in some interval, second species, and outer content 0
8. dense in some interval, second species, and positive outer content
4.2.2. Find a perfect, nowhere dense subset of [0, 1] with outer content 9/10.
4.2.3. Show that if f is a nonconstant function that has a bounded derivative over
[a, b], and if f'is zero on a dense subset of [a, b], then f' cannot be integrable on
[a,b].
4.2 Volterra's Function 97

4.2.4. Show that SVC(4) is closed and that every point is an accumulation point of
this set.
4.2.5. Prove that every open interval contained in [0, 1] contains a subinterval with
no points of SVC(4), and therefore SVC(4) is nowhere dense.
4.2.6. For the function g in Example 4.3, prove that the oscillation of g' at x = 0
is 2.

4.2.7. Find the values of a1 and a2 to 10-digit accuracy, where a1 is the largest
number less than 1/8 for which g' is zero and a2 is the largest number less than
1/32 for which g' is zero.
4.2.8. Show that even though V'(x) = this series does not converge
uniformly.
4.2.9. Show that if S = SVC(3) and the intervals are ordered so that longer intervals
precede shorter, then the mapping described in the proof of Theorem 4.2 takes
(0, 1, 0, 1, 0, 1, .) to
. .

2 2 2 _2 1 1

9+81+729+
Explain why this mapping is independent of how we choose to order intervals of
the same length.
4.2.10. Show that the derived set of the set described in Exercise 3.2.14 (p. 71) is
perfect and nowhere dense. Describe the countable union of open sets for which
this derived set is the complement in [—1/2, 1/21.
4.2.11. Of the types of sets listed in Exercise 4.2.1 that do exist, which can be
countable?
4.2.12. Give an example of a bounded, countable, nowhere dense set that has
positive outer content.
4.2.13. Let S be a bounded set with exactly one accumulation point, a. Define
to be the set of points in S that are at least 1/n away from a. Use the fact that
S — {a} = S is countable.
4.2.14. Using induction on the type, prove that any first species set is countable. It
is possible to mimic the proof from Exercise 4.2.13 and define to be the set of
points in S that are at least 1/n from any of the accumulation points of 5, but the
statement

S — 5' =

now requires proof.


98 Nowhere Dense Sets

4.2.15. One assumption that is sufficient to make the evaluation part of the fun-
damental theorem of calculus correct for Riemann integrals is to assume that F'
f
is continuous: If F' = where f is continuous, then f(t) dt = F(b) — F(a).
Explain why this assumption eliminates Volterra's counter-example.

f
4.2.16. Show that while the assumption F' = is continuous may be sufficient
to imply the evaluation part of the fundamantal theorem of calculus (see Exer-
cise 4.2.15), it is not a necessary condition.
4.2.17. Following Weierstrass, we can modify the definition of the Riemann inte-
gral by taking our Riemann sums, — with restricted to
be a point of continuity of f in the interval x3]. Show that even with this
modified definition, the derivative of Volterra's function is not Riemann integrable.

4.3 Term-by-Term Integration


Despite Volterra's function that revealed a disturbing exception to the fundamental
theorem of calculus, no one fully realized that the Riemann integral was inadequate
until Lebesgue described his own integral and demonstrated how many of the
difficulties that had been associated with the Riemann integral now evaporated.
Nowhere was this more evident than in the question of term-by-term integration.
A question that is extremely difficult to answer in the context of the Riemann
integral — what are the conditions that allow us to integrate an infinite series by
integrating each summand? — would suddenly have a simple and direct answer.
This more than anything else convinced mathematicians that the Lebesgue integral
was the correct approach to integration.
To fully appreciate the simplicity that Lebesgue made possible, it is necessary
to spend some time wrestling with term-by-term integration in the setting of the
Riemann integral.
In the 1 860s, Weierstrass proved that if a series of integrable functions converges
uniformly over the interval [a, b], then we can integrate the series by summing the
integrals,
b °° °° b

f fkx) dx = fk(x) dx).

Heine popularized this result and clarified the distinction between pointwise and
uniform convergence in Die Elemente der Functionenlehre (The Elements of Func-
tion Theory) published in 1872. Uniform convergence is sufficient for term-by-term
integration, but it was clear that it was not necessary. Many series that do not con-
verge uniformly still allow for term-by-term integration. As we saw in Section 2.3,
Heine tried to work around the nonuniform convergence by isolating a small set
4.3 Term-by-Term Integration 99

that was problematic and focusing on its complement where convergence would
be uniform.
As would be realized eventually, this problematic set has to be closed and
nowhere dense. In the early 1 870s, it was hopefully believed that this meant it
was a small set. But as the work of Smith, Volterra, and Cantor showed, a closed,
nowhere dense set can be very large and can in fact have outer content as close as
desired to the length of the entire interval in which it lies.
In the 1880s, Paul du Bois-Reymond tackled the problem of term-by-term in-
tegration, proving in 1883 that any Fourier series of an integrable function can
be integrated term by term. In 1886, he published results on the general problem
and focused attention on those values of x for which convergence is uniform in-
side some neighborhood of x. His approach was picked up in 1896 by William F.
Osgood whose work we will study in detail.
Paul du Bois-Reymond (1831—1889) was German. His father had moved to
Germany from Neuchâtel in francophone Switzerland. He received his doctorate
under the direction of Ernst Kummer at the University of Berlin in 1853 and held
positions at a succession of universities including Heidelberg, Freiburg, Tubingen,
and finally at the Technische Hochschule Charlottenberg (Charlottenberg Institute
of Technology) in Berlin. Otto Holder whom we will meet later was one of his
doctoral students in Tubingen.
William F. Osgood (1864—1943) is one of the few Americans to feature in this
story. He went to Germany for his graduate work, studying with Max Noether
at Erlangen and earning his doctorate in 1890. He returned to spend his career
teaching at Harvard.
To clarify the problems with which du Bois-Reymond and Osgood had to deal, it
is useful to consider some examples. Rather than working with series, it is simpler
if we work with sequences of integrable functions, and ask whether
çb 71b
j (lim Sn(x)) dx = lim ( j Sn(x) dx
Ja \Ja
This is equivalent to working with series for we can always define

Sn(x) = fk(x) = Sk(x)- Sk-1(x).

We also assume that this sequence of functions converges pointwise to 0,


lim Sn(x) = 0, for all X.
n—±oc

If Sn(x) —k f(x) and if f is integrable, then we can replace our sequence —k f


by (Sn — f) —* 0. The limit of integrable functions needs not be integrable, a fact
that was illustrated in an example by René Baire in 1898.
100 Nowhere Dense Sets

Example 4.5. (Baire's Sequence). We define

(4.8)
— otherwise.
Each is discontinuous on a finite set of points, so each is integrable. The
function these approach is Dirichlet's function (Example 1.1), which is not Riemann
integrable.

For our purposes, we shall assume that the functions in the sequence as well
as the limiting function are all continuous. In this case, we lose no generality if
we restrict ourselves to sequences that converge to 0. Finally, we assume that the
interval over which we integrate is [0, 1].

What Can Happen


Example 4.6. We begin with the example (see Figure 4.8)
= nxe_flX.
Since = 0 at every x, we see that

(lim =0.
0

However, we also have that


p1 —n
1 —
I dx =
JO 2

1.5

0.5 n=3

Figure 4.8. Graph of y = n 3, 8, and 20.


4.3 Term-by-Term Integration 101

and therefore
/ 1
) = Jim =
1

Jim (f o
n—oo 2

2

The integral of the limit is not equal to the limit of the integral. If we convert this
into a series,

= (kxe_kx2 — (k — 1)xe_1)x2),

we have an example where term-by-term integration yields the wrong value.

The sequence (Ar) does not converge uniformly, so the ability to interchange
limit and integral is not guaranteed, but there are examples of sequences for which
the convergence is not uniform, and yet the integral of the limit is equal to the limit
of the integral.

Example 4.7. We next consider (see Figure 4.9)

=
1 + n3x2
Again we have = 0 at every x, and therefore,

(iim dx = 0.
0

The convergence of (Ba) is not uniform. In fact, the maximum value of


in [0, 1] is n1/2/2, which occurs at x = We cannot force all values of

1.75

1.5

1.25

0.75
n=3
0.5

0.25

0.2 0.4 0.6 0.8 1

Figure 4.9. Graph of y = n2x/(1 + n3x2) at n = 3, 8, and 12.


102 Nowhere Dense Sets

0.5

0.4

0.3

0.2

0.1

Figure 4.10. Graph of y = nx/(l + n2x2) at n = 3, 8, and 20.

within E of 0 by taking n sufficiently Jarge. NevertheJess, in this case

f'
i
n2x
dx=
Jn(1+n3)
Jo 1+n3x2 2n

and therefore
ln(1 + n3)
Jim (jo Bn(x)dx) = Jim = 0. (4.9)
n—+oc n—oc 2n

Example 4.8. Our last example is (see Figure 4.10)


nx
1 + n2x2
Again, we see that the convergence is not uniform, because the maximum value of
is 1/2, occurring at x = Here again we have that

(lim dx = 0.
0

In this case, the integral is

nx ln(1+n2)
I dx=
J0 2n

and, therefore,

Jim
If' Cn(x)dx) = lim
ln(1+n2)
= 0.
0 n—+oc 2n
4.3 Term-by-Term Integration 103

Interchanging limits and integrals works in these last two cases despite the fact
that convergence is not uniform. What is different about them?
We shall not be able to explain why the interchange works for until we have
the tools of Lebesgue integration in hand, but we can tackle now. What is
most noticeable about the sequence (Ca) is that these functions stay bounded. In
the case of the we were able to get convergence to 0 and still have the area
under the graph of y = increase toward 1/2 because the maximal values
of the functions were increasing. This did not interfere with the convergence to
0, because the location of that maximum kept moving left, approaching x = 0.
Whatever positive value of x we might choose, eventually the maximum will occur
to the left of it, and from then on the sequence approaches 0. But if our functions
are all bounded, then we cannot use that trick.

Preserving Some Uniformity


We now return to Osgood who, following du Bois—Reymond, separated those
points that lie within neighborhoods within which we have uniform convergence
from those points that do not. In the three examples, 4.6—4.8, if we take any x > 0,
then the convergence is uniform in some neighborhood of x. The problem point is
x = 0. No matter what neighborhood of 0 we choose, these sequences do not have
uniform convergence in that neighborhood. Osgood called these F-points (read
"gamma points").
Note that convergence is not uniform in any neighborhood of a point in Fa. In
all three of our examples, x = 0 is the only F-point. In the first two examples,
Fa = {0} for any a > 0, no matter how large. In the third example, 0 is an element
of Fa, provided 0 <a < 1/2.

Proposition 4.3 (Characterization of Fa). For any sequence ofcontinuousfunc-


tions, f2, .), that converges pointwise to 0 and for any a > 0, the set Fa
. .

is closed and nowhere dense.

Proof. By Proposition 3.2, Fa is closed if it contains its accumulation points. Let


xo be an accumulation point of Fa, and let N3(xo) be an arbitrary neighborhood

Definition: F-Points
Given a sequence of functions, f2, .. .), that converges pointwise to 0, and
any a > 0, we define Fa to be the set of x such that given any integer m and any
neighborhood of x, there is an integer n m and a point y E for
which > a. We call x a F-point if it is an element of Fa for some 0.a >
104 Nowhere Dense Sets

of xO. Since there is an element x e Fa that lies in the open set N3(xo), we may
choose a neighborhood of x, N3'(x), which is entirely contained within N8(xo).
For any integer m, there is an n m and a point y e N8'(x) c N8(xo) for which
> a. Therefore x0 is also in Fa.
To prove that Fa is nowhere dense, we assume that it is dense in some open
interval (a1, and look for a contradiction. We can find an integer n1 and a
point Yi e (a1, b1) for which > a. Since is continuous, there must
be an open interval (a2, b2) containing Yi and contained in (ai, b1) over which
the absolute value is larger than a/2. Since Fa is dense in (a2, b2), there is an
integer n3 > n2 and a point Y2 e (a2, b2) for which Ifn2(Y2)I > a. We can find a
neighborhood of (a3, b3), a2 <a3 <b3 <b2, over which > a/2.
Continuing in this way, we generate an increasing sequence of integers,
(ni <n2 <fl3 < •.•), anda sequence of nested intervals, [a1, b1] D [a2, b21
[a3, b3] D ••., for which y e [ak, bk] implies that Ifnk(Y)I ? a/2. Since we have
strict containment, ak_i < ak < bk < bk_i, we can consider the closed intervals,
(a1, b1) D [a2, b2] D [a3, b3] D
where y e [ak+1, bk+i] implies that Ifnk(Y)I a/2.
By the nested interval principle, there is a point c contained in all of these
intervals. Since converges to 0, there is an N such that n N implies that
<a/2. Choose any N. This gives our contradiction because c is in
[ak+i, bk+i] and, therefore, Ifnk(c)I ? a/2. LI

Is Boundedness Sufficient?
As du Bois-Reymond knew by the time he was working on the problem of term-
by-term integration, closed nowhere dense sets can be quite large. As we saw in
the last section, there are closed and nowhere dense subsets of [0, 1] with outer
content as close to 1 as we wish. Because of the difficulty of working with such
sets, du Bois-Reymond was never certain whether or not uniform boundedness,
combined with continuity, would be enough to allow term-by-term integration.
He died before Osgood answered this question. Unbeknownst to either du Bois-
Reymond or Osgood, a mathematician at the University of Bolgna, Cesare Arzelà
(1847—19 12), had proven in 1885 that, for any convergent sequence of integrable
and uniformly bounded functions, the limit of the integrals is equal to the integral of
the limit. Arzelà, who had studied with Ulisse Dini in Pisa, anticipated many of the
results in analysis that others would discover, but his work was not widely known.
We shall follow Osgood's proof that relies on continuity because it is simpler and
ties directly to our study of perfect, nowhere dense sets. In fact, the only place that
Osgood used continuity was in the proof of Proposition 4.3.
4.3 Term-by-Term Integration 105

Osgood's proof is more difficult to read than it needs to be because he did not
have access to the Heine—Borel theorem. Borel had published that result a year
earlier, but it would be another five years before Heine—Borel would be recognized
as the powerful tool that it is. Because of this, Osgood relied on the nested interval
principle that, as we have seen, is equivalent to the Heine—Borel theorem, but is
less well suited for Osgood's needs. To simplify matters, we shall use Heine—Borel
at the critical points in this proof.
The first time we use Heine—Borel is to prove Osgood's lemma. In the statement
of this lemma he assumed that the set G is closed, bounded, and nowhere dense.
In fact, he did not need the assumption that G is nowhere dense, so we shall prove
a more general form of his lemma. Recall that Ce is the outer content (p. 44).

Lemma 4.4 (Osgood's Lemma). Let G be a closed, bounded set and let
G1, G2, ... be subsets of G such that
and UGk=G.
It follows that
lim Ce(Gk) = Ce(G).

As Osgood points out, we really need G to be closed and bounded. For example,
if G = Q fl [0, 1], we can let Gk be the set of rational numbers between 0 and 1
with denominators less than or equal to k. In this case, Ce(Gk) = 0 for all k, but
Ce(G) = 1.

Proof. Given an arbitrary 6 > 0, we must show that there is response N so that
n N implies that Ce(Gn) Ce(G) <Ce(Gn) + 6. The first inequality follows from
the fact that c G. Since Ce(Gn) ? Ce(GN), the second inequality will follow if
we can show that Ce(G) < Ce(GN) + 6.
Let Uk be a finite union of disjoint, open intervals that contains Gk and such
that the sum of the lengths of the intervals is strictly less than Ce(Gk) + The
collection is an open cover of G. By the Heine—Borel theorem, it has a
finite subcover. Let N be the largest subscript in this finite subcover.
If Uk is in the finite subcover of G, k <N, then we divide it into two disjoint
sets: = Uk fl UN and = Uk — the part of Uk in UN and the part not
contained in UN. Since both Uk and UN are finite unions of open intervals, is
a finite union of intervals. (They might be open, closed, or half open.) Since both
UN and Uk contain Gk, the sum of the lengths of the intervals in is at least
Ce(Gk), and therefore the sum of the lengths of the intervals in is strictly less
106 Nowhere Dense Sets

than 6121c• Because is a finite union of intervals, if any of the intervals are not
open, we can replace them by slightly larger open intervals and still keep the sum
of the lengths strictly less than We denote this finite union of open intervals
containing
U denote the union of UN with all of the Vk, k < N, for which Uk is
in the finite subcover of G. The set U is still a finite union of open intervals that
contains G, and we have the desired bounds,

Ce(G) <Ce(U) <Ce(GN) + 312N + <Ce(GN) +6.

The Arzelà—Osgood Theorem


Theorem 4.5 (Arzelà—Osgood Theorem). Let ...) be a sequence of
continuous, uniformly bounded functions on [0, 1] that converges pointwise to 0. It
follows that

lim (j dx) = (lim fn(x)) dx 0.


0

As a consequence of this theorem, given any series of continuous functions that


converges pointwise to a continuous function, if the partial sums stay within a
uniformly bounded distance of the value of the series, then we may integrate the
series by integrating each summand. In the proof that follows we shall use Darboux
sums. Osgood did not use them in his proof, but their use clarifies Osgood's
argument.

Proof We need to show that for any a > 0, we can find an N such that n N
implies that

<a.
f
To do this, we shall separate the points in Ta12 from those that are not. The
proof presented here is a modified version of Osgood's proof, recast so as to take
maximum advantage of the Heine—Borel theorem.
From the definition of outer content, we can find a finite union of open intervals
that contains Fa/2 and for which the sum of the lengths of the intervals is as close
as I wish to the outer content of Fa/2. We shall call the union of these open intervals
U. The complement of U is a finite union of closed intervals, some of which might
be single points. We shall use C to denote the intersection of this complement with
[0, 1], still a finite union of closed intervals. The theorem now breaks into two
4.3 Term-by-Term Integration 107

parts, limiting the size of the integral over U and limiting the size of the integral
over C. First, we need to specify our choice of U. For each g e Fa/2, we know that
= 0, so we can find an so that n implies that <a/2.
Define to be the set of g e Fa/2 for which n i implies that I <a/2. We
see that

and UGi=Fa12.

Let B be a uniform bound on the I


B for all n > 1 and all x e [0, 1].
By Lemma 4.4, we can find an integer K and a finite union of open intervals, U,
so that U contains Fa/2 and the sum of the lengths of the intervals in U is less than
Ce(GK) + a/(2B). We define C to be the complement of U in [0, 1]. The set C is a
finite union of disjoint closed intervals, and therefore it also is closed and bounded.
We first bound the integral over C. For each x in C, x is not in Fa/2, so there
is a neighborhood, say (x), and an integer, say so that if y e (x) and
n> then I <a/2. The set of intervals (x) taken over all x e C is an
open cover of C. By the Heine—Borel theorem, we can find a finite number of these
neighborhoods whose union contains C,

(x 1) U N8 (x2) U . U N8 (xv) D C.

Let A = For all x in C, if n A, then <a/2, and there-


fore for any n A, the integral over C of is bounded by

<f < (4.10)

We now consider the integral over U of any with n K. We know that the
sum of the lengths of the intervals in U is less than Ce(GK) + a/2B. If we take
any partition P of U, the sum of the lengths of the intervals that contain points in
GK must be at least Ce(GK). The infimum of the values of I on these intervals
is strictly less than a/2. On all other intervals in this partition, the value of I I
is
still bounded by B. This implies that we have the following bound on the lower
Darboux sum for this partition:

S(P; fn) < Ce(GK) + B(cx/2B).

Since this inequality holds for all lower Darboux sums, and we know that the
function is Riemann integrable, this also provides an upper limit for the Riemann
integral of I I
over U:

<f (4.11)
108 Nowhere Dense Sets

Combining equations (4.10) and (4.11) and using the fact that Ce(C) + Ce(GK) =
1]) = 1, we see that if n > max{A, K), then
Ce ([0,

(4.12)

Exercises
4.3.1. Prove that for any k 1,

urn =0.
fl

4.3.2. Evaluate (k —
dx, k> 1, and then show that
0O 1

dx) =
(f

— (k

4.3.3. In the proof of Proposition 4.3, where does the assumption that is dense
in an open interval, that is, the negation of conclusion, actually get used?

4.3.4. Show that if G is any set with finite outer content,

G1cG2cG3c..., and UGk=G,


then

lim Ce(Gk) <Ce(G).

4.3.5. Show that if F is any open set,

F1DF2DF3D..., flFk=F,

and F1 has finite outer content, then

lirn Ce(Fk) = Ce(F).


4.4 The Baire Category Theorem 109

4.3.6. Give an example of a sequence of sets F1 F2 F3 •••, where F1 is


bounded and for which

lim Ce(Fk) Ce (n Fk
\k
4.3.7. Show that if F1 F2 F3 •, where F1 is bounded, then

lim Ce(Fk)> Ce Fk

4.3.8. Show that x is not a F-point if and only if for each a > 0, there is a
neighborhood of x, N such that that for all y e
andalin > <a.
4.3.9. Consider the sequence of functions defined on [—1, 1] by
0, if x =Oor lxi >2/n,
1
= — sin(7r/x), 0 < xl < 1/n,
n
sin(7r/x), 1/n lxi 2/n.
Show that this sequence converges to 0 for all x e (—1, 1). Show that this conver-
gence is not uniform in any E-neighborhood of 0. For what values of a > 0 is 0 in
['a?
4.3.10. Consider the sequence of functions, defined on (—1, 1) by =
0 and for x 0,

gn(x)=
1 ri 1\
I(Ixl——Im(m—1)I
112 1 1
lfl?2.
m—1L\ mJ J m rn—i
Show that this sequence converges to 0 for all x e [—1, 1]. Show that this conver-
gence is not uniform in any E-neighborhood of 0. Show that 0 is an accumulation
point of F-points, but it is not a F-point. This demonstrates that while each set Fa
is closed, the union of these sets needs not be closed.

4.4 The Baire Category Theorem


Osgood had shown that if a sequence of continuous functions converges to 0 and
is uniformly bounded, then the integral of the limit is the limit of the integrals.
This implies that if is a sequence of continuous functions,
f is f bounded sequence of
continuous functions converging to 0, and so
pb pb pb
lim / — f(x)) dx = 0 lim I = I f(x)dx.
Ja
110 Nowhere Dense Sets

Not every uniformly bounded sequence of continuous functions converges to a


continuous function. Fourier series are one of the best examples. But for Fourier
series, it is true that the integral of the limit equals the limit of the integrals. What
more, beyond the restriction that I f
is uniformly bounded, do we need before
we can conclude that term-by-term integration is legitimate?
Leopold Kronecker (1823—1891) obtained his doctorate at Berlin University in
1845, working on a problem in number theory under the direction of Dirichlet. A
wealthy man involved in his family's banking business, he would not hold an aca-
demic appointment until 1883 when he was appointed chair at Berlin University,
but he was active in research and began lecturing at Berlin University in 1862,
following his election to the Berlin Academy. He was suspicious of the direc-
tion in which Heine, Cantor, and others were leading the study of analysis and
tried to convince them not to publish the key papers we have described. Neverthe-
less, he did make one important contribution to our story in a paper published in
1879.

Theorem 4.6 (Kronecker's Theorem). Let be a sequence of integrable


functions on [a, b] that converge to the integrable function f. If I f
is bounded I

on [a, b] and if,for all a > 0,

lim Ce (E(n, a)) = 0, (4.13)


fl —+ CX)

where

E(n,a)= {x e ?aj,
then
pb pb
/ f(x)dx = lim / (4.14)
Ja

Proof Let B be a bound on I f I. Given E > 0, we need to find an N so that


n N implies that
pb
fn(X)dX_J f(x)dx
J a a

Choose a <E/ (2(b — a)), an N so that n N implies that

Ce (E(n, a)) <E/3B,


4.4 The Baire Category Theorem 111

and a finite union of open intervals, U, that contains E(n, a) and whose outer
content is less than E/2B. We then have that
b b b
f(x) dx
f f f(x) dx

=f
[a,b]—U

In 1885, Cesare Arzelà proved that if is a sequence of integrable functions


that converges to an integrable function and if I — fI is uniformly bounded, then
Ce(E(fl, a)) = 0 (equation (4.13)). This raised the question whether one
could have a uniformly bounded sequence of integrable functions for which
lim
n —k 00

The first explicit example of such a sequence was given by René Baire in his
doctoral dissertation of 1899, (Example 4.5 on p. 100).
René-Louis Baire (1874—1932) entered the École Normale Supérieure as an
undergraduate in 1892 and earned his doctorate there in 1899. He went on to teach
at the University of Montpellier in 1902 and Dijon in 1905. Baire suffered from both
physical and psychological disorders that became progressively debilitating. By
1914, they completely prevented him from teaching or continuing his mathematics.
He spent his last years in bitter solitude.
Baire's dissertation had a profound effect on Henri Lebesgue and the further
development of integration. Baire's thesis, Sur les fonctions de variables réelles
(On functions of real variables), clarified the intimate connection between the
structure of the real numbers and properties of functions. In the process, he made it
very clear that outer content is a fundamentally flawed way of measuring the size
of a set. We begin with the central result of his thesis.

Theorem 4.7 (Baire Category Theorem). An open interval cannot be expressed


as the countable union of nowhere dense sets.

We have seen that nowhere dense sets can be quite large in the sense that their
outer content can be as close as desired to the length of the interval in which they
lie. Baire realized that not even a countable union of them could fill that interval.
This is called the Baire category theorem because of Baire's definition.
112 Nowhere Dense Sets

Definition: Category
A set is of first category if it is a countable union of nowhere dense sets. A set
that is not of first category is said to be of second category.

Thus, the Baire category theorem is more succinctly put as: "Every open interval
is of second category."

Proof We lose no generality if we assume that our interval is (0, 1). Let be
a sequence of nowhere dense subsets of (0, 1). We must show that there is at least
one x e (0, 1) that is not in the union of the Sn.
Since Si is nowhere dense, we can find an open subinterval of (0, 1) that contains
no points of If necessary, we come in slightly from each endpoint to find a closed
interval [ai, b1] c (0, 1), ai < b1, that contains no points of Si. Since S2 is nowhere
dense, we can find a subinterval [a2, b2] c (ai, bi), a2 <b2, that contains no points
of S2. In general, once we have defined [an_i, bn_ 1], an_i < bn_ i, we choose a
subinterval [an, bn] c (an_i, bn_ i), an <bn, that contains no points of 5n• By the
nested interval principle, the intersection bn] contains at least one point,
and this point is not in any of the Sn. E

This is not a hard proof. Baire's genius lay in recognizing how important this
simple observation can be.

Applications of Baire's Theorem


Notice that any countable set is of first category — any set with a single element is
nowhere dense. We immediately get Cantor's theorem that the set of elements in
(0, 1) is not countable. Another easy consequence is given by the next corollary.

Corollary 4.8 (Complement of First Category). The complement of any first-


category set in R is dense in lit In fact, the complement of any first-category set
has an uncountable intersection with every open interval.

Proof. Let S be of first category. Any subset of a first-category set is again of first
category (Exercise 4.4.7), so the intersection of S with any open interval is of first
category. It follows that every open interval contains a point of 5c, and thus Sc is
dense in JR. It is left for Exercise 4.4.8 to show that the intersection of 5c with any
open interval cannot be countable.

Now we get to the heart of what interested Baire, the characterization of discon-
tinuous functions. Recall Hankel's distinction (p. 45) between totally discontinuous
functions, such as Dirichet's function, Example 1.1, which is discontinuous at ev-
ery point, and pointwise discontinuous functions, such as Riemann's function,
4.4 The Baire Category Theorem 113

Example 2.1, which is discontinuous at every rational point with even denominator
but is still continuous at all other points.
It will be convenient to follow Baire and consider the continuous functions as a
subset of the pointwise discontinuous functions. A continuous function is simply
a pointwise discontinuous function for which the set of points of discontinuity has
shrunk all the way down to the empty set.

Corollary 4.9 (Characterization of Pointwise Discontinuous). A function on


[a, b] is pointwise discontinuous if and only if the set of points at which it is
discontinuous is offirst category.

Proof One direction is easy and follows from Corollary 4.8. If the set of discon-
tinuities is of first category, then its complement, the set of points at which the
function is continuous, is dense.
For the other direction, we assume that f is pointwise discontinuous and show
this implies that the set of discontinuities is of first category. We begin by recalling
(Proposition 2.4) that a function f is continuous at c if and only if the oscillation
of f at c, w(f; c), is zero. Let
= {x e 1/kj.

The set of points at which f is discontinuous is the countable union Pk. If


we can show that each Pk is nowhere dense, then we are done. We need to show
that every interval in [a, b] contains a subinterval with no points of Pk.
Pick any interval (a, ,8) c [a, b]. Since f is pointwise discontinuous, we can
find a point of continuity, say c, in (a, Continuity implies that we can control
the change in f by staying close enough to c, so we should be able to find a
neighborhood of c in which every point has oscillation strictly less than 1/k. We
now show how to find this neighborhood.
First find a 6 response so that (c — 6, c + 6) c (a, ,8) and

Ix - cI <6 f(x) -
<4(k +
Now consider the interval (c — 6/2, c + 6/2). If x is in this interval and Ix —
6/2, then Iy — cI <6 and

f(x) - f(x) - f(c) -


1 1 1
+ =
4(k + 1) 4(k + 1) 2(k + 1)

This implies that the oscillation of f at x must be less than or equal to 1/(k + 1) <
1/k. We have shown that Pk is nowhere dense, and therefore the set of points of
discontinuity is of first category.
114 Nowhere Dense Sets

The function g of Exercise 1.1.15 (p. 15) is an example of a function that is


discontinuous at every rational number but continuous at every irrational. Volterra
showed that we cannot have a function that is continuous at the rational numbers
and discontinuous at the irrationals. In fact, he showed that we cannot have two
pointwise discontinuous functions for which the points of continuity of one are the
points of discontinuity of the other, and vice versa. This result is an easy corollary
of Baire's result. We can even strengthen it.

Corollary 4.10 (Volterra's Theorem Strengthened). Let be any count-


. . .

able collection of pointwise discontinuous functions on (a, b). There are uncount-
ably many points in (a, b) at which all of these functions are continuous.

Proof By Corollary 4.9, the set of points of discontinuity for each function is
a set of first category. A countable union of sets of first category is a countable
union of countable unions of nowhere dense sets, so it is also of first category. By
Corollary 4.8, there are uncountably many points of (a, b) not in this union. E

Baire's Big Theorem


One of the points of this work was to be able to say something meaningful about
limits of continuous functions. How discontinuous can a Fourier series be? Baire's
result is impressively strong. We first need to explain what is meant by continuity
relative to a set. See the definition given below.
For example, Dirichlet's characteristic function of the rationals, X is contin-
uous relative to the set of irrationals. It is also continuous relative to the set of
rationals. It is not continuous relative to any set in which both the rationals and
irrationals are dense.

Theorem 4.11 (Limit of Cont Fcns Ptwise Discont). If a function f is


the limit of continuous functions, then it is pointwise discontinuous. In fact, if we
restrict f to any closed set, 5, then the points of continuity off relative to S form
a dense subset of S.

Recall that continuous functions are considered to be a special case of pointwise


discontinuous functions. The limit of continuous functions can be continuous. It

Definition: Continuity relative to a set


The function f is continuous at c E 5, relative to the set 5, if given any E > 0 there
is a response > 0 so that x E S and Ix — Cl < implies that f(x) — < E.
4.4 The Baire Category Theorem 115

Definition: Class
Continuous functions constitute class 0. Pointwise discontinuous functions that
are not continuous constitute class 1. Inductively, if f is the limit of functions in
class n but it is not in any class k n, then we say that f is in class n + 1.

cannot be as discontinuous as Dirichlet's function, which is discontinuous at every


value. It cannot even be as discontinuous as the characteristic function of the set
of points of the Cantor ternary set, SVC(3), that are not endpoints of the deleted
intervals (points of SVC(3) that are not of the form an integer divided by a power
of 3). If we take this characteristic function and restrict it to the Cantor ternary set,
a closed set, we still do not have any points of continuity relative to SVC(3).
I shall discuss the implications of Theorem 4.11 before sketching a proof. With
this theorem in mind, Baire defined classes of functions. Essentially, the larger the
class number, the more discontinuous the function.
Dirichlet's function is not in class 1, but it is the limit of class 1 functions. To see
this, choose positive integers k and n. The function (cos is continuous.
The limit as n approaches cx is
1,
/ / \2n i.e., x E Q and
fk(x) = lim x =
denominator divides k!,

0, otherwise.
This function is not continuous, but it is the limit of continuous functions, so it is
in class 1. We now take the limit

f(x) fk(x)
= =I
This is Dirichlet's function which we know is not in class 1. It must be in class 2.
Are there functions in class 3, 4, .. . up to any finite number? Are there functions so
discontinuous that they are not in any finite class? In 1905, Henri Lebesgue would
show that the answer to both questions is "yes."

Lebesgue's Proof of Theorem 4.11


The previous year, 1904, Lebesgue had published a greatly simplified proof of
Theorem 4.11. It is still more complicated than I want to pursue in this book, but
an outline of his proof is instructive, for it both demonstrates the role of the Baire
category theorem and illustrates a clever idea that is the hallmark of Lebesgue's
approach and that will dominate the next several chapters. Lebesgue partitioned
the range ofthe function.
116 Nowhere Dense Sets

Let f be the limit of continuous functions, f, on [a, bi. As before, we


let Pk denote the set of points at which the oscillation of f is greater than or
equal to 1/k. If we can show that each Pk is nowhere dense, then f is pointwise
discontinuous. We take any open interval (a, ,8) c [a, bi and partition the entire
y-axis from —oc to oc using points... <m_1 <m0 <m1 <m2 < for which
m1+i — m1 < 1/2k. Consider the set

E (a, /3) <f(x)


Notice that if f(x) = then x E E1 Ifm1 <f(x) <m1+i, then x E E11 fl E1.
We have that
00
1

= U E f(xi) —

f on E, is less than 1/k.


Lebesgue begins by using the fact that f is the limit of continuous functions to
prove that each E1 is a countable union of closed sets. In fact, he does more than
this. He proves that f is a limit of continuous functions if and only if, for each
k E N, the domain can be represented as a countable union of closed sets so that
the oscillation of f on each set is strictly less than 1/k.
He next proves that given any set E that is a countable union of closed sets,
we can construct a function for which the points of discontinuity are precisely the
points of E. Let 0, be a function on (a, ,8) for which the points of discontinuity
are precisely the points in E1. Could all of the functions Øj, —oc < i < oc, be
pointwise discontinuous? If they were, then by Corollary 4.10 there would be a
point in (a, ,8), call it c, where all of them are continuous. But f(c) E E1 for some
j, and that means that is not continuous at c, a contradiction. At least one of the
must be totally discontinuous.
If is totally discontinuous on (a, ,8), then there is an open subinterval of(a, ,6)
for which is discontinuous at every point of this subinterval. By the way we
defined the set E1 contains an open subinterval of (a, /3). From the definition
of E1, the oscillation is less than 1/k at every point in this subinterval. We have
shown that Pk is nowhere dense, and therefore, f is pointwise discontinuous.

Discontinuities of Derivatives
We conclude this section with a corollary of Theorem 4.11. It was Darboux who
first observed that if a derivative is discontinuous, then its discontinuities must
be like those in the derivative of Volterra's function. Even though f'(x)
4.4 The Baire Category Theorem 117

does not exist, f' must still satisfy the intermediate value property.' Baire showed
that Volterra's example illustrates the worst possible case in terms of the size and
density of the set of discontinuities of f'.

Corollary 4.12 (Derivative Dense Set of Continuities). Every function that


is a derivative is of class 0 or 1.

Proof Let f be differentiable on (a, b), and let f' be its derivative. Since f is
differentiable, it is continuous on (a, b). For each k 1, the function defined by
f(x + 1/k) — f(x)
fk(x)=
1/k
is also continuous on (a, b — 1/k). Choose a positive integer K. Since f is differ-
entiable, fk(x), k K, exists and equals f'(x) for all x E (a, b — 1/K).
By Theorem 4.11, f'is pointwise discontinuous on (a, b — 1/K). Its points of
discontinuity form a set of first category. Since this is true for every K 1, the set
of points of discontinuity of f' on

(a,b)=U(a,b—l/K)
is also of first category. Therefore, f'is pointwise discontinuous, which implies
that it is of class either 0 or 1. LI

Exercises
4.4.1. Consider the sequence of functions defined in Example 4.5 on p. 100.
Describe the sets
E(n,a)= {x E [0,
Find the value of ce (E(n, a)).
4.4.2. Give an example of a sequence of integrable functions on [0, 11 that converge
to an integrable function and such that
lim ce(E(n,a))=0,
fl —±00

for all a > 0, but such that


çb çb
I lim I
Ja
Thus, we really do need the hypothesis that I — is bounded.

See Theorem 1.7.


118 Nowhere Dense Sets

4.4.3. Prove that any finite union of nowhere dense sets is nowhere dense.

4.4.4. Let C1 denote the Cantor ternary set, C1 = SVC(3). Let C2 be the subset of
[0, 1] formed by putting a copy of C1 inside every open interval in [0, 1] — C1. Let
C3 be the subset of [0, 1] formed by putting a copy of C1 inside every open interval
in [0, 1] — (C1 U C2). In general, let be the subset of [0, 1] formed by putting
a copy of C1 inside every open interval in [0, 1] — (C1 U C2 U U Show
that C = is first category.

4.4.5. For the set C defined in Exercise 4.4.4, find a description of the elements of
C in terms of their representation in base 3.

4.4.6. Let V1 denote Volterra's set, V1 = SVC(4). Exactly as in Exercise 4.4.4,


construct a sequence of nowhere dense sets V1, V2, V3,... such that is the
subset of [0, 1] formed by putting a copy of V1 inside every open interval in
[0, 1] — (V1 U V2 U U Show that V = is first-category and its
outer content is 1.

4.4.7. Prove that any subset of a first-category set is of first category.

4.4.8. Show that if S is of first category and I is an open interval, then Sc fl J


cannot be a countable set.

4.4.9. Show that if f is totally discontinuous on some subinterval of 7r), then


it cannot be represented by a trigonometric series,

a0 + (ak cos(kx) + bk sin(kx)).

4.4.10. We have seen (Corollary 4.12) that any derivative is continuous on a dense
set of points. Can a derivative also be discontinuous on a dense set of points? To
see that the to this question is "yes," let (r1, r2, ...) be an ordering of the
rational numbers in [0, 1]. Let

f(x)=x2sin(1/x), f(0)=0,
and define

Show that f is differentiable at every point in [0, 1] and that its derivative is
discontinuous at each

4.4.11. Prove that any countable union of first-category sets is first category.
4.4 The Baire Category Theorem

4.4.12. Define
f( X ) — J q, if x = p/q E Q where q 1 and gcd(p, q) = 1,

— 0, otherwise
Prove that f is of class 2.
4.4.13. Let be a sequence of real numbers chosen so that no two differ by
a rational number, r1 — Q if i j. Define

Q=UQn
Define the characteristic function of Q, X Q(X) = 1 if x E Q, = 0 if x Q. Prove
that this function is at most of class 3.
5
The Development of Measure Theory

Through the 1880s and 1890s, the Riemann integral piled up a list of inconve-
niences, including the following:

1. It is defined only for bounded functions. While improper integrals had been
introduced to deal with unbounded functions, this fix appears ad hoc. Further-
more, recourse to improper integrals can work only if the set of points with
unbounded oscillation has outer content zero.
2. It is possible to have an integrable function with positive oscillation on a
dense set of points, and therefore the integral is not differentiable at any of the
points in this dense set (Riemann's function, example 2.1). This violates the
antidifferentiation part of the fundamental theorem of calculus on this dense
set.
3. It is possible to have a bounded derivative that cannot be integrated (Volterra's
function, Example 4.4). This violates the evaluation part of the fundamental
theorem of calculus.
4. The limit of a bounded sequence of integrable functions is not necessarily
integrable (Baire's sequence, Example 4.5).
5. The question of finding necessary and sufficient conditions under which term-
by-term integration is valid was turning out to be extremely difficult.

Despite these inconveniences, few mathematicians were dissatisfied with the


Riemann integral. One of the few was Weierstrass. In a letter written to Paul du
Bois-Reymond in 1885, he expressed his unhappiness with the need to consider the
values of the integrand at points where the oscillation is positive. Hankel had proven
that if a function is Riemann integrable, then it is at worst pointwise discontinuous.
That is to say, the points where the function is continuous must be dense. While it is
true that any Riemann integrable function is at worst pointwise discontinuous (see
Exercise 5.1.1), recall from Section 2.3 that Hankel went further and also asserted

120
The Development of Measure Theory 121

that every pointwise discontinuous function is Riemann integrable. This is false


(see Exercise 5.1.2).
Weierstrass suggested modifying Riemann's definition so that in the Riemann
sum,

the t1, t1 are restricted to be points of continuity. It appears that he


was hoping to expand the class of integrable functions so that, for example, the
derivative of Volterra's function would now be integrable.
As du Bois-Reymond pointed out in his response, this does not save us. Even
under Weierstrass's definition, the derivative of Volterra's function fails to be in-
tegrable. Any interval that contains a point of the set we have called SVC(4) will
contain points of continuity at which the value of f is 1, and points of continuity
at which the value of f is —1.
A year later, in his Berlin lectures, Weierstrass took a different approach. He
returned to the idea of integral as area. Given a nonnegative function f, f(x) 0
for all x E [a, b], we consider the set of points in the plane

Sf = a <x b, 0< y < f(x)}.


The integral fa" f(x) dx should be the area of S1. We can extend this to any function.
Define
f(x) = max{—f(x), 0).
We can then define the integral as
b
f(x)dx =
f area(S1+) —

The only question is "what do we mean by the area of a set of points in the plane?"
We can extend the idea of outer content to the plane. The area of any rectangle
is its length times its width. In exact analogy with the definition of outer content
on the real number line, given any set 5, we let C denote the set of all coverings
C of S by a finite number of rectangles, let area(C) be the sum of the areas of the
rectangles in C, and define

Ce(S) = inf area(C).


CEC

This gives us the Weierstrass integral,


b
(W)f f(x)dx = Ce(Sf+) — Ce(5f).
122 The Development of Measure Theory

Definition: Characteristic function


The characteristic function of a set S, x is defined to be 1 if x e S, 0 if x S.

The Weierstrass integral has the advantage that every bounded function is inte-
grable. It yields the desired value for the integral of the derivative of Volterra's
function.
It does, however, have a noticeable drawback. Consider the characteristic func-
tion of a set.
If S and T are disjoint sets that are both dense in [0, 1] (e.g, S could be the
rationals and T the irrationals), then

(W)f (x + X T(x)) dx = 1 (W)f X dx + (W)f X T(x) dx =2.

It is important that integration should be additive, f(f + g) = f f + f g. Weier-


strass's integral would not be the solution, but it was pushing in the right direction.
The key to integration would come from a better understanding of area.

5.1 Peano, Jordan, and Borel


Giuseppe Peano (1858—1932) studied and then taught at the university of Turin
(Torino). He began teaching there in 1880. Peano is best known for his construction
in 1890 of a space-filling curve. This is a curve that passes through every point in the
two-dimensional region 0 x 1, 0 y 1. He is also known for his axioms,
published in 1889, that define the natural numbers in terms of sets, creating the
foundations for later work in logic.
Peano's work on area came early in his career. In 1883, he showed how to
use the upper and lower Darboux integrals that had been invented by Volterra to
provide simplified proofs of many results for Riemann integrals. In 1887, as a
further elaboration of the ideas in the 1883 paper, Peano published Applicazione
geometriche del ca/cob infinitesimale (Geometric applications of infinitesimal
calculus) in which he became one of the first to provide precise definitions of the
interior and the boundary of a set. He distinguished inner and outer content. The
inner content of a set, c1(S), is obtained by considering all finite unions of disjoint
intervals contained in S. The inner content is defined as the supremum, taken over
all such unions, of the sum of the lengths of the intervals in the union. Any set that
does not contain any open intervals has inner content zero.
Inner and outer content are easily extended to sets in R2 or higher dimensions.
Instead of working with intervals, we work with rectangles or rectangular blocks
whose areas or volumes are defined to be the product of the lengths in each
dimension. Peano defined a set as having area if and only if the inner content is
5.1 Peano, Jordan, and Borel 123

Definition: Content
Let be the set of all finite coverings of the set S ç R'1 using n-dimensional
rectangular boxes, and let be the set of all pairwise disjoint finite collections
of open n-dimensional rectangular boxes for which the union is contained in S.
The volume of a rectangular box is the product of the lengths of the sides, and the
volume of C e or e denoted vol(C), is the sum of the volumes of the boxes.
The inner and outer content of a bounded set S are defined, respectively, as

c1(S) = sup vol(C), Ce(S) = inf vol(C).


CECs

If c,(S) = Ce(S), then we say that S has content

c(S) = cs(S) = Ce(S).

equal to the outer content, in which case we can denote this area as simply the
content of the set.
If S is not bounded, let Nk(O) be the neighborhood of the origin with radius k
and define

c1(S) = lim c1 (S n Nk(O)), Ce(S) = lim Ce (S fl Nk(O)).

Content corresponds to the usual concept of length in R, area in R2, and volume
in R3. Inner and outer contents differ only for sparse sets. For example, Q fl [0, 1]
has inner content 0 and outer content 1. The set SVC(4) has inner content 0 —
because it contains no intervals — and outer content 1/2. Its complement in [0, 1],
[0, 1] — SVC(4), has inner content 1/2 and outer content 1.
Peano recognized the relationship between inner and outer content given in
the following proposition. The concept was both popularized and made rigorous
by Camille Jordan in the first volume of Cours d'analyse, published in 1893.
Recall that the boundary of S, denoted as, consists of all points for which every
neighborhood contains at least one point of S and at least one point not in S.
Proposition 5.1 (Inner versus Outer Content). Let S be a bounded set in We
have that

Ce(S) = c1(S) + ce(aS). (5.2)

As a consequence, the set S has a well-defined area, called the content of the set,
if and only if ce(aS) = 0.

Proof We shall prove this theorem for two-dimensional sets. The same idea works
in any number of dimensions. We subdivide R2 into squares, 2m by and
124 The Development of Measure Theory

restrict our attention to those squares that have nonempty intersection with S. Let
5e,m be the union of the squares that contain at least one point of S. This is a cover
of S. As m increases, the area of this cover decreases and approaches the outer
content of 5,

lim area(Se,m) = Ce(S).


m -±00

The squares in 5e,m are of two types: those that contain boundary points of S
and those that do not. Let S8,m be the union of the squares that contain boundary
points, and 5i,m the union of the squares that do not and, therefore, are completely
contained within 5,

5e,m = 5i,m U 58,m, area(Se,m) = area(Sj,m) + area(Sa,m).

As m increases, the area of 5i,m also increases and approaches the inner content of

lim area(Sj,m) = c1(S).


m 00

As m increases, the area of 58,m decreases and approaches the outer content of the
boundary of 5,

lim area(Sa,m) = c,(aS).


m 00

Therefore,

Ce(S) = lim area(Se,m)


m 00
= mlim00 area(Sj,m)+ lim area(Sa,m)
m -*00

=Cj(5)+Ce(a5). LI

Peano observed that if f is a nonnegative function defined on [a, b] and if is


the set of points under f as defined in equation (5.1), then
—b
f(x) dx =
f f(x) dx = Ce(Sf). (5.3)

It follows that f is Riemann integrable if and only if has area in the sense that
its inner and outer contents are equal.

Jordan Measure
Camille Jordan (1838—1922) earned his doctorate in 1861 but worked as an en-
gineer until 1876 when he took a position as professor of analysis at the École
5.1 Peano, Jordan, and Borel 125

Definition: Jordan measure


A set S is Jordan measurable if and only if the inner and outer contents are the
same, c,(S) = Ce(S). The Jordan measure of S is its content, given by either the
inner or outer content, c(S) = c1(S) = Ce(S).

Polytechnique in Paris. His interests ranged widely, and he is known today for
his contributions to group theory, topology, and number theory as well as analy-
sis. His three-volume analysis textbook, Cours d'analyse de l'Ecole Polytechnique
(Course in analysis for the Polytechnical Institute), published 1893—1896, estab-
lished Peano's content as the basis for calculus.
The problem that forced Jordan to focus on content was the issue of multidi-
mensional integrals. In particular, he had to explain how to integrate real-valued
functions in two real variables for which the domain might be a very irregular
region. As long as the inner and outer contents of the domain were equal, one could
make sense of the integral. A critical piece of this is the fact that if we have a finite
collection of pairwise disjoint sets, then the content of the union is the sum of the
contents.

Proposition 5.2 (Finite Additivity of Content). Let S1, S2, ..., be a finite set
of pairwise disjoint Jordan measurable sets. The content of their union is equal to
the sum of their contents,

c (Si U S2 U U Sn) = c(Si) + c(S2) + ... + c(Sn). (5.4)

Proof From the definition of inner and outer content, we have that

Since each set is Jordan measurable,

Jordan's Cours d'analyse was very influential. Henri Lebesgue studied it while
an undergraduate at the École Normale Supérieure. Lebesgue later recounted how
it had prepared the way for his own approach to integration. But using content
to define area had one major flaw; for too many important sets, inner and outer
content are not equal. Volterra's example of a nonintegrable derivative relies on
the fact that the inner and outer content of SVC(4) are different. This suggested to
126 The Development of Measure Theory

several people that a more all-encompassing definition of area might get around
the difficulties of Volterra's function. As we saw with Weierstrass's integral, outer
content alone would not do it. Every bounded set has a well-fined outer content,
but the Weierstrass integral is not additive because outer content is not additive. It
is possible to have two disjoint sets, S fl T = 0, for which

Ce(S U T) Ce(S) + Ce(T).

Borel Measure
Emile Borel (187 1—1956) was only 22 when he became chair of mathematics at
the University of Lille. He returned to Paris in 1896 to teach at the École Normale
Supérieure where Lebesgue was then an undergraduate. In 1909 he became a
professor at the Sorbonne. We have already encountered some of his work in the
Heine—Borel theorem. We now turn to his study of area published in 1898 in Leçons
sur la théorie desfonctions (Lectures on the theory of functions).
In Section 3.3, we discussed Borel's paper of 1895 and how his study of the
convergence of certain infinite series led to the discovery of the Heine—Borel
theorem. It did more than that. Borel was interested in the size of his set of points
on which the series must converge,

= {x e [0,11 — > cA,1/2 for alln > 1),


1/2
where is dense in [0, 1], > 0 for all n > 1, and A < 00. If
c < 1/(2A), then the complement of cannot be a Jordan measurable set. Its
inner content is clearly less than or equal to 2cA,V2 < 1. Since it contains I,
a dense set of points, its outer content is 1. But there is a very natural definition
of the size of Its complement in [0, 1] can be expressed as a countable union
of pairwise disjoint intervals. The size of the complement should be the sum of
the lengths of those intervals. The size of should be 1 minus the size of the
complement.
A similar set for which the same argument should apply is Volterra's set, SVC(4).
Its complement in [0, 1] is a union of pairwise disjoint intervals whose combined
length is 1/2. We know that the outer content of SVC(4) is 1/2, but since this
set contains no open intervals, its inner content is 0. It is not Jordan measurable.
Borel believed that measure should be redefined so that SVC(4) has a well-defined
measure equal to 1/2.
The problem with Jordan measure is that it is only finitely additive. Borel realized
that he needed a measure that is countably additive. This means that if we are given
5.1 Peano, Jordan, and Borel 127

an infinite sequence of pairwise disjoint sets, (S1, 52, 53, . .), Si fl Si = 0 for
.

i j, then we want the measure of the union to equal the sum of the measures,
00 00

m
(u Sk) =

As an example, such a countably additive measure would imply that the set of
rational numbers in [0, 1], a countable union of single points, must have measure 0.
Bore! begins with three assumptions that uniquely define Borel measure:

1. The measure of a bounded interval is the length of that interval (whether open,
closed, or half open).
2. The measure of a countable union of pairwise disjoint measurab!e sets is the
sum of their measures.
3. If R and S are measurable sets, R c 5, then so is S — R. Furthermore, m(S —
R) = rn(S) — m(R).

As an example, each set SVC(n), n > 3, is the complement in [0, 1] of a countable


union of open intervals. It is measurable in Borel's sense, and its measure is
(n — 3)/(n — 2). In fact, any open set is a countable union of open intervals, so
open sets and closed sets are measurable in Borel's sense.

Borel Sets
Borel came very close to our modern concept of measure, but in all of his discus-
sions of the application of his measure, he restricted himself to sets that could be
constructed from intervals using countable unions and complements. Today, we
call the sets that can be built in this way Borel sets.
It is left for you to show (see Exercises 5.1.13—5.1.15) that under this definition,
all open intervals and all half-open intervals are Borel sets, that any countable
intersection of Borel sets is also a Borel set, and if A, B are Borel sets, A D B,
then A — B is a Borel set.

Definition: cr-algebra
A cr-algebra, A, is a collection of sets with the property that
1. 0 E A,
2. if {A1, A2, . . .} is any countable (finite or infinite) collection of sets in A, then
their union is also in A (note that we do not need them to be pairwise disjoint),
3. if A E A, then Ac E A.
128 The Development of Measure Theory

Definition: Borel sets


The collection, B, of Borel sets in R is the smallest a-algebra in R that contains
all closed intervals.

In the next section, we shall see that any Borel set is measurable in Borel 's sense.
This will take some work. If we have a countable union of pairwise disjoint sets for
which the Borel measure is defined, then the Borel measure of the union is the sum
of the Bore! measures. But we need to define the Borel measure of any countable
union of sets with well-defined Borel measure.

The Limitations of Borel Measure


Borel never tried to apply his idea of measure to the problem of integration. In fact,
he went so far as to state,
It will be fruitful to compare the definitions that we have given with the more general definitions
that M. Jordan gives in his Cours d'analyse. The problem we investigate here is, moreover,
totally different from the one resolved by M. Jordan.'

Borel measure actually applies to a much smaller collection of sets than Jordan
measure. This may seem a strange comment in view of our examples of sets that
are measurable in Borel's sense, but not when we try to use Jordan's content.
But, as we shall see, the cardinality of B, the collection of all Borel sets, is only
c, the cardinality of [0, 1]. On the other hand, any subset of SVC(3) (the Cantor
ternary set with content 0) will also have Jordan measure zero. As we saw in
Section 4.1, SVC(3) has cardinality c. The collection of its subsets has cardinality
By Cantor's theorem (Theorem 3.9), this is a larger cardinality than c.

Proposition 5.3 (Borel Does Not Contain Jordan). The cardinality of B is =


c, the cardinality of R. The cardinality of Jordan measurable sets in R is
Therefore, there exist Jordan measurable sets that are not Borel measurable.

In view of the fact that there are so many more Jordan measurable sets than Borel
sets, one might expect that it is fairly easy to give an explicit example of a set that
is Jordan measurable and not Borel. In fact, finding such explicit sets is difficult
(but see Exercises 5.4.8—5.4.11 for an example).
Because it would take us far afield, a discussion of the proof that the cardinality
of B is c has been put in Appendix A. 1, but there is a simple heuristic argument
why this might be the case. The cardinality of the set of all intervals cannot

Borel, (1950, p. 46n).


5.1 Peano, Jordan, and Borel 129

exceed the cardinality of the set of pairs of real numbers, and that is c. Taking
complements only doubles the number of elements in the set, so the cardinality
is still c. Taking differences, unions of countable collections, and intersections of
countable collections of the Borel sets we have already constructed still does not
get us beyond cardinality = c. We proceed by induction.
The problem with this approach is that the induction needs to go beyond all
finite positive integers. One has to be much more careful than this when arguing
by induction with transfinite numbers. But it gives an indication of why we might
believe that the cardinality is only c.
Borel knew that it would make sense to assign measure 0 to the subsets of the
Cantor ternary set. In general, if A c S c B, where A and B are Borel sets with
the same measure, then Borel recognized that we should assign that value as the
measure of S. But he never worked out the implications of this insight. And he
never recognized that this would provide the key to the problems of integration.
That revelation would come to the young graduate student, Henri Lebesgue.

Exercises
5.1.1. Prove that if f is Riemann integrable, then every open interval contains at
least one point at which f is continuous.
5.1.2. Give an example of a pointwise discontinuous function that is not Riemann
integrable.
5.1.3. Show that a bounded set S is Jordan measurable if and only if, given E > 0,
we can find two finite unions of intervals, E1 and E2, such that

E1 c S c E2, and c(E2) — c(Ei) <E.


5.1.4. Prove Proposition 5.2 with the weaker hypothesis that instead of having
Si, S2 be pairwise disjoint, we only insist that their interiors are pairwise
disjoint.
5.1.5. Show that if S and T are Jordan measurable, then so are S U T and S fl T,
and
C (S U T) + c (S fl T) = c(S) + c(T).
5.1.6. Show that if S is Jordan measurable, then so is the interior of S and the
closure of 5, and all three sets have the same content.
5.1.7. Is the set

m+ m,nEN
I n+1
Jordan measurable? Justify your answer.
130 The Development of Measure Theory

5.1.8. Let be the sequence of rationals in [0, 1]. Let 'k be the open interval
of length 112k+ 1 centered at rk. Show that 'k is not Jordan measurable.
5.1.9. Given an example of a bounded open set that is not Jordan measurable.

5.1.10. Give an example of a bounded closed set that is not Jordan measurable.
5.1.11. Give an example of a Borel measurable set that is not Jordan measurable.
5.1.12. Find the smallest a-algebra that contains all closed intervals with rational
endpoints.

5.1.13. Show that under the definition of Borel sets, every open interval and every
half-open interval is a Borel set.
5.1.14. Show that under the definition of Borel sets, every countable intersection
(finite or infinite) of Bore! sets is again a Borel set.
5.1.15. Show that under the definition of Borel sets, if A and B are Borel sets,
A D B, then A — B is a Borel set.

5.1.16. Find an example of a Borel set that is neither the countable union of
intervals (open, closed, half open, or even a single point), nor is it the complement
of such a union.

5.1.17. Define f for x 0 by

f(x) = x > 0; f(0) = 0.

This function is not bounded and so not Riemann integrable on [0, 1], but its
improper integral does exist. Find the value of the improper integral f(x) dx.
Consider the function g defined on [0, 1] by g(x) = 0 for x E SVC(4), and, if(a, b)
is one of the disjoint open intervals whose union equals [0, 1] — SVC(4), then g
on (a, b) is given by

(x)—
a <x
— (a+b)/2<x<b.
Show that even the improper Riemann integral of g over [0, 1] does not exist. Find
the integral of g over each of the open intervals of length on which g is nonzero.
Sum these values (recalling that there are intervals of length to find a
value that would be reasonable to assign to this improper integral.
5.1.18. Consider the set C defined in Exercise 4.4.4. Find the Borel measure of C.
Justify your answer.
5.1.19. Consider the set V defined in Exercise 4.4.6. Find the Borel measure of V.
Justify your answer.
5.2 Lebesgue Measure 131

5.1.20. Let f be any real-valued function defined on R. Show that the set of points
of continuity of f is a Borel set.
5.1.21. Let be a sequence of continuous functions defined on R. Show that
the set of points at which this sequence converges is a Borel set.
5.1.22. A real number x is simply normal to base 10 if each digit appears with
the same asymptotic frequency. Specifically, let N(x, d, n) be the number of oc-
currences of the digit d among the first n digits in the decimal expansion of x,
then N(x, d, n)/n = 1/10. A real number x is normal to base 10 if each
block of k digits appears with the same asymptotic frequency. Let N(x, B, n) be
the number of occurrences of the block B (including overlapping occurrences)
among the first n digits of the decimal expansion of x, the N(x, B, n)/n =

1. Show that for each digit d, {x I N(x, d, n)/n} is a Borel set.


2. Show that the set of real number that simply normal to base 10 is a Borel
set.
3. Show that for any base b 2, the set of real numbers that are simply normal
to base b is a Borel set.
4. Show that for any base b 2, the set of real numbers that are normal to base
b is a Borel set.
5. Show that the set of numbers that are normal to every base b 2 is a Borel
set.

5.2 Lebesgue Measure


Lebesgue was born in 1875. He entered college at the École Normale Superieur
in Paris in 1894. It was there that he studied Jordan's Cours d'analyse and met
Émile Borel. He graduated in 1897, worked in the library for two years, and then
took a high school teaching position in Nancy, all while working on his doctoral
dissertation in which he undertook nothing less than a revolutionary approach to
the problem of integration. Over the period 1899—1901, he published the results of
his studies. The dissertation was formally accepted at the Sorbonne in 1902, and in
1902—1903 he gave the prestigious Cours Peccot at the College de France, in which
he explained his results in Leçons sur 1' integration et la recherche des fonctions
primitives (Lectures on integration and the search for antiderivatives).
After receiving his doctorate, Lebesgue held professorships in Rennes and Poitier
and then at the Sorbonne beginning 1910. In 1921 he was appointed professor at the
College de France. He was elected to membership in the Académie des Sciences in
1922. Lebesgue was a prolific mathematician, making important contributions in
topology, set theory, and partial differential equations. In his later years, he focused
on pedagogy and the history of mathematics. Lebesgue died in 1941.
132 The Development of Measure Theory

S3 S4 S3 S3

Figure 5.1. Lebesgue's horizontal partition.

His Lectures on integration, still in print, is an excellent introduction to the


subject of Lebesgue measure and integration. The first third explains the Riemann
integral, discusses its strengths and flaws, and goes over much of the history we
have presented in the earlier chapters of this book. Lebesgue ends this section
with Jordan measure and the theorem that for nonnegative functions the Riemann
integral is simply the Jordan measure of Sf, the set of points bounded above by the
graph of f and below by the x-axis.
In Chapter 7, he reveals his new idea. To define f(x) dx, he does not follow
Newton, Leibniz, Cauchy, and Riemann who partitioned the x-axis between a and
b. Instead, he lets 1 be the infimum of the values of f, L the supremum, and then
partitions the y-axis between 1 and L. That is, instead of cutting the area using
a finite number of vertical cuts, he takes a finite number of horizontal cuts (see
Figure 5.1).
Consider the partition of the y-axis: 1 = < < < < = L. For each
horizontal strip, 1, < y < we consider all points in the domain for which f(x)
lies in this strip:

Let X be the characteristic function of this set. It is 1 if x is in the set, 0 if it is


not. We can squeeze our function between two sums of characteristic functions:

f(x) 1
for all x E [a, b].
<
5.2 Lehesgue Measure 133

We are working with finite summations, so the integral of each of these sums should
be the sums of the integrals, and integration should preserve the inequalities:
jb
f(x)dx Xs,(x)dx.

The integral of the characteristic function of a set should be the measure of that
set. Our sets S1 are, by the way they have been defined, pairwise disjoint. If they are
always Jordan measurable, then the sum of the measures of the 5, is the measure of
their union, which is b — a. If the lengths of the intervals on the y-axis, 'j+l —
are all less than E, then the upper and lower limits on the value for our integral
differ by at most

(li+1
fX <E x = E(b — a).

We can always force the upper and lower bounds as close as we wish by taking the
partition of the y-axis sufficiently fine.
If we restrict ourselves to Jordan measure, then we are right back at the Riemann
integral. But Lebesgue saw that he could use Borel's idea of measure.
Consider V', the derivative of Volterra's function. Our inability to integrate this
function comes from the fact that in any neighborhood of a point in SVC(4), this
derivative takes on both the values + 1 and —1. If we slice our function horizontally
and look at where the function lies between, say, 0.7 and 0.8, this is a fairly nice
set. It is a countable union of disjoint intervals (see Figure 5.2). The set of values of
x for which V'(x) lies between 0.7 and 0.8 is not measurable in Jordan's sense, but
it is a Borel set. If we use Borel measure, then the derivative of Volterra's function
is integrable. The fundamental theorem of calculus (evaluation part) appears to be
salvageable.

Figure 5.2. Lebesgue's partition of V1.


134 The Development of Measure Theory

Improving on Borel
Lebesgue realized that he could not simply substitute Borel measure for Jordan
measure. As we have seen, that severely reduces the number of sets that are measur-
able. What Lebesgue needed was a concept of measure that would encompass all
Jordan measurable sets and all Borel measurable sets. He laid out three conditions
that his measure would have to possess:

1. It is translation invariant: adding the same number to each element of a mea-


surable set does not change its measure.
2. The measure of a countable union of pairwise disjoint measurable sets is equal
to the sum of the measures of the individual sets.
3. The measure of the interval (0, 1) is 1.

Unlike Borel, Lebesgue sought to find the most general possible collection of sets
for which such a measure could be defined. Lebesgue measure is built on the
concept of the countable cover.
Any countable union of intervals can be expressed as a countable union of pair-
wise disjoint intervals. If C is a countable union of pairwise disjoint intervals, then
Lebesgue's three conditions imply that the Lebesgue measure of C, denoted m(C),
must equal the sum of the lengths of the intervals of C. We use this as our starting
point for the general definition of Lebesgue measure.
Lebesgue outer measure satisfies Lebesgue's three conditions, but it still misses
one critical property that is present in Jordan and Borel measure: if S and T are
Jordan or, respectively, Borel measurable sets, then S — T is also respectively
Jordan or Borel measurable, and the respective measure of S — T is the measure
of S minus the measure of S fl T. In the case of Borel measure, this is built into

Definition: Countable cover


A countable cover of S is a countable collection of intervals whose union contains
S.

Definition: Lebesgue outer measure


Given a bounded set S c [a, b], let C be the collection of all countable covers of
S. The Lebesgue outer measure of 5, me(S), is the infimum over C E C of the
sum of the lengths of the pairwise disjoint open intervals that constitute C,

me(S) = inf m(C).


CEC
5.2 Lebesgue Measure 135

the definition since measurability is preserved under complementation. For Jordan


measure, the justification is a bit more subtle.
Jordan measure is defined in terms of the measure of bounded sets. A bounded
set is Jordan measurable when its inner and outer content are the same, and the inner
content is defined by the supremum of the sum of the lengths of a finite number of
disjoint intervals whose union is contained within the set. The complement of any
finite union of disjoint intervals is a finite union of disjoint intervals (allowing an
interval to be a single point). If S [a, b], then each C, a finite union of disjoint
intervals contained in S, corresponds to exactly one K, a finite union of disjoint
intervals that contains [a, b] — S, and vice versa. It follows that

c,(S) = sup c(C)

= sup ((b — a) — c(K))

=(b—a)— inf c(K)

=(b—a)—ce([a,b]—S).
The condition needed for Jordan measurability, ct(S) = Ce(S), is precisely the
condition we need in order to guarantee that Ce ([a, b] — s) = (b — a) — Ce(S).
Lebesgue outer measure is more complicated because the complement of a
countable union of disjoint intervals is no longer necessarily a countable union of
disjoint intervals — witness the SVC sets. It might seem that the natural definition of
Lebesgue inner measure would be to take the supremum over all countable unions
of disjoint intervals contained in S of the sum of the lengths of these intervals.
That turns out to be a useless notion because it just recreates the inner content
(see Exercise 5.2.11). The right definition of Lebesgue inner measure parallels the

Definition: Lebesgue inner measure


Given a bounded set S c [a, bI, the Lebesgue inner measure of 5, m1(S), is b — a
minus the Lebesgue outer measure of the complement of S in [a, bI:

mj(S)=(b—a)—me([a,b]— S).

Definition: Lebesgue measure


Given a bounded set S c [a, bi, if m1(S) = me(S), then we say that S is Lebesgue
measurable, and its measure is defined to be this common value:

rn(S) = me(S) = m1(S).


136 The Development of Measure Theory

alternate definition of inner content, the one that shows how to compute the content
of a complement.
It may seem that we have stopped short of the full complementarity that we need:
If S and T are measurable, then so is S — T, and m(S — T) = rn(S) — m(S fl T).
As we shall see in the next section, this more general statement of complementarity
is a consequence of the statement that for S c [a, b],
m ([a, b] —5) = (b — a) — rn(S). (5.5)

From now on, the terms outer measure, inner measure, and measure will
refer to Lebesgue measures. The fact that this definition satisfies the first and third
conditions for our measure is easy to check and is left for the exercises. The second
condition, countable additivity, will be proven in the next section.
As Lebesgue observed, outer measure is always subadditive,

Theorem 5.4 (Subadditivity of Outer Measure). Lebesgue outer measure is


subadditive. That is to say,for any countable collection of sets, (Si, S2, . . .),
00 00

me(USi) <>me(Sj). (5.6)

Proof Choose any e > 0 and for each choose a countable open cover
'i2, . . .) for which

We create a countable open cover of U S, by taking all of the open intervals in all
of the chosen open covers. This is a countable collection of countable collections,
so it is still countable. The outer measure of U is bounded by the sum of the
lengths of all of these intervals. This sum is bounded by
00 00

(me(Sj) + €12') me(Si)) + €.


=
Since this upper limit holds for all 0, it follows that
00 00

me (u
It follows that

me(S) + me ([a, b] — 5) > me ([a, b]) = b — a. (5.7)


5.2 Lebesgue Measure 137

This can be restated as

me(S) (b — a) — me ([a, b] — S) = mi(S), (5.8)

with exact equality if and only ifS is measurable. Note that S c [a, b] is measurable
if and only if [a, b] — S is measurable (see Exercise 5.2.5).
As the next theorem shows, subadditivity allows us to collect many examples of
measurable sets.

Theorem 5.5 (Examples of Measurable Sets). If the set S is bounded, then any
of the following conditions implies that S is measurable:
1. The outer measure of S is zero,
2. S is countable, or
3. S is an interval (open, closed, or half open).

Proof.
1. Because of inequality (5.7), we only need to prove that

me(S)+me([a,b]—S)_<b—a.
The outer measure of [a, b] — S is less than or equal to b — a, and therefore

me(S) + me ([a, b] — S) <0 + (b — a).


2. Any countable set has outer measure zero because given any 0, we can put
the nth point inside an open interval of length less than and so obtain a
countable open cover for which the sum of the lengths of the intervals is less
than €.
3. If S is an interval, we can choose [a, b] to be the closure of S (see Exer-
cise 5.2.4). The set [a, b] — S consists of at most two points, so its outer
measure is zero.

Alternate Definition of Lebesgue Measure


Lebesgue defined his outer measure in terms of countable covers which pushed him
into a somewhat awkward definition of the inner measure. It was later realized that
there is a simpler formulation of these definitions. As we have shown, every open
set is a countable union of pairwise disjoint open intervals. We shall now assume
what will be proven in the next section, that any countable union of pairwise disjoint
intervals is measurable, and its measure is the sum of the lengths of these intervals.
It follows that every open set is measurable. Therefore, if S is any bounded set and
U is any open set that contains 5, then U is a countable cover of 5, and U has a
138 The Development of Measure Theory

Definition: Lebesgue measure


Given a bounded set S, the Lebesgue outer measure of S is the infimum of m(U)
taken over all open sets U that contain S. The Lebesgue inner measure of S is
the supremum of m(F) taken over all closed sets F contained in S. The set S is
measurable if and only if the inner and outer measures are equal.

well-defined measure. Furthermore, given any 0 and any countable cover C


of S, we can always expand the length of the ith interval to E/2! and turn C into
an open set U for which m(U) m(C) + €. In other words, we lose nothing if we
restrict our countable covers to consist of open sets.
If S C [a, b] and U is an open set that contains [a, b] — S, then [a, b] — U =
[a, b] fl Uc is a closed set contained in S. If we define the measure of F =
[a, b] — U to be (b — a) — m(U), then the inner measure of S is b — a minus the
outer measure of [a, b] — S, which is the supremum over all closed sets F contained
in S of the measure of F (see Exercise 5.2.16). We have established the equivalent
definition of Lebesgue inner and outer measure given above.

Exercises
5.2.1. Prove that if A and B are measurable sets and A c B then m(A) m(B).
5.2.2. Let x be a real number, x + S = {x + SI X + s, s E S}. Prove that me(x +
S) = me(S). Show that this implies that if S is measurable, then m(x + S) = M(S).
5.2.3. Prove that (0, 1) is measurable and its measure is equal to 1.
5.2.4. Show that the definition of the inner measure does not depend on the choice
of the interval [a, bi. Let a = inf S and ,6 = sup S. Show that
me(Scfl[a,b])=(b_a)_(13_a)+me(Scfl[a,13]).
5.2.5. Prove that if S ç [a, b] then S is measurable if and only if Sc fl [a, b] is
measurable.
5.2.6. Prove that if me(S) = 0 then me(S U T) = me(T).
5.2.7. Prove that for any bounded set S and any E > 0, we can always find an open
set U D S such that

me(U) <me(S) + E.
5.2.8. Prove that for any bounded set 5, we can always find a set T that is a
countable intersection of open sets (and, thus, a Borel set) for which S T and

me(S) = me(T).
5.2 Lebesgue Measure 139

5.2.9. Prove that for any bounded set S and any 0, we can always find a closed
set K C S such that

me(K) > me(S) — E.

5.2.10. Prove that for any bounded set S, we can always find a set L that is a
countable union of closed sets (and, thus, a Borel set) for which S D L and

me(S) = me(L).
5.2.11. Given a set 5, let ICs be the collection of all countable unions of pairwise
disjoint intervals contained in S. Prove that

sup m(K) = c,(S).

5.2.12. Let S and T each be a countable union of pairwise disjoint intervals,


S= and T = Show that

m (S U T) + m (S fl T) = rn(S) + m(T).
5.2.13. Show that if S and T are bounded sets, then

me (S U T) + me (S fl T) <me(S) + me(T).

5.2.14. Let S and T be bounded sets such that


yETj>0.
Show that

me(S U T) = me(S) + me(T).


5.2.15. Show that if S is a bounded, measurable set and T D 5, then

me(T — 5) = me(T) — rn(S).

5.2.16. Assuming that all open sets and all closed sets are measurable, show that if
S ç [a, b], then the infimum over all open sets U that contain [a, b] — S of m(U)
is equal to b — a minus the supremum over all closed sets F ç S of m(F).

5.2.17. Prove that if S C (a, b) has measure 0, then (a, b) — S is dense in (a, b)
and is uncountable.

5.2.18. Using Exercise 5.2.17 and the fact that any uncountable closed subset of R
has cardinality c (see p. 96), prove that if S C (a, b) has measure 0, then (a, b) — S
has cardinality c.
140 The Development of Measure Theory

5.3 Carathéodory's Condition


Constantin Carathéodory (1873—1950) came from a family of the Greek urban elite
of Constantinople (modern Istanbul). His father served as Ottoman ambassador to
Brussels from 1875 until 1900. Carathéodory grew up in Belgium where he studied
engineering. From 1897 until 1900 he worked on the construction of the Assiut
dam in Egypt, studying Jordan's Course d'analyse in his spare time. He then went
to the University of Berlin where he earned his doctorate in mathematics under
the direction of Hermann Minkowski. For most of his career, he taught in German
universities.
In 1914, Carathéodory offered an alternate definition of measurability. It arose
from his extension of the notion of measure to k-dimensional subsets of R", k <q,
an extension that, in 1919, would lead Felix Hausdorff to define and explore sets
with noninteger dimension. We want measurable sets to have the property found in
Borel sets that if X and S are measurable, then so is X — S. Furthermore, m(X —
S) = m(X) — m(X fl S). Carathéodory showed that this follows from Lebesgue's
definition of measure and that something much stronger is true.
What Carathéodory's condition says is that we can take any measurable set
and use it to cut any other set. The outer measure of the set that is being cut —
in this case, X — will equal the sum of the outer measures of the two pieces.
The first part of this section will be devoted to proving that Carathéodory '5
condition is a consequence of Lebesgue's definition of measurability. The heart
of this section will use Carathéodory's condition to prove that any countable
union or intersection of measurable sets is again measurable. This establishes
the fact that Lebesgue measurable sets form a a -algebra, and thus include all Borel
sets.
Note also that Lebesgue's original definition of measure was restricted to
bounded sets. Carathéodory's condition enables us to determine when an un-
bounded set is measurable.
If a bounded set satisfies Carathéodory 's condition, then it satisfies this equality
when X = [a, b] D 5, and so it satisfies Lebesgue's definition of a measurable set.
It will take some work to show that any set that satisfies Lebesgue's condition also
satisfies Carathéodory's.

Definition: Lebesgue measure, Carathéodory condition


A set S is measurable if and only if for every set X with finite outer measure,

me(X — S) = me(X) — me(X fl S). (5.9)


If S is measurable, then the measure of S is defined to be me(S).
5.3 Carathéodory's Condition 141

Theorem 5.6 (Lebesgue Carathéodory). If a bounded set S satisfies


me ([a, b] — S) = (b — a) — me(S) (5.10)
for any interval [a, b] 5, then it satisfies Carathéodory's condition.

Before we prove this theorem, we need a lemma.

Lemma 5.7 (Local Additivity). Let S be any bounded set and (Ii, '2,...) any
countable collection of pairwise disjoint intervals, then

me (s y = me(S fl Ii).

Proof Given any 0, choose a countable open cover, (J1, J2, .. .), of S fl U1
such that m(J1) <me(S U1 + €. Because the intervals Ij, '2,... are
pairwise disjoint, we have that

>m(Jj fl <m(J1).

By the subadditivity of the outer measure, Theorem 5.4, and the fact that

we see that

me(Sfl <>me(Sflhj)

fl

<>m(Jj)

<me(Sfl 5.12

Since this is true for all 0, the first inequality must be an equality.
Proof (Theorem 5.6) We assume that S satisfies equation (5.10). Our first step is
to show that S satisfies equation (5.9) when X is a bounded interval.
142 The Development of Measure Theory

We find two intervals, Y immediately to the left of X and Z immediately to the


right, so that Y, X, and Z are pairwise disjoint and Y U X U Z is a single closed
interval that contains S. From equation (5.10), we see that
me(S) + me (Sc fl (Y U X U Z)) = m(Y U X U Z) = m(Y) + m(X) + m(Z).
(5.13)
By Lemma 5.7, we know

me(S) = me(S fl Y) + me(S fl X) + me(S fl Z), (5.14)


me
(Sc fl (Y U X U Z)) = me(Sc fl Y) + me(Sc fl X) + me(SC Z).
(5.15)
Combining equations (5.13)—(5.15), we see that
m(Y) + m(X) + m(Z) = me(S) + me (Sc fl (Y U X U Z))
= (me(S fl Y) + me(Sc Y))
+ (me(S fl X) + me(Sc x))
+ (me(S n Z) + me(Sc fl z)). (5.16)

Subadditivity implies that for each interval I, we have


me(S fl I) + me(Sc I) > m(I).
The only way we can get equality (5.16) is if we have me(S fl I) + me(Sc fl I) =
m(I) for each of the three intervals. Therefore,
m(X) = me(S fl X) + me(Sc X).
We now let X be any set with finite outer measure. Given any 0, we choose
a countable open cover, (Ii, '2, .), of X so that
.

>m(Ij) <me(X)+E.
We use subadditivity and the first part of our proof:
me(X) <me(S fl X) + me(Sc X)

+me(Scfl

(me(S fl + me(Sc I))

= >m(Ij)

<me(X)+€. (5.17)

Again, since this is true for all 0, the first inequality must be equality. LI
5.3 Carathéodory's Condition 143

We now take the first step toward proving that any countable union of measurable
sets is measurable. This theorem, in addition to moving us toward that result, is
very important in its own right.

Theorem 5.8 (Countable Additivity). If (S1, S2, ...) are pairwise disjoint mea-
surable sets whose union has finite outer measure, then
00 00

me
(u = (5.18)

Proof We start with two disjoint measurable sets, and S2, and invoke the
Carathéodory condition with X = U S2. Since fl X = S2, the Carathéodory
condition gives us exactly what we need,

m(Si) + m(52) = m(Si) + n X) = me(X) = me (Si U S2).

We now proceed by induction. Assume that additivity holds if we have n — 1


pairwise disjoint measurable sets, but that we are faced with n pairwise disjoint
measurable sets. Let X = Si U S2 U ... U Then 5f fl X = S2 U U From
the Carathédory condition and our induction hypothesis, we see that

me(X) = m(Sj) + me (S2 U U = m(51) + m(52) + +


Finally, we consider a countably infinite collection of pairwise disjoint measur-
able sets. For any finite value of n,

me(CJSi) >me(üSi) =>m(Sj).

Since this upper bound holds for all n, the summation converges as n approaches
infinity and
00 00

me (U
The inequality in the other direction follows from subadditivity.

We would like to be able to say that these unions are also measurable. In fact,
we would like to be able to say that any countable union of measurable sets is
measurable. The first step is to consider finite unions and intersections.

Theorem 5.9 (Finite Unions and Intersections). Any finite union or intersection
of measurable sets is measurable.
144 The Development of Measure Theory

Figure 5.3. The set X cut by S1 and S2.

Proof It is enough to show that the union of two measurable sets is measurable.
The intersection of two sets is the complement of the union of their complements,

fl S2 = U

and by induction we can then conclude that any finite union or intersection of
measurable sets is measurable.
Let X be the arbitrary set to be cut by S1 U S2. We divide X into four disjoint
subsets (see Figure 5.3):

X1 = X fl fl Sf), X2 = X fl fl S2),
X3 X4 = (5.19)

To show that S1 U S2 is measurable, we need to show that

me(Xi U X2 U X3) + me(X4) = me(X). (5.20)

We first cut X1 U X2 U X3 by Si:

me(Xi U X2 U X3) + me(X4) = me(Xi U + me(X3) + me(X4). (5.21)

We next use S2 to glue together X3 and X4:

me(Xi U + me(X3) + me(X4) = me(Xi U + me(X3 U X4). (5.22)

Finally, we use Si to glue together X1 U X2 and X3 U X4:

me(Xi U + me(X3 U = me(X). (5.23)

Theorem 5.10 (Countable Unions and Intersections). Any countable union or


intersection of measurable sets is measurable.
5.3 Carathéodory's Condition 145

Proof Again, it is enough to prove this theorem for countable unions. Let Tn =
U S2 U... U T = Si. We shall also use the sets where U1 = T1
and = Tn — for n > 1. In other words, consists of all elements of 5n
that are not in S2,..., or By their construction, the sets Tn and are
measurable, and the are pairwise disjoint. The union of U1 through is Tn.
Since Tn is measurable, we know that for any set X with finite outer measure,

me(X) = me(X fl + me(X fl (5.24)

Since is measurable,

me(X fl + me(X fl Tn_i). (5.25)

By induction,

me(X fl

T D Tn, we know that Tc c and therefore me(X fl ? me(X fl Tc).


We can rewrite equation (5.24) as an inequality,

me(X)> me(X fl + me(X fl TC). (5.27)

We use the same trick we used in the proof of Theorem 5.8. Our summation has an
upper bound independent of n, so the infinite summation must converge and

me(X)> me(X fl + me(X fl TC).

By Theorem 5.8,
00 00 00

me(X fl > me (U(x = me (x fl U Uk) = me(X fl T).

Therefore,

me(X) > me(X fl T) + me(X fl TC).


Subadditivity gives us the inequality in the other direction.

An important consequence of this result is the next theorem that shows that
Lebesgue measure is, in a real sense, not very far removed from Jordan content.
146 The Development of Measure Theory

Definition: Symmetric difference


The symmetric difference of two sets is the set of points that are in exactly one
of these sets. The symmetric difference of S and T is written SAT,

= (S — T) U (T — S) = (S fl TC) U (T fl Sc).

For example, the symmetric difference of the overlapping intervals [0, 2] and
[1, 3] is

[0, 31 = [0, 1) U (2, 3].

Theorem 5.11 (Approximation by Finite Number of Open Intervals). IfS c


[a, bJ is a measurable set, then for any E > 0 we can find a finite union of open
intervals, U, such that
<E. (5.28)

called this the "Second Fundamental Theorem of Measure Theory."2 We


Borel
shall use it several times over the next few chapters.
Proof. By the definition of outer measure, we can find a countable union of open
intervals that contains 5, call it V, such that
m(S) <m(V) <m(S) + E/2.
While S and V are subsets of [a, b], it may be helpful to think of them as subsets of
the plane (see Figure 5.4). Since both S and V are measurable, m(V — 5) <E/2.
Let W denote a countable union of open intervals that contains 5c n [a, b] and
such that me (w — (5c fl [a, bi)) <E/2. Since W is open, Wc is closed. We know
that

W_(Scfl[a,b]) D WflS = S_Wc. (5.29)

Therefore,
m(S— Wc) <E/2.
Since W D 5c fl [a, b], S contains Wc fl [a, b], which is closed. We have now
sandwiched our set S between a closed set and an open set,
Wcfl[a,bI c S c V.

2
Bore!, Leçons sur Ia Théorie des Fonctions, 4th ed. 1950. The first fundamenta! theorem is the Heine—Borel
theorem.
5.3 Carathéodory's Condition 147

Figure 5.4. V S WC fl [a, bJ. The shaded region is W. U is the region inside the dotted hexagon.

Since Wc fl [a bI is closed and bounded, it is compact. Since V is a count-


able union of open intervals that contains Wc fl bi, the Heine—Borel theorem
promises us a finite subcollection of open intervals, U = such that

Wc fl [a, bi c U c V.

By subadditivity,

<me(S fl UC) + me(SC fl U) <me(S — Wc) + me(V — S) <E.

We finish with a corollary that stands in stark contrast to Lemma 4.4 on page 105
which was both much more complicated and much more restricted in its assump-
tions.

Corollary 5.12 (Limit of Measure). IfS1, S2, ... are measurable sets such that
00

sic C 53 C = 5,

then S is measurable and

m(S) = lim m(S,). (5.30)


l-+00

Similarly, if T1 D T2 D T3 D are measurable, T1 has finite measure, and

= T,
148 The Development of Measure Theory

then T is measurable and


m(T) = urn m(T,). (5.31)
l-+00

Pro of The sets S2 — S1, S3 — S2, ... are pairwise disjoint. If we define S0 = 0,
then we can write

urn = jim —

j=l

= urn >m(Si — Si_i)


l-+00
j=1

=m Si — Si_i) =

To prove equation (5.31), we set Si = T1 — T, and apply equation (5.30) (see


Exercise 5.3.5). El

All intervals are measurable. With Theorem 5.10, we see that all Borel sets are
measurable. I leave it as an exercise (Exercise 5.3.4) to verify that for any bounded
set 5,
c,(S) <mi(S) <me(S) Ce(S),

and therefore any Jordan measurable set is measurable in Lebesgue's sense. Are
there any sets that are not measurable? That is an important question with a very
surprising answer that will be revealed in the next section.

Exercises
5.3.1. Show that both R and 0 satisfy Carathéodory's condition.
5.3.2. Let S be an unbounded set with bounded outer measure such that for every
k e N, S fl [—k, ki satisfies Lebesgue's condition for measurability,
me ([k, ki — 5) = 2k — me (5 fl [—k, ki).
Show that S satisfies the Carathéodory condition and that
m(S) = m (Sn [—k, ki).
5.3.3. Show directly that the intersection of two measurable sets is measurable by
proving that

me(X2) + me(Xi U X3 U X4) = me(X). (5.32)


for the sets defined in equation (5.19) on p. 144.
5.3 Carathéodory's Condition 149

5.3.4. Show that if the set S is bounded, then

c,(S) <mi(S) <me(S) Ce(S).

5.3.5. Use equation (5.30) to prove equation (5.31).

5.3.6. Justify the set containment

WnS
given in (5.29).

5.3.7. Let U be an open set such that S c U and U fl T = 0. Show that

me (S U T) = me(S) + me(T).
5.3.8. Show that if S and T are measurable, then

m(S U T) + m(S fl T) = m(S) + m(T).

5.3.9. Show that if me(S) <oc and there is a measurable subset T c 5 such that
m(T) = me(S), then S is measurable.
5.3.10. Let S be any subset of R. Show that for any E > 0 there is an open set
U D S such that me(S) <m(U) + E. Show that there is a countable intersection of
open sets, G, such that G D S and me(S) = m(G).

5.3.11. Show that for any bounded set 5, the following statements are equivalent:
1. 5 is measurable.
2. Given any E > 0, there is an open set U S such that me(U — 5) <E.
3. There is a countable intersection of open sets G S such that me(G — 5) = 0.
4. Given any E > 0, there is a closed set C S such that me(S — C) <E.
5. There is a countable union of closed sets F c S such that me(S — F) = 0.

5.3.12. Show that the statements of Exercise 5.3.11 are also equivalent when S is
any subset of lit

5.3.13. Let S and T be sets with finite outer measure. Show that

me(S U T) = me(S) + me(T)


if and only if there are measurable sets Si and T1 such that S c Tc T1, and
m(Si fl

T be sets with finite outer measure. Show that if me(S U T) =


me(S) + me(T), then U T) = me(S) + m,(T).
150 The Development of Measure Theory

5.3.15. For a sequence of sets in IR, we define the supremum and the
infimum3 as
00 00 00 00

lim = fl
k=1 n=k
and lim
n-÷oo
= U fl
k=1 n=k
1. Show that if each Sn is measurable, then

m (lim 11P2 m(Sn).


J n-÷oo
\n—÷oo

2. Show that if, in addition, m(Sn U Sn+1 U...) <00 for at least one n 1, then

n-+00
limm(Sn).
n—*00

5.3.16. We say that a sequence of sets, converges if 5n =


5n• We denote this common value by limn÷00 5n•
1. Show that any monotonic sequence of sets converges.
2. Show that if (Sn) is a convergent sequence of measurable sets, 5n c T for all
n> 1,andme(T) <oo,then
m (lim = lim m(Sn).
/ n—*00
\n—÷00

5.3.17. Let S be the set of points in [0, 1] that do not require the use of the digit 7
in their decimal expansion. Show that S is measurable and rn(S) = 0.
5.3.18. Find the Lebesgue measure of the set of points in [0, 11 for which there is
a decimal expansion that uses all of the digits 1 through 9.

5.4 Nonmeasurable Sets


In 1905, Giuseppe Vitali (1875—1932) published an example of a nonmeasurable
set. Vitali graduated from the Scuola Normale Superiore in Pisa in 1899. He worked
with Dini for two years before taking ajob as a high school teacher and then entering
politics, representing the Socialist Party on the city council in Genoa. With the rise
of the Fascists in 1922 and the dissolution of the Socialist Party, he returned to
mathematics. He suffered a stroke in 1926 that left half of his body paralyzed, but
he continued to make important contributions to analysis until his death in 1932
from a heart attack.
The idea behind Vitali's construction is as follows. We separate IR into equiv-
alence classes. Two numbers are in the same equivalence class if they differ by
a rational number. All of the rational numbers constitute one equivalence class.
All numbers of the form — a where a e Q is another equivalence class. All

First introduced by Borel in 1905.


5.4 Nonmeasurable Sets 151

numbers of the form 7 — b where b e Q is a third. From each equivalence class,


we select one number that lies within (0, 1) and call the resulting set

Theorem 5.13 (Existence of Nonmeasurable Set). The set .A/ is not measurable.

Proof Let q be any rational number and define the translation .A/ + q to be {a +
qae We have added a rational number to each element of so .A/ + q
also consists of exactly one element from each equivalence class. If qi and are
distinct rational numbers, then J\f + qi is disjoint from J\f +
Every real number in (0, 1) is contained in J\f + q for exactly one rational value
of q, and this value of q lies strictly between —1 and 1. To see this, take any
real number a e (0, 1) and find the equivalent number ,8 e By definition of
the equivalence, ,8 — a e Q and —1 <fi — a < 1. We can bound the union of the
pairwise disjoint sets J\f + q for rational q between —1 and 1:

U
qEQfl(—1,1)

The outer measure is translation invariant (because it is based on interval lengths,


and interval lengths are translation invariant), so me(J\f + q) = me(J\f). By subad-
ditivity,

U
\qEQn(—1, 1) / qEQfl(—1, 1) qEQfl(—1, 1)

This tells us that me(J't/) > 0.


is measurable, then all of the sets J\f + q must be measurable. This implies
that UqEQn(_1,1)CAT + q) is a countable union of measurable, pairwise disjoint sets,
and so

U = m(J\f)=oo.
\qEQn(—1,1) / qEQfl(—1,1)

The set J\f cannot be measurable.

Difficulties
This would seem to settle the matter. There are nonmeasurable sets. But Vitali's
paper landed in the very center of a raging controversy among mathematicians.
The construction of J'./ requires selecting one number from each equivalence class.
We have uncountably many equivalence classes.
Is it possible to have a set whose definition requires uncountably many choices?
152 The Development of Measure Theory

Just a year earlier, many prominent mathematicians, Lebesgue among them,


asserted that this should not be allowed. To appreciate what Vitali stepped into, we
need to back up and investigate some of the issues created by Georg Cantor's work
on transfinite numbers.
When Cantor first introduced his different sizes of c= he
thought of them only as cardinal numbers, descriptions of the relative size of a
set. Finite cardinal numbers can also be thought of as ordinal numbers, describing
a position in the ordered sequence of numbers. Thus "5" describes the size of the
set {A, B, C, D, E}. It is also the integer that comes after 4 and before 6. Is there
a similar ordering of the transfinite numbers? Can we write < c= <2c?
It may seem that the answer is obviously "yes," but there is a subtlety here. What
enables us to say "5 < 6" is the fact that in any set of cardinality 6, we can always
find a subset of cardinality 5. In every set of cardinality c, is there always a subset
of cardinality In general, given any two transfinite cardinals, a and ,8, is it
always true that a <fi, a = ,8, or ,8 <a? This became known as the trichotomy
property. If two sets are not in one-to-one correspondence, is one of the sets
always in one-to-one correspondence with a subset of the other? For infinite sets,
the answer is not clear. "Always" is a very big word. Mathematicians had learned
by now not to trust intuition when working with infinite sets.
In 1895, Cantor asserted that transfinite numbers possess the trichotomy property.
In a letter to Richard Dedekind written in 1899 he claimed that this would follow
from a property of transfinite sets that he had begun investigating in 1883, the
notion of a well-ordered set.
Cantor was correct. In 1904 Ernst Zermelo proved that the trichotomy property
follows if all sets are well ordered. Friedrich Hartogs would prove in 1915 that
these properties are equivalent: If the trichotomy property always holds, then every
set can be well ordered.

Definition: Total order


A total order on a set is a relation, call it -<, such that(1) if a -< b and b -< c, then
a —< c, and (2) for any two elements a and b, exactly one of the following is true:

a —< b, a = b, or b —< a.

Definition: Well-ordered set


We say that a set S is well ordered if we can put a total order on the set so that every
nonempty subset has a smallest element. This implies that S has a first element,
and, given any element of 5, there is a well-defined next element.
5.4 Nonmeasurable Sets 153

Zermelo (187 1—1953) earned his doctorate in 1894 at the University of Berlin,
working on the calculus of variations. After moving to Gottingen in 1897, and at
the urging of David Hilbert, he turned his attention to the problems of set theory.
He taught for several years at the University of Zurich before retiring to the Black
Forest of Germany because of poor health. He was awarded an honorary chair at
the University of Freiburg in 1926, a position he resigned in 1935 in protest against
Hitler's government.
To illustrate what is meant by a well-ordered set, we begin with the rational
numbers between 0 and 1. The rational numbers are not well ordered if we rely on
the usual order according to position on the real number line. With this order, the
set of rational numbers between 0 and 1 does not have a smallest element. If we
use an order that puts these rational numbers into one-to-one correspondence with
the natural numbers:

--<--<--<--<--<--<--<...
1

2
2 1

34 4 3
2 1 3 1

5 5

this is a well ordering. Every subset has a first element when we use this order.
Is it possible to well order the real numbers between 0 and 1? Cantor thought
that it should be possible, but he could not find such an order. The problem
hung unaswered for 17 years, occasionally prodded by those few individuals truly
dedicated to set theory, but ignored by most mathematicians. Then in 1900, David
Hilbert, probably the most influential mathematician of the age, delivered an address
at the International Congress of Mathematicians in which he described the 23 most
pressing and important unsolved problems in mathematics. Problem number 1 on
his list was to settle the continuum hypothesis (see p. 75). The specific question he
asked is whether or not there exists an infinite subset of [0, 1] whose cardinality is
neither nor c.
In his explanation of problem 1, Hilbert raised the question whether it is possible
to well order the real numbers. Suddenly, this became an important problem.
Opinion was divided. In 1904, Julius Konig announced a proof that such a well
ordering could not exist. Flaws in his proof were quickly discovered. The same
year, Ernst Zermelo published his proof that it could be done, not just for the real
numbers but for any set.
What Zermelo actually accomplished was to show that every set can be well
ordered if and only if the axiom of choice always holds.

Definition: Axiom of choice


The axiom of choice says that given any set S, there is a mapping that assigns to
each nonempty subset of S one of the elements of that subset.
154 The Development of Measure Theory

In other words, given any collection of subsets, even an uncountable collection,


we can always choose one element from each subset. Vitali used this axiom to
define J\f by selecting one element from each equivalence class.
If the set S is well ordered, we can always make the assignment by using this
well ordering, assigning the least element in the subset. Thus, the axiom of choice
follows from the well ordering principle. Zermelo's accomplishment was to show
that the axiom of choice also implies the well ordering principle.
This was the first time that the axiom of choice was stated explicitly. The
principle had been used implicitly many times previously. No one had raised
serious objections. But now that it was shown to be equivalent to well ordering,
there was doubt. It seemed too much like sleight-of-hand to claim that now we can
well order the real numbers.
Battle lines were drawn. Hilbert considered the problem solved. Hadamard
agreed. In opposition stood Borel, Jourdain, Bernstein, Schönflies, Baire, and
Lebesgue. Vitali's nonmeasurable set, appearing less than a year later, was greeted
by Lebesgue and many others as an empty exercise. They wanted an example of a
nonmeasurable set whose construction would not depend on the axiom of choice.

Pursuing the Axiom of Choice


Other mathematicians would find additional examples of nonmeasurable sets, Van
Vleck in 1908 and Bernstein the same year, but they all were dependent on the
axiom of choice. Also in 1908, Zermelo published his axioms of set theory. Sets
were recognized as forming the foundation for all of mathematics, and Zermelo
attempted to clarify the assumptions that enable us to construct and work with sets.
His axioms were modified by Fraenkel and Skolem in 1922, creating what today
are known as the Zermelo—Fraenkel (ZF) axioms. If we add the axiom of choice,
the system is known as ZFC. The axioms of ZF are the assumptions we need if we
are to build the mathematics that we know. In particular, they enable us to construct
the real number system. One of the great problems now facing those working on the
foundations of mathematics was whether the axiom of choice could be shown to be
a consequence of the ZF axioms, in contradiction to those axioms, or independent
of them.
At the same time, other mathematicians were discovering results that began to
cast doubt on the axiom of choice. In 1914, Felix Hausdorif used the axiom of
choice to create a most peculiar decomposition of the surface of a sphere. He first
removed a particular set of points, a set that is easily shown to be countable, and
then took the remainder of the sphere and decomposed it into three pairwise disjoint
pieces, call them A, B, and C, with the following properties:

1. It is possible to rotate A so that it exactly matches up with B.


5.4 Nonmeasurable Sets 155

2. It is possible to rotate A so that it exactly matches up with C. Note that so far,


there is nothing particularly remarkable about this decomposition.
3. It is possible to rotate A so that it exactly matches up with B U C.
In view of the first two properties, the last one is remarkable. The pieces are, of
course, nonmeasurable sets determined by the axiom of choice.
Then, in 1924, Stefan Banach and Alfred Tarski took Hausdorif's construction
and showed how it could be used to take a solid ball, cut it into five pieces, and then
reassemble those five pieces using only rigid motions (rotations and translations)
into two solid balls, each of the same size as the original. With just a little more
work, it was possible to modify the argument so that one could begin with any solid
object, dissect it into finitely many pieces, and reassemble those pieces using rigid
motions into any other solid object.
This result is sometimes referred to as "the pea and the sun theorem." One can
take a pea, cut it into finitely many (nonmeasurable) pieces, and reassemble those
pieces into a sphere the size of the sun.4 To those accepting the axiom of choice,
this simply illustrated how meaningless are our intuitive understandings of area or
volume once we begin working with nonmeasurable sets. To others, it confirmed
the implausibility of this axiom.
The next big step forward came from Kurt Gödel in 1938. He proved that the
Zermelo—Fraenkel axioms ZF are consistent with the continuum hypothesis, and
they are consistent with the axiom of choice. Chalk one up for those supporting the
axiom of choice.
In 1964, Paul Cohen published his proof that the axiom of choice could be false
without contradicting ZF. He also proved that the continuum hypothesis could be
false without contradicting ZFC (Zermelo—Fraenkel plus the axiom of choice). In
other words, the assumptions needed to define the real number line are consistent
with or without the axiom of choice. The axiom of choice is our choice to make.
What does this say about the existence of nonmeasurable sets? In 1970, Robert
Solovay proved that, while the existence of nonmeasurable sets is not enough to
imply the axiom of choice, it is an assumption that goes beyond ZF. We do not
violate any of our assumptions about the real number line if we assume that all sets
are measurable.

Do Nonmeasurable Sets Exist?


Does Theorem 5.13 say anything meaningful? Does the set Al actually exist? As
we have seen, it exists if we accept the axiom of choice but its existence is a choice
we get to make. Life may seem much simpler if we choose to reject the existence of
nonmeasurable sets, but this is not the road taken by the mathematical community.
For a delightful and very accessible proof of this result, see The Pea and the Sun by Leonard P. Wapner.
156 The Development of Measure Theory

We have encountered three decision points in our construction of the real number
line. The first was whether or not to include infinitesimals. The judgment to reject
them was made by Cauchy and his contemporaries in the early nineteenth century.
As we have seen, calculus could have been placed on a firm foundation with their
acceptance, but this was not fully realized until the work of Abraham Robinson in
the 1960s. It requires paying the price of greatly complicating the structure of the
real numbers and violating the intuitive principle of commensurability. Robinson's
nonstandard analysis has many supporters who believe that it should be the standard
approach to analysis, but they consitute a minority of mathematicians.
The second decision point involves the continuum hypothesis. We are free to
decide that there either are or are not infinite subsets of JR with cardinality other
than or c. Given this choice, most mathematicians would probably opt for no
other cardinalities. It keeps life simpler. Beyond those who work directly in set
theory, no one worries much about this. This preference does not impact other
branches of mathematics.
The axiom of choice is far more problematic because it does affect many other
branches of mathematics. There are results whose proofs are greatly simplified by
appeal to the axiom of choice, others that are possible only because of this axiom.
In 1918, Waclaw Sierpiñski published a list of such results. In 1929, Krull used
the axiom of choice to prove that in a commutative ring, every proper ideal can
be extended to a maximal prime ideal. In 1932, Hausdorif used the well-ordering
principle to prove that every vector space has a basis. In 1936, Teichmüller extended
this proof to show that every Hilbert space has an orthonormal basis. It is not
necessary to know what these statements mean to recognize that much modem
mathematics presupposes the axiom of choice. We certainly could live without it
or with a weaker form that creates fewer apparent paradoxes, but that would create
complications that most mathematicians would prefer to live without.
Sierpiñski was unhappy calling this the "axiom of choice" since nothing is
chosen. Rather, this axiom asserts the existence of something that we can never
explicitly construct. In a letter to Emile Borel written in 1905, Jacques Hadamard
described this debate as centering on the distinction "between what is determined
and what can be described." Nonmeasurable sets can be determined in the sense
that they can be prescribed; they cannot be described. Hadamard goes on to com-
pare this debate to "the one which arose between Riemann and his predecessors
over the notion of function. The rule that Lebesgue demands appears to me to
resemble closely the analytic expression on which Riemann's adversaries insisted
so strongly." Here Hadamard adds a footnote:

I believe it necessary to reiterate this point, which, if I were to express myself fully, apppears to
form the essence of the debate. From the invention of the infinitesimal calculus to the present,
5.4 Nonmeasurable Sets 157

it seems to me, the essential progress in mathematics has resulted from successively annexing
notions which, for the Greeks or the Renaissance geometers or the predecessor of Riemann,
were "outside mathematics" because it was impossible to describe them.5

The power of mathematical thinking is manifested precisely when we are willing


to explore promising avenues even when they lead us outside preexisting expecta-
tions. To me, the proper response to the Banach—Tarski paradox is fascination and
delight that the axiom of choice can lead us to such surprising conclusions.

Exercises
5.4.1. Prove that if are rational numbers, then

Exercises 5.4.2—5.4.7 establish the fact that if we accept the existence of a


nonmeasurable set, Al, then every set of positive outer measure contains a nonmea-
surable set.
5.4.2. Show that any set of positive outer measure contains a bounded set of positive
outer measure.
5.4.3. Show that any set of positive outer measure contains a closed set of positive
measure. With Exercise 5.4.2, we can conclude that any set of positive outer
measure contains a closed and bounded set of positive measure.
5.4.4. Let S be any closed, bounded set and let U be any open set that contains S.
Using the fact S and Uc are disjoint, closed sets, show that there is > 0 such for
any x e JR with x the set x + S = {x + s s e 5) is contained in U.
5.4.5. Show that if S is measurable, closed, and bounded and has positive measure,
then we can always choose the open set U that contains S to have measure strictly
less than 2m(S). Use this fact to show that > 0 is chosen so that lxi implies
thatx + S C U, then (x +S)flS 0.
5.4.6. Define S e S = {s — t s, t e S}. Show that if(x + S) fl S
I
0 for all ix
then
SeS.
Putting this result with Exercises 5.4.2—5.4.5, show that if S is measurable and has
positive measure, then S e S contains an open interval.
5.4.7. Let S be a set with positive outer measure and define 5q = S n (q + Al),
q e Q. If 5q is measurable with positive measure, then by Exercise 5.4.6, 5q ê 5q

Translation due toG. H. Moore, Zermelo's Axiom of Choice, pp. 317—318.


158 The Development of Measure Theory

contains an open interval, and therefore so does (q + Al) ê (q + Al) = Al ê Al.


Explain why this cannot happen. Explain why Sq cannot have measure 0 for
all q e Q. Complete the proof that every set with positive measure contains a
nonmeasurable subset.
Exercises 5.4.8—5.4.11 show that there exists a Jordan measurable set (and thus
a Lebesgue measurable set) that is not a Borel set. Recall that for any set S in the
domain of f, f(S) = {f(s) I
s e 5).
—1,
5.4.8. Show that if is a continuous, strictly increasing function, then so is
and any set U in the domain of i/i is open if and only i/r(U) is open.

5.4.9. Show that if is a continuous, strictly increasing function, then any set S
in the domain of is a Borel set if and only if *(S) is a Borel set.
5.4.10. Define = x + DS(x), where DS is the Devil's staircase, Example 4.1
on p. 86. Show that is a continuous and strictly increasing function from
[0, 1] onto [0, 2]. Let C = SVC(3) be the Cantor ternary set on [0, 1]. Show
that m (i/i(C)) = 1, and thus *(C) contains a nonmeasurable set, M.
5.4.11. Let M be a nonmeasurable set contained in i/r(C). Show that (M) is
Jordan measurable, but it cannot be a Borel set.
6
The Lebesgue Integral

In Section 5.2, we saw that the idea behind the Lebesgue integral of f is to
partition the y-axis, 1 = 10 <11 <12 <• <in = L, define S1 = {x I <f(x) <
'j+l }, and then bound the integral by the summations

f(x)dx
f
As the partition of the y-axis gets finer, these sums will approach each other and
so approach a value for the integral. The only catch is that these sets, the must
be measurable.
In view of the difficulty involved in finding a nonmeasurable set, we should
expect that for reasonable functions, the are measurable. But there is something
to prove here. We shall call such functions measurable functions. The most
important result of the first section is that every Riemann integrable function is
measurable. We will lose nothing (and gain a great deal) by switching from the
Riemann integral to the Lebesgue integral.
Our greatest gain will be Lebesgue's dominated convergence theorem, stated
and proven in Section 6.3. Here at last we shall see a broadly applicable sufficient
condition allowing for term-by-term convergence. In Section 6.4, we shall explore
the connection between measurability and uniform convergence. This will lead into
a discussion of some of the varied ways in which sequences can converge, a theme
that will be picked up and developed much further in Chapter 8.

6.1 Measurable Functions


We begin with the formal definition of a measurable function. This may not appear
to agree with the definition of the but that is taken care of with the next
proposition.

159
160 The Lebesgue Integral

Definition: Measurable function


The function f is measurable on the interval [a, hi if for all c E R, the set
{x E [a, bJ f(x) > c j is measurable.

Proposition 6.1 (Equivalent Definitions of Measurability). The following state-


ments are equivalent:
1. for all c e IR, {x e [a, bil f(x) > c} is measurable,
2. for all c e IR, {x e [a, bill f(x)> c} is measurable,
3. for all c e IR, {x e [a, bill f(x) <c) is measurable, and
4. for all c e IR, {x e [a, bill f(x) <c) is measurable.

Proof Since complements and countable intersections of measurable sets are mea-
surable and

(6.1)

(6.2)

{x f(x) <cj = fl{x f(x) <c+ 1/nj, and (6.3)

{x e [a, b] f(x) > cj = [a, b] — {x e [a, b] f(x) <cj, (6.4)

it follows that statement 1 implies 2 which implies 3 which implies 4 which


implies 1.

Corollary 6.2 (Lebesgue Sets Si Are Measurable). 1ff is measurable on [a, b],
then {x e [a, b] c < f(x) <d } is measurable.

Proof The intersection of measurable sets is measurable, so


{x e c< f(x) <dj
= {x e c < f(x)j fl {x e f(x) <dj.
The next proposition shows us that simple combinations of measurable functions
are also measurable.

Proposition 6.3 (Measurable Functions Closed under +, x, I). If f and g


are measurable functions on [a, b] and if k is any constant, then kf, f2, f + g, fg,
and I fI are also measurable on [a, frI.
6.1 Measurable Functions 161

Proof If k = 0, then kf = 0, and any constant function is measurable. If k > 0,


then

{x kf(x) > cj = {x f(x) > c/k).

If k <0, then the second set is {x e [a, bill f(x) <c/k). If c <0, then {x e
[a, bill f2(x) > c} = [a, fri. If c 0, then

(XE [a,b] f2(x) > cj = {x e [a,b] f(x)


U

If q is any rational number then

Sq = {x f(x)> qjfl{x c—qj

is measurable. I leave it as Exercise 6.1.2 to verify that

{x e [a,b] f(x)+g(x)> cj = U Sq. (6.5)


qeQ

Since

fg=
fg is measurable. If c <0, then {x e [a, b]I lf(x)l > c} = [a, fri. If c 0, then

{x e [a,b] If(x)I > cj = {x e [a,b] f(x) < —cj U {x e [a,b] f(x) > cj.

Limits of Measurable Functions


For the Riemann integral, the limit of a sequence of integrable functions is not
necessarily integrable (see example (4.5) on p. 100). This creates serious compli-
cations when we try to find conditions that allow term-by-term integration. As we
shall now see, limits of measurable functions are measurable.

Proposition 6.4 (Inf and Sup of Measurable Functions). If is a se-


quence of measurablefunctions on [a, b], then thefunctions defined by
are measurable functions.
162 The Lebesgue Integral

Definition: Almost everywhere


When we say that something happens almost everywhere, we mean that it happens
for all x in our domain except for a set of measure zero (which could be the empty
set).

Proof The measurability of and of follows from the


equalities

{x e [a, b] (inf fn(x)) > c} e [a, b] > cj, (6.6)


=
{x e [a,b] > cj = {x e [a,b] > cj. (6.7)
n>'
— n=1

The measurability of and of now follows from the


definition:

= inf (sup (6.8)


n>1 \m>n /
lim = sup (inf (6.9)
n—*oo

It follows immediately that if converges to f(x), then f is a


measurable function. Something even stronger is true. We might have a few values
of x at which the sequence fails to converge or converges to something other than
f(x) and still be able to conclude that f is measurable. Let S be the set of points
for which f(x). If the measure of S is zero, then f is measurable.
This idea that something happens except at points in a set of measure zero will
become so common that we give it a formal name, almost everywhere.
For example, for x e [a, b]
lim
fl—* 00
f(x) almost everywhere
means that this equality holds except possibly for values of x e [a, fri in a set of
measure zero. On this set of measure zero, the equality could fail to hold because
does not exist, because f(x) is not well defined, or because both
exist but are not equal.

Theorem 6.5 (Limit of Measurable Functions). is a sequence of measur-


able functions and f(x) almost everywhere in [a, b], then f is a
measurable function.
6.1 Measurable Functions 163

Proof Recall that any set of outer measure zero is measurable (Theorem 5.5), and
thus any subset of a set of measure zero also has measure zero. There is a set S of
measure 0 such that
lim f(x)
n—+Oo

for all x e [a, fri — S, where rn(S) = 0. If we choose some c e IR, then the sets
and

are not necessarily identical, but any element that is in one but not in the other must
be in S.
Let
S1=F1—F2 and S2=F2—F1.
From Proposition 6.4, we know that F2 is measurable. Since and S2 are subsets
of S, they are also measurable. I leave it for you (Exercise 1.2.14) to show that
F1 = (F2 U fl

Therefore, F1 is measurable.
We now focus on the kind of function we want to use in our approximation to
the Lebesgue integral.
A function is simple if and only if it can be written as a finite linear combination
of characteristic functions of measurable functions. Simple functions admit many
different representations. For example,

X[o,2] + X[l3] = X[o,l) + 2X[12] + X(2,3]


= 0. + X[o,l)u(2,3] + 2X[12].

However, there is always a unique representation in the form

0(x)
=
where 1, e JR are distinct, x is the characteristic function of S,, and the sets S, are
measurable, pairwise disjoint, and their union is the domain of 0. In dealing with
a generic simple function, we shall assume that the representation we are using is
this unique representation.

Definition: Simple function


A function is called simple if its image consists of a finite number of values and
the set of points that map to each of these values is measurable.
164 The Lebesgue Integral

Theorem 6.6 (Measurable Functions as Limits). A function f is measurable


on [a, b] if and only if it is the limit almost everywhere of simple functions on
[a, b]. 1ff is measurable and bounded below, then it is the limit of a monotonically
increasing sequence of simple functions.

Proof. I leave it for Exercise 6.1.5 to prove that simple functions are measurable.
It follows from Theorem 6.5 that the limit almost everywhere of simple functions,
if it exists, is a measurable function.
In the other direction, let f be measurable. For each positive integer n and for
<k define
I k k+1

Define

= {x e [a,b] f(x) <


= {x e f(x)j.
Since all of the sets and are measurable, we can define the
sequence of simple functions

XEfl,k(X).
=
For n> If(x)I, we know that

fis the limit of these simple functions. If f is bounded below, f(x) A for
all x e [a, b}, then for all n > IA this sequence is monotonically increasing.

Corollary 6.7 (Nonnegative Measurable Functions as Monotonic Limits). A


nonnegative measurable function on [a, b] is the limit of a monotonically increasing
sequence of simple functions.

Note that f does not need to be a bounded function. For the Riemann integral,
we had to twist ourselves in knots to handle unbounded functions. For the Lebesgue
integral, unbounded functions present no special problems.

Farewell to the Riemann Integral


We now prove that every Riemann integrable function is measurable. In the next sec-
tion, we shall see that every bounded measurable function is Lebesgue integrable.
6.1 Measurable Functions 165

Lebesgue integration completely subsumes all integrals that can be defined us-
ing Riemann's definition and, fortunately, when they both exist, the values of the
Riemann and Lebesgue integrals are the same.

Theorem 6.8 (Riemann Integrable Measurable). Every Riemann integrable


function on [a, b] is measurable on [a, b].

Proof Let f be a Riemann integrable function on [a, b]. For each positive integer
n, we define as the partition of [a, b] into equal intervals of length (b —
Let 'n,k be the kth interval of this partition,
r
Ink=[a+(a—b)
k—i k\
=
and let = infXEjflk f(x). We define the simple function

= XIflk•

Each of our partitions is a refinement of the previous partition, D


and therefore > i(x). For each x e [a, b], forms an increasing
sequence. Since f is Riemann integrable, it must be bounded, and therefore this
sequence converges,

lim
fl—+ 00
= <f(x).
All that remains is to prove that /(x) = f(x) almost everywhere. Choose any
x e [a, b]. For each n, choose k so that x E 'n,k• The oscillation of f over 'n,k is

w(f; ink) f(x) — f(x) —


Every open interval that contains x will contain 'n,k for some n, k, and therefore
the oscillation of f at x is at least f(x) — Therefore, the set of points for
which limn>00 f(x) is contained in the set of points at which f is not
continuous.
To complete the proof, we shall show that if f is Riemann integrable, then the
set of points at which it is not continuous has measure zero. In other words, if a
function is Riemann integrable, then it is continuous almost everywhere.
In Theorem 2.5, we saw that a function is Riemann integrable if and only if
for each a > 0, the set the set of points at which the oscillation is at least a,
has outer content zero. Since outer measure is always less than or equal to outer
166 The Lebesgue Integral

content, the measure of is zero. The set of points at which the oscillation is
positive is the union

Si/k.

This is a countable union of sets of measure zero, so it also has measure zero.
We have shown that = f(x) almost everywhere. Since each is
measurable, Theorem 6.5 implies that f is measurable.

In proving that every Riemannn integrable function is measurable, we discovered


that every Riemann integrable function is continuous almost everywhere. Lebesgue
realized that this is also true in the other direction, yielding a simple characterization
of Riemann integrable functions.

Theorem 6.9 (Lebesgue's Characterization of Riemann Integrability). A


bounded function defined on a closed and bounded interval is Riemann integrable
if and only if it is continuous almost everywhere.

Proof. We confirmed one direction in the previous proof. All that remains is to
show that if f is bounded and continuous almost everywhere, then it is Riemann
integrable. Again let be the set of points at which the oscillation is greater than
or equal to a > 0. By Theorem 2.5, we need to show that this set has outer content
zero. Since it is a subset of the set of points at which f is discontinuous, we know
that has measure zero. The problem is that <ce(Sa), and we need to
show that Ce(Sa) = 0. We shall need to be clever.
The fact that = 0 means that for any E > 0, we can find a countable open
cover of for which the sum of the lengths of the intervals is less than E. If we
can show that is closed, then we can use the Heine—Borel theorem to conclude
that there is finite subcover of for which the sum of the lengths of the intervals
is less than E. This is exactly what we need to conclude that Ce(Sa) = 0.
Our proof has come down to showing that is closed. We will show that its
complement is open. If c e then the oscillation at c is strictly less than a,

w(f;c)= limf(x)—limf(x)< a.

Let = (a — w(f; c))/3. By the definition of lim and lim, we can find an open
neighborhood of c in which

< f(x)< limf(x)+&


6.1 Measurable Functions 167

The distance between these upper and lower bounds is


\ / = w(f;c)
1 2
= —w(f;c)+ —a <a.
( limf(x)—

/ 3 3

This open neighborhood of c is entirely contained in Since we can find such a


neighborhood for any element of this set is open, and is closed.

There is a delicious irony here. Riemann introduced his definition of the integral
for the purpose of understanding how discontinuous a function could be and still be
integrable. It appeared that it could be very discontinuous, having discontinuities
at all rational numbers. In fact, there are Riemann integrable functions with dis-
continuities at the points in a set with cardinality c. Now that we are finally putting
the Riemann integral behind us, we get the answer that Riemann was seeking. A
Riemann integrable function is always a very continuous function. It is continuous
almost everywhere. A function that is discontinuous only at the rational numbers
is not very discontinuous.
The Lebesgue integral enables us to handle truly discontinuous functions. Dirich-
let's function, the characteristic function of the rationals, was created to show that
a function could be so discontinuous that it would make no sense to talk about
its integral. This was considered a function beyond the pale. Yet, as we shall see,
the Lebesgue integral of this function has a simple and natural meaning. Over any
interval, the integral of this function is the measure of the set of rationals in that
interval, which is zero.

Exercises
6.1.1. Using the definition of a measurable function, show that any constant func-
tion is measurable.
6.1.2. Prove equation (6.5).
6.1.3. Show that

X SflT = XT'
XSUT = XS+XT - Xs• XT'
xsc=1-xs.
6.1.4. Let be a sequence of sets. Prove the equality of the characteristic
function of the infimum of these sets (defined in Exercise 5.3.15) and the lim inf
of the characteristic functions of these sets,

X urn = n—+
iiJJ2
00

6.1.5. Prove that every simple function is measurable.


168 The Lebesgue Integral

6.1.6. Show that is a simple function, then it is possible to find a representation,

for which the sets S, are both measurable and pairwise disjoint.
6.1.7. Prove that any sum or product of finitely many simple functions is a simple
function.
6.1.8. If If I
is measurable, does it necessarily follow that f is measurable?
6.1.9. Let f be a real-valued function defined on R. Show that the condition

{x e [a, b] f(x) = c) is measurable for all c e R


is not enough to guarantee that f is measurable on [a, b].
6.1.10. Let S be a dense subset of R. Show that f is measurable on the interval
[a, b] if and only if {x e [a, b] f(x) c) is measurable for every c e S.
6.1.11. Show that a real-valued function f defined on [a, b] is measurable if and
only f' (U) is measurable for every open set U c R.
6.1.12. Show that if a real-valued function f defined on R is measurable, then
f (B) is measurable for every Borel set B ç R.
6.1.13. Prove that any continuous function defined on [a, b] is measurable.
6.1.14. Prove that if f is measurable and f = g almost everywhere, then g is
measurable.
6.1.15. Assume that f is continuous on [a, b]. Show that f satisfies the condition
S c [a, b] and rn(S) = 0 implies rn (f(s)) = 0
if and only if
for any measurable set M c [a, b], its image, f(M), is measurable.
6.1.16. Show that if g is measurable on I = [a, b] and f is continuous on g(I),
then f o g is measurable on I.
6.1.17. Suppose that g is continuous on I = [a, b] and h is measurable on g(I).
Does it necessarily follow that h o g is measurable on I?
6.1.18. Suppose that g is measurable and f satisfies the condition that for every
open set U, the inverse image f (U) is a Borel set. Show that f o g is measurable.
6.1.19. Give an example of measurable function whose inverse is not measurable.
6.1.20. Let f be differentiable on [a, b], Show that its derivative, f', is measurable
on[a,b].
6.2 Integration 169

6.2 Integration
We begin with the definition of the Lebesgue integral of a simple function (see
below) and prove a few of the properties we would expect of an integral.

Proposition 6.10 (Properties of Lebesgue Integral, Simple Functions). Let 4


and i/i be simple functions, c e R, and E = E1 U E2, where E1 and E2 are disjoint
measurable sets. The following properties hold:

c
dx
3. if 4(x) < all x e E, then IE 4(x)dx s IE and
= IE, + IE,

The last of these statements may look a little unusual. It is simply a generalization
of the identity
= fb
f(x)dx f(x)dx
f + f
Proof We begin by setting
m n

i=1 j=1

where the are distinct, the are distinct, the S, are pairwise disjoint, the T1 are
pairwise disjoint, and E = = T1.

m m

= fl E) = fl E) =
fE

Definition: Lebesgue integral of simple function


Given a simple function

= 1=1

where the I, are distinct, the S, are pairwise disjoint, and is the domain of
S1

çb, we define its Lebesgue integral over the measurable set E to be

dx fl E).
fE =
170 The Lebesgue Integral

2. The sum + i/i is a simple function given by


+ i/i = + 1i)X(s.nT)•

It follows that

+ dx = + fl Tj fl E)
fE
m n
= k, m fl Tj fl E)
i=1 j=1
n m

flT1flE)
j=1 i=1
m n
= k, fl E) + m(T1 fl E)

= fE
dx
+ f
3. The assumption 4 i/i for all x e E implies that we can rewrite our functions
as

= i/f =

and for each pair i, j for which S, fl T1 fl E 0, we have <ii. It follows


that

f dx = fl Tj fl E) < fl Tj fl E)
= f dx.

= fl fl

m m

= fl Ei)+ >kj m(S, fl

= f dx
+
dx.

We have one more result to prove about integrals of simple functions. Like many
of our results, it applies to monotonic sequences.
6.2 Integration 171

Proposition 6.11 (Monotone Convergence, Simple Functions). Let be a


monotonically increasing sequence of nonnegative simple functions. If there is a
finite A for which fE dx <A for all n, then converges to a finite-valued
function f almost everywhere.

Proof Since every bounded increasing sequence converges, the conclusion of this
proposition is equivalent to the statement that U, the set of x e E for which
is unbounded, is a set of measure zero. Choose any E > 0, and define

={x e AlE).
Since is nonnegative,

E
< I dx < JEI dx <A.

Therefore, <E. Since is monotonically increasing, we see that E1 c


E2 c ..., and U is contained in the union of the (see exercise 6.2.11). By
Corollary 5.12,

m(U) <m (U = lim <E.


\n=1 /I
Since this is true for every E > 0, m(U) = 0.

Integration of Measurable Functions


We are almost ready to define the integral of a measurable function. The definition
that we give will need to apply to unbounded functions. If a function is bounded
below and unbounded above, we can do this easily. If it is unbounded both below
and above, we run into potential problems of offsetting infinities. To avoid such
problems, we restrict our attention to nonnegative functions, f(x) > 0. We shall
first define the integral of a nonnegative function. If f is measurable but takes on
both positive and negative values, we can write f as a difference of two nonnegative
functions:

f= — where (6.10)

f(x) = max{0, —f(x)}. (6.12)

We saw in Theorem 6.6 that any nonnegative function is the limit of a mono-
tonically increasing sequence of simple functions. We define the integral of f in
terms of the integrals of simple functions.
172 The Lebesgue Integral

Definition: Lebesgue integral of measurable function


Given a nonnegative measurable function f and a measurable set E, we define

I f(x)dx = sup f (6.13)


JE
where the supremum is taken over all simple functions for which f(x)
for all x E E. For all other measurable functions, we say that f is integrable if it
is measurable and both fE f(x) dx
fE f(x) dx

f f f f(x)dx. (6.14)

If they both are infinite, then fE f(x) dx does not exist.

Note that an integral might have the value or even though the function
is not integrable. This is analogous to the situation of a sequence that diverges to
infinity. Such a sequence does not converge, but to say that this sequence approaches
or that it approaches still says something meaningful.
The following proposition follows immediately from the definition of the
Lebesgue integral.

Proposition 6.12 (Integral over Set of Measure Zero). If f is any measurable


function and m(E) = 0, then

I f(x)dx = 0.
JE
(6.15)

If E = E1 U E2, where E1 and E2 are disjoint measurable sets, then

f f(x)dx = f f(x)dx + f f(x)dx. (6.16)


JE JE JE2

Combining these results, we see that if f = g almost everywhere, then

I f(x)dx = JEI g(x)dx.


JE
(6.17)

Because of this proposition, if we wish to integrate f(x) = over


a set E and discover that the limit does not exist on a subset of measure zero, we
can safely ignore those values of x at which the limit does not exist. No matter
how we choose to define f at the points where the limit does not exist, it will not
change the value of the integral of f.
6.2 Integration 173

Proposition 6.13 (Null Integral Zero AE). Let f be a nonnegative mea-


surable function on the measurable set E. Then f = 0 almost everywhere if and
only if

I f(x)dx=0.
JE

Proof We first assume that the integral is zero. We set

Since f is nonnegative, we see that

0= [f(x)dx= I
JE
and therefore = 0. Since

{xeEIf(x)>0}=UEn, and E1cE2c...,


we can invoke Corollary 5.12 to conclude that
m ({x e E f(x) > 0)) = lim00
fl—+
= 0.
The other direction follows from Proposition 6.12.

The Monotone Convergence Theorem


We now get our first result that enables us to interchange integration and a limit.
This is still a good deal weaker than the Arzelà—Osgood theorem, but it is an
important first step. In the years following Lebesgue's publication of his new
integral, many mathematicians studied it, discovering new properties and better
proofs of the fundamental relationships. One of these was Beppo Levi (1875—
196 1) who published five papers on the Lebesgue integral in 1906. Levi was born
in Torino (Turin) where he also studied. His first professorship came in 1906 at
the University of Cagliari on the island of Sardinia. He went on the University
of Parma in 1910 and then to the University of Bologna in 1928. He was fired in
1938 because of his Jewish heritage and took the position of director of the newly
created Universidad del Litoral in Rosario, Argentina, where he remained until
his death. Levi's primary work was in algebraic geometry, but he made important
contributions to our understanding of the Lebesgue integral. The following theorem
was first proven by Levi in 1906. Its proof is fairly complicated, but it has many
immediate corollaries and provides a very efficient route to Lebesgue's dominated
convergence theorem.
174 The Lebesgue Integral

S S S S 7 fi
S S 7 f2
S S S S S 7
• S 7 fr—i
Q5n,2 Q5n,3 S s 7
Figure 6.1. Monotonically increasing sequences of simple functions.

Theorem 6.14 (Monotone Convergence). Let be a monotonically increas-


ing sequence of nonnegative measurable functions. If there is a finite A for which
fE dx <A for all n, then converges to a finite-valued function f almost
everywhere, f is integrable, and

I f(x)dx = urn I (6.18)


JE

Proof We need to establish that is bounded almost everywhere. If, for


a given x, is bounded, then the sequence converges. We know from
Theorem 6.5 that on the set of x where we have convergence, f = lim
f is nonnegative is
is all that remains to be shown before we can conclude
that f is integrable. There are three parts to our proof. First, we show that U, the
set of x e E for which is unbounded, is a set of measure zero. Next, we
demonstrate the fairly easy inequality fE f(x) dx. Finally,
we tackle the more difficult inequality, JE f(x) dx.
For the first part of the proof, we want to be able to use Proposition 6.11. We
know from Theorem 6.5 that each is the limit of a monotonically increasing
sequence of simple functions, '/n,k 7 (see Figure 6.1). We define a new sequence
of simple functions, by

= max

By Proposition 6.4, is measurable. (The maximum is just the supremum taken


over a finite set.) We recognize that

= max max =
1<j,k<n 1<j,k<n—1

so this is a monotonically increasing sequence. Since the integrals


IE dx are bounded.
6.2 Integration 175

If x is a point at which is unbounded, then for any M, we can find an n for


which M + 1. Since Øn,k / we can find a k for which M.
This implies that Ømax{n,k}(x) > M, and so the sequence is also unbounded.
By Proposition 6.11, U, the set of x on which is unbounded, has measure
zero. By Proposition 6.12, we can define f however we wish on U. We choose to
define f(x) = 0 for x e U. This concludes the first part of the proof.
For the second part, we observe that if g, h e and g(x) h(x) for allx e
then the set of simple functions that are less than or equal to g is contained in the
set of simple functions that are less than or equal to h. Therefore,

supf
f g(x)dx = supf Ø(x)dx
E E E
Ø(x)dx
= f Eh(x)dx.
(6.19)

From Proposition 6.12, if g(x) h(x) except possibly forx e U, where m(U) = 0,
then

fg(x)dx = f h(x)dx = fh(x)dx.


g(x)dx
f
Therefore, functional inequalities that hold almost everywhere imply integral in-
equalities that hold on all of E. It follows that for all n, fE f(x)

dx f(x)

the third part of the proof, we begin with any simple function less than
or equal to f, 0 = f, where the S, are pairwise disjoint and
U 5, = E. Choose an a between 0 and 1, and define the sets

Each is a measurable set (explain why in Exercise 6.2.5) and

A1cA2c..., UAn=E.
n=1

By Corollary 5.12,

lim m(S, fl = m(S, fl E).


00

Choose an E > 0. For each set we can find an so that n implies that
m(51 fl Ar)> (1 — fl E). Let N = max{Ni, N2, ..., Nk}.
176 The Lebesgue Integral

We now see that for n> N,

JE
[
I
k

=a fl

(i — flE)

= fl E)— fl E)

=a [Ø(x)dx—Em(E).
JE
Since this is true for every E > 0 and for every a between 0 and 1, it follows that

I
JE
I
JE
We have shown that every simple function 0 f has an integral that is dominated
by fE dx for all n sufficiently large. We can conclude that

lim fE f Ø(x)dx = f f(x) dx.


E E
(6.21)

E
It took some work to prove this theorem, but we are rewarded with four important
corollaries.

Corollary 6.15 (Properties of Lebesgue Integral). Let f, g be integrable func-


tions and c any constant. It follows that If cf, and f + g are integrable and

f cf(x)dx = cf f(x)dx, (6.22)

f (f(x) ±g(x)) dx = f f(x)dx + f g(x)dx. (6.23)

Proof We know that I fi, cf, and f + g are measurable. Since the integrals of

f f f
6.2 Integration 177

To show that cf and f + g are integrable, we need to establish that the integrals
of (cf), (f + g) are finite. We see that

(cf) +
(f + (f + +
Since these functions are nonnegative and we know that the integrals of the larger
functions are all finite, so are the integrals of the smaller functions.
We first establish equations (6.22) and (6.23) for f, g nonnegative, C? 0. If
c = 0, then equation (6.22) is trivially true. Let be a monotonically increas-
ing sequence of simple functions that converges to f. It follows that is a
monotonically increasing sequence of simple functions that converges to cf. By
Theorem 6.14,

f cf(x) dx = lim f
E E
c dx = lim c fE dx = c
f f(x) dx.
E

If is a monotonically increasing sequence of simple functions that converges to


g, then + converges to f + g. Again by Theorem 6.14,

f (f(x) + g(x)) dx = lim f


E E
+ dx

= lim I dx + lim I dx

= [f(x)dx+ [g(x)dx.
JE JE

We now write f = — g= — g. If c 0, then the positive part of


cf is c• and the negative part of cf is c• If c <0, then the positive part of
cf is and the negative part is f+. In either case, we have

f cf(x)dx=f c.f+(x)dx_f
E

= c(f f f(x)dx)
=c [f(x)dx.
JE
To conclude the proof of equation (6.23), we begin with the observation that

(f + g= — + — g,
and therefore

(f + g = (f + + +
178 The Lebesgue Integral

We integrate each side of this equality. All of the summands are nonnegative
measurable functions, so we can write each integral as a sum of integrals:

f(f + + f f(x)dx + f g(x)dx


= f(f + +f f
f(f + g)(x)dx

f f f(x)dx f f g(x)dx,
I (f+g)(x)dx= JE[f(x)dx+ [g(x)dx.
JE JE

In the next corollary, we see that term-by-term integration is correct for series of
nonnegative measurable functions provided that the sum of the integrals converges.
The proof is left as Exercise 6.2.7.

Corollary 6.16 (Term-by-term Integration, Summands 0). If fk(x) is


a series of nonnegative measurable functions and dx) converges,
then fk(x) converges almost everywhere and

fk(x)) dx = dx). (6.24)


fE (t (f
The next corollary of the monotone convergence theorem gives us an important
result for approximating integrable functions by simple functions.

Corollary 6.17 (Approximation by Simple Function). Let f be an integrable


function on [a, b]. Given any E > 0, we can find a simple function 0 such that
b

f f(x) - dx <E. (6.25)

Proof. We write f as the difference of two nonnegative functions, f = f+ —

f. By Corollary 6.7, we can find monotonically increasing sequences of simple


functions that converge to f+, and respectively,
6.2 Integration 179

By the monotone convergence theorem, we can find an N so that

o
<;,

f(x)dx_f
Let 1 = — 1/IN, which is also a simple function. We have that
b b
f(x) -
f dx
= b

f —
f dx

Our final corollary will be a very important result that shows that for any inte-
grable function, even an unbounded function, we can force the integral to be as
small as we wish by taking a domain with sufficiently small measure.

Corollary 6.18 (Small Domain Small Integral). Let f be an integrable


function on [a, b] and E > 0 any positive bound. There is always a positive response
8 > 0 so that for any measurable set S ç [a, b] with m(S) <8, we have that

I f(x)dx
Js
<E. (6.26)

Proof. We know from Corollary 6.17 that for any E > 0 we can approximate f by
a simple function 0 so that
b
E

f <

Since 0 is simple, it is bounded, say IO(x)I < B for all x e [a, b]. We can choose
8 = E/2B. Given any set S ç [a, b] of measure less than E/2B,

Js 2B 2
180 The Lebesgue Integral
It follows that

f
f dx

f(x) dx dx
<f a S

+= E.

Exercises
6.2.1. Find the Lebesgue integral over [0, 11 of the function f defined by

1x2, xe[0,1]—Q,
f(x)=li xe[0,1]flQ.
Is this function Riemann integrable over [0, 1]?

6.2.2. Using the Cantor ternary set, SVC(3), define the function g on [0, 1J by

xeSVC(3),
gx
(
— x is in a removed interval of length

Find the value of the integral g(x) dx. Is this function Riemann integrable over
[0, 1]?

6.2.3. Using the Cantor ternary set, SVC(3), define the function h by

x e [0, 1/21 — SVC(3),


h(x) = cos(7rx), x e [1/2, 1] — SVC(3),
x e SVC(3).

Find the value of the integral h(x) dx. Is this function Riemann integrable over
[0, 1]?

6.2.4. Show that if 0 is a simple function given by

0(x) = XT(x),

where (k1, ... , are any n real numbers, not necessarily distinct, and (T1, ...,
are measurable sets, not necessarily pairwise disjoint and whose union is not
6.2 Integration 181

n C
necessarily the domain of 0, but for which 0(x) = 0 for all x e T,) , then
it is still true that for any measurable set E,

fE
6.2.5. Explain why it is that if and 0 are measurable functions, then An = {x e
E I fn(X) > aØ(x)} is a measurable set.
6.2.6. Compare the hypotheses of the monotone convergence theorem, Theo-
rem 6.14 with those of the Arzelà—Osgood theorem, Theorem 4.5 on p. 106.
Give an example of a sequence of functions that satisfies the hypotheses of the
Arzelà—Osgood theorem but not those of the monotone convergen theorem. Give
an example of a sequence of functions that satisfies the hypotheses of the monotone
convergence theorem but not those of the Arzelà—Osgood theorem.
6.2.7. Prove Corollary 6.16.
6.2.8. Show that the conclusion of Corollary 6.16 can be false if we do not
require that > 0 for all n, even if we strengthen the bounding condition to
I
f(x))dxl <A.
6.2.9. Let 1
be a series of integrable functions for which
°° b

a
n=1

converges. Show that the series converges almost everywhere on [a, b]


and
°° b b °°
fn(x)dx=f
6.2.10. Show that the conclusion of Exercise 6.2.9 can be false if the assumption
is weakened to
°° b

converges.
6.2.11. In the proof of Proposition 6.11, explain why En C En+i. Then show that
if is unbounded for some x, then x must be contained in at least one of
the Ek.
6.2.12. Show that if f is Lebesgue integrable on E and if

Sn = {x e If(x)I
then •m(Sn)= 0.
182 The Lebesgue Integral

6.2.13. Show that if f and g are integrable over E and f(x) g(x) for all x e E,
then

I f(x)dx
JE
I g(x)dx.
JE

6.2.14. Show that we can still conclude that fE f(x) dx g(x) dx with the
weaker hypothesis that f(x) g(x) almost everywhere on E.

6.2.15. Prove the three identities of Proposition 6.12.

6.2.16. Show that if f(x) dx = 0 for every measurable subset S E and if


m(E) > 0, then f = 0 almost everywhere on E. Show that the conclusion still
holds if all that we know is that f(x) dx = 0 for every closed subset S E.
6.2.17. Find an example of an integrable function f on a set of positive measure
E so that f(x) dx = 0 for every open subset S ç E, but f is not zero almost
everywhere.

6.2.18. Show that if f is Lebesgue measurable on E and

f f
f f 0 almost everywhere on

f a on the set E, m(E) <oc.


Prove that f is Lebesgue integrable if and only if k m(Ek) converges, where

Ek={x e

6.2.20. Let f be a nonnegative, measurable function on the set F, m(F) < oc.
Prove that f is Lebesgue integrable if and only if m (Fk) converges, where

Fk=
6.2.21. Let f be a nonnegative, measurable function on the set G, m(G) < oc. For
E > 0, define

S(E)= where Gk = {x e G kE f(x) <(k + 1)E}.

Prove that

lim S(E)
E-±O
= I f(x) dx.
JG
6.3 Lebesgue's Dominated Convergence Theorem 183

6.3 Lebesgue's Dominated Convergence Theorem


In 1904, Lebesgue published his solution to the problem of term-by-term inte-
gration. Lebesgue's solution gives a sufficient condition rather than a necessary
condition, but it hews so closely to what is necessary that it gives us a simple
yet practical guide to determine when term-by-term integration is allowed. In this
section, we state the theorem, and then discuss how it is used and what it means,
and finally prove it.

Theorem 6.19 (Dominated Convergence Theorem). Let be a sequence


of integrable functions that converges almost everywhere to f over the measurable
set E. If there is an integrable function gfor which

I I
g almost everywhere in E,

then f is integrable and

I f(x)dx =
JE
lim I (6.27)

In terms of infinite series, this says that if the are integrable functions, if the
series converges almost everywhere, and if the partial sums are bounded by an
integrable function, g,

g(x) almost everywhere in E,

then the integral of the series exists, is finite, and

dx).
f dx =
(f (6.28)

Uniform Convergence
Weierstrass had shown that if a sequence of Riemann integrable functions converges
uniformly, then the integral of the limit is the limit of the integrals. We shall show
that this is a special case of Theorem 6.19, the dominated convergence theorem.
To say that (fr) converges uniformly to f over [a, b] means that for any E > 0
we can find a response N so that nN implies that

— < E for all x in [a, bI.


184 The Lebesgue Integral

Since we are working with Riemann integrable functions, each of the


f we define the function g by

g(x) = max + E},

then g will also be measurable and bounded, and therefore, integrable on [a, b].
Since

g(x) for all x e [a, hi,

the conditions of Theorem 6.19 are satisfied.

Bounded Convergence
Arzelà's generalization of Osgood's theorem says that if a sequence of integrable
functions converges to an integrable function and there is a finite bound A such
that

— <A for all n > 1 and for all x in [a, b],

then the integral of the limit is the limit of the integrals. Again, this is a special
case of Theorem 6.19, the dominated convergence theorem. In this case, we define

g(x) = + 2A.

The function g is integrable. For every x e [a, b] we have that

— f(x) - + g(x).

Example 4.7 from Section 4.3


In Section 4.3, we saw how Osgood's theorem explains Example 4.8. This was
a sequence of continuous functions that do not converge uniformly, but do have
bounded convergence. But the Arzelà and Osgood results did not help us with
Example 4.7,

1 + n3x2
These functions converge to zero on [0, 1], but the convergence is not bounded. The
maximum value of this function on [0, 1] occurs at x = and is equal to
a value that does not stay bounded. Nevertheless, as we saw in equation (4.9) on
p. 102, the integral of the limit is equal to the limit of the integrals. This example
can be explained by the dominated convergence theorem.
6.3 Lebesgue's Dominated Convergence Theorem 185

Define g(O) = 0 and


22/3 n2x
3

(see Exercise 6.3.3). This is an unbounded function, but it is integrable over [0, 1]
in the Lebesgue sense.
We can use an improper Riemann integral to verify that g is Lebesgue integrable:
22/3
lim I dx = lim (211/3 — 2'/'3a2"3) = 21/3.
a_*O+ J 3

The unproven theorem that we are using is that any strictly nonnegative function
for which the improper Riemann integral exists will be Lebesgue integrable (see
Exercise 6.3.4).
For a rigorous verification that g is Lebesgue integrable, we show how to express
g as a limit of simple functions. For 1 < i <rn, let be the interval [(i —
1)/rn, i/rn), which is closed on the left, open on the right, and let S = [(rn —

1)/rn, 1]. Our function g can be written as a limit of step functions,


m 22/3 —1/3

We see that
1 m 22/3 . —1/3 m 22/3 . —1/3
1


m
22/3
—13
1/.

We can bound this summation by integrals,


m m m
xl/'3dx
1
+ J dx
> > f
-rn —-> i >-rn
2 2 2 2
i=1

If we divide by rn213 and then take the limit as rn approaches oo, we get
m
22/3 22/3 3
lim
m-*oo3rn2/3 3 2

The integrals of the step functions are bounded, and therefore g is integrable.
186 The Lebesgue Integral

However we verify it, g is integrable on [0, 1], and therefore


çl p1
j urn dx = urn dx.
Jo

Example 4.6 from Section 4.3


What about Example 4.6,
= nxe_nx?

As we saw there,

(tim An(x)) dx = 0 but lim (f' dx) =


o o 2
The conclusion of the dominated convergence theorem is false, so the hypothesis
had better be false. Indeed, we see that
sup =
n>l /

which is not integrable over [0, 1].

Sufficient but Not Necessary


The dominated convergence theorem says that if the functions are integrable
and if g = I
which will always be measurable, has a finite integral, then
the integral of the limit equals the limit of the integrals. What if the integral of g is
infinite? Does it follow that the integral of the limit does not equal the limit of the
integrals. In a word, "no."

Example 6.1. Consider =n x the function that is n for 1/n


x 1/n + 1/n2 and zero everywhere else.

Each f,, is nonzero on a distinct interval, and therefore the integral of the
supremum over [0, 2] is the sum of the integrals of the
2 °°
1

[
JO
sup
n>1 n=1

This sum diverges to infinity. On the other hand, at each x,

lim = 0,
fl 00
6.3 Lebesgue's Dominated Convergence Theorem 187

and
1
urn / = urn — =0= / urn
n—*ooJo n—*oon Jo
This is a sequence that is not dominated by an integrable function, and yet the
limit of the integrals does equal the integral of the limit. Lebesgue's condition is
sufficient but not necessary.
Nevertheless, the dominated convergence theorem is extremely useful. It gives us
a very generous condition under which term-by-term integration is always allowed.

Fatou's Lemma
Pierre Fatou (1878—1929) studied as an undergraduate at the École Normale
Supérieure, attending from 1898 to 1901. The result that carries his name was
part of his doctoral thesis of 1906. He worked as an astronomer at the Paris ob-
servatory. Much of his mathematics involved proving the existence of solutions to
systems of orbital differential equations. He also studied iterative processes and
was the first to investigate what today we call the Mandelbrot set.
If we think back to our examples where the limit of the integrals of a sequence of
functions is not equal to the integral of the limit (such as Example 4.6 on p. 100),
we see that the integral of the limit was always less than the limit of the integrals.
Fatou's lemma says what should be intuitively apparent, that if the functions are
nonnegative then we can never get the inequality to go in the other direction.

Theorem 6.20 (Fatou's Lemma). is a sequence of nonnegative, integrable


functions, then

[ lim
JEn—*oo
f
n—÷ooJE
(6.29)

If converges to f almost everywhere on E and if fE dx is bounded by a


constant independent of n, then f is integrable and

[f(x)dx < lim f (6.30)


JE n-*ooJE

Proof Define the sequence (gm) by gm(x) = infn>m It follows that for all
n > m, we have

JE
I gm(x)dx JE
188 The Lebesgue Integral

and therefore

JE
f gm(x)dx lim f
n—÷ooJE

By definition, the sequence (gm) is monotonically increasing and converges to


lim By the monotone convergence theorem (Theorem 6.14), we have that

f lim
E n—* m -* 00 f gm(x)dx
E
lim fE
n—* 00

If converges to f almost everywhere, then /7 f almost everywhere. If


fE dx is bounded by a constant independent of n, then the monotone conver-
gence theorem tells us that f is integrable. LI

Proof of the Dominated Convergence Theorem


Finally, we prove Lebesgue's theorem.

Proof. (Dominated Convergence Theorem, Theorem 6.19) By Theorem 6.5, f


is measurable. We also know that

If I = + g almost everywhere.

Since f+ and are nonnegative, they are each bounded above almost everywhere
by g. By Proposition 6.12, if we change the value of a function on a set of measure
zero, it does not change the value of the integral. Therefore, the integrals of f+
and of are bounded above by the integral of g, which is finite. We have proven
that f is integrable.
Since g + is nonnegative, we can apply Fatou's lemma:

fg(x)dx
+ f f(x)dx = f(g + f)(x)dx
<lim [(g+fn)(x)dx
fl—*00 JE

= lim
fl—*00 E E

[g(x) dx +
= JE dx.
n-+ooJE
Therefore,

[f(x)dx < lim [fn(x)dx. (6.31)


JE n—*OOJE
6.3 Lebesgue's Dominated Convergence Theorem 189

We also can apply Fatou's lemma to g — which also is bounded below by 0:

f g(x)dx —
f f(x)dx = f(g — f)(x)dx
f(g_fn)(x)dx
n—*oo JE

= I g(x) dx — lim
JE
I dx.

Therefore,

lim I < I
JE
f(x)dx. (6.32)

Combining inequalities (6.31) and (6.32) yields the desired equality. LI

At this point, it is worth going back and comparing this to Osgood's proof of a
much weaker result in Section 4.3. What should be most striking are the knots we
had to tie ourselves into to deal with those sets on which the convergence was not
nice. It is the ability to neatly excise troublesome sets of measure zero that makes
all the difference.

Exercises
6.3.1. Show that if m n are positive integers, then
ri 11 ri
1 11
I—,-—+——InI--'--+—I=Ø.
1

Lm m m2J " n2J


6.3.2. Show that if f is integrable, then so is

f f dx.

Does the integrability of If I imply the integrability of f?


6.3.3. Prove that for all n > 1 and for all x E (0, 1],
22/3 n2x
>
3 — 1+n3x2
6.3.4. Prove that if f unbounded over [a, b] and f(x) > 0 for a <x <b, but
is
the improper Riemann integral J' f(x) dx exists, then f is Lebesgue integrable
over [a, b].
6.3.5. Show that as an improper Riemann integral,

I (2x sin(x2) — 2x' cos(x2)) dx


Jo
exists (and equals sin(1)), but the Lebesgue integral does not exist in this case.
190 The Lebesgue Integral

6.3.6. Show that the sequence of functions defined by = (n + over


E= [0, 1] is an example for which fE urn dx <urn fE dx.
6.3.7. Define g(x) = + for 0 x < 1. Show that if this supremum
is taken over all real values n 1, then g(x) = —e1 / ln x, and this function is not
Lebesgue integrable over [0, 1].

6.3.8. Show that there is no sequence of functions on [0, 2ir] of the type

= sin(nx) + cos(nx),
which converges to the function 1 almost everywhere on [—ir, ir], and where
+ < 10.

6.3.9. Let f be integrable. Show that


pb
lim
h—*O Ja

6.3.10. Let (fe,) be a sequence of integrable functions such that

<A <oo for almost all x E [a, b].

Show that converges almost everywhere to an integrable function and


b °° °° b

f (>fn(x))dx=>2(f fn(x)dx).
6.3.11. Let (fe,) be a sequence of integrable functions and let f be an integrable
function such that
pb
lim I (x) — f(x) dx = 0.
Ja

Show that converges almost everywhere, then converges to f(x)


almost everywhere.

6.3.12. Let be a sequence of nonnegative functions that converge to f on R


and such that
p00
lim /
fl—*00J00
f(x)dx <oo.
J00
Show that for every measurable set E,

lim I dx = I f(x)dx.
JE
6.4 Egorov's Theorem 191

6.3.13. Show that if (fe,) is a sequence of integrable functions that converges


uniformly to f over [a, b], then
pb çb
f(x)dx = urn
Ja fl_*OOJa

6.3.14. Let = — (n + 0 <x < 1, Show that

fn(x)dx)
f' dx
(f
and

6.3.15. Let be a sequence of measurable functions on E such that

Show that

f fn(x)) dx = fn(x)dx).

6.3.16. Show that if f is integrable on (—oo, oo), then

lim /
fl—*O0
f(x) cos(nx) dx = 0.

6.3.17. Show that if f is integrable on (—oo, oo) and g is bounded and measurable,
then

lim f If(x) (g(x) — g(x + dx =0.

6.4 Egorov's Theorem


In his book Lectures on the Theory of Functions, J. E. Littlewood explained that
there are three principles that lie behind work in real analysis:

1. Every measurable set is almost a finite union of open intervals.


2. Every measurable function is almost a continuous function.
3. Every convergent sequence of measurable functions is almost uniformly con-
vergent.
192 The Lebesgue Integral

By "almost," we mean that it is true except for a set of measure less than E where
can be any positive number, no matter how small. This means that we can first try
proving our theorem in the greatly simplified case where our sets are finite unions
of open intervals, our functions are continuous, and our convergent sequences
converge uniformly. We then use these "almost" statements to expand the range of
situations in which our theorem holds.
The actual theorem summarized in the first principle is Theorem 5.11, that for
any measurable set and any 0 we can find a finite union of open intervals
so that the symmetric difference between the original set and this finite union
has measure less than €. The theorems that correspond to the second and third
principles will be proven in this section. The second principle corresponds to
Luzin's theorem, Theorem 6.26. The third principle is made explicit in Egorov's
theorem, Theorem 6.21.
In Section 4.3 we saw how Osgood approached the justification of term-by-term
integration by looking for a large subset of our interval on which convergence is
uniform. Osgood sought to isolate the F-points, the points that are most problematic
for uniform convergence.
As explained in the last section, we do not need to address uniform convergence
directly in order to prove the dominated convergence theorem, but there is an
implicit use of uniform convergence. In 1911 Dimitri Egorov would make explicit
the connection between Lebesgue measure and uniform convergence. Egorov's
student, Nikolai N. Luzin, then used this result to prove that every measurable
function is almost continuous. By this, we mean that given any 0 and any
measurable function f, we can remove a set of measure < from the domain of f,
and f will be continuous on what remains. Luzin was not the first to observe this.
Lebesgue had stated this theorem — though without providing a proof — in 1903.'
Vitali published a proof of this result in 1905.
Dimitri Fedorovich Egorov (1869—1931) began teaching at Moscow University
in 1894 and earned his doctorate there in 1901. In addition to his work in real
analysis, he is noted for his contributions to differential geometry. In 1923, he
was appointed director of the Institute for Mechanics and Mathematics at Moscow
State University. Egorov protested against the arrests and execution of clergy in the
1920s and also against the attempt to impose Marxist methodology in science. He
was dismissed in 1929, arrested in 1930, and died in exile a year later.
Examples 4.6—4.8 from Section 4.3 are all nonuniformly convergent sequences.
But if we remove any neighborhood of 0, no matter how small, each of these
sequences converges uniformly on the interval that remains. These are all almost
uniformly convergent.

In a footnote, Lebesgue corrects a statement from a letter written to Borel in which he had claimed that one
could remove a set of measure zero and have the function be continuous on the set that remained.
6.4 Egorov's Theorem 193

Definition: Almost uniform convergence


A sequence of functions (fr) converges almost uniformly to f on the measurable
set E if for each 0, there is a set S c E, m(S) <€, such that (fr) converges
uniformly on E — S.

Theorem 6.21 (Egorov's Theorem). is a sequence of measurable functions


that converges almost everywhere to f on [a, b], then this sequence converges
almost uniformly to f.

Proof By Theorem 6.5, f is a measurable function, and therefore so is f —


f on a given set if and only if the
sequence (f — converges uniformly to 0 on that set. This implies that we lose
no generality if we restrict our attention to sequences that converge to 0 almost
everywhere.
We define by

gn(x) = sup Ifm(x)I.


m>n

The sequence (ga) converges uniformly to 0 on a given set if and only if (fn)
converges uniformly to 0 on that set (see Exercise 6.4.3). The advantage of working
with (ga) is that it is a sequence of monotonically decreasing functions. Let A be
the subset of [a, b] on which 0. By our assumption, m(A) = 0.
Define

Since our sequence (ga) is monotonically decreasing and approaches 0 on [a, b] —


A, we see that

Sk,1 cSk,2c•••, and

By Corollary 5.12,

lim m(Sk,fl)
n—+oo
=m (U > m ([a, b] — A) =b — a.
\n= I

Given any 0, for each k, we choose an n that may depend on k, written n(k),
so that
m(Sk,fl(k)) > b — a —
194 The Lebesgue Integral

We set
00 00

S = [a, b] — fl Sk,n(k) = U ([a, b] — Sk,fl(k)).

The measure of this set is bounded by

rn(S) < = (6.33)

If x [a, b] is not in S, then x E Given any k 1, we have that


x E Sk,fl(k), and therefore for any m n(k),

0 gm(x) gn(k)(x) <

The convergence is uniform for all x E [a, b] — S. E


The converse of Egorov 's theorem is easy to prove.

Theorem 6.22 (Egorov Converse). is a sequence of measurable functions


that converges almost uniformly to f on [a, b], then this sequence converges almost
everywhere to f.

Proof. Let Sk be a set of measure < 1/k for which converges uniformly to
f on [a, b] — Sk. In particular, it converges to f. The set on which it does
not converge is contained in Sk, which, by Corollary 5.12, has measure
LI

Convergence in Measure
We have seen that convergence almost everywhere is equivalent to almost uniform
convergence. Both of these are weaker than pointwise convergence which required
convergence at every point. In the route we shall take to prove Luzin's theorem that
measurable functions are continuous once we remove an arbitrarily small set, we
make use of an even weaker type of convergence, convergence in measure.
Notice how similar this is to Kronecker's convergence (Theorem 4.6 on p. 110)
in which the outer content (rather than the measure) of this set must converge to 0.

Definition: Convergence in measure


A sequence of functions (fr) converges in measure to f on the measurable set E
ifforalla >0,
lim
fl—+00
m({x eE >aI)=O.
6.4 Egorov's Theorem 195

Since the outer content is always greater than or equal to the measure, convergence
in measure is also considerably weaker than Kronecker's convergence.

Example 6.2. Consider the functions


(k — 1)/n x k/n,
fkn(x)= 11,
0, otherwise.

The function is 0 except on an interval of length 1/n. The sequence

fl,2,
converges to 0 in measure. However, for each x e [0, 1], there are infinitely many
functions in this sequence at which = 1. This sequence does not converge
atanyx in[O, 1].
Our first result shows what might be expected, that uniform convergence implies
convergence in measure. The second result is perhaps more surprising. If a sequence
converges in measure, it might not converge almost everywhere (as in our example),
but there will always be a subsequence that converges almost everywhere.

Theorem 6.23 (Almost Uniform In Measure). If the sequence con-


verges almost uniformly on E, then it converges in measure on E.

Proof Choose any > 0 and find a set S of measure less than so that
converges uniformly on E — S. Given a > 0, there is a response N so that for any
n N and any x e E — 5, — f(x)I <a. It follows that for n N,

m({x e E >a}) sm(S) <&


We see that

lim m({x e E
fl--* 00
?a}) <&
Since this holds for every > 0,
lim m({x e E

a kind of partial converse, is due to Frigyes Riesz (1880—1956)


in 1909. He was a Hungarian mathematician who earned his doctorate in Budapest
in 1902, working in projective geometry. He is known as one of the founders of
functional analysis, the field that would provide the final answer to the problem of
representability by Fourier series.
196 The Lebesgue Integral

Theorem 6.24 (Riesz's Theorem). If the sequence converges in measure


to f on E, then it has a subsequence (fflk) that converges almost everywhere to f
onE.

Proof Since the sequence converges in measure, we can find an n1 so that


({x E — f(x)I > 21}) <21.
For each k> 1, we find so that > and

m ({x e E Ifnk(x) — f(x)I


I claim that the subsequence converges almost everywhere to f.
Let

Sk {x eE — f(x)I >
=

2 2 2
Let

S=flSk.
The measure of Sk is bounded above by

m(Sk)
2-k + 2-k-1 + +... = 21-k
and therefore the measure of S is
m(S) = lim m(Sk) = 0.

We need to show that if x e E — S, then fflk (x) f(x). Choose any E > 0 and
find K so that <E and x g SK. Then for all k K, x Sk, 50

fnk(x) — <E

Limits of Step Functions


A step function is a simple function for which the measurable sets are intervals
(open, closed, or half open). A single point is considered to be a closed interval,
but recall that a simple function is a sum involving finitely many characteristic
functions. A function that is constant on an interval is also continuous on the
interior of that interval. A step function can be discontinuous only at the endpoints
6.4 Egorov's Theorem 197

of the intervals, and therefore every step function is continuous almost everywhere.
One of the consequences of Riesz's result is the following theorem.

Theorem 6.25 (Measurable Limit of Step Functions). A function f defined


on [a, b] is measurable if and only if it is the limit almost everywhere of step
functions.

Proof One direction is easy. Step functions are measurable functions, so Theo-
rem 6.5 implies that f must be measurable.
In the other direction, we know from Theorem 6.6 that f can be written as the
limit almost everywhere of a sequence of simple functions. Let be such a
sequence of simple functions, and let T be the set of measure zero on which this
sequence does not converge to f. We write as

where the Sk,n are measurable sets. By Theorem 5.11, we know that for each
we can find a finite union of open intervals,
N(k,n)
Uk,n 1k,n,i,
=
such that the measure of the symmetric difference between and is as small
a positive value as we might wish. In particular, we can have

m (Sk,fl Uk,fl) <

We replace the simple function by

Since each is a finite union of intervals, the function is a step function. Note
that the intervals in {U1 might overlap, but each intersection is
a finite union of intervals, and there are only finitely many such intersections, so
we can always rewrite as a simple function for which the measurable sets are
nonoverlapping intervals.
Let = Uk,fl). For every x e [a, b], x lies in exactly one of the
1 <k <ma. Therefore, if x g then = We also know that
rn, rn,

>m (Sk,fl Uk,fl)


198 The Lehesgue Integral

We have that
{x e >E} c {x e >E}UT.
Therefore, for n > N, we have
m ({x e [a,b] — f(x)f > E})
e f(x)I ?E})+0.
Since converges almost everywhere to f, it also converges in measure to f,
and therefore
urn m({x e >E}) =0.
n —k 00

It follows that in measure to f.


converges
By Reisz's theorem (Theorem 6.24), we can find a subsequence of (i/ia) that
converges almost everywhere to f. A subsequence of step functions is still a
sequence of step functions.

Luzin's Theorem
Nikolai Nikolaevich Luzin (1883—1950) studied engineering at Moscow University
from 1901 to 1905. Here Egorov spotted his talent and encouraged him to pursue
mathematics. After graduation, Luzin began the study of medicine but returned
to Moscow University in 1909 to study mathematics with Egorov. They began
joint publications on function theory in 1910. Luzin's theorem comes from a
paper published in 1912. In 1915, Luzin received his doctorate. He was appointed
professor at Moscow University in 1917. In 1935 he became head of the Department
of the Theory of Functions of Real Variables at the Steklov Institute. In 1936
he was denounced for publishing his mathematical results outside of the Soviet
Union, activity that was viewed as anti-Soviet.2 He came close to dismissal and
possible imprisonment, but managed to survive. In addition to his work in function
theory, Luzin is noted for his contributions to descriptive set theory (measure-
theoretic and topological aspects of Borel sets and other a -algebras) and to complex
analysis.
We are going to use the fact that a convergent sequence of measurable functions
is almost uniformly convergent to prove that for every measurable function, we
can remove a set of arbitrarily small measure and the function will be continuous
relative to what remains (see definition of continuity relative to a set on p. 114). We
know that if f is measurable over E, then we can find a sequence of step functions
that converge to f. We know that step functions are continuous almost everywhere.

2
For more on the "Luzin affair," see
6.4 Egorov's Theorem 199

Theorem 6.26 (Luzin's Theorem). If f is a measurable function on the set E,


then given any E > 0, we can find a subset S of E so that m(S) <E and f is
continuous relative to the set E — S.

Proof Let be a sequence of step functions that converges almost every-


where to f on E. Let T be the set of all points of discontinuity of all of the Since
each has finitely many points of discontinuity, T is countable and its measure is
0. Every is continuous on E — T. Given E > 0, let S be a set of measure < so
that (i/ia) converges uniformly on E — S. Since a uniformly convergent sequence of
continuous functions converges to a continuous function, f is continuous relative
toE—T—S.

Dirichlet's characteristic function of the rationals (Example 1.1), a function that


is discontinuous at every point, is not so discontinuous after all. In this case, we can
remove a set of measure 0, the rationals, and our function now is continuous on what
remains. Not every measurable function is quite this close to being continuous. We
cannot always remove a set of measure 0. But for any measurable function and
any E > 0, we can remove a set of measure less than E and our function will be
continuous relative to what remains.

Exercises
6.4.1. Define the functions by

0, otherwise.

Show that 0 on [—2, 2] and there is no set S of measure 0 such that


converges uniformly on [—2, 2] — S.

6.4.2. Without using Egorov's theorem, show that for any E > 0, we can find a set
5, rn(S) <E, such that sequence of functions defined in exercise 6.4.1 converges
uniformly on [—2, 2] — S.

6.4.3. Prove that if gn(x) = supm >n over [a, b], then (ga) converges uni-
formly over [a, b] if and only if converges uniformly over [a, b].

6.4.4. For the sequence given in Example 6.2, fi, i, fl,2, fl,3, ..., find a sub-
sequence that converges almost everywhere. Then apply the proof of Theorem 6.24
200 The Lehesgue Integral

to this sequence and find the subsequence predicted by that proof. In other words,
find the first pair (k1, n i) such that

m ({x e [0, 1] Ifk,n(x)l 2_i


1) <2'.
For each j > 1, find the first pair (kg, n3) after ni_i) such that

m ({x e [0, 1] Ifk1,n1(x)I ? }) <2g.


Show that the sequence given in Example 6.2 (p. 195) also converges in
6.4.5.
Kronecker's sense (see Theorem 4.6 on p. 110). That is to say, for all a > 0,

lim Ce({X e
fl —*00
=0.
Show that any sequence that converges in Kronecker's sense must converge in
measure.
6.4.6. Find an example of a sequence of functions that converges in measure but
does not converge in Kronecker's sense.
6.4.7. Give an example of a measurable function for which we cannot remove a
set of measure 0 and have a function that is continuous relative to what remains.
Justify your answer.
6.4.8. Consider X the characteristic function of SVC(4). Given any E > 0,
describe how to construct a set S (the construction may depend on E) with rn(S) <E
and with the property that X SVC(4) is continuous relative to [0, 1] — S. Justify your
answer.

6.4.9. Let Al be the nonmeasurable set described in Theorem 5.13 on p. 151. Define
the function f by f(x) = q, where q is the unique rational number chosen so that
x e Al + q. Prove that f is discontinuous at every value of x.
6.4.10. Let be a sequence of measurable functions on E, m(E) <oc. Show
that

lim i
f dx=0
1 +
if and only if converges to 0 in measure. Show that this result is false if we
omit the assumption rn(E) <oc.
6.4.11. In equation (6.33) in the proof of Egorov's theorem, Theorem 6.21, we
showed that rn(S) <E. We chose E to be an arbitrary positive integer. Where is the
flaw in our reasoning if we conclude from this statement that rn(S) = 0?
6.4.12. Egorov's theorem does not claim that there exists a subset E C [a, b] with
rn(E) = 0 such that converges uniformly to f on [a, b] — E. However, show
6.4 Egorov's Theorem 201

that it does imply that there exists a sequence of measurable sets, (En), in [a, b]
such that

and converges uniformly to f on each


6.4.13. Show that converges in measure to f, then converges in measure
to g whenever f = g almost everywhere.

6.4.14. Prove that if for each E > 0 there is a measurable set E, m(E) <E, such
that f is continuous on [a, b] — E, then f is measurable on [a, b].
6.4.15. Let be a sequence of measurable functions. Show that the set E of
points at which this sequence converges must be a measurable set.

6.4.16. Prove that if is a sequence of nonnegative, measurable functions on


[a, b] such that dx = 0, then converges to 0 in measure.
Show by example that we cannot replace the conclusion with the assertion that
converges to 0 almost everywhere.

6.4.17. We say that the measurable functions n 1, are equi-integrable over


the set E if for every E > 0 there is a response > 0 such that if S c E and
rn(S) then

dx <E for all n > 1.

Show that if is a convergent sequence of equi-integrable functions over a set


E of finite measure, then

lim I dx = I lim dx.

6.4.18. Prove the following version of Lebesgue's dominated convergence theo-


rem: Let be a sequence of measurable functions that converges in measure
to f on [a, b]. If there exists an integrable function on [a, b], call it g, such that
<g(x) for all n > 1 and all x e [a, b], then
ph
lim j = j f(x)dx.
Ja
6.4.19. Show that if is a sequence of equi-integrable functions (see Exer-
cise 6.4.17) that converges in measure over a set E of finite measure, then

lim I dx = I lim dx.


202 The Lebesgue Integral

6.4.20. Let be a sequence that converges in measure to f on [a, b] and for


which fn(x)I <C for aim 1 and alix e [a, b]. Show that if g is continuous on
[—C, C], then
ph pb
urn /
fl--*(X) I I
g(f(x)) dx.

6.4.21. Let be a sequence that converges in measure to f on [a, b]. Show that
pb pb
urn / sin(f(x)) dx.
Ja
7
The Fundamental Theorem of Calculus

The derivative of Volterra's function oscillates between + 1 and —1 in every neigh-


borhood of a point in SVC(4), the Smith—Volterra—Cantor set with measure 1/2.
Since the set of discontinuities has positive measure, the derivative of Volterra's
function cannot be Riemann integrable. As we have seen, this violates the assump-
tion, encoded in the evaluation part of the fundamental theorem of calculus, that if
we differentiate a function, we can then integrate that derivative to get back to our
original function.
The derivative of Volterra's function may not be Riemann integrable, but it is
not too hard to show that it is Lebesgue integrable. As we shall see in this chapter,
Lebesgue integration saves the evaluation part of the functional theorem of calculus.
The antiderivative part of the fundamental theorem faces a more fundamental
obstacle. The idea here is that if we integrate a function, we can then differentiate
that integral to get back to the original function. We have a problem if the function
with which we start does not satisfy the intermediate value property (see p. 16).
By Darboux's theorem (Theorem 1.7), such a function might be integrable, but it
cannot be possible to get back to the original function by differentiating the integral.
Even here, Lebesgue enables us to say something very close to what we might
wish. We shall see that if a function f is Lebesgue integrable, then the derivative of
f(t) dt exists and is equal to f(x) almost everywhere. Even without assuming
that f is continuous, the antidifferentiation part of the fundamental theorem of
calculus holds almost everywhere.
In this chapter, we need to take a closer look at differentiation. One of the
surprising outcomes will be Lebesgue's result that continuity plus monotonicity
implies differentiability almost everywhere. Weierstrass 's example of a continuous,
monotonic function that is not differentiable at any point of an arbitrary countable
set (such as the set of all algebraic numbers) is essentially as nondifferentiable as
any continuous, monotonic function can be.

203
204 The Fundamental Theorem of Calculus

7.1 The Dini Derivatives


Ulisse Dini (1845—1918) grew up in Pisa and studied at the university in that city,
working under the direction of Enrico Betti. Riemann was in Pisa for extended
periods during 1863—1865, there for his health but working with Betti. Dini, who
was Betti 's student at that time, must have learned much directly from Riemann.
In 1871, Dini took over Betti's chair in Analysis and Higher Geometry as Betti
switched his interests to physics. That same year, Dini entered politics and was
elected to the Pisa City Council. He was elected to parliament in 1880, became
rector of the University of Pisa in 1888, and was elected to the Italian senate in
1892. He was appointed director of the Scuola Normale Superiore in 1908.
Dini had begun work on the existence of and representability by Fourier series
in the early 1870s, inspired by the work of Heine, Cantor, and Hankel. In 1878, he
published an influential book, Fondamenti per la teorica dellefunzioni di variabili
reali (Foundations for the theory of functions of real variables). Here was the first
statement and proof that any first species set has outer content zero. Here was
the first rigorous proof of a result that gave conditions under which continuity
implies differentiability: If f is continuous on [a, b] and if f(x) + Ax is piecewise
monotonic for all but at most finitely many values of A, then f is differentiable
on a dense subset of [a, b] (for Dini's proof with the slightly stricter condition
that f(x) + Ax is piecewise monotonic for all A, see Exercises 7.1.14 through
7.1.20). But the most important contribution in this book was Dini's work on
nondifferentiable functions.
A function f is differentiable at c if and only if
f(x)—f(c)
lim
x—c
exists. There is not much more to say about this limit for a function that is differ-
entiable at c. For a nondifferentiable function, there are many ways in which this
limit might fail to exist. It was Dini's insight to focus separately on the lim sup and
the lim inf of this ratio, on what happens as we approach c from the right, and what
happens as we approach c from the left. He defined four derivatives.

Definition: Dini derivatives


The four Dini derivatives of f at c are
f(x)—f(c) . f(x)—f(c)
D + f(c) = lim , = lim —,
x—c x—c
- f(x)—f(c) f(x)—f(c)
D_f(c) = lim
.

D f(c) = lim ,
x — C x — C
7.1 The Dini Derivatives 205
As examples, for f(x) = fx the Dini derivatives at 0 are

= D+f(0) = 1, D_f(0) = D_f(0) = —1.

For g defined by g(x) = x sin(1/x), x 0, g(0) = 0, the Dini derivatives at 0 are

= Wg(0) = 1, D+g(0) = D_g(0) = —1.

All of the Dini derivatives exist, provided the function is defined in a neighbor-
hood of c. A function is differentiable at c if and only if the four Dini derivatives
at c are finite and equal. It is possible to have a continuous and strictly increasing
function for which all four Dini derivatives at c are different (Exercise 7.1.1). There
is no necessary relationship among these derivatives at a single point except the
fact that the lim sup is always greater than or equal to the lim inf,

Wf(c)> Df(c). (7.1)

One of Dini's important realizations is that if any one of these derivatives is


integrable, then they all must be integrable, and the values of the definite integrals
will be identical. In partricular, he proved the following result, which helps to
patch up the fundamental theorem of calculus in the case where the integral is not
differentiable.

Theorem 7.1 (Dini's Theorem). Let f be Riemann integrable on [a, b]. For
x e [a, b] define
cx
F(x)=J f(t)dt.
a

Let DF be any of the Dini derivatives of F. Then DF is bounded and integrable


on [a, b], and

DF(t)dt = F(b) — F(a).


Ja
This is a nice result with which to introduce Dini derivatives because its proof
illustrates some of their characteristics that will be useful later.

Proof We begin by noting that if g(x) = —f(x), h(x) = f(—x), and k(x) =
—f(—x), then (Exercise 7.1.2)

= Wf(c) = D_f(c) = (7.2)

It follows that if this theorem holds for Dt then it holds for each of the Dini
derivatives.
206 The Fundamental Theorem of Calculus

Since f is Riemann integrable on [a, b], it is bounded on this interval. For each
x e [a, b], define
1(x) = lim f(t), L(x) = lim f(t).
t-÷x
The functions 1 and L are bounded. For x > c, we have that
7 / \
inf f(t) )
\tE[c,x]
(x — c) < / f(t)dt < ( sup 1(t)) (x — c),
J \tE[cx] /
and, therefore,
1
fX
= f(t)dt <L(c),
x— C \tE[cx] /
= 1
fX
f(t)dt> lim (inf 1(t)) > 1(c).
x— C tE[c,x]

Therefore, D+ F is also bounded. We also see that

lim urn lim sup f(t) = L(c), and


/
(inf
tE[x,y]
1(t)) = 1(c).

The oscillation of F at each x e [a, b] is less than or equal to the oscillation of


f, so D+ F is Riemann integrable. Furthermore, if the oscillation of f at C is zero,
then = f(c).
Since f is integrable, it is continuous almost everywhere, so the set of points of
continuity of f is dense in [a, b]. No matter how fine the partition, every interval
contains points at which f and F are equal. This implies that for any partition
we can find Riemann sums for f and D+ F that are equal. Their integrals must be
equal.

Bounded Variation
One of Dini's observations in his 1878 book was that if a function f has Dini
derivatives that are either bounded above or bounded below, then f can be written
as a difference of two monotonically increasing functions (see Exercise 7.1.3).
This is significant because Dirichlet's theorem that prescribed sufficient conditions
under which a function can be represented by its Fourier series included piecewise
monotonicity as one of the conditions. Three years later, in his 1881 paper Sur la
série de Fourier, Camille Jordan found a simple characterization that is equivalent
7.1 The Dini Derivatives 207

Definition: Total variation, bounded variation


Given a function f defined on [a, b] and a partition P = (a = xo <x1 <
Xm = b), the variation of f with respect to P is

V(P, fl —
=
f over [a, b] is the supremum of the variation over all
partitions,

V(f)=supV(P,f).
P

We say that f has bounded variation on this interval if the total variation is finite.

to being representable as the difference of two monotonically increasing functions,


a property he called bounded variation.
If a function is unbounded, it cannot have bounded variation. There are also
bounded functions that fail to have bounded variation. The classic example is the
function defined to be sin(1 /x) for x 0 and 0 at x = 0. We can make the variation
as large as we want by taking sufficiently small intervals near 0.

Example 7.1. A more subtle example is the continuous function g defined by

g(x) = x cos(1/x), x 0, g(0) = 0, over [0, 1/rr].

If we take the partition (0, 1/(N7r), 1/((N — 1)7r),..., 1/2rr, 1/7r), then the vari-

ation is

1 1 1 1 1

We can make this as large as we want by taking N sufficiently large, so this function
does not have bounded variation.

Example 7.2. Still more subtle is the differentiable function h defined by

h(x) = x2 cos(1/x), x 0, h(0) = 0, over [0, 1/7r].

Given any partition of [0, 1/rr], we find the smallest N so that 1/N7r is less than
x1, the first point of the partition that lies to the right of 0. The variation that
208 The Fundamental Theorem of Calculus

corresponds to this partition is bounded above by

+
+

This quantity is bounded for all N by


2 1 1 1

This function has bounded variation, even though it oscillates infinitely often in
any neighborhood of 0.

We are now ready to state Jordan's characterization of functions that are a


difference of monotonically increasing functions.

Theorem 7.2 (Jordan Decomposition Theorem). A function defined on [a, b]


is the difference of two monotonically increasing functions if and only if it has
bounded variation.

Proof. We begin with the assumption that f = g — h, where g and h are mono-
tonically increasing. It follows that
=
— — — g(xji) +

g(xi) — —
+
(g(b) — g(a)) + (h(b) — h(a)).

In the other direction, we assume that f has bounded variation. We define the
function T by

T(x) =
the total variation of x over the interval from a to x. The function T is monotonically
f
increasing. Since f = — (T — f), we only need to show that T — is also f
monotonically increasing. We leave it as Exercise 7.1.8 to verify that for a <b <c,
we have

V(f) = V(f) + V(f).


7.1 The Dini Derivatives 209
Take any pair x, y, a x <y <b. We have that
(T(y) — f(y)) — (T(x) — f(x)) = V(f) — f(y) + f(x)
=
This is greater than or equal to 0, because

f(y) -
Corollary 7.3 (Continuity of Variation). The function f is continuous and of
bounded variation on [a, b] if and only if it is equal to the difference of two
continuous, monotonically increasing functions.

Proof. We assume that f is continuous and of bounded variation. To show that f


is the difference of two continuous, monotonically increasing functions, we only
need to establish that T(x) = V(f) is continuous on [a, b]. We assume that T is
not continuous at some c e [a, b] and show that this leads to a contradiction. Since
T is monotonically increasing and bounded, T(x) and T(x) exist.
The only way T could be discontinuous at c is if at least one of these limits does
not equal T(c). By symmetry (we can replace f(x) by g(x) = f(—x) so that a limit
from the right becomes a limit from the left) and the fact that T is monotonically
increasing, we can assume that

T(c) = lim T(x) + E, for some E > 0.


x -÷

Since f is continuous at c, we can find a response > 0 so that Ix — <


implies that If(x) — f(c)I <E/2. Take any partition of [a, c] for which the last
subinterval has length less than say a = xO <x1 < = c, c —
The variation is bounded by
n n—i
— — +

+
E
< lim T(x)-f---
2

This means that every sufficiently fine partition of [a, c] has a variation strictly less
than T(c) — E/2, but this contradicts the definition of T(c) as the supremum of the
variations over all partitions.
The other direction is left as Exercise 7.1.10.
210 The Fundamental Theorem of Calculus

Exercises
7.1.1. Find a strictly increasing, continuous function for which
Df(0), and D_f(0) are all different.
7.1.2. Show that if g(x) = —f(x), h(x) = f(—x) and k(x) = —f(—x), then
D+f(c) = Wf(c) = D_f(c) =
7.1.3. Show that if all four Dini derivatives are bounded below by A and if c is any
constant larger than IA then f(x) + cx is a monotonically increasing function of
x. It follows that f is the difference of two monotonically increasing functions.
7.1.4. Show that Dirichlet's function, the characteristic function of the rationals,
does not have bounded variation.
7.1.5. Prove that if a function has bounded variation on [a, b], then it is bounded
on[a,b].
7.1.6. Give two examples of continuous functions on [0, 1] that do not have
bounded variation and whose difference does not have bounded variation.
7.1.7. Show that the set of points of discontinuity of a monotonic function is
countable. Using this result, prove that any function of bounded variation has at
most countably many points of discontinuity.
7.1.8. Show that for a <b <c, we have
V(f) = Vab(f) +

V(f +g) V(f)+ V(g) and = cf V(f).


7.1.10. Show that if T(x) = V(f) is continuous, then so is f.
7.1.11. Show that if is a sequence of functions that converges pointwise to f
on [a, b], then
Vab(f)<
n —±00

7.1.12.Let f be defined by f(0) = 0 and f(x) = x2 sin(1/x2) for x 0. Does f


have bounded variation on [0, 1]? Justify your answer.
7.1.13. For positive constants a and define by = 0 and =
xa for x 0. Prove that has bounded variation on [0, 1] if and only
ifa > 6.

Exercises 7.1.14—7.1.20 will lead you through Dini's proof that if f is contin-
uous on [a, b] and f(x) + Ax is piecewise monotonic for all A e R, then f is
7.1 The Dini Derivatives 211

differentiable on a dense set of points in [a, b]. In these exercises, by "interval" we


mean a closed interval with distinct endpoints.
7.1.14. For each pair of real numbers p <q in [a, b], consider the function defined
by
f(q)
g(x;p,q)= f(x)—f(p)— f(p) — (x —p).
p—q
Show that g(p; p, q) = g(q; p, q) = 0. Explain why g also is continuous and piece-
wise monotonic.
7.1.15. Given any interval I ç [a, b], define
[f(x) — f(y)
x—y
Show that for each E > 0, there exists a pair r, s e I, r <s such that
f(r) — f(s)
11<
r—s
<lj+E.
7.1.16. Show that there is a subinterval J c I on which g(x; r, s) (defined in
Exercise 7.1.14) is decreasing.
7.1.17. Show that if g is decreasing on the interval J ç I, then for any x y
we have that
f(x) — f(y) f(r) — f(s)
x—y r—s
<lj+E.
7.1.18. Explain how to construct a nested sequence of intervals, I 12
such that x y e implies that
f(x)—f(y) 1

x—y
for some sequence (li, 12,...)

7.1.19. Show that the sequence (la) is increasing and for all ii ? 1,

1 1 1 1

It follows that this sequence converges. Denote this limit by

7.1.20. Let xo be any element of Show that f, the function with which
we began in Exercise 7.1.14, is differentiable at xO and that f'(xo) = lo.
7.1.21. Show that the devil's staircase, DS(x) (Example 4.1 on p. 86), is not
differentiable at 1/4.
212 The Fundamental Theorem of Calculus

7.2 Monotonicity Implies Differentiability Almost Everywhere


We are ready to tackle the proof that a monotonic, continuous function is differen-
tiable almost everywhere. Since every function of bounded variation is a difference
of monotonic functions, this implies that every continuous function of bounded
variation is differentiable almost everywhere.
The connection between continuity and differentiability had long been debated.
As soon as we had the modern definition of continuity, the Bolzano—Cauchy def-
inition, it was realized that differentiability necessarily implied continuity. It was
always clear that the implication did not work in the other direction, xl being the
classic example of a function that is continuous at x = 0 but not differentiable
here. But for many years it was believed that continuity on a closed and bounded
interval implied differentiability except possibly on a finite set. This was stated as
a theorem in J. L. Raabe's calculus text of 1839, Die Differential- und Integral-
rechnung. In Section 2.2, we discussed Weierstrass's example of a function that
is everywhere continuous and nowhere differentiable, as well as his example of a
monotonic, continuous function that is not differentiable at any of a countable set
of points. Weierstrass believed, a belief shared by others, that it was only a matter
of time before someone found a monotonic, continuous function that was nowhere
differentiable.
Dini's 1878 result — if f(x) + Ax, defined on a closed and bounded interval,
is piecewise monotonic for all but finitely many A, then f is differentiable on a
dense set — had cast doubt on the existence of a nowhere differentiable, continuous,
monotonic function, but it did not rule it out. Lebesgue laid the matter to rest
when he proved that any monotonic, continuous function must be differentiable
almost everywhere. In 1910, Georg Faber showed that continuity was not necessary.
Every monotonic function — and thus every function of bounded variation — is
differentiable almost everywhere. A year later, the husband and wife team of
William H. Young and Grace Chisholm Young published an independent proof
that continuity is not needed. In retrospect, the fact that continuity is not needed
should not be surprising. A function that is monotonic on a closed, bounded interval
is quite limited in how discontinuous it can be.
Georg Faber (1877—1966) received his doctorate in Munich in 1902, his Habili-
tation in Wurzburg in 1905. He taught at the Technische Hochschule (Institute of
Technology) in Munich and became rector of this university at the end of World War
II, overseeing the resumption of its activities. William Henry Young (1863—1942)
and Grace Chisolm Young (1868—1944) were a husband and wife team of British
mathematicians who made many important contributions to analysis. Although
William held a succession of positions at various British universities, the Youngs
made their home in Gottingen, Germany, until 1908, in Geneva, Switzerland until
1915, and finally in Lausanne.
7.2 Monotonicity Implies Differentiability Almost Everywhere 213

The easiest way to prove the Faber—Chisholm—Young result is to first prove


Lebesgue's theorem and then to deal with discontinuous monotonic functions. Our
proof will follow that expounded by Chae, which, in turn, is based on the elementary
proof of Lebesgue's theorem by Riesz in 1932 and the proof that continuity is not
required that was given by Lee Rubel in 1963.

Theorem 7.4 (Continuity + Bounded Variation Differentiable AE). If f


is continuous and has bounded variation on [a, b], then f is differentiable almost
everywhere on [a, b]

Outlining the Proof


Riesz's proof may be elementary in the sense that it does not require the introduction
of sophisticated mathematical tools, but it is not simple. I shall discuss the approach
we are to take and the main obstacles and shall then break the proof into a sequence
of lemmas. In what follows, we assume that f is monotonically increasing. As
we saw in the previous section, any continuous function of bounded variation is a
difference of two continuous, monotonically increasing functions.
We use the Dini derivatives. Since f is monotonically increasing, the Dini
derivatives are never negative. We allow the value to signify that the limit
diverges to oc. With this understanding, all Dini derivatives have values at every
x e [a, b]. We need to show that the Dini derivatives are finite and equal almost
everywhere.
We already know, by the definition of lim sup and lim inf, that

D+f(x) and Df(x)> D_f(x).


If we can establish that

D_f(x)> and Df(x),


then all four Dini derivatives are equal. If we can establish that the first of these
inequalities, D_ f(x) f(x), holds almost everywhere provided that f is con-
tinuous and monotonically increasing, then this inequality also holds almost ev-
erywhere on [—b, —a] for k(x) = —f(—x): D_k(x) But as we saw in
the previous section

D_k(—x) = and = Wf(x).


It follows that

Wf(x).
214 The Fundamental Theorem of Calculus

Definition: Shadow point


Given a continuous function g on [a, b], we say that x E [a, b] is a shadow point
of g if there exists z E [a, b], z > x, such that g(z) > g(x).

We have shown that it is enough to prove that


< D_f(x), almost everywhere, and (7.3)
D+f(x) <oc, almost everywhere. (7.4)

We start with inequality (7.3). Riesz observed that if D_ f(x) < for a
given value of x, then we can find two rational numbers, r and R, such that
D_f(x) <r <R <
For each pair of rational numbers 0 < r <R <oc, we define the set
= {x e <r <R <
There are a countable number of such pairs. If we can show that each set has
measure zero, then inequality (7.3) holds almost everywhere.
The set is the intersection of ER = {x e (a, b) R} and Er =
{x e (a, b) I D_f(x) <r}. We need to limit the size of these sets. Notice that the
set of x for which = oc is equal to flR ER. Using the flipping operation,
h(x) = f(—x), we see that D+h(x) = —D_f(x), so whatever we can say about
ER can be translated into a comparable result for Er. The key is to be able to limit
the size of ER.
If x e ER, then we can find a z > x such that
f(z)—f(x)
> R, or, equivalently,
z—x
f(z) — Rz> f(x) — Rx.
If we define g(x) = f(x) — Rx, then the set ER is contained in the set of shadow
points of g. The shadow points correspond to points in the valleys (see Figure 7.1).
A point x is a shadow point if the point (x, g(x)) on the graph of g lies in the
shadow of the rising sun. The next lemma provides the key to bounding the size of
ER.

Lemma 7.5 (Rising Sun Lemma). Let g be continuous on [a, b]. The set of shadow
points of g that lie in (a, b) is a countable union of pairwise disjoint open intervals
(ak, bk)for which

g(ak) g(bk) for all k. (7.5)


7.2 Monotonicity Implies Differentiability Almost Everywhere 215

Figure 7.1. Shadow points.

Proof. For each shadow point x, we can use the continuity of g to find a small
neighborhood of x that is left of z and over which the value of the function stays less
than g(z). This tells us that the set of shadow points is an open set. By Theorem 3.5,
the set of shadow points is a countable union of pairwise disjoint open intervals
(ak, bk). What may seem obvious from looking at Figure 7.1 but actually takes
some work is that g(ak) g(bk) (inequality (7.5)). This is a critical part of the
rising sum lemma.
We shall prove that g(x) g(bk) for every x E (ak, bk). The inequality forx = ak
then follows from the continuity of g (see Exercise 7.2.4). For ak <x <bk, let

= {y E g(x) g(y)}.
Since x E is nonempty. It is bounded by bk, so has a least upper bound,
t= sup g(t) g(x) (see Exercise 7.2.5). If t <bk, then g(bk) <g(x). Also,
if t <bk, then t (ak, bk), so t is a shadow point. We can find a z > t so that
g(z) > g(t). We have the inequalities
g(z) > g(t) g(x) > (7.6)

Since t is the least upper bound of y E [x, bk] for which g(y) g(x), z must be
larger than bk, and, as we have seen, g(z)> g(bk). This means that bk is a shadow
point, a contradiction. Therefore bk = sup and g(x).
Notice that the shadow points of g(x) = f(x) — Rx include much more than
just the points in The reason for using shadow points is so that we can work
with a countable union of intervals, as we shall see in the next result that tells us
about the size of

The Proof of Theorem 7.4


The next three lemmas complete the proof of Theorem 7.4.
216 The Fundamental Theorem of Calculus

Lemma 7.6 (Size of ER). If R > 0, f is a monotonically increasing continuous


function on (a, ,8), and ER = {x E (a, D+f(x)> R}, then ER is contained in
I

a countable union of pairwise disjoint open intervals, (ak, ,8k),for which


fC8)—f(a)
R
k

Proof. We apply the rising sun lemma to the function g(x) = f(x) — Rx. As we
have seen, ER is contained in

U(ak, ,8k),

where the open intervals are pairwise disjoint and g(ak) This tells us that

f(ak) — Rak fC8k) — R,8k, or, equivalently, (7.7)

— ak — f(ak)). (7.8)

Since f is monotonically increasing,


1 f(,8)—f(a)
— ak) (fC8i3 — f(ak))
R

The set of x for which D+f(x) = is the intersection of taken over all
R E N. We see that E' E2 .••. By Corollary 5.12, we can conclude that
f(b) f(a)
m ({x E [a, b] = = lim m(ER) iim =0.
R—÷oo R—*oo R
(7.9)
To find the size of and finish our proof, we need a slightly different result for
Er.

Lemma 7.7 (Size of Er). 1ff is a monotonically increasing continuous function on


(a, ,8), and Er = {x E (a, D f(x) <r j, then Er is contained in a countable
union of pairwise disjoint open intervals, (ak, ,8k),for which

fC8k)—f(ak)

Proof We follow the proof of Lemma 7.6 with f replaced by h(x) = f(—x) and
R replaced by —r. Notice that we did not need the fact that R is positive until we
divided by R to get inequality (7.8). We observe that

—r D_f(x) <r,
7.2 Monotonicity Implies Differentiability Almost Everywhere 217

so
Er = {x E (a, Df(x) <r) = {x E (—a, —a) —r j.
Following the proof of Lemma 7.6 up to inequality (7.7), we see that

h(—,8k) — (—r)(—,8k) h(—ak) — (—r)(—ak). (7.10)

This is equivalent to

fC8k) — f(ak) < ak). (7.11)

The next lemma completes the proof of Theorem 7.4. It uses a very ingenious
trick.

Lemma 7.8 (Size of If f is a monotonically increasing continuous function


on (a, b), then
m({x E

(a, (a, b), and consider

By Lemma 7.7, Er fl (a, is contained in a countable union of disjoint open


intervals (ak, for which

f(13k) — f(ak) <r(f3k — ak).


For each interval (ak, 13k), we know from Lemma 7.6 that fl (ak, If3k) is contained
in a countable union of open intervals, for which
> fC8k)—f(ak)

R

We have shown that fl (a, Uk,fl(ak,fl, a union of pairwise disjoint


open intervals for which
f(1f313 —

> >
—ak)

r
(7.12)
218 The Fundamental Theorem of Calculus

For each pair k, n, we apply this result to fl This set is contained


in a countable union of disjoint open intervals, for which

— — (7.13)

Combining equations (7.12) and (7.13), we see that fl (a, is contained in


the countable union of pairwise disjoint open intervals Uk US(ak,fl,S, for
which
r r2
> — — — a). (7.14)

For each triple k, n, s, we apply equation (7.12) to fl We are


able to put fl (a, inside a countable collection of pairwise disjoint open
intervals for which

> — — a). (7.15)

Proceeding by induction, for any positive integer N, we can put inside a


countable collection of pairwise disjoint open intervals for which the sum of the
lengths is less than or equal to

Since r/R < 1, this implies that the outer measure of is 0.

The Faber—Chisholm--Young Theorem


We now show that we can do without the assumption of continuity.

Theorem 7.9 (Bounded Variation Differentiable AE). If f has bounded


variation on [a, b], then f is differentiable almost everywhere on [a, b].

Again, we can assume that f is monotonically increasing. A function of bounded


variation is simply a difference of two such functions. If two functions are differ-
entiable almost everywhere, then so is their difference.
The discontinuities of an increasing function are jumps, places where
lim f(x) < lim f(x).
Consider the subintervals of (f(a), f(b)) that are not in the range of f. Each
contains a distinct rational number. Therefore, the number of such intervals is
7.2 Monotonicity Implies Differentiability Almost Everywhere 219

(
I
Figure 7.2. A strictly increasing function with discontinuities. Its inverse.

countable, and the set of points of [a, b] at which f is discontinuous is either


empty or a countable set.
Let g(x) = f(x) + x. The function f is differentiable if and only if g is differ-
entiable, and g is slightly easier to work with because it is strictly increasing. The
inverse of a strictly increasing function is continuous and monotonically increas-
ing on its domain and can be uniquely extended to a continuous, monotonically
increasing function on (f(a), f(b)) (see Figure 7.2). This should be obvious from
the picture. We can make it rigorous by defining

G(x) = sup{t E [a, b] g(t) <x}, I


g(a) x g(b).

This function is well defined, monotonically increasing, and — because g is strictly


increasing — G(g(x)) = x (see Exercise 7.2.6). Recall from Theorem 3.1 that a
function is continuous if and only if its inverse maps open sets to open sets.
Since g is strictly increasing, it maps the open interval (a, to the open interval
(g(a), g(,8)), and therefore G is continuous.
We can apply Theorem 7.4 to the function G. This will be the key to our proof,
which will come in two parts. We use the existence of the derivative of G to prove
that the Dini derivatives of g are equal (possibly +oc) almost everywhere. We then
prove that D+g is finite almost everywhere.

Lemma 7.10 (Equality of Dini Derivatives). Let g be a strictly increasing function


on [a, b]. The Dini derivatives of g are equal (possibly +oc) almost everywhere.

Proof Let G be the continuous, monotonically increasing function obtained by ex-


tending the inverse of g to the entire interval [g(a), g(b)]. We know by Theorem 7.4
220 The Fundamental Theorem of Calculus

that G'(x) exists and is finite almost everywhere. We write


g(y) — g(x) — g(y) — g(x) — [G(g(y)) —

G(g(x)) — [ ] g(y) — g(x)


The points of discontinuity of g are countable, and thus have measure zero.
Where g is continuous, y x implies that g(y) g(x), and, therefore, except on
this set of measure zero,
g(y)—g(x) . [G(g(y))—G(g(x))1' 1
lim = lim I I =
y—÷x y— x g(y)—÷g(x) [ g(y) — g(x) ] G'(g(x))

Lemma 7.11 Finite AE). Let g be a strictly increasing function on [a, b].
The Dini derivative D+g is finite almost everywhere.

Proof We can restrict our attention to those x for which D+g(x) = Dg(x)
because the set of x on which they differ has measure zero. Let E°° = {x E
(a, b) I Dg(x) = +oc}. If x E E°°, then for any positive N we can
find s and t, s <x <t, such that
g(t) —g(x)> N(t — x),
g(x) — g(s)> N(x — s).

Therefore, g(t) — g(s)> N(t — s). We define SN to be the set of all x E (a, b) for
which we can find s and t, a <s <x <t <b, such that g(t) — g(s)> N(t — s).
By what we have just shown about E°°, we know that it is a subset of 5N• For each
xe we select and so that <x <ti,

— — si). (7.16)

The intervals (si, ti), taken over all x E SN, provide an open cover of 5N•
The set SN is open (Exercise 7.2.7). By Theorem 3.5, SN is the union of a count-
able collection of pairwise disjoint open intervals, 5N = Uk(ak, bk). We choose a
closed interval inside each (ak, bk) whose length is exactly half bk — ak,
1
[ak, ,8kI c (ak, bk), — ak = — ak). (7.17)

Each closed interval [ak, ,8kI is contained in the open cover UXEsN(5x, ti). By the
Heine—Borel theorem, Theorem 3.6, we can find a finite collection of these open
intervals that covers [ak,

[ak, 13kI c U (sx(k,J),


j=1
7.2 Monotonicity Implies Differentiability Almost Everywhere 221

This finite open cover can be ordered so that Sx(k, 1) < Sx(k,2) < < We
can assume that tx(k, 1) < tx(k,2) < < because otherwise there is an open
interval that can be eliminated from the cover. Furthermore, if the right endpoint
of one interval is strictly greater than the left endpoint of the second interval to its
right, Sx(k,j+2) <tx(k,j), then the interval in the middle is contained in their union,

(Sx(k,j+1), c tX(k,J)) u (sX(k,J+2), tX(k,J+2))

(see Exercise 7.2.8), so we can eliminate (SX(k,J+1), tX(k,J+1)) from the cover. Thus,
after removing these superfluous intervals, the intervals in odd position are pairwise
disjoint, as are the intervals in even position.
We can use the fact that g is strictly increasing, together with equation 7.16, to
put a bound on the length of [ak,

13k — > — Sx(k,2J_1)) + > (tX(k,2J) — Sx(k,2J))


l<j<flk/2

— g(sx(k,2j_1)))
l<j<flk/2

+ — g(sx(k,2j)))
l<j<flk/2

— g(ak)) + — g(ak)) = — g(ak)).

It follows that E°° SN (ak, bk) where these intervals are pairwise dis-
joint. The measure of E°° is less than or equal to
00 00

— ak) = 2 >C8k — ak)

400
> (gC8k) — g(ak))

Since this is true for all N > 0, no matter how large, we can conclude that
m(E°°) =0.

Exercises
7.2.1. Let f be the function defined by f(0) = 0 and f(x) = x sin(1/x) for x 0.
Find Df(0), and D_f(0).
222 The Fundamental Theorem of Calculus

7.2.2. Show that if a function f assumes its maximum at c, then f(c) 0 and
D_f(c) > 0.
7.2.3. Show that if f is continuous on [a, b] and any one of its Dini derivatives
(say is everywhere nonnegative on [a, b], then f(b) f(a).
7.2.4. Prove that if f is continuous on [ak, bk] and if f(x) f(bk) for all x E
(ak, bk), then f(ak) f(bk).
7.2.5. Prove that if f is continuous on [ak, bk] and if
t = sup fy E [ak, bk] f(x) f(y)j,
then f(t)> f(x).
7.2.6. Show that if g is strictly increasing and

G(x) = sup{t E [a, bj I g(t) <x}, g(a) x g(b),


then G(g(x)) = x.
7.2.7. Let f be a strictly increasing function and let EN be the set of all x E (a, b)
for which we can find s and t, a <s <x <t <b, such that f(t) — f(s)> N(t —
s). Prove that EN is open.

7.2.8. Given three intervals (a1, b1), (a2, b2), (a3, b3) that cover (ai, b3) with a1 <
a2 <a3 and b1 <b2 <b3, show that if a3 <b1, then
(a2, b2) c (a1, b1) U (a3, b3).
7.2.9. Let f be monotonically increasing on , b], and c an arbitrary value in
(a, b). Show that
sup f(t) = lim f(x) <f(c) < lim f(x) = c<t<b
inf f(t).
a<t<c x—*c

Explain how this implies that f(x) and f(x) exist.

7.2.10. Given an arbitrary sequence (xv) ç [a, b] and a sequence of positive num-
bers (ca) such that < oc, define the function f by

f(x)= >cn.
xn <x

Show that
1. f is monotonically increasing on [a, b],
2. f is discontinuous at each and
3. f is continuous at each x E [a, b] —
7.3 Absolute Continuity 223
7.2.11. Verify that in the rising sun lemma (Lemma 7.5), we have f(ak) = f(bk)
except possibly when ak = a.

7.3 Absolute Continuity


Once mathematicians ceased to define integration as the inverse process of dif-
ferentiation, they were faced with the two sets of questions that constitute the
fundamental theorem of calculus,

1. Antiderivative part:
(a) When is a function integrable?
(b) If the integral exists, when can that integral be differentiated?
(c) When does differentiating the integral take us back to the original
function?
2. Evaluation part:
(a) When is a function differentiable?
(b) If the derivative exists, when can that derivative be integrated?
(c) When does integrating the derivative take us back to the original function?

For question 1(a), we have a simple characterization of Lebesgue integrable


functions: measurable functions for which the integrals of f+ and are finite.
Question 1(b) asks when the resulting function F(x) = f f(t) dt is differentiable.
As the next theorem implies, the answer is "always, almost everywhere." We shall
answer question 1(c) in the next section.

Theorem 7.12 (Properties of Integral). 1ff is integrable on [a, b], then F(x) =
f f(t) dt is uniformly continuous and of bounded variation on [a, b].
By Theorem 7.9, it follows that F is differentiable almost everywhere.

Proof We are looking at F on a closed and bounded interval, so continuity will


imply uniform continuity. Given an E > 0, we seek a response so that Ix — y I

implies that
=
F(x) —
f
f is integrable, by Corollary 6.17 we can find a simple function such that
fb
224 The Fundamental Theorem of Calculus

Since every simple function takes on only finitely many values, it is bounded, say
<B for all x E [a, b]. Choose = €/2B, then Ix — yI implies that

f
f(t)
f dt

f
To see that F has bounded variation, we observe that
fX fX
and

are monotonically increasing functions, and therefore

F(x)
= f f(t)dt = f f(t)dt
is a difference of monotonically increasing functions. By the Jordan decomposition
theorem (Theorem 7.2), it has bounded variation.

The Evaluation Part


The results of Section 7.2 give us a good answer to 2(a): If f has bounded variation,
then it is differentiable almost everywhere. Bounded variation is not necessary, but
it is a strong sufficient condition. We also have another approach. We can use one
of the Dini derivatives, which always exist, and then ask when a function that is a
Dini derivative can be integrated.
We are going to skip over 2(b) for the moment and go straight to 2(c). If we
have a function that can be differentiated almost everywhere, and that derivative
can be integrated, do we get back to the original function? If we do, then that says
that our original function could be represented as a definite integral. As we saw
in Theorem 7.12, that means that the original function must be continuous and of
bounded variation on [a, b]. We see that bounded variation is not only sufficient to
answer the first question affirmatively, it is necessary if we are to answer the last
question in the affirmative. What about continuity? If f is a continuous function
with bounded variation, then does it always follow that
=
f(x) f f'(t)dt?
The answer is no.
7.3 Absolute Continuity 225

Definition: Absolute continuity


A function f is absolutely continuous on [a, b] if given any E > 0, there is a
response so that given any finite collection of pairwise disjoint open intervals in
[a, b], {(ak, for which

<&

we have that

— <E.

To see why not, we consider the devil's staircase, DS(x), (Example 4.1 on p. 86).
The total variation of this function is 1,so it has bounded variation. It is a continuous
function that is constant on the open intervals that form the complement of SVC(3).
Since SVC(3), the Cantor ternary set, has measure 0, we have that

—DS(x) = 0, almost everywhere.


dx

The integral of any function that is 0 almost everywhere is a constant function, and
DS(x) is not constant. If we start with DS, differentiate, and then define

F(x)
= f DS'(t)dt,
then F(x) = 0 for all x E [0, 1].
We need something stronger than continuity to characterize functions that are
integrals. We need absolute continuity.
To see that the devil's staircase is not absolutely continuous, let us take E = 1/2.
Our function increases by 1/2 from x = 0 to x = 1/3, so our response must be
less than 1/3. But the increase of 1/2 also occurs in an increase of 1/4 over [0, 1/9]
and an increase of 1/4 over [2/9, 1/3]. Our must be less than 2/9. But these
increases actually occur over four intervals, each of length 1/27. The response is
less than 4/27. Continuing in this way, we see that for each positive integer n,

3n+1
>0.

There is no response.
226 The Fundamental Theorem of Calculus

A Little History
This property of definite integrals, absolute continuity, was first observed by Axel
Harnack in 1884. The name was coined by Vitali in 1905, but several mathemati-
cians were aware of it and using it by the 1890s, including Charles de la Vallée
Poussin, Camille Jordan, Otto Stolz, and E. H. Moore. As we shall prove later in this
section, if a function F can be defined as a definite integral, F(x) = f(t) dt (us-
ing either the Lebesgue or Riemann definition of the integral), then F is absolutely
continuous.
What about the other direction? If a function is absolutely continuous, does that
imply that it is an integral? The Riemann integral is intractable, but we can do
this for the Lebesgue integral. Because we are not limited to bounded functions, it
will take more work to verify that any function defined as a definite integral must
be absolutely continuous. But it will be possible to show that the implication also
runs in the opposite direction; every absolutely continuous function is a definite
integral. This result was observed by Lebesgue in 1904, but he gave no proof. The
first proof was published by Vitali in 1905, the same paper in which this property
received its name. The next two propositions will move us toward the theorem that
F can be written as

F(x) f(t)dt
= Ja
for some function f if and only if F is absolutely continuous.

Lebesgue Integral and Absolute Continuity


Proposition 7.13 (Lebesgue Integral Is Absolutely Continuous). If f is
Lebesgue integrable on [a, b], then F(x) = f(t) dt is absolutely continuous
on[a,b].

Since every Riemann integrable function is also Lebesgue integrable, it follows


that if f is Riemann integrable on [a, b], then F(x) = f(t) dt is absolutely
continuous on [a, b]. There are, however, other ways of defining the integral — see
Appendix A.2 — for which f(t) dt might not be absolutely continuous.
Proof The proof is almost identical to the first part of the proof of Theorem 7.12.
Given E > 0, we find a simple function such that
fb
7.3 Absolute Continuity 227

Since is simple, it is bounded, say kt,(x)I < B for all x E [a, b]. Choose the
response = E/2B. Let

S = bk),

where a <a1 <b1 <a2 <b2 <a3 <• <b and


— ak) =
E/2B. The set S is a union of finitely many pairwise disjoint intervals in [a, b]
for which the sum of the length is less than E/2B. This means that rn(S) < E/2B.
It follows that
P

f(t)dt
k=1 J k=1 ak S

— dt
+ f dt

+ f dt

:
What about the other direction? If F is absolutely continuous, can we find an
integrable function f for which

F(x)=J f(t)dt.
a

The natural candidate for f is F', but does F' exist? The next proposition guarantees
that it does, almost everywhere.

Proposition 7.14 (Absolute Continuity Bounded Variation). If F is abso-


lutely continuous on [a, b], then it has bounded variation on [a, b].

Proof. We let be the response to E = 1. Given any finite collection of pairwise


disjoint intervals, ((ak, bk))1, for which the sum of the lengths is less than we
have that

f(bk) — <1.

LetN = [(b — the smallest integer greater than or equal to (b — a)/& Let
P = (a = x0, xi, . .. , Xm = b) be any partition of [a, b] into intervals of length
rn > N. For 1 <j < N, choose 1(j) to be the largest integer such that
228 The Fundamental Theorem of Calculus

Definition: Lipschitz condition of order a


A function f defined on [a, b] is said to satisfy a Lipschitz condition of order a
if there is a constant M > 0 such that
f(x) — <M Ix — yla

forallx,y [a,b].

X1(J) <a + j& The interval from X1(j) to X1(J)+1 is one of the intervals in P, so it
has length less than Since

a+ <a + (I +
the interval from X1(J)+J to XI(j+l) also has length less than On each of these
intervals, the variation of F with respect to P is less than 1. Counting the initial
and final intervals, [a, X/(l)] and [X/(N_ 1)+1' b], there are at most 2N such intervals,
so the variation of F with respect to P is strictly less than 2N,

V(P,f)< 2N
2[b_a1
Since every partition has a refinement with intervals of length less than and since
refining a partition can only increase the variation, we see that the total variation is
bounded by 2[(b —
For the evaluation part of the fundamental theorem of calculus, if we start with
F, differentiate it, and then integrate, we wind up with an absolutely continuous
function. If we want any hope that we end with the same function with which we
started, then we need to have started with an absolutely continuous function. This
condition is necessary. As we shall see in the final section of this chapter, it is also
sufficient.

A Hierarchy of Functions
Absolute continuity implies bounded variation, but as the devil's staircase illus-
trates, bounded variation does not imply absolute continuity. With one more defi-
nition in place, we can describe a nice hierarchy of functions defined on a closed
bounded interval. In Section 8.1 we shall see how Lipschitz's condition arose and
how he used it.
A function is said to be C1 or continuously differentiable on [a, b] if it is
differentiable and its derivative is continuous on this interval. All of the following
statements hold for functions on a closed and bounded interval:
1. If a function is C1, then it has a bounded derivative.
7.3 Absolute Continuity 229
2. If a function is differentiable with a bounded derivative, then it satisfies a
Lipschitz condition of order 1.
3. If a function satisfies a Lipschitz condition of order 1, then it is absolutely
continuous.
4. If a function is absolutely continuous, then it has bounded variation.
5. If a function has bounded variation, then it is differentiable almost everywhere.

The proofs of the first three statements are left as exercises. All of these impli-
cations go only one way, a fact that is also left for the exercises.

Absolute Continuity and Monotonicity


To conclude this section, we observe that any absolutely continuous function is the
difference of two absolutely continuous and monotonically increasing functions, a
key observation that we shall use in the next section.

Proposition 7.15 (Absolute Continuity of Variation). The function f is abso-


lutely continuous on [a, b] if and only if it is equal to the difference of two absolutely
continuous, monotonically increasing functions.

Proof. One direction is easy. The difference of absolutely continuous functions is


absolutely continuous (see Exercise 7.3.13).
By Proposition 7.14, if f is absolutely continuous on [a, b], then it has bounded
variation. We need to prove that T(x) = V(f) is absolutely continuous on this in-
terval. Given E >0, let > 0 be the response to E/2: If the pairwise disjoint intervals
(ak, bk), 1 <k <n, have combined length less than then f(bk) —
is strictly less than E/2. We shall see that T(bk) — < E.
We observe that

mk

T(bk) — = = sup —
j=1

where the supremum is taken over all partitions, ak = XkO <Xk, 1 < <Xk,mk =
bk, of [ak, bk]. It follows that

n n mk

T(bk) — = sup — . (7.18)


k=1 k=1 j=1

Since the set of intervals (xk,j_1, xk,J), 1 < j <mk, 1 <k <n, is a finite collection
of pairwise disjoint intervals of total length less than each double sum on the
230 The Fundamental Theorem of Calculus

right side of equation (7.18) is less than E/2, and so the supremum of these sums
is strictly less than E.

Exercises
7.3.1. Find an example of a simple function 0 such that

<0.1.

7.3.2. Let f be defined by f(0) 0=, f(x) = x2 sin(l/x2) for x 0. Show that f
does not have bounded variation in any neighborhood of 0, but it is differentiable
at 0.

7.3.3. If a function is continuous and of bounded variation, does it necessarily


follow that it is absolutely continuous?
7.3.4. Let f be absolutely continuous on the interval [E, 1] for every E > 0 and
continuous on [0, 1]. Does it necessarily follow that f is absolutely continuous on
[0, 1]? If we add the restriction that f has bounded variation on [0, 11, does it now
follow that f is absolutely continuous on [0, 11?
7.3.5. In the definition of absolute continuity, we restricted ourselves to a finite
collection of pairwise disjoint open intervals. Show that this is equivalent to the
following definition: A function f is absolutely continuous on [a, b] if given any
E > 0, there is a response so that given any countable collection (finite or infinite)
of pairwise disjoint open intervals in [a, b], {(ak, bk)}, for which

>(bk—ak)< &
we have that
— <E.

7.3.6. Show that if a function is C1 on a closed and bounded interval, then it


has a bounded derivative on this interval. Give an example of a function that is
differentiable with a bounded derivative on a closed and bounded interval but which
is not C1 on this interval.
7.3.7. Show that if f is differentiable with bounded derivative on [a, b], then f
satisfies the Lipschitz condition of order 1. Show that if a differentiable function
f satisfies a Lipschitz condition of order 1 on [a, b], then the derivative of f is
bounded.
7.3.8. Give an example of a function that satisfies a Lipschitz condition of order
1 on some closed and bounded interval but is not differentiable at every interior
point of that interval.
7.4 Lebesgue's FTC 231

7.3.9.Consider the function f defined on [0, 1] as follows: If x e SVC(3), then


f(x) = 0. If x is in a removed interval of length with center at a/(2. 3n_1),
3h1)•
then f(x) = — a/(2• — 1/(2. This is a continuous function with a
graph that looks like a lot of v's. Show that this function is Lipschitz of order 1 but
that it is not differentiable at any point in SVC(3).
7.3.10. Using the idea from Exercise 7.3.9, show how to construct a bounded,
continuous function that is differentiable almost everywhere but does not have
bounded variation.
7.3.11. Give an example of a function that is differentiable at every point of [0, 1]
but does not satisfy the Lipschitz condition on this interval.
7.3.12. Show that if f satisfies a Lipschitz condition of order 1 on [a, b], then f
is absolutely continuous on [a, b].
7.3.13. Let f and g be absolutely continuous functions on [a, b], c E R. Show that
cf, f + g, and fg are absolutely continuous on [a, b].
7.3.14. For positive constants a and ,8, define by = 0 and =
xa for x 0. Prove that is absolutely continuous on [0, 11 if and
only ifa >
7.3.15. Let f be absolutely continuous on [a, b], let [c, d] be the range of f, and
let g satisfy a Lipschitz condition on [c, d]. Show that g o f, the composition of g
with f, is absolutely continuous on [a, b].
7.3.16. Define the function f by f(0) = 0 and f(x) = x2 for x > 0.
Let g(x) = x > 0. Show that f, g, and f o g are absolutely continuous on
[0, 1], but g o f is not absolutely continuous on [0, 1].
7.3.17. Show that if f and g are absolutely continuous and g is monotonically
increasing, then f o g is absolutely continuous.
7.3.18. Let f be an absolutely continuous function on [0, 1] and S c [0, a subset
of measure zero. Show that f(S) has measure zero.
7.3.19. Show that any absolutely continuous function maps measurable sets to
measurable sets.

7.4 Lebesgue's FTC


Finally, we are prepared to prove the fundamental theorem of calculus. As we
have seen, for the antiderivative part, we must start with an integrable function, a
measurable function f for which the Lebesgue integrals of both f+ and are
finite. For the evaluation part, we must start with an absolutely continuous function.
It turns out that these restrictions are not only necessary, they are sufficient.
232 The Fundamental Theorem of Calculus

Theorem 7.16 (FTC, Antiderivative). 1ff is integrable on [a, b], then

F(x) f(t)dt
= Ja
is differentiable almost everywhere, and F'(x) = f(x) almost everywhere.

Proof We saw in Propositions 7.13 and 7.14 that F has bounded variation and
thus is differentiable almost everywhere. We shall show that

F'(x) < f(x), almost everywhere.

This will complete the proof because if f is integrable, then so is — f. The definite
integral of —f is —F, and the derivative of —F is —F'. Thus once we have proven
that F' f for every integrable f, it also follows that
—F'(x) < —f(x), almost everywhere.

We get the second inequality for free. But we pay dearly for the first inequality.
The proof is very reminiscent of the proof of Theorem 7.4. Let S be the subset
of [a, b] on which F is differentiable and begin by considering the set

E Q. The set of x for which f(x) < F'(x) is the union over all pairs
p <q of This is a countable union. If we can show that = 0, then it
follows that F'(x) f(x) almost everywhere.
For x E f(x) <p, and therefore

f(t)dt (7.19)

On the other hand, if F' exists at x E then


f(t)dt F(y) — F(x)
lim
y—x
=lim
y—x
=F(x)>q.
If were a closed interval, then we could turn this inequality (see Exercise 7.4.2)
into the statement

f(t)dt

Together with equation (7.19), this would imply that = 0. Of course,


may not be a closed interval, but we can use the rising sun lemma to approximate
by intervals. This will be our approach.
7.4 Lebesgue's FTC 233

Given any E > 0, Corollary 6.18 guarantees a response so that for any measur-
able set A C [a, b] for which m(A) < we have that

I f(x)dx <E. (7.20)


JA

We first find an open set U that contains and such that

m (U —

We can find such a set U because is measurable. As an open set, U is a countable


union of pairwise disjoint open intervals,

U= bk).

Let

Uk = E (ak, bk)

E Uk, then we can find a z > x such that


F(z) — F(x)
> q,
z—x
F(z) — qz> F(x) — qx.
This tells us that x is a shadow point for the function defined by F(x) — qx. By
the rising sum lemma (Lemma 7.5), Uk is contained within a countable union of
pairwise disjoint open intervals ((akj, each contained in (ak, bk), with

F(/3k,J) — q,8k,j ? F(ak,J) — qak,].


It follows that
f
q&Jk,j — F(,8kJ) — F(ak,J) f(t)dt. (7.21)
= Jak,J
Let
0000
T = U U(ak,j,

k, then the intervals (ak,J, t3k,J) are pairwise disjoint. Since (ak,J, ,8k,j) c
(ak, bk) and the intervals (ak, bk) are pairwise disjoint, all of the intervals over all
pairs k, j are pairwise disjoint. We also have that

C T C U, and therefore m(T — <m(U — <&


234 The Fundamental Theorem of Calculus

We are now ready to put the pieces back together, using equations (7.19)—(7.21):
00 00

q q —
k=1 j=1

f(t)dt
k=1 j=1

= I f(t)dt
JT

= I f(t)dt + I f(t)dt

Since this holds for every E > 0, we have that

q is nonnegative, this can happen only if

Before proving the evaluation part of the fundamental theorem of calculus, we


need to give a precise statement and proof of the result that if f' = 0, then f is
constant.

Lemma 7.17 (Zero Derivative Constant). 1ff is absolutely continuous and


monotonically increasing and if f'(x) = 0 almost everywhere on [a, b], then f is
constant.

Note that without absolute continuity, f could be the devil's staircase, in which
case the conclusion to this lemma would be false.

Proof Let E = {x E [a, b] f'(x) = 0!. Since f' = almost everywhere, we


0
know that m(E) = b — a. Since f is monotonically increasing, we know that
f(E), the image of E, is contained in [f(a), f(b)].
Let Z = (a, b) — E, m(Z) = 0. We shall use absolute continuity to show that
f(Z) has measure 0. We begin by choosing any E > 0. By absolute continuity, we
can find a response > 0 such that given any finite collection of pairwise disjoint
open intervals, the sum of whose lengths is less than the sum of the images of
those intervals will be less than E.
7.4 Lebesgue's FTC 235
Since Z c (a, b) and it has measure zero, we can find a countable collection
of pairwise disjoint open intervals ((ak, whose union contains Z and for
which

m(Z)< —ak) <6.

If we take any finite collection of these, bk), the sum of these lengths is
less than 6, SO f maps

U (f(ak), f(bk)),
a set of measure less than E. Since the intervals (f(ak), f(bk)) are pairwise disjoint,

(f(bk) — f(ak)) <E.

Since this is true for all n, we see that

m(f(Z)) < (f(bk) - f(ak)) <E.


Since E is an arbitrary positive constant, we can conclude that m (f(Z)) = 0.
So far, so good. We want to prove that f(a) = f(b). We know that

0 < f(b) — f(a) = m (f(E) U f(Z)) <m (f(E)) + m (f(Z)) = m (f(E)).


This lemma comes down to proving that m (f(E)) = 0.
We again choose any E > 0. For each x e E, x <b, select a z > x such that
f(z) — f(x)
< E.
z—x
In what by now should look like a very familiar move, we rewrite this inequality as

EZ-f(Z)>EX—f(X).
The set E is contained in the set of shadow points of the function defined by
EX — f(x). Let ((ak, be a collection of pairwise disjoint open intervals
whose union contains E and for which

E ak — f(ak) <E 13k — f(13k),


f(13k) — f(ak) <E(f3k — ak).
236 The Fundamental Theorem of Calculus

Since f is monotonically increasing, f(E) is contained in fC8k)),


a union of pairwise disjoint open intervals. We have shown that

00 00

m (f(E)) (f(fh) — f(ak)) <E — ak) <E(b — a).

This is true for all E > 0; therefore, m (f(E)) = 0 and f(a) = f(b). Since f is
monotonically increasing, it is constant on [a, b]. LI

We are now prepared to state and prove the second half of the fundamental theo-
rem of calculus. As we have seen, absolute continuity is not just sufficient; it is also
necessary.

Theorem 7.18 (FTC, Evaluation). 1ff is absolutely continuous on [a, b], then it
is differentiable almost everywhere, f'is integrable on [a, b], and

pb
f'(t)dt = f(b) — f(a). (7.22)
Ja
Proof We have already shown that f has bounded variation on [a, bi and therefore
is differentiable almost everywhere. We can extend f' however we wish so that it
is defined on all of [a, b]. In particular, we can use any of the Dini derivatives in
place of f'. Changing the value of the integrand on a set of measure zero does not
affect the value of the Lebesgue integral. We need to prove that f'is integrable and
to establish equation (7.22).
As shown in Proposition 7.15, if f is absolutely continuous, then it is the
difference of two absolutely continuous and monotonically increasing functions. It
is enough to prove our theorem with the added assumption that f is monotonically
increasing.
To prove that f'is integrable, we define a sequence of functions by

f(x + 1/n) — f(x)


=n(f(x+1/n)—f(x)),
1/n

where we extend f to the right of x = b by defining f(x) = f(b) forx > b. Each
is nonnegative, and converges to f' almost everywhere. By Fatou's lemma
7.4 Lebesgue's FTC 237

(Theorem 6.20), if the integral of over [a, bi has a bound independent of n, then
f' is integrable. The bound on the integral of follows from the monotonicity
of f,
pb pb pb
fn(X)dXflJ f(x+1/n)dx_nJ f(x)dx
Ja a a

=n / f(x)dx—n / f(x)dx
Ja
Ja+1/n
b+1/n a+1/n
=n f(x)dx—n f(x)dx
b a

i .f(b+ 1/n)—n. .f(a) = f(b)—f(a).


We can replace a by any lower limit x e [a, bi and b by any upper limit y e [a, bi.
We see that for a <x <y <b,
f'(t)dt <f(y) — f(x). (7.23)
Iy
x

We now define g by

g(x) = f(x) f f'(t)dt.


We want to apply Lemma 7.17 to the function g. From its definition, g(a) = f(a) —
0 = f(a). If we can show that g is constant on [a, bi, then that constant is f(a) and
equation (7.22) is proven. We only need to show that g is absolutely continuous
and monotonically increasing, and that its derivative is 0 almost everywhere.
Since g is a difference of two absolutely continuous functions, it is absolutely
continuous. By equation (7.23), x <y implies that

= f(y) - f(x)
g(y) — g(x)
- f f'(t)dt >0,
so g is monotonically increasing. By the antiderivative part of the fundamental
theorem of calculus, Theorem 7.16,
d
f'(t) dt = f'(x), almost everywhere,
dx a f
andtherefore g' = 0 almost everywhere. By Lemma 7.17, g is the constant function
equal to f(a). For all x e [a, bi,

f(x) f f'(t)dt = f(a), LX f'(t)dt = f(x) — (a).
We have now answered four of our five original questions. We have found the
right way to define integration. In Lebesgue's dominated convergence theorem we
238 The Fundamental Theorem of Calculus

have found a condition which, though not necessary, is a strong and useful sufficient
condition that allows term-by-term integration of a series. We have learned that the
connection between continuity and differentiability is stronger than we might have
expected. And in this section, we have explained the exact relationship between
integration and differentiation.
That still leaves one question, our very first question, the question that started us
asking all of these other questions,
When does a function have a Fourier series expansion that converges to that function?

We now have the tools to make serious progress. One of the most surprising
insights of the early twentieth century was that this is not quite the right way to
pose the problem. As we shall see in the next chapter, there is a better, more useful
question that will have a very elegant answer.

Exercises
7.4.1. Give an example of a function, f, integrable on [0, 11, such that for F(x) =
f(t)dt, there is a c e (0, 1) such that F is differentiable at c but F'(c) f(c).
7.4.2. Show that if

lim
y—x
>q
for all x e [a, bi, then we can find a 6 > 0 so that x, y e [a, bi and 0 < y—x <6
implies that

f(t)dt > q(y — x).


Iyx
It follows that

J
f a bounded derivative on [a, bi. Show that for
allx e [a,bI,
f'(t)dt = f(x) -
f
7.4.4. Let f be integrable on [a, bi with
px
f(t)dt=0
Ja
for all x e [a, bi. Using Proposition 6.13 but not using Theorem 7.16, show that
f = 0 almost everywhere.
7.4 Lebesgue's FTC 239
7.4.5. Use the result of Exercise 7.4.4 and the evaluation part of the fundamental
theorem of calculus, Theorem 7.18, to prove the antiderivative part of the funda-
mental theorem of calculus, Theorem 7.16.
7.4.6. Let f and g be absolutely continuous on [a, bi with f' = g' almost every-
where. Show that f = g + c for some constant c.
7.4.7. Show that if f is absolutely continuous on [a, bi, then

= f dx,

where is the total variation of f on [a, bi. Show that this is not necessarily
true if f is not absolutely continuous.
7.4.8. A monotonic function f defined on [a, bi is said to be singular if f' = 0
almost everywhere. Show that any monotonically increasing function is the sum of
an absolutely continuous function and a singular function.
7.4.9. Let g be a strictly increasing, absolutely continuous function on [a, bi with
g(a) = c, g(b) = d.
1. Show that for any measurable set S c [a, bi,

m (g(S)) = [g'(x)dx.
is
2. Show that if A = {x e [a, bi g'(x) 0), and B is any subset of [c, dl of
measure zero, then
m(Ang'(B)) =0.
3. Show that if A is the set defined in part 2 and C is any measurable subset of
[c, dl, then
p
m(C)=J g'(x)dx=J
Aflg'(C) a

7.4.10. [Change of Variable] Prove the change of variable formula for Lebesgue
integrals: If g is strictly increasing and absolutely continuous on [a, bi with g(a) =
c and g(b) = d and if f is integrable on [c, dl, then
d b

f f(t)dt=f f(g(x))g'(x)dx. (7.24)

7.4.11. Let f and g be integrable on [a, b] and define


cx
F(x)=a+J f(t)dt, G(x)=f3+j g(t)dt.
a a
240 The Fundamental Theorem of Calculus

Prove that
fb fb
G(t)f(t)dt g(t)F(t)dt = F(b)G(b) — F(a)G(a). (7.25)
Ja + Ja
7.4.12. [Integration by Parts] Prove the formula for integration by parts for
Lebesgue integrals and absolutely continuous functions: If f and g are absolutely
continuous on [a, bi, then
b b

f f(t)g'(t)dt + f f'(t)g(t)dt = f(b)g(b) — f(a)g(a). (7.26)

7.4.13. Let f be integrable on [a, bi. We say that c e (a, b) is a Lebesgue point
if +oo and
1
çc+h
lim
h—*O
-h / f(t) — dt =0.
Show that if c is a Lebesgue point for f, then F(x) = f(t) dt is differentiable
atcandF'(c)= f(c).
7.4.14. Show that if f is integrable on [a, bi, then each point of continuity of f is
a Lebesgue point for f.
7.4.15. Show that if f is integrable on [a, b], then almost every point of [a, bi (all
but a set of measure zero) is a Lebesgue point for f.
7.4.16. Let f, not necessarily a measurable function, be defined on [a, bi. For
each x0 e [a, bi and h, E > 0, let S(xo, h, E) be the set of points x e [x0 — h, x0 +
hi fl [a, bi for which f(x) — E. We say that xo e [a, fri is a point of
approximate continuity of f if for each E > 0,
me (S(xo, h, E))
lim =0.
h—*O 2h
Show that any point of continuity is also a point of approximate continuity. Give
an example of a function for which there is a point of approximate continuity that
is not a point of continuity. Justify your example.
7.4.17. Prove that if f is measurable on [a, bi, then almost all points of [a, b] (all
but a set of measure zero) are points of approximate continuity.
8
Fourier Series

The development of measure theory and Lebesgue integration did not come about
because mathematicians decided they needed a new definition of the integral. It
happened because they were trying to develop and use tools of analysis to solve real
and practical problems. These included solutions to partial differential equations,
extensions of calculus to higher dimensions and to complex-valued functions, and
generalizations of the concepts of area and volume. Fourier series were not unique
in motivating work in analysis, but they constitute a very useful lens through which
to view the development of analysis because these series often were the principal
source of the questions that would prove most troublesome and insightful. As
progress was made in our understanding of analysis, these insights often translated
directly into answers about Fourier series.
This is true especially of Lebesgue's work on the integral. In 1905, armed with
the power of his new integral, he gave a definitive answer to the question of when
the Fourier series of a function converges pointwise to that function. We shall see
his answer in this first section.
The story does not stop there. Once we are using the Lebesgue integral, we can
change the values of the function on any set of measure zero without changing the
value of the integral. Therefore, two functions that are equal almost everywhere
will have the same Fourier coefficients, and so the same Fourier series. The best
we can hope for from a theorem with the weak assumption that f is integrable is
that the Fourier series of f converges to f almost everywhere. In fact, this is not
quite true, though we can come close to it by either strengthening the assumption
just a little (Theorem 8.9) or slightly weakening the conclusion (Theorem 8.2). If
we want the Fourier series to converge to f at every point, we shall need to be quite
restrictive about the kind of function with which we start.
If we are content with convergence almost everywhere, then we really need
to think of equivalence classes of functions where f g if f = g almost
everywhere, or, equivalently, f f(x) — dx = 0. This integral defines a

241
242 Fourier Series

natural distance between equivalence classes of integrable functions on [a, bi,


D(f, g) = f f(x) — dx. Suddenly, we are looking at a geometric space in
which each point is an equivalence class of functions. In fact, this is a vector space
equipped with a definition of distance. We can think of the partial sums of the
Fourier series of f as points in this space and ask if these points converge to the
point represented by f. Is every point in this space the limit of the partial sums
of its Fourier series? Is there a unique trigonometric series that converges to our
equivalence class? As we shall see, with some simple assumptions the answers to
these questions are "yes" and "yes."
It is very important to realize that these clean "yes's" do not answer the original
question about pointwise convergence. Real progress in mathematics often consists
of realizing when you are asking the wrong question. Asking the right question
in this case opened an entire world of new and powerful mathematics, functional
analysis.

8.1 Pointwise Convergence


Because the Fourier coefficients are given in terms of integrals of the original
function, we cannot even speak of a Fourier series unless the function is integrable.
When Dirichlet introduced the characteristic function of the rationals, he intended
it as an illustration that not all functions are integrable. While this characteristic
function is integrable in the Lebesgue sense, not all functions are. The question is
"how much more than integrability is needed in order to guarantee that the resulting
Fourier series converges to the original function, either at every point or almost
everewhere?"
In 1829, Peter Gustav Lejeune Dirichlet (1805—1859) gave us the first proof
of sufficient conditions. For simplicity, we assume that we are only considering
functions on the interval 7t1.

Dirichiet's Conditions. The following conditions collectively imply that the


Fourier series of f converges pointwise to f on (—7r, Jr):

1. f is integrable in the sense of Cauchy and Riemann (and thus bounded),


2. f(x) = f(y) + f(y)),
3. f is piecewise monotonic on [—7r, 7r1, and
4. f is piecewise continuous on [—7r, 7r1.

Actually, Dirichlet's proof allows for an infinite number of points of discontinu-


ity, provided the set of points of discontinuity is nowhere dense. Dirichlet believed
that monotonicity was not necessary. While the requirement of piecewise mono-
tonicity can be weakened, Paul du Bois-Reymond would show in 1876 that there
8.1 Pointwise Convergence 243

Definition: Absolutely integrable


A function f is absolutely integrable on [a, bJ if it is Riemann integrable on this
interval or if the improper Riemann integrals of f and I fI exist and are finite.

are functions that satisfy conditions 1, 2, and 4 but for which the Fourier series
does not even converge at all values of x e (—7r, rr). See Exercises 8.1.10—8.1.15
for Fejér's example of a continuous function whose Fourier series fails to converge
at any rational multiple of 7T.
Riemann would show that the function f does not need to be bounded. It is
enough that f is absolutely integrable.
Assumption 2 is essential. To say that the Fourier series for f converges to f(x)
at a given value of x is to say that

n_*00\(f(x)
lim — — cos(kx) + bk = 0,
2
/
where
1
f7t
ak = —I f(x)cos(kx)dx,
7T

1
f7t
bk = —I f(x)sin(kx)dx.
7T

We substitute these integrals, interchange the finite summation and the integra-
tion, use the sum of angles formula, and employ the trigonometric identity'
1 sin[(2n + 1)u/21
— + cos u + cos 2u + + cos nu = (8.1)
2 2sin[u/21
to rewrite the limit as
7 1 sin[(2n + 1)(t — x)/21
lim I f(x) — — I f(t)dt = 0.
7T j
With a change of variable and the continuation of f outside (—7r, 7r) by assuming
f(x + 27r) = f(x), we can rewrite this as

lim /(f(x)—— I
1 sin[(2n + 1)uI
[f(x—2u)+f(x+2u)I duJ =0.
fl-*OO\ sinu j

This process leading to the derivation of equation (8.3) is done at a more leisurely pace in Section 6.1
of A Radical Approach to Real Analysis. It includes a proof of the assertion that equation (8.3) implies
equation (8.4).
244 Fourier Series

We use the integral identity


sin[(2n + 1)uI
du = (8.2)
J0 sinu 2
to rewrite f(x) as
1 sin[(2n + 1)uI
— I 2f(x)du.
7T J0 sinu
We have reduced the problem of proving convergence of the Fourier series to f to
proving that
[2172
sin[(2n + 1)uI
lim I . [f(x—2u)+f(x+2u)—2f(x)] du=0. (8.3)
n—±ooJ0 sinu
A necessary condition for this to be true is that
lim[f(x—2u)+f(x+2u)—2f(x)]=0.
u—*O
(8.4)

This is equivalent to condition 2:

f(x) = lim f(y) + lim f(y)


2 \y—±x
If we define
= f(x — 2u) + f(x + 2u) — 2f(x),
we can replace condition 2 with the requirement that
then is continuous at u = 0.
How much more do we need?
Rudolph Otto Sigismund Lipschitz (1832—1903) was the first mathematician
after Dirichlet to make significant progress. In 1864, he published his doctoral
thesis on representation by Fourier series. He showed that beyond the two necessary
conditions of absolute integrability and the continuity of & at u = 0 for all x, one
more condition on would be enough to guarantee pointwise convergence. In
fact, that additional condition implies the continuity of
Lipschitz's Conditions. The following conditions imply that the Fourier series of
f converges pointwise to f on (—7r, 7r):
1. f is absolutely integrable.
2. For each x there exist positive constants A, E, and a such that for any u, t e
NE(x),

— <Alt — (8.5)

Any function that satisfies inequality (8.5) is said to satisfy a Lipschitz condition
of order a. It implies continuity at u = 0. Notice that it is not strong enough to
imply differentiability at u = 0 unless a > 1.
8.1 Pointwise Convergence 245
The next advances were made by Ulisse Dini. In 1872, he showed that the bound
Alt — ula ontherightsideofinequality(8.5)couldbereplacedbyA/ log It —
In 1880, he found a single condition that implies pointwise convergence of the
Fourier series of f to f.
Dini's Condition. The following condition implies that the Fourier series of f
converges pointwise to f on (—7r, 7r):

1. &(u)/u is absolutely integrable over [0, 7t].


Dini's condition is simple to state, but not always simple to check. The following
year, Camille Jordan published his criteria. Note that bounded variation implies
Riemann integrability (see Exercise 8.1.2).
Jordan's Conditions. The following conditions imply that the Fourier series of f
converges pointwise to f on (—7r, 7r):

1. f has bounded variation on (—7r, 7r).


2. & is continuous at u = 0.

Cesàro Convergence
In 1890, the Italian Ernesto Cesàro (1859—1906) broadened the definition of con-
vergence.
For example, the series 1 — 1 + 1 — 1 + 1 — corresponds to the sequence of
partial sums (1, 0, 1, 0, 1, 0, .). This sequence does not converge. But if we take
. .

the sum of the first n terms of this sequence and divide it by n, we get
(n+1)/2 1 1 n/2 1
forn odd: = — + —, forn even: = —.
n 2 2n n 2
The limit of this average value does exist. It equals 1/2. We say that the Cesàro
limitofi —1 + 1—1 + 1—... is 1/2.
Cesàro limits are particularly useful for Fourier series. Consider the Fourier
cosine series expansion of the constant function 7r/4, valid for —7r/2 <x
1 1 1
f(x) = cos(x) — cos(3x) + cos(5x) — cos(7x) +...

Definition: Cesàro limit


The sequence is said to have Cesàro limit A if

=A.

lim
n
246 Fourier Series

The derivative of f is 0 for all x e (—7r/2, 7r/2), but if we try to differentiate term
by term, we get a series that does not converge except at x = 0:
— sin(x) + sin(3x) — sin(5x) + sin(7x)

Now consider the Cesàro sum of this series. We first need to find the kth partial
sum (see Exercise 8.1.5)
(— 1 )k sin(2kx)
— sin(x) + sin(3x) — sin(5x) + + (—1)ksin ((2k — 1)x) =
2 cos(x)
(8.6)
We now compute the average of the first n partial sums (see Exercise 8.1.6)
11— sin(2x) sin(4x) — sin(6x) sin(2nx)
+ + + (—
n 2cosx 2cosx 2cosx 2cosx
— (tanx)(—1 + cos(2nx)) + (—1y2 sin(2nx)
(8 7)
— 4ncosx
We fix x e (—7r/2, 7r/2) and take the limit n —* oc. Since the numerator stays
bounded as the denominator approaches oc, the Cesàro limit of the series obtained
by term-by-term differentiation is 0, regardless of the value of x.
What if the limit of a sequence exists? Is the Cesàro limit the same? Fortunately,
the answer is "yes."

Proposition 8.1 (Limit Cesàro Limit). If —* A, then the Cesàro limit of


(an) exists and also equals A.

Proof Given any E > 0, we can find an N such that n N implies that I

E. We take the average of the first n terms, n > N, and compare it to A,

(a1 +•.•+aN)—NA aN+1A

(a1

(al+•..+aN)—NAI E

n
+(n-N)-n
(al+•••+aN)—NAI
+E.
n
For n sufficiently large,
(al+•••+aN)—NA
< E,
n
8.1 Pointwise Convergence 247

and therefore

—A <2E
n

for all n sufficiently large. Since this is true for every E > 0, the Cesàro limit
isA. LI

In 1900, the Hungarian mathematician Lipót Fejér (1880—1959) proved that if


f is continuous, then the Fourier series of f converges at least in the Cesàro sense
pointwise to f. For most of his career, Fejér taught at the University of Budapest.
Among his doctoral students are Paul Erdôs, George Pólya, Gabor Szegó, and John
von Neumann.
In 1905, Lebesgue used his insights into integration to prove the following
theorem.

Theorem 8.2 (Lebesgue on Fourier). 1ff is integrable (in the Lebesgue sense) on
the interval [—7r, yr], then the Fourier series off converges to f almost everywhere,
at least in the Cesàro sense of convergence.

In some sense, we could not possibly ask for a better result. The only assumption
we need to make about f is that it is integrable, an assumption needed before we
can even define the coefficients of the Fourier series. On the other hand, the
conclusion is weaker than we might have wished: almost everywhere instead of for
all x e (—7r, 7r), convergence in the Cesàro sense rather than strict convergence.
Yet, as mathematicians were beginning to realize, asking for certain properties to
hold for all x introduces unnecessary complications. For many purposes, it makes
sense to consider two functions to be equivalent if they agree almost everywhere.
If we work with equivalence classes and f is integrable, then the Cesáro limit of
its Fourier series exists and is equivalent to f. If the Fourier series of f converges
in the usual sense, then it converges to a function that is equivalent to f.
Allowing for Cesàro convergence does introduce its own complications. The
series

sinx — sin(3x) + sin(5x) —

Cesàro converges to the constant function 0. We have lost the uniqueness of the
representation by a trigonometric series. That is a high price to pay. When it is
worth paying depends on how we want to use the trigonometric representation.
Sometimes existence is more important than uniqueness; sometimes it is not.
248 Fourier Series

Exercises
8.1.1. Show that if
sin[(2n + 1)u]
tim [f(x — 2u) + f(x + 2u) — 2f(x)] du = 0,
0 SlflU

where the integral is the Riemann integral, then

lim[f(x —2u)+f(x+2u)—2f(x)] =0.


8.1.2. Show that if f has bounded variation on [a, hi, then it is Riemann integrable
on this interval.
8.1.3. Consider the function f defined by f(0) = 0, f(x) = 1/ ln (Ixl/27r) for
x e [—7r, 7r1 — {0}. Show that this function satisfies Jordan's conditions but does
not satisfy Lipschitz's conditions.
8.1.4. Consider the function g defined by g(0) = 0, g(x) = x cos (7r/(2x)) for
x e [—7r, 7t] — {0}. Show that this function satisfies Lipschitz's conditions at x = 0
but does not satisfy Jordan's conditions in any neighborhood of x = 0.
8.1.5. Use geometric series to find the sum of

+ — + ... +
Set y = ix and use the fact that the imaginary part of eix is i sin x to prove
equation (8.6),
(— 1 )k sin(2kx)
sin(x) + sin(3x) — sin(5x) + + (—1)ksin ((2k — 1 )x) =


2 cos(x)

8.1.6. Using the approach of Exercise 8.1.5, prove equation (8.7).


8.1.7. Let (an)21 be a sequence of real numbers and define

= — (ai + a2 +••• + ar).


Show that

fl —+00 fl —+00 00 fl —*00

8.1.8. Cesàro convergence is also known as (C, 1)-convergence. If the sequence


(see Exercise 8.1.7) does not converge, but it does converge in the Cesàro
sense, then we say that the original sequence has (C, 2)-convergence. In
general, for k> 1 define

= + + + ar).
8.1 Pointwise Convergence 249

If the sequence (ar does not converge but does converge, then we
say that the original sequence has (C, k)-convergence. Find examples of sequences
with (C, k) convergence for each k, 2 <k <4.

8.1.9. We can use the symbol


C
xn —* xo

to mean that x0 is the Cesàro limit of (xv). We say that a function f is Cesàro
continuous at x = x0 if

x0 implies f(xo).
Note that we have weakened the conclusion, but we have also weakened the
hypothesis. Is every continuous function also Cesàro continuous? Is every Cesàro
continuous function also continuous? Is f(x) = x2 Cesàro continuous? Is it Cesàro
continuous for any values of x?

Exercises 8.1.10—8.1.15 develop Fejér's example of a continuous function whose


Fourier series does not converge at any point of a dense subset of [—7r, 7r].

8.1.10. Show that the Fourier sine series for the constant function 7r/4 on (0, 27r)
is given by
00.1
sin 1

0 <x (8.8)
4= 2k —

8.1.11. Define the function by

cos ((r + 1)x) cos ((r + 2)x) cos ((r + n)x)


2n—1
+
2n—3
+...+ 1

— cos ((r + n + 1)x) — cos ((r + n + 2)x)


1 3
cos((r+2n)x)
2n—1
Show that
n .1 1

sin ((r + n + 1/2) (8.9)


2k

and therefore there is a bound B, independent of n, r, or x, such that


< B.
250 Fourier Series
8.1.12. Let (A') be an increasing sequence of positive integers. Define the
function f by

2A1 + 2A2 + + x)
f(x) = 0, x)+ (8.10)

Show that this series converges absolutely and uniformly, regardless of the choice
of sequence (A,). Therefore, f is continuous on R.

8.1.13. Using the uniqueness of the Fourier series expansion of a continuous func-
tion, show that the Fourier series for f on [—7r, 7t] is given by

f(x) = cos(nx),

where

=
m and k, the unique positive integers that satisfy

fl = 2Ai + 2A2 + + 2Am_i + k, 1 <k <2Am.


8.1.14. Show that the sum of the first 2A1 + + 2Ami + Am terms of the Fourier
series for f at x = 0 equals
2A1++2Am_i+Arn
cos(n .0) = (i + + + +
— 3 —

and this is asymptotically equal to


ln(Am)
2m2

as m approaches oc. Show that if Am = mm2, then the Fourier series does not
converge at x = 0. Explain the difference between the series in equation (8.10) that
is used to define f and the Fourier series for f.

8.1.15. Show that if we define g by

2A1 + 2A2 + .. + n! x)
n=1

then g is continuous and the Fourier series for g does not converge at any point of
the formk7r/n,k e Z,n eN.
8.2 Metric Spaces 251

Uniform convergence

Pointwise convergence

L2 convergence
Almost uniform Convergence almost
convergence everywhere

L' convergence

Cesaro convergence
Convergence in measure almost everywhere

Figure 8.1. Types of convergence on [a, b].

8.2 Metric Spaces


We have seen several types of convergence for a sequence of functions on a
closed and bounded interval, [a, b]. We shall see several more in this section (see
Figure 8.1). Uniform convergence is the strictest. It implies pointwise convergence.
This, in turn, implies convergence almost everywhere, which we have seen is
equivalent to almost uniform convergence. Convergence almost everywhere implies
convergence in measure.
In the early twentieth century, mathematicians began to think of functions as
objects with various means of measuring the distance between them. This in turn
would lead to new ways of defining convergence, and an ultimately satisfying
answer to the question of when a function has a representation as a trigonometric
series that converges to that function.
In a remarkable feat of anticipation that came too early to be successful, Axel
Harnack in 1882 came up with an original approach to the question of the con-
vergence of Fourier series. He observed that if (ao, a1, a2, .. , b1, b2, ...) are the
.

Fourier coefficients of a function f whose square is integrable on [—7r, 7t], then

+ + f2(x)dx
<f
for all N 1. It follows that if we let be the partial sum of the Fourier series,

Sn(X) = + (ak cos(kx) + bk sin(kx)),


252 Fourier Series

then

(Sn(X)Sm(X))2dX= m<n,
k=m+1

which converges to 0 as m approaches infinity.


Although Harnack never explicitly expressed it as such, integrating the square of
the difference of these two partial sums gives us a kind of distance between these
functions. From the fact that we can force the distance to be as small as we wish
by going out sufficiently far along the sequence, it can be shown that this sequence
has a limit g in the sense that

lim f (g(x) — dx = 0.
—7T

Initially, Hamack claimed that f = g "in general," that is to say at all but an isolated
set of points (a set of points with empty derived set). Later that year, he realized
that he was wrong.
That same year, George Halphen found a function f for which the integral of
(Sn(x) — Sm(X))2 converges to 0— where is the partial sum of the trigonometric
series formed from the Fourier coefficients of f — but the sequence Sn fails to
converge at any but a single point. In other words, if we define a distance between
two function as

D(f, g) (f(x) — g(x))2 dx,


=
and if f2 is integrable, then the partial sums of the Fourier series form a Cauchy
sequence under this notion of distance, and they converge, relative to this notion of
distance, to a well-defined function. But, as Halphen showed, we can find a function
(necessarily one for which the square is not integrable) for which the sequence of
partial sums of its Fourier series is still a Cauchy sequence, relative to distance D,
but the sequence of partial sums converges pointwise at only one point.
The truth, as we shall see, is that if we start with a function whose square is
integrable, then the limit function g exists and equals f almost everywhere. The
very statement of the result requires Lebesgue measure. The Lebesgue integral is
critical to its proof.
Maurice Fréchet (1878—1973), the next player in our story, had the great fortune
to be taught his high school mathematics by Jacques Hadamard at the Lycée Buf-
fon in Paris. Even after Hadamard went on to a university position in Bordeaux,
they continued to correspond, and after Hadamard returned to Paris, he became
Fréchet's doctoral adviser. Fréchet served during the First World War as an inter-
preter attached to the British army. After teaching at the University of Strasbourg
for many years, he eventually become professor of analysis and mechanics at the
8.2 Metric Spaces 253

École Normale Supérieure. In addition to his work in analysis, he is noted for his
contributions to probability and statistics.
Real progress toward our modern understanding of convergence came in 1906,
with the publication of Fréchet's doctoral dissertation, Sur quelques points du
calculfonctionnel (On several aspects of functional calculus). It made use of and
demonstrated the power of thinking of the set of continuous functions on a closed
and bounded interval as points. The distance between two continuous functions
was defined as the maximum of the absolute value of their difference.
Frigyes Riesz read Fréchet's thesis with great interest, and in that same year of
1906, showed how Fréchet's use of distance between functions could be used to
prove a result of Erhard Schmidt on orthogonal systems of functions, a concept
that David Hilbert had devised for solving integral equations, about which there
will be more to say in Section 8.4. For now, suffice it to say that Riesz, who
was familiar with Harnack's attempts in 1882, recognized that with the Lebesgue
integral he could attain Harnack's goal and develop a very powerful tool for analysis
in the process. For functions f and g whose squares are Lebesgue integrable
over the interval [a, hI, he defined the distance between these functions to be
— g(x))2dx)'/2. Riesz's proof of the convergence result for Fourier series
was published in 1907. Ernst Fischer (1874—1954) discovered the same result in
the same year.
In 1910, Riesz published his groundbreaking generalization, Untersuchungen
über Systeme integrierbarer Funktionen (Analysis of a system of integrable func-
tions), extending his analysis to the general space of functions f for which If
is integrable over [a, hi, the U spaces, p 1. The term metric space would not
come until 1914 when Felix Hausdorif laid the foundations for topology in the
seminal work Grundzüge der Mengenlehre (A basic course in set theory).

Spaces
The set of vectors in R'1 has a lot of structure. It is closed under addition and under
multiplication by any scalar:
(a1, a2, . .. , + (b1, b2, . . . , = (a1 + b1, a2 + b2, . , + ba),

c(a1, a2, . . . , = (cai, Ca2, . . ., car).

The set of all functions defined on [a, hI also is closed under addition and scalar
multiplication. Both sets have zero elements and additive inverses and satisfy the
basic properties of addition and scalar multiplication.
Given any vector in we can define its length or norm by
254 Fourier Series

This is simply the square root of the dot product of the vector with itself. If 61 is the
angle between vectors and b, then

•b= Ilbil cos6l, =


The distance between two vectors is the norm of their difference, — bil, which
satisfies the following equation, also known as the law of cosines,

= • +bb— b
= + — lIbII cos9. (8.11)

The dot product is basic to working with vectors in What makes itso useful is
that it maps a pair of vectors to a real number so that two vectors that are orthogonal
map toO and two identical unit vectors map to 1. If we define the natural basis unit
vectors by

= (1,0,... ,0), = (0,1,0,.. .,0), . . . , = (0,... ,0, 1),


then the dot product satisfies

•ek (8.12)
=
Combining this with a distributive law and the ability to factor out scalars, equa-
tion (8.12) uniquely defines the dot product of any two vectors,

(a1, a2, . . . , (b1, b2, . . . ,

= + + ... + + + +
= . + + ..• + .

+ + ••• +
. .

= a1b1 + a2b2 + ... +


Does this have an analog among the set of functions defined on [a, hI? Char-
acteristic functions play the role of unit vectors, so we need a mapping that takes
a pair of characteristic functions to a real number so that the image is 0 if and
only if the characteristic functions are "orthogonal." A natural characterization of
orthogonality would be if the sets are disjoint. Specifically, we define an inner
product of the characteristic functions for sets S and T as the measure of S fl T,

XT) = m(S fl T).


8.2 Metric Spaces 255

Definition: Inner product


If f and g are integrable functions over [a, b], then we define the inner product
of f and g, denoted (f, g), by
pb
(f,g) f(x)g(x)dx.
= Ja

We can use linearity to extend this to inner products of simple functions. If


= i/i = >2bjXT, then

(0, = n Tj)
= f O(x)*(x)dx.
If f and g are measurable, then Theorem 6.6 guarantees sequences of simple
functions that converge to f and g, f, g. We define

(f, g) = n—+oo
lim = lim I dx.

If fg is integrable, then this limit exists and equals

I (lim
J n-÷oo
dx = I f(x)g(x)dx.
J
If f and g are each integrable, then so is their product.
Now that we have an inner product, we can define the norm of a function and the
distance between two functions. If f2 is integrable on [a, b], we define the norm
of f over [a, b] by
b 1/2
f(x)2dx)

We define the distance between two functions f and g, both integrable over [a, b],
by

d(f,g)=
f
= g almost everywhere, then the distance between f and g is
zero. If the distance between f and g is zero, then (f — g)2 = 0 almost everywhere,
so f = g almost everywhere. Because it is common to insist that two objects are
the same if the distance between them is zero, we shall assume that f and g are the
same if they are equal almost everywhere. To be specific, we work with equivalence
classes of integrable functions over [a, b] where two functions are equivalent if
and only if they are equal almost everywhere.
256 Fourier Series

Definition:
The space p> 1, consists of all functions f for which fP is integrable over
[a, b], together with a norm defined by
b 'IP

= (f If(x)IPdx)
and a distance defined by

We consider f and g to be identical if they are equal almost everywhere.

Definition: L°°
The space L°° consists of all functions f that are bounded almost everywhere over
[a, b], together with a norm defined by

00
= inf {a I
f(x)I <a almost everywhere },
and a distance defined by

d00(f,g)=
f and g are considered identical if they are equal almost everywhere.
The set of functions f for which f2 is integrable over [a, b], equipped with the
distance function we have just defined, is denoted by L2, or L2[a, b] if we need to
specify the interval. More generally, Riesz defined as shown above.
We need to check that these really are vector spaces. The properties of vector
spaces are given on p. 18. Most of these properties are easily seen to be satisfied,
and we leave these for Exercise 8.2.18. We shall prove closure under addition.

Proposition 8.3 Closed under Addition). 1ff, g E then so is f + g.

Proof Since f and g are measurable, so is f + g. We only need to verify that the
integral of If + gI" is finite. This follows because

If + (Ifi + (2max{lfl, IgID'


+

which is integrable.

In addition to p = 2, the most important cases of spaces are p = 1 and the


limit as p c'o.
8.2 Metric Spaces 257

Definition: Norm
Given a vector space V and a mapping N from V to R, we say that N is a norm
if it satisfies the following properties:

1. N(x) > 0 for all x e V.


2. N(x) = 0 if and only ifx = 0.
3. N(ax) = tatN(x) for all x e V and a e R.
4. N(x +y) N(x)+N(y) for allx,y E V.

Definition: Metric space


A metric space is a set M together with a distance function d that assigns a real
number to each pair of elements of M such that
1. d(x, y) 0 for all x, y e M (positivity),
2. d(x, y) = 0, if and only if x = y (nondegeneracy),
3. d(x, y) = d(y, x) for all x, y e M (symmetry), and
4. d(x, y) + d(y, z) > d(x, z) for all x, y e M (triangle inequality).

In this case, it is easy to see that this space is closed under addition and scalar
multiplication (see Exercise 8.2.19).
We have constructed examples of norms on sets of functions and then used these
norms to define distance. Our definition of an space can be used with 0 < p < 1
to define a vector space. The problem with these values of p is that the resulting
norm does not satisfy the fourth of the required properties of a norm, the triangle
inequality (see Exercises 8.2.26—8.2.28).
The last of these conditions, the triangle inequality, is the only property of the
norms that does not follow immediately from the definition. Later in this section
we shall see how to prove it.

Convergence
Convergence of (fk) to f in means that given any E > 0, we can find a response
K so that for any k K, we have that

f) = IIfk flip <E.


In order to be able to work with such a definition, we need some basic assumptions
about how distance works, given in the following definition of a metric space.
Whenever distance is defined in terms of a norm, d(x, y) = N(x — y), it will
satisfy these conditions. Therefore, is a metric space if we can show that its
norm satisfies the triangle inequality.
258 Fourier Series

For p = 1, the triangle inequality is just the real number inequality a + bI


lal + Ibi. I leave the case p = oc for Exercise 8.2.3. For finite p 1, the triangle
inequality is equivalent to the following result of Hermann Minkowski (1864—
1909), who proved it for series in 1896, and Riesz, who proved it for integrals in
1910.

Theorem 8.4 (Minkowski—Riesz Inequality). 1ff, g E L 1 p < oc, then


(fb <(fb
If()+()IPd) If(x)IPdx)
+ (f g(x)IPdx)
(8.13)

As we shall see, this will follow from the next inequality.

Lemma 8.5 (Hölder—Riesz Inequality). For p, q > 1 such that i/p + 1/q = 1,
we take any f E gE We shall always have that fg E L' and
fb < (fb (fb
dx If(x)IPdx) (8.14)

Proof Since f and g are measurable, so is fg. Inequality (8.14) implies that fg
is integrable. We only need to prove inequality (8.14).
In Exercise 8.2.21, you are asked to verify that the equation i/p + 1/q = 1 is
equivalent to p — 1 = 1/(q — 1), to (q — l)p = q, and to (p — 1)q = p. Using
this equivalence, we see that for positive x, the function f(x) = xy —
x= = yq' (see Exercise 8.2.22). Therefore,
7 1\ yq

p p \ P1 q

Substituting a for x and ,8 for y with a, ,8 > 0, we have that


,8q
(8.15)
p q

We now set
= (fb = (fb
A B

= f a f a
8.2 Metric Spaces 259
We use inequality (8.15) with a = I f(x) I/A, ,8 = I g(x) / B: I

çb
1 ç" If(x)I Ig(x)i
dx dx
Ja = Ja A B
çb fb
<IJa pAP
dx+i dx
— Ja

This is precisely the inequality that we set out to prove.

Now we can prove Theorem 8.4.

Proof Minkowski—Riesz Inequality. We recall that (p — 1)q = p. Let M be the


right-hand side of our inequality,
= (fb
M If(x)IPdx)
+ (f Ig(x)IPdx).
We begin with the observation that

f f(x) f f(x) + 1

Ig(x)I dx.

We apply the Hölder—Riesz inequality to each integral on the right side of this in-
equality,
fb
f(x)+ dx

1/q b i/P
f(x) + dx) dx)
<(L (f
1/p
dx) dx)
+ (f (f
=M
(f
=M(f
260 Fourier Series

To get the Minkowski—Riesz inequality, we divide each side by


b 1/q
dx)
(L
and remember that 1 — 1/q = i/p.

Ordering L" Spaces


The space L' consists of all Lebesgue integrable functions. This includes many
unbounded functions such as f(x) = x"2 on [0, 1]. The space L°° admits only
those measurable functions that are bounded almost everywhere, a much more
restricted class of functions. In general, if p < q then D L". Any function in
is also in

Proposition 8.6 (Containment of Spaces). ff1 p q <oc and L" are


defined over the finite interval [a, b], then U D Furthermore, if q <cc then

Ill lip (b - (8.16)

Proof. Assume that f Dl. This implies that f is measurable. It is in U if and only
if dx is finite. Let r = q/p> 1 and define s so that 1 /r + 1/s = 1; that
is, s = r/(r — 1) = q/(q — p). We use the Hölder—Riesz inequality, Lemma 8.5,
with if and the constant function 1,
çh çb

J a a
(fb

dx) (b -
b
= (f b
dx) dx)
(f <(b —
(f
Exercises
8.2.1. For 1 p <q <cc, find an example of a function in U that is not in L".

8.2.2. For 1 p <cc, find an example of a function in U that is not in L°°.


8.2 Metric Spaces 261

8.2.3. Show that if f and g are bounded almost everywhere, then


inf {a If(x) + g(x)I <a almost everywhere
inf {a If(x)I <a almost everywhere
+ inf {a Ig(x)I <a almost everywhere }.
8.2.4. Show that for f L°°[a, b],

8.2.5. Show that for f L°°[a, b],

= inf sup
xE[a,b]

where g f means that g = f almost everywhere.


8.2.6. Let f be a continuous function on (a. b), and for any c. a <c <b. let
g = X(a,c)' the characteristic function of (a, c). Show that
1
Ill—gil00
8.2.7. Show that convergence in the L°° norm is uniform convergence almost
everywhere. That is to say, show that if ff
in L°°[a, b] , then there exists a
set S of measure zero in [a, b] such that uniformly on [a, b] — S, and if
f,, f
uniformly almost everywhere in [a, b], then f
in L°°[a, b].
8.2.8. Prove that if f f in b] for every finite
p 1.
8.2.9. Prove that if 1 p <q <cc and f f in
definition on p. 194).
f f in measure (see

8.2.11. Let be a sequence in that converges to f in the norm. Show that


converges pointwise almost everywhere tog, then f = g almost everywhere.
8.2.12. Consider the sequence of functions

(fi,i, 11,2, 12,2, 12,3, 13,3, •),

where is the characteristic function of the open interval ((k — 1)/n, k/n)),

fk,n =
Show that if 1 p < cc, then this sequence converges in the norm. Show that
it does not converge in the L°° norm.
262 Fourier Series

8.2.13. Show that the sequence given in Exercise 8.2.12 converges in the Cesàro
sense almost everywhere.
8.2.14. Define = . Show that f,, converges pointwise to the constant

function 0, but this sequence does not converge in any norm, 1 p oc.
Compare to Example 4.6 on p. 100. Explain why this sequence does not converge
to the constant function 1/2 in the L' norm.
8.2.15. Find a sequence of functions that converges in the L°° norm but does not
converge pointwise.
8.2.16. Find a sequence of functions that Cesàro converges almost everywhere but
does not converge in measure.
8.2.17. Find a sequence of functions that converges in measure but does not Cesàro
converge almost everywhere.
8.2.18. Verify that b] satisfies the definition of a vector space as given on
p. 18.
8.2.19. Show that the set of function in L°°[a, b] is closed under addition and
scalar multiplication.
8.2.20. Show that if x and y are nonnegative, then
max{x, y} = lim
p—÷oo
(xp + yp1/P
)
and, in general, for nonnegative x1, x2, . . . ,

max txi, x2, . . . , = lim ip + x2p + +


p—÷oc

8.2.21. Show that if p, q> 1, then i/p + 1/q = 1 is equivalent to


1
p—l= and (p—1)q=p.
q—1
8.2.22. Show that the function f(x) = xy — xP/p, x > 0 has its maximum at
x= =
8.2.23. Show that equality in the Hölder—Riesz inequality holds if and only if there
exist nonnegative numbers r and s such that
q
r f(x) p =s g(x)
8.2.24. The p = 1 case of the Hölder—Riesz inequality is
b b

f f If(x)Idx (8.17)

Prove this inequality.


8.3 Banach Spaces 263

8.2.25. If we take the limit q oc of each side of inequality (8.16), we get an


inequality between II and II
1 p< c'o. State and prove this inequality.

Exercises 8.2.26—8.2.28 establish the fact that the triangle inequality does not
hold for spaces when 0 < p < 1.
8.2.26. Let f b], where 0 < p < 1, and g E [a, b], q = p/(p — 1) <0
and, where f 0 and g > 0 on [a, b]. Show that
fb (fb (fb
dx> If(x)IPdx)

8.2.27. Let f, g E where 0 < p < 1 and f, g 0 on [a, b]. Show that

+ +
8.2.28. For 0 < p < 1, show that there exist functions f, g E b] such that

+ +
8.2.29. For n E N, define

= (n(n + X(±,1)•
Show that for each pair of distinct integers, m, n, the distance between fm and
is 2, Ifm — II,, = 2, and therefore this is a bounded sequence in that does
not have a limit point in

8.3 Banach Spaces


The spaces are equipped with a norm and therefore a definition of distance.
We can begin to explore questions of convergence. As we have seen for sequences
of real numbers, we often find ourselves in the situation where we want to prove
convergence but we do not know the limit. Therefore, we cannot prove convergence
by showing that the terms of the sequence will eventually stay as close as we wish
to this limit. We need to resort to the Cauchy criterion, that if we can force the
terms of the sequence to be as close together as we wish by going far enough out,
there must be something to which the sequence converges.
We never really proved that the Cauchy criterion is valid for sequences of real
numbers. It is a property we expect of the real number line, and we take it as an
axiom that every Cauchy sequence converges. We would like this Cauchy criterion
to be true for sequences in our U spaces, but we can no longer just assume that it
is true. Now we need to prove it. We shall show that every L" space is a Banach
space.
264 Fourier Series

Definition: Banach space


A vector space equipped with a norm is called a Banach space if it is complete,
that is to say, if every Cauchy sequence converges.

The name used to designate complete metric spaces was chosen to honor one
of the founders of functional analysis, Stefan Banach (1892—1945). He was born
in Krakow in what was then Austria-Hungary, today Poland. In 1920, he began
teaching at Lvov Technical University in what was then Poland and is now Ukraine.
Until the Nazi occupation of Lvov in 1941, he was a prolific and important mathe-
matician. Imprisoned briefly by the Nazis, he spent the remainder of the war feeding
lice in Rudolf Stefan Weigl's Typhus Institute.2 Banach died of lung cancer shortly
after the war ended.
The proof that is complete for every p, 1 p oc, is known as the Riesz—
Fischer theorem. Riesz and Fischer each proved it independently for p = 2 in 1907.
The remaining cases were established by Riesz in 1910.

Theorem 8.7 (LP is Banach). For 1 p oc, is a Banach space.

Proof We shall first prove the case p = To say that a sequence is Cauchy
in the L°° norm means that given any E > 0, we can find a response N so that
m, n N implies that fm — This means that fm(X) — <EI

almost everywhere. We eliminate the values of x in the following sets:

Ak = {x E [a, b] Ifk(x)I> I'


= {x E [a, b] Ifm(x) - hfm - I.
From the definition of the L°° norm, each of these sets has measure zero (see
Exercise 8.3.2). There are countably many of these sets, so their union also has
measure zero,

F=(UAk)u( U m(F)=0.
k=1 1<m<n<oo

For each x in what remains, [a, b] — F, we have that m, n > N implies that

Ifm(x) — ft7(x)h — <E.

- For more information on this unpleasant occupation and how it was used to save the lives of a number of
Polish and Ukrainian intellectuals during World War II, see www.lwow.home.plfWeigl.html.
8.3 Banach Spaces 265

The sequence is a Cauchy sequence, and so it converges to a finite value


we can call f(x). We can define f however we wish for x e F, say f(x) = 0. It
follows that fk f
almost everywhere. It only remains to show that f e L°°.
Since f is a limit almost everywhere of measurable functions, it is measurable. We
have to show that there is a bound on f that holds almost everywhere.
Let N be the Cauchy criterion response to E = 1 and let

,8 IIfNIL,o}.

Then for x e [a, bI — F and for all k> 1

lfk(x)l /3 + 1.

It follows that lf(x)I /3 + 1 almost everywhere, and therefore f e L°°.


We now consider the case of finite p. To say that is Cauchy in the U
norm means that for any E > 0, we can find a response N so that m, nN implies
that fm — <E. We construct a subsequence of the fk as follows. Choose n1
so that n n1 implies that

<

We choose n2 > n1 so that n implies that

-
In general, once we have found 1, we choose > 1
so that n implies
that

- fnkllp < 2k

It follows that

+ - lip +1 <oc.

We shall use the fact that this sum is bounded.


We create a new sequence of functions by

= + —
266 Fourier Series

By the Minkowski—Riesz inequality, Theorem 8.4 on p. 258, we have a finite bound


on f g,ç(x) dx that does not depend on k,

fb p

dx

+ —
j=1
+ (8.18)

Using Fatou's lemma, Theorem 6.20 on p. 187,

fb p

dx dx <oc.
(8.19)
By the monotone convergence theorem, Theorem 6.14 on p. 174, the sequence of
functions

/ k

j=1

converges almost everywhere, and therefore so does the sequence of functions

= + fnjl

We know that if a sequence of real values converges absolutely, then it converges


(see Theorem 1.9). Therefore the sequence of functions

+ — =

converges almost everywhere.


We have found a subsequence of that converges almost everywhere. As
before, we define f by f(x) = fflk(x) where our sequence converges,
8.3 Banach Spaces 267

and define f any way we want, say f(x) = 0, on the set of measure zero where
the subsequence does not converge. From equation (8.19),
fb = fb p

dx — dx
+
b 00

<f +
f e U. It only remains to prove that converges to f in the sense
of the U norm.
We observe that

f(x) — fnk(x) -
= f—k

From the triangle inequality, it follows that


00 00

If — fnkIIp —
=
j=k

Given any E > 0, choose k so that E > For all n > we see that

1 1 3

LI

The Riesz—Fischer Theorem


Recall that we began our study of L2 with the observation that we had an inner
product. We then used the inner product to define the norm, and from the norm
we defined distance. But we do not want to lose that inner product, because inner
products enable us to write vectors in terms of basis vectors.
For example, in R3, we can use = (1,0,0), = (0, 1,0), and = (0,0, 1)
as our basis vectors. Any two of these are orthogonal, and every vector is a unique
linear combination of these three vectors. We could use other basis vectors. For
example,

= (1, —1,2), = (2,0, —1), = (1,5,2).


In Exercise 8.3.1, you are asked to verify that each pair from v3 j is or-
thogonal. Each vector has a unique decomposition into a linear combination of
268 Fourier Series

these three vectors, and we can use the dot product to find this decomposition. The
component of in the direction of is
V Vk
Vk.
Vk Vk

Thus, if i = (1, 2, 3), the coefficient of is

1•1+2•(—1)+3•2 5
1.1 +(—1)•(—1)+2•2 — 6

We can write (1, 2, 3) in terms of our basis as

Now think about the Fourier series representation of a function in L2[—ir, in,
say f(x) = x:

f(x) = 2 sin(x) — sin(2x) + sin(3x)



sin(4x) +.•• . (8.20)

The trigonometric functions used in the Fourier series are

1, cosx, cos 2x, ... , sinx, sin 2x,

These functions are orthogonal using the L2 inner product! For n 1, we note
that
(Jr 1

I 1 cos(nx) dx = — sin(nx) = 0,
n
(Jr —1
I 1 sin(nx) dx = cos(nx) = 0,
n
fir —--cos2(nx)
I sin(nx) cos(nx) dx = = 0.
2n

For m n, we observe that


fJr sin((m — n)x)
sin((m + n)x)
I cos(mx) cos(nx) dx = = 0, +
f—Jr 2(m — n) 2(m + n)
fJr •
• sin((m — n)x) sin((m + n)x)
I sin(mx) sin(nx) dx = — =0
2(m — n) 2(m + n)
. — cos((m — n)x) cos((m + n)x)
I sin(mx) cos(nx) dx = — = 0.
2(m — n) 2(m + n)
8.3 Banach Spaces 269
It appears that these functions form a basis, but one that is infinite. Even so, we
should be able to use the inner product to find the coefficients:
(x, cos(nx)) — x cos(nx) dx

(cos(nx), cos(nx)) — (nx) dx —
(x, sin(nx)) — x sin(nx)dx
(sin(nx), sin(nx)) — sin2 (nx) dx

— —

Jt n

The Fourier series expansion of our function f defined by f(x) = x, —it <x <pr,
is simply the representation of f in terms of this orthogonal basis of sines and
cosines.
The amazing result discovered by Fischer and Riesz is that every function in
L2 has such a representation and every suitably convergent trigonometric series
corresponds to a function in L2.

Theorem 8.8 (Riesz—Fischer Theorem). Let f e L2[—ir, in, then f has a unique
Fourier series representation

f(x) = + cos(nx) + sin(nx)),

where
(f(x), cos(nx)) (f(x), sin(nx))
n>O n>1.
(cos(nx), cos(nx)) — (sin(nx), sin(nx)) —

The convergence of this series is convergence in the sense of the L2 norm. Further-
more,
2
<cx. (8.21)

The implication also goes the other way. If(ao, a1, b1, a2, b2, ...) is any sequence
of real numbers for which + + converges, then

00
a0
+> cos(nx) + sin(nx))

is afunction in L2.

We shall prove this theorem in the next (and last) section.


270 Fourier Series

Comparing this to results on Fourier series in Section 8.1, we see that we have
strengthened Lebesgue's assumption. Instead of simply being integrable, we insist
that the square of the function must be integrable. In exchange, we get a much
stronger conclusion. We no longer need the Cesàro limit. Nevertheless, we get
convergence only in the L2 norm. But in 1966, the Swedish mathematician Lennart
Carleson (b. 1928) showed that convergence is not just in the L2 norm, it is pointwise
convergence almost everywhere. In 2006, Carleson received the Abel Prize "for his
profound and seminal contributions to harmonic analysis and the theory of smooth
dynamical systems." In 1970, Richard A. Hunt of Purdue University showed that
there is nothing special about L2. The same is true for functions in any space,
provided only that p is strictly greater than 1. All of the problematic functions that
require a Cesàro limit live in L' but not in p > 1.

Theorem 8.9 (Carleson—Hunt Theorem). If f e p> 1, then the Fourier


series representation off converges almost everywhere to f.

The proof of the Carleson—Hunt theorem is beyond the scope of this book. The
remainder of this chapter will be devoted to proving the Riesz—Fischer theorem.
Because it requires little additional effort to prove a far more general result, I shall
present this proof in the context of Hilbert spaces.

Exercises
8.3.1. Verify that each pair of the vectors
= (1, —1,2), v2 = (2,0, —1), v3 = (1,5,2)
is orthogonal.
8.3.2. Show that for functions in L°°[a, bi, each of the following sets has measure

Ak = {x e [a, bi lfk(x)l>
Bm,n = {x e [a, bi lfm(x) — Ifm — I.
8.3.3. Let C = C[0, 11 be the set of all continuous functions on [0, 11. For f e C,
define the max norm by If umax = max Show that C equipped with the max
norm is a Banach space.
8.3.4. Let C = C[0, 11 be the set of all continuous functions on [0, 11. Show that
C equipped with the L2 norm, hf 112 = f2(x) dx, is not a Banach space.
8.3.5. Show that for f e L bI, 1 <p <oc and any E > 0, there exists a contin-
suchthat If — <E and If — <E.
8.4 Hubert Spaces 271

8.3.6. Let be a sequence of function in bI, 1 p <oc, that converges


almost everywhere to f e LP[a, bi. Show that converges to f in the norm
if and only if II II flIp.

8.3.7. We say that a sequence of integrable functions is equi-integrable on


[a, bI if for each E > 0 there is a response S > 0 so that for any S [a, bi with
rn(S) we have that dx <E for all n > 1. Show that if is an
equi-integrable sequence that converges almost everywhere to f, then
çb çb
I f(x)dx = lim I
Ja
8.3.8. Let be a sequence of functions in bI, 1 <p <oc, that converges
almost everywhere to f e bi. Show that if the sequence of norms is bounded,
<M for all n > 1, then for all g e bI, q = p/(p — 1), we have
that
çb çb
I f(x)g(x)dx = lim
Ja
Identify how p> 1 is used in your proof and give an example that shows that this
result is not correct for c L' [a, bI, g e L°°[a, b].
8.3.9. Let be a sequence of functions that converges in the norm to f,
1< p < Let (ga) be a sequence of measurable functions such that
g almost everywhere. Show that converges to gf in the
norm.

8.4 Hubert Spaces


We begin with the definition of a Hilbert space. Note that the inner product on L2
satisfies all of the criteria.
It is ironic that these Banach spaces equipped with an inner product are today
called Hilbert spaces, because Hilbert resisted this formalism. Nevertheless, it was

Definition: Hubert space


A Hubert space is a Banach space equipped with an inner product, (x, y), that
maps pairs of elements to real numbers and satisfies the following properties:

1. (x, y) = (y, x) (symmetry).


2. for a, IR, (ax + z) = a(x, z) + z) (linearity).
3. (x, x) > 0 (nonegativity).
4. (x, x) = lix 112 (defines norm).
272 Fourier Series

his work on orthogonal systems of functions, especially in finding solutions of the


general integral equation
çb
f(s) = çb(s)+ K(s, t)çb(t)dt,
Ja
where f and K are given and the task is to find an appropriate function that
would lead others to develop this concept.
David Hilbert (1862—1943) was from Konigsberg (modem Kaliningrad in the
little piece of Russia between Poland and Lithuania), earned his doctorate under
the direction of Ferdinand Lindemann in 1885, and in 1895 became chair of math-
ematics at the University of Gottingen. He was probably the most influential and
highly acclaimed German mathematician of the early twentieth century. In addition
to his work on integral equations and the related field of calculus of variations, he
is noted for his fundamental contributions to invariant theory, algebraic number
theory, mathematical physics, and, especially, geometry.
We first observe a basic inequality and identity for Hilbert spaces. Recall that
the dot product in satisfies

iib = cos9,

where 9 is the angle between and It follows that

V 1i1,ii.

This is true of any inner product.

Proposition 8.10 (Cauchy—Schwarz—Bunyakovski Inequality). The inner prod-


uct of any Hubert space satisfies the inequality

(x, < lxii ilyll.

This result was proved for R°° by Cauchy in 1821 and for L2 (though they did
not call it that) independently by Victor Bunyakovsky in 1859 and Hermann A.
Schwarz in 1885.

Proof. By linearity, (x, 0) = 0. We now assume x, y 0, set A = Ix Ii, and


observe that

0< (x—Ay,x---Ay)
= (x,x)+A2(y,y)—2A(x,y)

0 < lix ii — (x, y).


8.4 Hubert Spaces 273

The parallelogram law says that the sum of the squares of the diagonals of a
parallelogram equals the sum of the squares of the sides,

iiibli2).
This is also true of any Hilbert space, and the proof is the same.

Proposition 8.11 (Parallelogram Law). The inner product of any Hubert space
satisfies the equality

lix + yii2 + lix - y112 =2 (11x112 + 11y112). (8.22)

This proof is left as Exercise 8.4.2. Equipped with this result, we can quickly
verify that L2 is the only U space whose norm arises from an inner product.

Proposition 8.12 (L2 Alone). The only U space that is also a Hi/bert space is
L2.

Proof In bi, take two subsets, S, T c [a, bi, such that S fl T = 0 and
rn(S) = m(T) 0. Define A = m(S)11P. In Exercise 8.4.3, you are asked to show
that

IiAxsiip = 1, 11AXT11p = , iiXs + AXTIIP = I1AXs — AXTIIP =


(8.23)
The parallelogram law for this example becomes + = 2(12 + 12), which
is true only when p = 2.

Complete Orthogonal Set


We know that the set { 1, cos x, sin x, cos 2x, sin 2x, . .} is an orthogonal set in
.

L2. Is anything missing? Is there any function (other than the constant function 0)
that is orthogonal to all of these in L2? In other words, do we have a complete
orthogonal set?

Definition: Orthogonality
If x and y are elements of a Hilbert space such that (x, y) = 0, then we say that
x and y are orthogonal. A set Q of elements of a Hubert space is called an
orthogonal set if for each pair x y E Q, we have that (x, y) = 0.

Definition: Complete orthogonal set


An orthogonal set Q is complete if (x, y) = 0 for all y E Q means that x = 0.
274 Fourier Series

Theorem 8.13 (L2 Complete Orthogonal Set). The set


1, cos x, sin x, cos 2x, sin 2x, cos 3x, ..

is a complete orthogonal set in L2[—ir, in.

Our proof will be spread over Lemmas 8.14—8.16 and will proceed by contra-
diction. We assume that we have a nonzero function f e L2[—in, in for which
f7t PJT

f(x)cosnxdx = 0, n >0, and f(x)sinnxdx = 0, n 1.


J—7t J —Jr

The proof breaks up into three pieces. We first establish the existence of finite
trigonometric series with certain properties that help us isolate the value of f near
specific points. Next, we use the existence of such a finite series with the fact that
f is continuous to find a contradiction. Finally, we pull this all together to find a
contradiction when all we assume about f is that it is integrable. Notice that we
need to prove our result only for f e L2. What we shall actually prove is stronger
than we need.

Lemma 8.14 (Special Trigonometric Polynomial). Given any > 0 and any
E > 0, we can find a finite trigonometric series,

T(x) = a0 + cos(nx) + sin(nx)),

for which
1. T(x) > Oforallx e [--in, in,
2. T(x)dx = 1, and
3. T(x) <Eforalh3 < lxi <ir.

These functions are all nonnegative, and the area underneath the graph is always
1, but that area is concentrated as close to zero as we wish (see Figure 8.2). The
effect of integrating f T is to pick out the values of f closest to 0. If f is nonzero
at any point, say f(z) 0, then integrating f(x + z)T(x) picks out those values
of f in an arbitrarily small neighborhood of z.

Proof. Define

= (1 + (j (1 + dx)
(f—JT

= (cos(x/2))2fl (cos(x/2))2fl dx).


8.4 Hi/bert Spaces 275

—2 —1 1 2

Figure 8.2. Graphs of T5, T10, T20, and T50.

The first two properties are clearly satisfied by this function. For the third property,
we observe that for 6 < x we have that

<(cos(6/2))
2

— 2
— 6

Since
cos(6/2)
cos(6/4)
we can find an n for which <E for all 6 < Ix I

Lemma 8.15 (Continuous f). 1ff is continuous on II—7r, 7t] and f is orthogonal
to cos(nx), n > 0, and to sin(nx), n > 1, then f is the constant function 0 on this
interval.

Proof We extend the definition of f to all of R by assuming that f(x + 27r) =


f(x). This might entail changing the value of f at one of the endpoints of [—7r, 7t],
but that will not change the values of any of the integrals. Let T(x) be any finite
trigonometric series in cos(nx) and sin(nx), and let y be any real number. Using
276 Fourier Series

the difference of angles formulas, T(x — y) is also a finite trigonometric series in


cos(nx) and sin(nx) (see Exercise 8.4.6). From the orthogonality of f,
p7r
f(x)T(x—y)dx=0.
J—Jr y—7r

Assume there is a z E at which f(z) = c 0. We can assume c is


positive (otherwise take the negative of f). By the continuity of f we can find a 6>
o so that f(x)> c/2 for all x E (z — 6, z + 3) c 7r). Since f is continuous,
we also know that it is bounded on [—7r, 7t], say If I <M. In what follows, we shall
use the lower limit on f, f > —M. We choose T so that it satisfies the conditions
of Lemma 8.14. We have that

0=] f(x+z)T(x)dx
—7t
p7r
C
/ T(x)dx+ / f(x+z)T(x)dx+] f(x+z)T(x)dx
8
Jr —8
= (f(x + z)
f T(x)dx +f — c/2) T(x)dx

(f(x+z)-c/2)T(x)dx
+ f
— (M
+ f T(x)dx - (M
+ f T(x)dx
>
2 \ 21
where we can get any positive value we wish for E by suitable choice of T. This
tells us that
fir
0=] f(x+z)T(x)dx> c >0,
—JT

a contradiction. Therefore, there is no z E at which f(z) 0.

Lemma 8.16 (Integrable f). Let f be integrable on [—7r, 7t] where f is orthog-
onal to cos(nx), n > 0, and to sin(nx), n > 1, then f is the constant function 0
on this interval.

Proof. We use the fact that the integral of f,


fx
F(x)=] f(t)dt,
—Jr
8.4 Hi/bert Spaces 277

iscontinuous and that we can relate the Fourier coefficients of f and F by means
of integration by parts (see Exercise 7.4.11 on p. 239). For n 1, we have that
Jr Jr Jr
sin(nx) 1
I F(x)cos(nx)dx = F(x) — —I f(x) sin(nx)dx = 0,
J—7r n flJ_Jr
(Jr cos(nx) 1
I F(x) sin(nx) dx = — F(x) +— I f(x) cos(nx) dx = 0.
J—7r fl

The only nonzero Fourier coefficient of F is the constant term, say A0. All
Fourier coefficients of F — A0 are zero, so F is constant. This implies that f =
F'=O.

Complete Orthonormal Sets


We have seen that our sequence of sines and cosines is complete. Before working
with arbitrary complete orthogonal sets, it is useful to consider those sets for which
each element has norm 1. This can be done just by dividing each element by its
norm.
In 7t], we turn our sequence into an orthonormal set if we replace the
constant function function 1 with and divide each of the other functions by
The set
1 cos x sin x cos 2x sin 2x cos 3x
J

is a complete orthonormal set in L2[—7r, 7t]. The advantage of working with this
orthonormal set is that the Fourier coefficient of, say, is simply the
inner product of f with
All of the remaining results needed to establish the Riesz—Fischer theorem will
be done in the context of an arbitrary Hilbert space H for which there exits a
countable, complete, orthonormal set .. .1. For f H, the real number
U' is called the generalized Fourier coefficent.
For f H, we need to show that

Definition: Orthonormal set


An orthogonal set Q is orthonormal if each element has norm 1.
278 Fourier Series

This equality is to be understood in terms of convergence in the norm. If we want


to find the finite sum that most closely approximates f using our norm,
we see that
2
N N N

k=1 j=1 k=1

= (f, f) — f) -

+
j=1 k=1

= If 112 +

= If 112 - ((f, - Ck). (8.24)


+
As we would hope, the distance between f and
Ck = (f, Equation (8.24) implies the following two results, equation (8.25)
and inequality (8.26), known as Bessel's identity and Bessel's inequality. They
are named for Friedrich Wilhelm Bessel (1784—1846), a mathematical astronomer
who spent most of his career at the Konigsberg observatory (then in Germany,
today it is the Russian city Kaliningrad). His most widely known mathematical
work is the development of the Bessel functions, a complete orthonormal set for
L2 that arose in his study of planetary perturbations.
2
N N

- = If 112
— (8.25)

If 112. (8.26)

Since inequality (8.26) holds for all N, we have that

(8.27)

and therefore if f E H, then the sum of the squares of the generalized Fourier
coefficients must converge. This proves inequality (8.21) in the Riesz—Fischer
theorem (Theorem 8.8 on p. 269).
8.4 Hi/bert Spaces 279

Completing the Proof of the Riesz—Fischer Theorem


There are still two pieces of the Riesz—Fischer theorem that must be established.
First, that the series (f' converges in norm to f. Second, that any series
is in H if converges. Equation (8.29), known as Parseval's
equation, is a pleasant surprise that appears in the course of the proof. It is named
for Marc-Antoine Parseval des Chênes (1755—1836).

Theorem 8.17 (Convergence of Fourier Series). Let be a complete orthonor-


mal set in the Hilbert space H. For each f H, we have that

f= (8.28)

Furthermore, we have that

= (8.29)

Finally, lf(ck) is any sequence of real numbers for which converges, then
00

converges to an element of H.

Proof Let

We need to prove that


lim
fl 00

We have that for m <n,

k=in+1

and, therefore,

k=m+1

Bessel's inequality (8.26) implies that Cl' converges, so we can force


the distance between and fm to be as small as we wish by taking m and n
280 Fourier Series

sufficiently large. That means that our sequence is Cauchy, and therefore it
must converge. Let g be its limit,

lim
fl 00
=g =
k=1

We fix k. Then for all n > k we have that

=
j=1
=
It follows that
(g, (/)k) = jim = (f'
fl —+00

and, therefore,

Since our orthonormal set is complete, g = f.


We now use Bessel's equality, equation (8.25), to prove Parseval's equality:
00 N

hf 112
— =

= lim =0.
k=1

Finally, as seen above, if converges, then the sequence of partial sums of


is Cauchy. Since every Hilbert space is a Banach space, H is complete.
Therefore this series converges to an element of H.

Exercises
8.4.1. Show that if x is orthogonal to ..., then x is orthogonal to any linear
combination

8.4.2. Using the definition of the norm in terms of the inner product, prove that

lix + yii2 + lix - y112 =2 (lix 112 + ily 112).


8.4 Hi/bert Spaces 281

8.4.3. Finish the proof of Proposition 8.12 by proving the four identities in equa-
tion (8.23).
8.4.4. Use integration by parts (Exercise 7.4.12) to show that if F is a differentiable
function over f
and if F' = has the Fourier series representation

f(x) = a0 + cos(nx) + sin(nx)),

then the Fourier series for F is

F(x) = A0 + cos(nx) + sin(nx)).

8.4.5. Explore the values of + cosx) dx = 2 dx for =


1, 2, 3 Find a general formula for the value of this function of the positive
integers n.
8.4.6. Using the difference of angles formula, show that if T(x) is a finite trigono-
metric series in cos(nx) and sin(nx), then T(x — y) is also a finite trigonometric
series in cos(nx) and sin(nx).
8.4.7. Show that any inner product is continuous: if x and y,., —* y, then
(xv, > (x, y).

8.4.8. Let B be a Banach space whose norm satisfies the parallelogram law, (equa-
tion (8.22)). Show that if we define the inner product by

(x, y) = (lix + y112 - lix - y112),


this makes B into a Hilbert space.

8.4.9. Let C[O, 1] be the set of continuous functions with the max norm (see
Exercise 8.3.3). Does this space satisfy the parallelogram law? Justify your answer.
8.4.10. Let x be orthogonal to each of the elements and let y =
Show that x is orthogonal to y.
9
Epilogue

Does anyone believe that the difference between the Lebesgue and Riemann integrals can
have physical significance, and that whether say, an airplane would or would not fly could
depend on this difference? if such were claimed, 1 should not care to fly in that plane.

— Richard W. Hamming'

Hamming's comment, though cast in a more prosaic style, echoes that of Luzin
with which we began the preface. When all is said and done, the Lebesgue integral
has moved us so far from the intuitive, practical notion of integration that one can
begin to question whether the journey was worth the price.
Before undertaking this study of the development of analysis in the late nine-
teenth, early twentieth centuries, I had been under the misapprehension that what
convinced mathematicians to adopt the Lebesgue integral was the newfound ability
to integrate the characteristic function of the rationals. In fact, the evidence is that
they were quite content to leave that function unintegrable. The ability to integrate
the derivative of Volterra's function was important, but less for the fact that the
Lebesgue integral expanded the realm of integrable functions than that, in so doing,
this integral simplified the fundamental theorem of calculus. This begins to get to
the heart of what made the Lebesgue integral so attractive: It simplifies analysis.
I have included Osgood's proof to show how difficult it can be to make progress
when chained to the Riemann integral and how easily such a powerful result as the
dominated convergence theorem flows from the machinery of Lebesgue measure
and integration.
The real significance of the Lebesgue integral was the reappraisal of the notion
of function that enabled and, in turn, was promoted by its creation. Following

See Hamming (1998) for a full elaboration on this theme.

282
Epilogue 283

Dinchlet and Riemann, mathematicians had begun to grasp how very significant
it would be to take seriously the notion of a function as an arbitrary rule mapping
elements of one set to another. Through the second half of the nineteenth century,
they came to realize that the study of real-valued functions of a real variable is the
study of the structure of R. Set theory and the geometry of R took on an importance
that was totally new.
This insight was solidified in Jordan's Cours d'analyse, the textbook of the mid-
1890s that would shape the mathematical thinking of Borel, Lebesgue, and their
contemporaries. Jordan established the principle that the integral is fundamentally
a geometric object whose definition rests on the concept of measure. Jordan got
the principle correct, but the details — his choice of a measure based on finite
covers — were discovered to be flawed. This was the direct inspiration for the work
of Borel and Lebesgue. The great simplification that came out of their work was
the recognition that what happens on a set of measure zero can be ignored.
Chapter 8 hints at the fundamental shift that occurred in the early twentieth
century when the theory of functions as points in a vector space — the basis for
functional analysis — emerged. To appreciate this new field, we must view it in
the context of all that was happening in mathematics. This book has followed a
single strand from the historical development of mathematics and so ignored much
else that was happening, influencing and being influenced by the development
of the theory of integration. We saw a hint of this in Borel's work in complex
analysis that led him to the Heine—Borel theorem and in occasional references to
multidimensional integrals. But we have ignored the entire development of complex
analysis, the insights into the calculus of variations, the study of partial differential
equations, and the nascent work in probability theory. Most of the nineteenth-
century mathematicians working in what today we call real analysis were working
broadly on analytical questions, and especially on the practical questions of finding
and describing solutions to situations modeled by partial differential equations.
The manuscript that Joseph Fourier deposited at the Institut de France on Decem-
ber 21, 1807, the event that I identitified in A Radical Approach to RealAnalysis as
the beginning of real analysis, showed how to solve Laplace's equation, a partial
differential equation,

—+------=o.
8x2 8w2

The same trick that was used there, the assumption that

z(x, w) =
284 Epilogue

would be shown to work on many other partial differential equations. The difficulty
would come in expressing the distribution of values along the boundary in terms
of the basis functions,

f(x) = z(x, 0) =

If the sequence forms a complete, orthogonal basis with respect to an appro-


priately defined inner product over some Hubert space, then we are home free.
The Hilbert spaces that we saw in Section 8.4 unify our understanding of many
different partial differential equations.
Exact solutions to partial differential equations are rare. But as physicists, as-
tronomers, and mathematicians came to realize, it is often possible, even when
an exact solution cannot be found, to say something definite about the existence,
uniqueness, and stability of solutions. Often, this is all that really is needed.
The classic example of this is Henri Poincaré's Les Méthodes Nouvelles de la
Mécanique Céleste (New methods of celestial mechanics). Newton had established
the physical laws that govern the motion of the sun and the planets within our solar
system, laws that are easily interpreted into the language of partial differential
equations. But even a system as simple as three bodies — the sun, the earth, and the
moon — does not yield an exact solution. From 1799 until 1825, Laplace published
his five-volume Traité de Mécanique Céleste (Treatise on celestial mechanics),
greatly extending and simplifying the tools needed to study celestial motion. Yet
even he was stymied by the three-body problem. The great problem that neither
Newton nor Laplace could solve was to determine whether or not our solar system
is intrinsically stable. Orbits will vary under the pull of the constantly shifting
planets. Do we need to worry that next year the earth will reach a tipping point
where it suddenly begins a rapid spiral into the sun?
Poincaré solved this problem. He did not find an exact solution. Rather, he
invented entirely new tools for analyzing solutions to partial differential equations,
tools that enabled him to conclude that our solar system is indeed stable. On each
revolution, planetary orbits will return to within a certain clearly delimited window.
Poincaré's work was published in three volumes appearing in 1892, 1893, and 1899.
It considered each orbit as a point in space and examined the possible perturbations
of these points.
By 1910, there was strong evidence coming from many directions that it was
very fruitful to work with functions as single points in an abstract function space.
As we have seen, such spaces have many possible definitions for distance, the
norms giving some indication of what is possible. Topology begins when we
Epilogue 285

study properties that remained unchanged as we modify our definition of distance,


properties that include compactness, completeness, and separability.2
An even more fundamental transformation in our understanding of functions was
about to take place. At the start of Chapter 3, I described the power of the realization
that two very different mathematical patterns exhibit points of similarity. Sets of
functions were now beginning to look like Euclidean spaces with their notions
of dimension, distance, and orthogonality. This suggests that we might usefully
apply the tools that have been used to analyze Euclidean space, tools that include
linear transformations with their eigenvalues and eigenvectors and the algebraic
structures that have been created to probe and explain symmetries.
This was done with astounding if often complex results. The twentieth century
saw new and powerful methods: spectral theory, algebraic topology, harmonic anal-
ysis, K-theory, and This is the mathematics at the heart of quantum
mechanics, string theory, and modern physics. These are the methods that would
lead to breakthroughs in number theory. Analysis today looks very little like cal-
culus. Today's advances are more likely to look like geometry and algebra, but
acting on strange, twisted spaces with operations that are only vaguely reminiscent
of multiplication or addition. The floodgates had been opened.

2
A set is separable if it contains a subset that is both countable and dense.
Appendix A
Other Directions

A.1 The Cardinality of the Collection of Borel Sets


The collection' of Bore! sets, B, was defined on p. 127 as the sma!!est cr-algebra
that contains a!! c!osed interva!s. This is a top-down definition, starting with a!!
a -a!gebras and finding the smallest that contains a!l closed intervals. Our approach
to proving that this collection has cardinality c is to seek a bottom-up definition,
recursive!y adding Bore! sets until we have them all and showing that at each
step the collection we have built has cardinality c. The bottom-up approach runs
into difficulties with transfinite induction. Today, these difficulties are bul!dozed
by the machinery of transfinite ordinals, machinery that was built from the study
of problems such as this. While efficient, it can obscure the real issues. We shall
instead proceed naively as did the early investigators and see where we get into
trouble.
We start with B0, the set of all closed intervals. Since each closed interval is
determined by its endpoints, B0 has cardinality c. Let B, be the collection that
consists of all countable unions of sets from all countable intersections of sets
from B0, and all differences of sets in B0. Since we have defined countable to
include finite, B0 is contained in B1. The cardinality of the collection of countable
unions is at most the cardinality of the collection of sequences of sets taken from
B0. The collection of sequences of sets from B0 is in one-to-one conespondence
with the set of mappings from N to B0. (For each positive integer, choose a set
to go in that position.) It follows that the col!ection of unions has cardinality at
most = c. The same is true for the cardinality of the collection of countable
intersections. And the collection of differences has cardinality bounded by c• c = c.
Since the cardinality of B, is at least the cardinality of B0, the cardinality of B1 is
exactly c.

To avoid confusion, from here on we shall use "collection" when we are speaking of a set of sets.

287
288 Other Directions

We proceed inductively, defining to be the collection of all countable unions


of sets from all countable intersections of sets from and all differences
of sets in Again, c and, using the same argument given before,
the cardinality of is c. It might seem that we get all Borel sets if we simply
take the union but there are many Bore! sets that are not yet accounted
for. It can be shown that for each positive integer n, there is a Borel set that is in
and not in Demonstrating the existence of such a set is not easy. It was
proven by Lebesgue in 1905 (Lebesgue 1905c), at the same time that he proved
almost everything else in this appendix. For our purposes, we shall simply accept
the existence of this Borel set and move on. Since there is a Bore! set in —

there is at least one Borel set in — that is contained within the interval
(n, n + 1) (see Exercise A.1.1). Call it E = is a Borel
set, but it is not in any of the and so it is not in the collection We
need to define a larger collection of Borel sets, the collection of al! countable
unions, countable intersections, and differences of sets in We note that
the cardinality of is c.
We still are not done. We can find a countab!e collection of sets in whose
union is not in We need another collection that consists of all countable unions
of sets from all countab!e intersections of sets from and all differences of
sets in We shall call this collection of sets and now we begin to see the
problem. We are going to get many transfinite subscripts.
In set theory, this first infinite subscript is usually denoted by w rather than
oc to avoid confusion with the transfinite cardinal numbers for which adding
1 resu!ts in no change + 1 = so). The subscript w or oc is refened to as the
first transfinite ordinal number, distinguishing these from the transfinite cardinal
numbers. Because we do not rea!ly need a new symbol for infinity — we shall use 00
consistently in this appendix — we shall stick with oc. We are in good company. Our
notation is what Cantor used in his early explorations of transfinite induction when
he attempted to classify second species sets by continuing the notion of derived sets
00 (n)
into transfinite iterations: If S is the nth derived set of S, then S (oo) =

S
and 5(00+1) is the derived set of
We continue, aware that we have moved into the realm of transfinite numbers
where great care must be taken. By taking unions, intersections, and differences of
sets in B00+i, we get B00+2, and so on through for all n e N. Again, Lebesgue
showed that each is a proper subcollection of and so
does not contain all of the Bore! sets. The collection of a!l unions, intersections, and
differences of sets in can be denoted by B00+00, or, more succinctly,
as B00.2. This collection also has cardinality c. Of course, from here we get B00.2+i
and so on through n e N. It should by now be clear how we can build
for any n e N. What about the set built from all unions, intersections, and
A.] The Cardinalily of the Collection of Borel Sets 289

differences of sets in We shall call this B002. Continuing in this way,


it is possible to define where p is any polynomial with nonnegative integer
coefficients.
We still are not done. We can define the collection of unions, intersections,
and differences of sets in to be B0000. The subscript 0000 should not be
confused with The former still conesponds to a countable union of countable
sets, and so is a countable ordinal (an ordinal number that can be reached in
countably many steps); the latter is the cardinality of the set of mappings from N to
N, and this is c. In fact, we can build towers of infinities, 000000, 0000°° , ... Their
limit is an infinite tower of infinites,
T= 000000

At each stage, we have taken countable unions, countable intersections, and


differences of Borel sets chosen from a collection with cardinality c, and so the car-
dinality of each collection of Borel sets remains c. The cardinality of the collection
BT is c, and it is still not the last such collection. We can add 1 to this subscript.
We have barely scratched the surface. Every transfinite ordinal we have seen
can be reached in countably many steps. Lebesgue was able to show that no matter
how large a subscript we consider, if it can be reached in countably many steps,
then B with that subscript cannot contain all Borel sets. It should be clear that we
never get to the end.
Nevertheless, we can talk about the smallest set of ordinal numbers that
contains 0, is closed under addition by 1, and is closed under limits of monotonically
increasing sequences of elements of this set. In fact, is the set of all ordinal
numbers that can be reached in countably many steps. This includes both those
that can be described using finitely many symbols, such as TTT + 1, and those that
cannot. Most famous among the latter is the Church—Kleene ordinal, This is
defined to be the smallest ordinal for which the ordinals that are less than it cannot
all be described using a finite number of symbols. (Note that we have just defined
the Church—Kleene ordinal using a finite number of symbols, which is why it is
inconect to define it as the smallest ordinal that cannot be described using a finite
number of symbols.)
The set is well ordered, and its cardinality is the smallest cardinal strictly
larger than The proofs of these statements are not difficult, but do require a
formal treatment of ordinals and limit sequences.2
By the same inductive arguments we have been using up until this point, the
cardinality of is c for every c e Furthermore, c 13, the collection of
Borel sets. It only remains to show that every Borel set is in for some c e This

2
See, for example, Devlin (1993).
290 Other Directions

means that we need to show that is a cr-algebra. If 5, T are in


then we can find a, b e Q so that S e Ba, T e Bb. Choose which ever subscript
is larger, say a > b, and it follows that S, T e Bb, so S — T e Bb+1. Given (Sn I'
a countable collection of sets in we choose an e Q so that Sn E Ban.
Ordering these so that a1 and setting A as the limit of the an, we have
that Sn e for all n > 1, and therefore both the union and the intersection of the
are contained in
It follows that the collection of Borel sets, 13, is the union of i-many collections
of cardinality c, and therefore 13 has cardinality c. One way to see this is to note that
we can assign to each Borel set a unique ordered pair, (c, d), whose first coordinate,
c, is the ordinal that identifies the first collection of Borel sets, in which this
set appears, and whose second coordinate identifies which of the Borel sets in
we have chosen. The first coordinate is chosen from a set of order i; the second
coordinate is chosen from a set of order c. The set of all such pairs has cardinality
• c < c• c = c. It follows that there are at most c Borel sets.3

Connection to Baire
It was this 1905 paper (Lebesgue 1905c) in which Lebesgue solved the question
of the existence of functions in class 3, 4, ... (see definition of class, p. 115). Not
only was he able to prove the existence of functions in each finite class, he was
able to prove the existence of functions in class oc, functions that are not in class n
for any finite n but are limits of functions taken from the union of all finite classes.
Just as with Borel sets, there are functions that are not in class oc but are limits of
functions in class oc, that is to say, functions in class oc + 1. The entire process
repeats. For every countable ordinal, there are functions in that class.

Exercises
A.1.1. Using an appropriate linear function composed with the arctangent function,
construct a one-to-one, onto, and continuous function, f, from IR to (n, n + 1).
Show that for each n > 0, S is a set in Bn if and only if f(S) is a set in Bn. Use this
result to show that if E is in Bn Bn_i, then f(E) is also in Bn — Bn_i.

A.1.2. Assume En E Bn — Bn_i for each positive integer n. Prove that En


not an element of Bn.

Dave Renfro has pointed out that there is another totally different approach to proving that = c. See Renfro
(2007).
A.2 The Generalized Riemann Integral 291

A.2 The Generalized Riemann Integral


It might seem that the problems of the fundamental theorem of calculus were put to
rest with the discovery of the Lebesgue integral, but there is one nagging loose end.
It concerns the integrals of Exercise 6.3.5 on p. 189. If we begin with the function
f defined by

f(x) = x2 sin(x2), x 0; f(O) = 0,


then f is differentiable for all x:

f'(x) = 2x sin(x2) — 2x' cos(x2), x 0; f'(O) = 0.

The derivative is not continuous at x = 0. It is not even bounded in any neigh-


borhood of x = 0. But even though the Riemann integral of f' does not exist, the
improper integral is always finite because for a > 0,

f'(x)dx = lim
f 0 E

= lim (f(a) - f(E))


=f(a)-f(0) = f(a).
Surely, this should be an example of the fundamental theorem of calculus (evalua-
tion part) in action.
But it is not. As shown in Exercise 6.3.5, f'is not Lebesgue integrable over
[0, al. The problem is that the nonnegative part of this function has an integral that
is not bounded.
If we go back to the evaluation part of the fundamental theorem of calculus,
Theorem 7.18, we see that the only assumption that this theorem requires of f is
to be absolutely continuous. Our f is not. Given any 6 > 0, choose K so that

<6.

1 1

+ —

For every N > K, the open intervals { (ak, are pairwise disjoint and their
union is contained in (0, 6), but

f(bk) -
k=K = (k +1/2k k _1/2) k=K
k
292 Other Directions

The sum of the oscillations is unbounded. This tells us that f cannot be expressed
as a Lebesgue integral of any function.
Maybe absolute continuity is too strict. Is it possible to define an integral so that
if a function is differentiable on the interval [a, bi, then its derivative is always
integrable and the evaluation part of the fundamental theorem of calculus always
holds?
In fact, this is possible. Several mathematicians, beginning with Arnaud Denjoy
in 1912, found ways of extending the Lebesgue integral. Denjoy's original definition
was greatly simplified by Luzin. In 1914, Oskar Penon came up with a different
formulation. Jaroslav Kurzweil, in 1957, discovered an extension of the Riemann
integral that, in the 1960s, Ralph Henstock would rediscover, realizing that it
could do the trick. The integrals of Denjoy, Penon, and Kurzweil—Henstock are
equivalent, though this would not be established until the 1980s. The description we
shall use is that of Kurzweil and Henstock. Because of the many people to whom
this integral could be attributed, we shall refer to it simply as the generalized
Riemann integral.
Recall that a tagged partition of [a, bi is a partition, (xO = a <x1 < =
b), of [a, bi together with a set of tags, ... xJ, one value taken from each
,

of intervals, A function fails to be Riemann integrable when we


cannot control the oscillation over an interval simply by restricting the length of
the interval. The Kurzweil—Henstock idea is to tie the allowed length of the interval
to the choice of tag.
Specifically, given an enor-bound E > 0 (how much the approximating sum is
allowed to deviate from the desired value), we define a gauge a mapping from
the set of possible tags, [a, bi, to the set of positive real numbers, chosen so that
problematic points are assigned appropriately small values. We restrict ourselves
to those Riemann sums for which x3 — < If we can find a suitable
gauge for every E > 0, then the generalized Riemann integral exists.

Definition: Generalized Riemann integral


The generalized Riemann integral of a function f over the interval [a, b] exists
and has the value V if for every error-bound E > 0 there is a gauge function 6E'
> 0 for all x E [a, b], such that for any tagged partition (xo = a <x1 <
< = b), x}, <xi, with I
< for
all j, the corresponding Riemann sum lies within E of the value V:

—xii)- <E.
A.2 The Generalized Riemann Integral 293

Dirichiet's Function
To see that this works for Dirichlet's function, the characteristic function of the
rationals over [0, 1], we let be an ordering of the set of rational numbers in
[0, 11. We define

— x= e Q,
6(x)_il
The sum of the lengths of the intervals for which the tag is rational is strictly less
than E, and therefore

0 X Q(XJ) — xj_ i) <E.

The generalized Riemann integral of x over [0, 11 exists and is equal to 0.

The Fundamental Theorem of Calculus


One of the advantages of this characterization of the generalized Riemann integral
is that it yields a simple proof of the evaluation part of the fundamental theorem of
calculus.

Theorem A.1 (FTC, Evaluation, Generalized Riemann Integrals). If f is dif-


ferentiable at every point in [a, bi, then the generalized Riemann integral of f'
over [a, bi exists and
pb
f'(x)dx=f(b)—f(a). (A.!)
Ja
Proof We need to show that for each E > 0, we can find a gauge, so that

-xii) - (f(b) — <E.

We choose our gauge so that for each x e [a, bI, 0 < Iz —x 66(x) implies that
f(z)—f(x)
b—a
Differentiability at x guarantees that we can do this. This is equivalent to
< E

b—a
Iz-xI.
294 Other Directions

We now observe that

-xii) - (f(b) -

= — xii) - -
j=1 j=1

- + - xii))
=

- — + -

- — —

f(x) = x2 sin(x2) is differentiable everywhere, its derivative is integrable


using the generalized Riemann integral.
The other part of the fundamental theorem of calculus is also true, though its
proof is much subtler and we shall not pursue it here. The interested reader can find
this proof in Gordon's The Integrals of Lebesgue, Denjoy, Perron, and Henstock
(1994, p. 145).

Theorem A.2 (FTC, Antidifferentiation, Generalized Riemann Integrals). If


the generalized Riemann integral off exists over [a, bI, then
d fx
/ f(t) dt = f(x), almost everywhere on [a, bI. (A.2)
dx Ja
The antiderivative is still continuous, though not necessarily absolutely contin-
uous. The fact that the antiderivative is differentiable almost everywhere requires
proof.

Comparison with the Lebesgue Integral


We have seen that there is at least one function that is integrable using the general-
ized Riemann integral that is not integrable using the Lebesgue integral. Are there
A.2 The Generalized Riemann Integral 295
any Lebesgue integrable functions that cannot be integrated using the generalized
Riemann integral? No.

Theorem A.3 (Lebesgue Generalized Riemann). 1ff is Lebesgue integrable


on [a, bi, then the generalized Riemann integral off exists over this interval and
equals the Lebesgue integral.

Proof Given E > 0, we must show how to construct a gauge function 6 so that
given any Riemann sum f(xJ)(xj — for which — we
have that

f <E.

We begin by partitioning the range of f and identifying those values of x in


[a, bi at which f lies in each interval. Let ,8 be a positive quantity to be determined
later. For <k <oo, define
Ek=
Since each Ek is measurable, we can find an open set Gk Ek for which m(Gk —
Ek) < v(k), where v is a strictly positive function of Z that will be determined
later.
Given x e [a, bi, we define our gauge by finding the unique Ek such that x e Ek
and choosing 6(x) so that
0 < 6(x) < inf Ix — yl. (A.3)

The fact that Gk is open guarantees that we can choose 6(x) > 0. Given a Riemann
sum that satisfies this gauge, f(xJ)(xj — let Ek3 be the set that contains
xJ. Since — < — xl, we know that Gk3 xII.
We now take the difference between the Riemann sum and the Lebesgue integral
and bound it by pieces that we can bound,

j=1
fa f(x) dx = j=1 f — f(x)) dx

j=1 f [xJ_I,x3]flEk1
f(x)
j=1 f f(x) dx

+ j=1 [xf_I,xJ]—Ek3
296 Other Directions

n 00

—x1_1)+ 1)fidx+

k=—oo fU(k)
We want to choose ,8 and v so that each of these pieces is strictly less than E/3.
We pick ,8 = E/(4(b — a)). Since f is Lebesgue integrable, so is f Corollary 6.18
.

tells us there is an > 0, such that m( U(Gk — Ek)) < implies that

f U(Gk—Ek)

Since m( U(Gk — Ek)) < — Ek) < v(k), we need to choose v so that

v(k) < say v(k) And we need to choose v(k) so that


00 00

+ 1)fi v(k) + 1)v(k) <


k=—oo
= 2(b— a) k=—oo

This inequality will be satisfied if


b—a
v(k) <
— + l)21k1+1
We choose v(k) to be whichever of the two bounds is smaller.

Final Thoughts
Over the past few decades, several mathematicians have made the argument that the
generalized Riemann integral should be introduced in undergraduate real analysis
and that at the undergraduate level it should take priority over the Lebesgue integral.
Their argument is based on three premises: that these students are already familiar
with the Riemann integral, that the Lebesgue integral, resting as it does on the notion
of measure theory, is too complicated to introduce at the level of undergraduate
analysis, and that the generalized Riemann integral is much more satisfying because
it can handle a strictly larger class of functions and provides a clean and simple
proof of the most general statement of the evaluation part of the fundamental
theorem of calculus.
I disagree. While many calculus texts introduce the Riemann integral, the fact
is that Riemann's definition is both subtle and sophisticated. Its only real purpose
is to explore how discontinuous a function can be while remaining integrable. For
the functions encountered in the first year of calculus, which should be limited to
piecewise analytic functions, there is no reason to work with tagged partitions in
A.2 The Generalized Riemann Integral 297

their full generality. Even in undergraduate analysis, it requires a certain level of


mathematical maturity before students are ready to appreciate the Riemann integral.
In countering the second point, the beauty of the Lebesgue integral is that it
defines the integral of a nonnegative function as the area of the set of points that
lie between the x-axis and the values of f. This is the intuitive understanding of
the integral that is lost in Riemann's definition. What is sophisticated about the
Lebesgue integral is how this area must be defined and how the principal results
are then proven. I fully agree that this is a topic that is seldom appropriate for
undergraduate mathematics, but given a choice between spending time introducing
measure theory or the generalized Riemann integral, I would always see the former
as more useful and interesting.
Finally, there is the issue of the fundamental theorem of calculus. I find it
instructive that while mathematicians of the late nineteenth century considered
Volterra's example of a function with a bounded derivative that could not be inte-
grated disturbing, most were perfectly content to live with this anomaly. It was the
usefulness of measure theory, not its ability to handle integration of exotic func-
tions, that drove the adoption of the Lebesgue integral. Regarding the fundamental
theorem of calculus, it is not clear that Theorem A. I is most useful or satisfying
than Theorem 7.18. In the former, we make the assumption that f is differentiable
at every point. Lebesgue's form of this theorem is based on the assumption that
f is absolutely continuous. These two conditions overlap, but neither implies the
other. The two theorems are not directly comparable.
There is no hope for a theorem using the generalized Riemann integral that
assumes only that f is differentiable almost everywhere. Once a function is not
required to be absolutely continuous, it is possible for the derivative to exist al-
most everywhere but f'(x) dx f(b) — f(a). The devil's staircase is the prime
example.
It is possible, however, to weaken the assumption of differentiability at all points
of [a, b] in Theorem A. 1. It is not too hard to show (see Exercise A.2.5) that the
conclusion of the evaluation part of the fundamental theorem of calculus still holds
if we assume that f is differentiable at all but a countable set of points. It is possible
to push even further, exploring when the fundamental theorem of calculus holds
for something called an "approximate derivative." Here the results are neither as
clean nor as compelling as Theorems 7.18 and A. 1, and they begin to call for ever
more complex extensions of the generalized Riemann integral.
My personal conclusion is that there is no single best possible statement of the
fundamental theorem of calculus that would hold if only we used the correct defi-
nition of the integral. I consider the generalized Riemann integral to be interesting
and worthy of attention, but only as an appendix to the main show, the Lebesgue
integral.
298 Other Directions

Exercises
A.2.1. Prove that if f is Riemann integrable over [a, bi, then the generalized
Riemann integral of f over [a, b] also exists.
A.2.2. Prove that for any gauge over any closed interval, there is at least one tagged
partition that corresponds to that gauge.
A.2.3. Explain why we can choose a gauge that satisfies the bounds given in
inequality (A.3).
A.2.4. Find an example of a function that is differentiable on [0, 11 but not abso-
lutely continuous on this interval. Find an example of a function that is absolutely
continuous on [0, 11 but not differentiable at every point of this interval.
A.2.5. Modify the proof of Theorem A. 1 to weaken the assumption on f so that it
is differentiable at all but countably many points in [a, bi.
Appendix B
Hints to Selected Exercises

Exercises that can also be found in Kaczor and Nowak are listed at the start of each
section following the symbol The significance of 3.1.2 = 11:2.1.1 is that
Exercise 3.1.2 in this book can be found in Kaczor and Nowak, volume II, problem
2.1.1.

1.1.4 Use the trigonometric identities,

sinx cos y = (sin(x + y) + sin(x — (B.1)

cosx cosy = +y)+cos(x — (B.2)

sin x sin y = (cos(x — y) — cos(x + y)). (B.3)

1.1.6 Write C = A + and I = A — E2(L\x), where E1 and E2 are mono-


tonically decreasing functions that are greater than or equal to 0 for > 0.
1.1.7 In the ring between distance x and distance x + the total population can be
approximated by the density, p(x), multiplied by the area, + — =
+
1.1.10 Given any partition of [—1, 01, let —a be the left endpoint of the rightmost
interval (the interval whose right endpoint is 0). Show that any Riemann sum
using left endpoints for the tags is bounded above by 2 —
1.1.13 Lagrange's remainder theorem says that

F(x + t) = F(x) + t F(x) +

for some x <x + t. Let x = a + (j — 1)t.

299
300 Hints to Selected Exercises

1.2.3 Try to find a sequence of nested intervals [a1, b1 I [a2, b2] D [a3, b3] D
so that the endpoints are elements of the sequence and ak equals or precedes
ak+1, and bk equals or precedes bk+1 in the sequence.
1.2.6 Let = ak. Show that > S2> •••. If A is the infimum of this
sequence, then it is also the limit. Show that A must satisfy the E-definition of
the tim sup. Show that if A satisfies the E-definition of the lim sup, then it is the
greatest lower bound of the sequence of 5n•
1.2.17 Since f is differentiable at c, (f(x) —f(c)) / (x — c) exists and
equals f'(c). Use the fact that for x > c, (1(x) — f(c)) / (x — c) = f'(y) for
some y between c and x. Show that f'(y) = f'(c).
1.2.21 Show that x1?/n2 converges uniformly over [—1, 11.

1.2.23 Use the intermediate value theorem.

2.1.5 The discontinuities of this function will occur when + l)/2 e Z.


2.1.7 Show that g(x) = 1 — g(l — x), and therefore f0'g(x) dx = 1
— f0' g(x) dx.
2.1.2 The key is to prove that for any two partitions P and Q, S(P; f) ? S(Q; f).
Consider their common refinement.
2.1.14 Show that m is discontinuous at every rational number but continuous at
every irrational number. For the latter, show that if x is irrational, then for every
E > 0 there are at most finitely many y such that m(x) — E.

2.1.18 Integrate from 1/n to 1 and then take the limit as n —k oo. Rewrite as
[1 fl/n[aj
fl/n \ x Lx]! Jcx/n x
11
Wdx- Jcx/n
fl/n Lx] X

Show that the first two integrals cancel and the third integral approaches a ln a
as n approaches infinity.
2.1.19 Using the fact that the oscillation over [a, bi is bounded, show that for any
partition P and any E > 0, we can find a refinement of P, Q D P, so that the
Riemann sum for Q with tags at the left-hand endpoints is strictly larger than
S(P;f)—E.

2.2.5 Use the Taylor polynomial expansion with Lagrange remainder term.
2.2.6 One approach is to prove this by induction on N.
2.2.7 Uniform convergence will imply the first inequality.
Hints to Selected Exercises 301

2.3.3=111:1.7.7
2.3.6 8. Fix a prime p. Consider the set of rational numbers in [0, 11 with numerator
equal to p. What is the greatest distance between any real number in this interval
and the nearest rational number with numerator equal to p?
2.3.7 Show that m is continuous at every irrational number and discontinuous at
every rational.
2.3.8 Show that h is continuous at x = 0 and nowhere else.
2.3.9 Prove that g is continuous at c if 2'1c Z for any integer n. Given E > 0,
let M be the smallest integer such that < E. Find 6 > 0 so that is not
1)
an integer for any y e (c — 6, c + 6). Show that + 1)/2] is
constant on (x — 6, x + 6). Show that g(x) — <E for x — <6.
2.3.12 Proceed by induction on k and assume that the derived set of Tk_l is Tk_2 U
{0}. Let x 0 be an element of Tk — Tk_1. Explain why x is not in the derived set
of Tk_1. Explain why there must be an E > Oso that Tk_l fl (x — E, x + E) = 0.
Show that (x — E/2, x + E/2) contains at most finitely many elements of Tk.

2.3.17 Let [a, bi be any closed subinterval of [0, 11, a <b. Show that there
must be a closed subinterval [a1, bil c [a, bi, a1 < b1, on which the oscil-
lation stays less than 1. Show that this has a closed subinterval [a2, b2i c
[a1, bil, a2 <b2 on which the oscillation stays less than 1/2. In general, once
you have found [ak_i, bk_li, show that it must contain a closed subinterval
[ak, bki c [ak_l, bk_li, ak <bk, on which the oscillation stays less than 1/k.
Let a be contained in the intersection of all of these intervals. Show that f must
be continuous at a. Thus, every closed interval contains a point of continuity.

3.1.6 7. For each x e R, consider the set of fractions strictly less than x and of the
form +2'7b, b odd. Does this set have a largest element?

3.2.12=1:3.2.27
3.2.3 Order the intervals so that '1 = (ai, b1), '2 = (a2, b2), ..., = ba),
where b1 b2 Why can we assume that no two of the b, are equal?
Why can we assume that a1 <a2 < <ar? Why can we assume that a1 <
0 <b1? Now finish the proof.
3.2.6 Show that 11(a) — f(s)I <E implies that there is an element larger than s
that is in this set. Show that I f(a) — f(s)I > E implies that there is an element
smaller than s that is greater than or equal to every element in this set.
302 Hints to Selected Exercises

3.2.8 Show that every Cauchy sequence is bounded and that the limit point of this
set of points is equal to its limit as a sequence.
3.2.9 Let S be a bounded set and consider the sequence: a1 chosen from 5, a2
chosen from among the upper bounds of 5, a3 = (a1 + a2)/2, is the average
of the largest element of {a1, a2, .. ,
. } that is less than or equal to some
element of S and the smallest element of {a1, a2, , ar_i } that is an upper
. . .

bound for S. Show that this sequence is Cauchy and therefore converges. Show
that the element to which it converges must be the least upper bound of S.
3.2.10 Consider the set of left-hand endpoints of the nested intervals and prove that
the least upper bound of this set must lie in all of the intervals.
3.2.12 To prove that converges if such a sequence, (ba), exists, first show
that = < +
3.2.14 Show that 1/3 is the only point of this set that is in the open interval
(5/18, 7/18).

3.3.2 Figure 3.1 shows how to get the correspondence between N and the positive
rational numbers.
3.3.3 Pick a countable subset of R that is disjoint from the rationals, say Q + =
a + a E Qj. Define the correspondence so that if xQ U (Q + yr), then x
gets mapped to itself, and the union Q U (Q + gets mapped to just Q +
3.3.4 2. First find a one-to-one correspondence between R and (0, 1). Find a way
to use the arctangent function. Now find a one-to-one correspondence between
the rational numbers in (0, 1) and the rational numbers in [0, 1]. 5. You need to
be able to combine every pair of real numbers into a single real number in a way
that you can recover the original pair. Think of using the decimal expansions.
3.3.6 See hint to 3.3.3.
3.3.7 First define a one-to-one mapping 4 N x Z —k Z and then map (a, b) —k
[b])+b— [b].
3.3.8 Explain the natural bijection between real numbers in [0, 1] and the set of
mappings from N to {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} (which has cardinality Note
that each rational number with a denominator that is a power of 10— other than
0 and I — corresponds to two different mappings. We assume that the set of
mappings is countable, and assign one such mapping (equivalently, we assign
the decimal expansion of one real number) to each positive integer. We let
be a one-to-one and onto mapping from N to 10's. We now define a mapping
Hints to Selected Exercises 303

T (equivalently, a real number) with the property that T(n) (the nth digit of
T) is not the image of n in iji(n). (Equivalently, it is not equal to the nth digit
of
3.3.10 Let i/i be the mapping from A onto B. Select A' c A so that i/i : A' B
is one to one. Define = i/i and i/jr? = i/i o We define the one-to-one
correspondence between A and B as follows: For each a E A', if we can find an
n so that *T1(a) E B — A', then a is mapped to i/i(a). If a is not in A' or there
is no such n (if each time we apply we get back an element of A'), then a is
mapped to itself. Show that under this mapping, each element of B is the image
of exactly one element of A.
3.3.13 These are countably infinite sequences of natural numbers.
3.3.15 There is always one map from the empty set to any other set, the trivial map
that takes nothing to nothing.

4.1.3 To prove that nowhere dense implies the existence of subintervals with no
points of S, you may find it easier to prove the contrapositive: If every subin-
terval of (a, b) contains at least one point of 5, then every point in (a, b) is an
accumulation point of S.
4.1.10 One approach is to modify Cantor's first proof that R is not countable.
Assume S = (s1, ...) is perfect. Start with s1, and s3. For simplicity,
first assume that s1 < < S3. Since is an accumulation point of 5, there
are infinitely many points of S between si and s3. Pick the one with smallest
subscript larger than 3. If the new point is larger than discard s3. If it is
smaller, discard s1. Continue doing this to create a sequence of nested intervals.
4.1.12 Show that if Ix — yI < 1/3, then DS(x) — 1/2. What if Ix —
1/9?

4.2.3 First show that for any partition of [a, bi, there is a Riemann sum approxi-
mation to the integral of f' whose value is zero.
4.2.4 Note that every neighborhood of any point in SVC(4) contains at least part
of one of the removed intervals, and since it is not entirely contained in this
interval, it must contain an endpoint of one of the removed intervals.
4.2.5 Show that for every E > 0, there is some n E N so that after removing all
intervals of length 1 the intervals that remain all have length strictly less
than E.
304 Hints to Selected Exercises

4.2.10 Find the open interval centered at 0. Find the open intervals on either side of
+1/3. For each M e N, find the open intervals on either side of
4.2.14 Show that if x e S — S', then we cannot have points of S' that are arbitrarily
close to x.

4.3.3 If we have any point in then we can always find an increasing sequence
of integers (ni < n2 < n3 < •) and a sequence of intervals, [a1, b1 I, [a2, b21,
[a3, b31..., for which link is greater than or equal to E at all points in [ak, bk].
Think about our three examples.
4.3.5 Since F1 has finite outer content, it must be bounded, say F1 ç [a, bi. Define
Gk =[a,bI—Fk.
4.3.10 Show that for every n, the oscillation of at x = +1/rn is 1/rn.

4.4.2 Consider the examples you know for which the limit of the integrals does not
equal the integral of the limit.
4.4.5 The Cantor set consists of numbers that can be represented in base 3 without
the use of the digit 1. The set C does include numbers that have a 1 in their base
3 representation. How many is?
4.4.10 Show that the series of the derivatives of n2f(x — converges uniformly.

4.4.13 Let RN +Q = {rk <k <N, q e Qj. Show that the characteristic
function of this set is in class 2.

5.i.3=III:1.7.1,5.i.4=III:i.7.2,5.i.i=III:i.7.3,5.i.6=III:1.7.8,5.1.7=
111:1.7.13, 5.1.8 = 111:1.7.14, 5.1.9 = 111:1.7.15, 5.1.10 = 111:1.7.15

5.1.1 We know that for any a- > 0, the set of points with oscillation ? a must have
outer content 0. Thus for each k 1, we can find a closed interval (not a single
point) on which the oscillation is less than 1/k. Show how to find a sequence
of nested closed intervals so that all points in the kth interval have oscillation
less than 1/k. Prove that the point in all of these intervals must be a point of
continuity.

5.1.2 Consider the characteristic function of a suitable set.


5.1.4 The interior of S is S — 8S. Show that if S is Jordan measurable, then the
inner content of the interior of S is the same as the content of S.
5.1.5 5 U T = (S — T) U (S fl T) U (T — 5).
Hints to Selected Exercises 305

5.1.6 Let denote the interior of S and Sc the closure. Sj U (S fl 8S) = S. S U


(5c fl 8S) = Sc.

5.1.12 Are there any Borel sets that are not in this a-algebra?

5.1.14 (fl = u (Sfl.


5.1.16 One possibility is to show that C as defined in Exercise 4.4.5 is such
a set.
5.1.20 Start by showing that for each a > 0, the set of points with oscillation
strictly less than a is an open set.
5.1.2 1 Let 5N,k be the set of all x such that for all m, n > N, we have that
fm(X) — 1/k. Show that is open, and therefore SN,k is closed.
Show that the set of points of convergence is precisely 5N,k•

5.1.22 For part 1, show that the the set in question is

non
k=1 m=1 n=m

5.2.12 = 111:2.1.3, 5.2.13 = 111:2.1.4


5.2.4 Since S [a, p8], it follows that 5c n [a, b] = [a, a) u (Sc n [a, p8]) U
(p8, bi, and these three sets are pairwise disjoint.
5.2.6 me(S U T) S me(S) + me(T).
5.2.11 Use the assumption that the measure of a countable union of pairwise disjoint
intervals is the sum of the lengths of the intervals to show that if K =
where the are pairwise disjoint intervals, then

m(K)=supm(UIj
J>1 1. — \J1
5.2.12 Use Exercise 5.1.5 and the fact that rn(S) = m ia).
5.2.13 Given any E > 0, choose covers U of S and V ofT so that me(S) > m(U) — E
and me(T) > m(V) — E. Now use the result proven in Exercise 5.2.12.
5.2.14 Choose 6 > 0 that is strictly less than inf {Ix — yI x e 5, y e T }.Define
U = UXEs and V = UyET We can restrict the collection of covers
of S U T to those that are contained in U U V.
306 Hints to Selected Exercises

5.2.15 By subadditivity, me(T — S) me(T) — rn(S). Use the result of Exer-


cise 5.2.13 to prove the other inequality.

5.3.7 = 111:2.1.5, 5.3.8 = 111:2.1.9, 5.3.9 = 111:2.1.10, 5.3.10 = 111:2.1.11,


5.3.11 = 111:2.1.12, 5.3.13 = 111:2.1.16, 5.3.14 = 111:2.1.17, 5.3.15 = 111:2.1.20,
5.3.16 = 111:2.1.21, 5.3.17 = 111:2.1.23, 5.3.18 = 111:2.1.26

5.3.2S= 1)).

5.3.4 We already know the middle inequality. The third inequality is fairly easy
to establish. The toughest part of this problem is to prove that if S c [a, b],
then
cj(S)=b_a_ce(SCfl[a,b]).
5.3.7 Use the fact that U is measurable and thus satisfies Carathéodory's condition.
5.3.9 Show that rn(T) <me(S).
5.3.11 Show that 1 implies 2 implies 3 implies 1. Show that 1 implies 4 implies 5
implies 1.
5.3.12 Every set is a countable union of bounded sets.
5.3.13 One direction, use the Carathéodory condition for both Si and Ti. Other
direction, show that there are measurable sets S and
and and me(T) = m(Ti). Show that
m fl Ti) = 0.
5.3.14 Let S1, Ti be the sets whose existence is established in Exercise 5.3.13. Let
C C S U T be a measurable set for which m(C) > m1(S U T) — E. Show that

<m(C)
=m(CflSi)+m(Cfl Ti)
<me(S) + m1(T).
5.3.15 Show that

I
\k=1 n=k I \n=k
sup inf = lim inf
n>k

5.4.4 Show that infXES, YEUC Ix — yI > 0.


Hints to Selected Exercises 307

6.1.9 = 111:2.2.3, 6.1.10 = 111:2.2.4, 6.1.11 = 111:2.2.5, 6.1.12 = 111:2.2.6,


6.1.13 = 111:2.2.7, 6.1.14 = 111:2.2.8, 6.1.15 = 111:2.2.15, 6.1.16 = 111:2.2.16,
6.1.17 = 111:2.2.17, 6.1.18 = 111:2.2.18, 6.1.19 = 111:2.2.19, 6.1.20 = 111:2.2.20
6.1.9 Let Al be a nonmeasurable set and define f so that the evaluation of f(x)
depends on whether or not x is in Al.
6.1.10 Use the result of Exercise 1.2.3, that every infinite sequence contains a
monotonic sequence.
6.1.11 Show that '(U) is measurable for every open set U if and only if
((a, b)) is measurable for every open interval (a, b).
6.1.13 Use theresult of Exercise 6.1.11.
6.1.15 In one direction, use the result of Exercise 5.4.7. In the other direction, use
the result of Exercise 5.4.9.
6.1.16 Show that jy f(y) > c } is an open set. Show that for any open set U in the
range of g, {x g(x) e U } is measurable.
6.1.17 Consider the function i,fr defined in Exercise 5.4.10. Let M be a nonmea-
surable set contained in the image of the Cantor set. Show that if g = and
h = X then g is continuous, h is measurable, and h o g is not measurable.
6.1.18 Use Exercise 6.1.12.
6.1.19 Start with the function from Exercise 5.4.10.
6.1.20 Write f' as a limit of a sequence of continuous (and hence measurable)
functions.

6.2.1 = 111:2.3.1, 6.2.2 = 111:2.3.2, 6.2.3 = 111:2.3.3, 6.2.12 = 111:2.3.5,


6.2.16 = 111:2.3.7, 6.2.18 = 111:2.3.8, 6.2.19 = 111:2.3.11, 6.2.20 = 111:2.3.12,
6.2.21 = 111:2.3.13
6.2.1 Remember that for the Lebesgue integral, changing the values of a function
on a set of measure zero does not change the value of the integral.
6.2.2 Use the fact that = (1 —

6.2.3 The outer content of the Cantor set SVC(3) is zero.


6.2.5 Show that
<g(x)} = <g(x)}).
qEQ
308 Hints to Selected Exercises

6.2.7 Explain why if (fE fk(x) dx) converges, then fE fk(x)) dx is


bounded for all N.
6.2.8 Note the conditions are not sufficient to guarantee the convergence of the
series Even if the series converges, it could happen that the series of
integrals does not converge.
6.2.9 Consider the sequences (ft) and
6.2.10 Find a sequence of functions (fr) for which f
dx converges
conditionally and does not converge for any x in the chosen interval
[a,bI.
6.2.12 First show that there is a bound B such that n < B for all n. Show
that this implies that given any E > 0, there is a response N so that for all n N,
fE. If(x)Idx <E.
6.2.17 Choose E so that its only open subset is the empty set.

6.2.18 The equality is equivalent to

f f f f
6.2.21 First show that if E = Ek, where the Ek are pairwise disjoint measur-
able sets, and 4 is any simple function, then fE 4(x) dx = fE, 4(x) dx.
Then extend this result to any integrable function, fE f(x) dx = fE,
f(x) dx.

6.3.12=III:2.3.14,6.3.13=III:2.3.15,6.3.14=III:2.3.18,6.3.15=III:2.3.19

6.3.4 It is enough to prove this if f is unbounded in all neighborhoods of a single


point c e (a, b). To say that the improper integral of f exists is to say that
pc—El pb
lim / f(x) dx + / f(x) dx
JC+E2

exists. Define = f(x)X [a,c1/n]U[c+1/n,b] and use the monotone conver-


gence theorem.
6.3.5 The integrand is the derivative of x2 sin(x2). Show that the integral of the
positive part of this function is not bounded.
6.3.8 Show that the assumptions of Lebesgue's dominated convergence theorem
are satisfied.
_____

Hints to Selected Exercises 309

6.3.9 Use Corollary 6.17 to show that it is enough to prove this theorem when f
is a simple function. Then use the definition of a simple function to show that
it is enough to prove this when f is the characteristic function of a measurable
set. Then use Theorem 5.11 to show that it is enough to prove this when f is the
characteristic function of an interval.
6.3.11 Use Fatou's lemma.
6.3.12 Apply Fatou's lemma to both fE dx and
fEc
6.3.16 See hint to Exercise 6.3.9.

6.3.17 See hint to Exercise 6.3.9.

6.4.2 = 111:2.3.10, 6.4.16 = 111:2.3.9, 6.4.17 = 111:2.3.20, 6.4.18 = 111:2.3.21,


6.4.19 = 111:2.3.22, 6.4.20 = 111:2.3.23, 6.4.21 = 111:2.3.24

6.4.1 If rn(S) = 0, then [—2, 2] — S is dense in [—2, 21.

Show that Baire's sequence, Example 4.5 on p. 100 converges in measure but
6.4.6
does not converge in Kronecker's sense.
6.4.7 Consider the characteristic function of a suitable set.
6.4.9 Choose any two numbers a, b e Al. Show that for any E > 0 and any x e R
there exist x1, x2 e N6(x) for which f(x1) — > lb — a I. Show that this
implies that f is discontinuous at x.
6.4.10 Let E E

f
I dx + rn(E — Er).
1+E 1+E
To show that the measure of E must be finite, consider the sequence defined by
= 1/(nx).
6.4.11 Consider Exercises 6.4.1 and 6.4.2.
6.4.14 If f is continuous relative to [a, b] — E, then {x e [a, b — E f(x) > c }
is an open set, and therefore measurable. Any subset of a set of measure zero is
measurable.
6.4.15 Show that
0000 00

=flU fl jx <1/kj.
k=1 n=1 m=1
310 Hints to Selected Exercises

6.4.18 The sequence (fE is bounded. If it does not converge to


fE f(x) dx, then it has a convergent subsequence that converges to a differ-
ent value. Show that this leads to a contradiction.
6.4.19 See the hint to Exercise 6.4.18.

f
6.4.20 By Exercise 6.1.16, g o f,, and g o are measurable. Use the fact that g is
uniformly continuous on [—C, Cl to show that g o converges in measure to
gof.

7.1.1 Start with the function x sin(1/x). Create a piecewise-defined function that
is continuous at 0 but with four different Dini derivatives at x = 0. Add an
appropriate linear function to make this function strictly increasing.
7.1.3 Assume there exists b > a such that f(b) + cb < f(a) + Ca. For each x e
[a, fri show that there is a neighborhood so that y e y x, implies
that
f(y)+cy—(f(x)+cx) c— Al
>
y—x 2
These neighborhoods provide a cover of [a, b]. Use the Heine—Borel theorem
to find a contradiction.
7.1.7 If c is a point of discontinuity of a monotonically increasing function, f, then
f(x) and f(x) exist, and f(x) f(x).
7.1.11 Show that for each partition P of [a, fri and each E > 0, there is a response N
so that for all n > N, V(P, f) — V(P, <E. It follows that for each P and
for n sufficiently large (how large depends on P), V(P, f) < V(P, +E

7.1.12 Explain why it is that if we can show that f has bounded variation on [0, a]
for some 0 <a < 1,then f has bounded variation on [0, 11. Find V(PN, f),
where PN is the partition of [0, with cut points at — 1

and + 1 n N. Explain why V(PN, f) is less than


the total variation on [0, and it is larger than the total variation on
[0,

7.1.13 See the hint to Exercise 7.1.12.


7.1.16 Use the assumption that for any A, f(x) + Ax is piecewise monotonic. By
decreasing, we mean weakly decreasing: y > x implies f(y) f(x).
7.1.21 Find a sequence (xv) that converges to 1/4 and for which
— DS(1/4)) I (x — 1/4) does not converge.
Hints to Selected Exercises 311

7.2.3 First show this result for a function g for which ? E > 0. Apply this to
g(x) = f(x) + EX.

7.3.3 = 111:2.4.7, 7.3.14 = 111:2.4.3, 7.3.16 = 111:2.4.4, 7.3.17 = 111:2.4.5,


7.3.19 = 111:2.4.6
7.3.2 See Exercise 7.1.12.

7.3.4 For the second question, from Corollary 7.3 we know that T(x) = is
continuous.

7.3.5 Since the summands are positive, if every finite sum is <E, then the infinite
sum is <

7.3.7 Use the mean value theorem.

7.3.14 See Exercises 7.1.13, 7.3.4, 7.3.12, and 7.3.7.


7.3.17 If g is monotonically increasing and the intervals (ak, bk) are pairwise
disjoint, then so are the intervals (g(ak), g(bk)).
7.3.18 Show that for any E > 0, there is an open cover of F(S) for which the sum
of the lengths of the intervals is less than E.
7.3.19 Let S be a measurable set. Explain why we can assume that S is bounded.
From the definition of Lebesgue measure on page 138, if S is measurable, then
we can find a sequence of closed sets S and a set Z of measure zero so that
S= ZU Use Proposition 3.8 and Exercise 7.3.18 to finish the proof.

7.4.7 = 111:2.4.11, 7.4.9 = 111:2.4.12, 7.4.10=111:2.4.13,7.4.11 = 111:2.4.14,


7.4.12 = 111:2.4.15, 7.4.13 = 111:2.4.22, 7.4.14 = 111:2.4.23, 7.4.15 = 111:2.4.24,
7.4.17 = 111:2.4.27
7.4.1 Changing the value of f at one point does not change the value of F.
7.4.2 Use the Heine—Borel theorem.

7.4.3 Show that if f is differentiable with a bounded derivative on [a, bi then it is


absolutely continuous on this interval.
7.4.4 Use Theorems 6.21 and 6.25.

f
7.4.5 Show that if F(x) = f(t) dt, then Theorem 7.18 implies that
(F'(t) — f(t)) dt = 0 for all x e [a, bi.
7.4.6 Use Theorem 7.18.
312 Hints to Selected Exercises

7.4.7 First show that if P is any partition of [a, b], then V(P, f)
f dx. For the other inequality, define = {x e [a, b] f'(x) > 0
S = {x [a, bflf'(x) <01, so that
pb p p
f'(x)dx_J f'(x)dx.
Ja S

Using the fact that every measurable set is almost a finite union of pairwise
disjoint open intervals, show that for any E > 0 there is a partition P of [a, b]
for which

<

7.4.8 f'(t) dt is absolutely continuous.


7.4.9

1. First prove this equality for the case where g(S) is open. Use this to prove
it for the case where g(S) is closed. Then use the fact that given any E > 0
we can find an open set G g(S) and a closed set F g(S) for which
m(g(S)) — E <m(F) <m(G) <m(g(S)) + E.
2. To avoid the possibility that g'(B) might not be measurable,
define a sequence of open sets [c, d] G1 B for
which = 0. We have B G = and g'(G) =
is measurable. Use part 1 to show that m (A fl g'(G)) = 0.
3. Again define a sequence of open sets [c, d] H1 H2 C for
which = m(C). Then C = B U where m(B) =0.

7.4.10 Use Exercise 7.4.9. First prove it when f is a simple function, then when f
is nonnegative, and finally for an arbitrary integrable function.
7.4.11 Let A = supxE[ab} and B = supxE[ab} Use the inequality

F(t)G(t) — F(s)G(s) <A G(t) — G(s) + B F(t) — F(s)

to prove that FG is absolutely continuous.


7.4.12 Use the result of Exercise 7.4.11.
7.4.15 For each rational number q, consider the function defined by If(x) — q I.
Show that
pc+hi
1
lim — / If(t) — qI dt = If(c) — qi, almost everywhere. (B.4)
h
Hints to Selected Exercises 313

Let Eq be the set of x e [a, b] for which equation (B.4) fails to hold and define

E=
\qEQ /
Using the fact that we can find a rational value as close as we wish to f(c), show
that every point in [a, bI — E is a Lebesgue point.
7.4.17 Use Luzin's theorem, Theorem 6.26.

8.1.10—8.1.15 = 111:2.5.36

8.1.1 Observe that


sin[(2n + 1)u]
lim =2n+1.
sinu
For 6 <u <,r/2, we have that
sin[(2n + 1)u] 1

sinu — sin6
8.1.8 Work backward starting with the fact that (0, 1, 0, 1, 0, 1, . . .) is (C, 1). If this
(1) (1) (1) (1)
15 the sequence (a1 , a2 , a3 , , .. .), what is (a1, a2, a3, a4, ...)?

8.1.10 Define f on (—ir, by f(0) = 0, f(x) = forx <0, and f(x) = 7r/4
for x > 0. The Fourier series for f is a pure sine series. Restrict it to (0, and
then do a change of variables, replacing x by x/2.
8.1.11 cos(a + b) — cos(a — b) = 2 sin a sin b. Use the convergence proven in Ex-
ercise 8.1.10 to justify the bound.

8.2.1 Use the fact that f is integrable over [0, 1] if and only if a > —1.
8.2.9 Use Proposition 8.6.
8.2.11 Use Egorov's theorem, Theorem 6.21.
8.2.13 Show that it converges in the Cesàro sense to 0 at every irrational value of x in
[0, 11.

8.2.15 Find a sequence of functions that converges uniformly almost everywhere


but fails to converge on a set of measure zero.
8.2.16 Find a sequence (ar) that converges only in the Cesàro sense, and define
314 Hints to Selected Exercises

8.2.17 Find a suitable sequence (ar) so that for fk,n = Xe-i the sequence
fi, 1, f2,2, does not converge in the Cesàro any irrational
value of x in [0, 1].
8.2.20 If x <y, rewrite the term inside the limit as y (1 + (x/y)P)
8.2.23 First show that for positive a and and i/p + 1/q = 1, c8 = aP/p + 18"/q
if and only if a =
8.2.26 Show that for 0 < p < 1 and positive x, f(x) = xy — xP/p has its minimum
atx = yl/(P').
8.2.28 Let A and B be disjoint subsets of [a, b] and set f = aX A' g =
Compute the norms and find values of a and for which these functions satisfy
the inequality.

8.3.7 = 111:2.3.20, 8.3.8 = 111:2.3.3 1, 8.3.9 = 111:2.3.32


8.3.3 C[0, 1] with the max norm is a subset of L°°[0, 1]. Observe that it is closed
under vector space operations. Since L°° is a Banach space, any sequence of
continuous functions that is Cauchy in the max norm converges in the max norm
to a function in L°°. Show that this limit is a continuous function.
8.3.4 C[0, 11 with the L2 norm is a subset of L2[0, 1]. Observe that it is closed
under vector space operations. Since L2[0, 1] is a Banach space, any sequence
of continuous functions that is Cauchy in the L2 norm converges in the L2
norm to a function in L2[0, 1]. Find a sequence of continuous functions that
converges pointwise to X Show that it also converges to X SVC(4) in the L2
norm.

8.3.7 Use Egorov's theorem and Exercise 6.3.13.


8.3.8 Use the Hölder—Riesz inequality to show that the sequence (fr) is equi-
• b P b P
integrable, and therefore

ía dx = ía dx. Use Exer-


I

cise 8.3.6 to show that II — flip -± 0. Finally use the Hölder—Riesz inequality
again to show that fg in the L' norm.
The condition p > 1 is used in the proof that (fr) is equi-integrable.
8.3.9 Using Lebesgue's dominated convergence theorem, show that gn f converges
to gf in the norm. To show that f,, converges to gf in the L" norm, use
the fact that
- + -
Hints to Selected Exercises 315

8.4.8 Linearity is the only property of inner products that does not follow imme-
diately. To prove that (x + y, z) = (x, z) + (y, z), use the parallelogram law to
show that
lix + y + z112 = 211x + y112 + 2iiZii2 — lix + y — z112

= 211x + zll2 + 211y 112 — lix — y + z112

= 2lly + zll2 + 211x 112 — lix — y — zii2.

There is a similar set of identities for lix + y — z112. To show that (ax, y) =
a (x, y), first prove this for integer a, then rational a, and then use the continuity
of the inner product (exercise 8.4.7) to finish the proof. But be careful, we do
not yet know that this is an inner product. What properties were needed to prove
continuity?

A.2.2 Use the Heine—Borel theorem.


A.2.5 Let (ta) be an ordering of the points at which f is not differentiable. Define
the gauge attn to beE + 1).
Bibliography

Baire, R. 1898. Sur les fonctions discontinues développables en series de fonctions


continues. C. R. Acad. Sci. Paris. 126: 884—887.
1900. Nouvelle demonstration d'un théorème sur les fonctions discontin-
ues. Bull. Soc. Math. France. 28: 173—179.
Bear, H. S. 1995. A Primer of Lebesgue Integration. San Diego, CA: Academic
Press.
Bhatia, R. 2005. Fourier Series. Washington, DC: The Mathematical Association
of America.
Birkhoff, G. 1973. A Source Book in ClassicalAnalysis. Cambridge, MA: Harvard
University Press.
Borel, E. 1895. Sur quelques points de la théorie des fonctions. Ann. Sci. École
Norm. Sup. (3). 12: 9—55.
1905. Leçons sur les Fonctions de Variables Réelles. Paris: Gauthier-
Villars.
1950. Leçons sur la Théorie des Fonctions. 4th ed. Paris: Gauthier-Villars.
(Original work published 1898.)
Bressoud, D. M. 2007. A Radical Approach to Real Analysis, 2nd ed. Washington,
DC: The Mathematical Association of America.
Browder, A. 1996. Mathematical Analysis: An Introduction. New York: Springer-
Verlag.
Burk, F. 1998. Lebesgue Measure and Integration: An Introduction. New York:
John Wiley & Sons.
Burkill, J. C. 1971. The Lebesgue Integral. Cambridge: Cambridge University
Press.
Cauchy, A.-L. 1897. Cours d'Analyse de l'Ecole Royale Polytechnique, reprinted in
Completes d'Augustin Cauchy, series 2, vol. 3. Paris: Gauthier-Villars.
(Original work published 1821.)

317
318 Bibliography

• 1899a. Leçons sur le Ca/cut Différentiel. Reprinted in Completes


d'Augustin Cauchy, series 2, vol. 4. Paris: Gauthier-Villars. (Original work pub-
lished 1829.)
• 1899b. Résumé des Leçons données a l'Ecole Royale Polytechnique sur
le Calcul Infinitesimal, series 2, vol. 4. Paris: Gauthier-Villars. (Original work
published 1823.)
Chae, S. B. 1995. Lebesgue Integration, 2nd ed. New York: Springer-Verlag.
Darboux, G. 1875. Mémoire sur les fonctions discontinues. Ann. Sci. École Norm.
Sup. série. 4: 57—112.
• 1879. Addition au mémoire sur les fonctions discontinues. Ann. Sci. École
Norm. Sup. série. 8: 195—202.
Dauben, J. W. 1979. Georg Cantor: His Mathematics and Philosophy of the Infinite.
Cambridge, MA: Harvard University Press.
de Freycinet, C. 1860. De L'Analyse Infinitésimale. Étude sur la Métaphysique du
Haut Calcul. Paris: Mallet-Bachelier.
de la Vallée Poussin, C. 1946. Cours d'Analyse Infinitésimale, 2nd ed. New York:
Dover.
Devlin, K. 1993. The Joy of Sets: Fundamentals of Contemporary Set Theory. New
York: Springer-Verlag.
Dieudonné, J. 1981. History of Functional Analysis. Mathematical Studies Vol. 49
Amsterdam: North-Holland.
Dirichlet, G. L. 1969. Werke, reprint. New York: Chelsea.
du Bois-Reymond, P. 1876. Anhang Uber den fundamenalsatz der integralrech-
nung. Abhandlungen der Mathematisch-Physikalischen Classe der Käniglich
Bayerischen Akademie der Wissenschaften zu München. 12: 161—166.
• 1880. "Der beweis des fundamentalsatzes der integralrechnung:
F'(x)dx = F(b) — F(a)." Math. Ann. 16: 115—128.
Dugac, P. 1989. Sur la correspondance de Borel et le théoreme de Dirichlet-Heine-
Weierstrass-Borel-Schoenflies-Lebesgue. Arch. Int. Hist. Sci. 39: 69—110.
Dunham, W. 1990. Journey through Genius: The Great Theorems of Mathematics.
New York: John Wiley & Sons.
• 2005. The Calculus Gallery: Masterpieces from Newton to Lebesgue.
Princeton: Princeton University Press.
Edgar, G. A. 2004. Classics on Fractals. Boulder, CO: Westview Press.
Edwards, C. H., Jr. 1979. The Historical Development of the Calculus. New York:
Springer-Verlag.
Epple, M. 2003. The end of the science of quantity: Foundations of analysis,
1860—1910. In A History of Analysis. Edited by H. N. Jahnke. Providence, RI:
American Mathematical Society, pp. 291—325.
Ferreirós, J. 1999. Labyrinth of Thought: A History of Set Theory and its Role in
Modern Mathematics. Basel: Birkhäuser.
Bibliography 319

Fichera, G. 1994. Vito Volterra and the birth of functional analysis. In Development
of Mathematics, 1900—1950. Edited by J. P. Pier. Basel: Birkhäuser, pp. 171—
184.
Gauss, C. F. 1876. Werke, vol. 3. Gottingen: Koniglichen Gesellschaft der
Wissenschaften.
Gordon, R. A. 1994. The Integrals of Lebesgue, Denjoy, Perron, and Henstock.
Graduate Studies in Mathematics, vol. 4. Providence, RI: American Mathemat-
ical Society.
Grabiner, J. V. 1981. The Origins of Cauchy's Rigorous Calculus. Cambridge, MA:
MIT Press.
Grattan-Guinness, I. 1970. The Development of the Foundations of Mathematical
Analysis from Euler to Riemann. Cambridge, MA: MIT Press.
1972. Joseph Fourier, 1 768—1 830. Cambridge, MA: MIT Press.
1990. Convolutions in French Mathematics, 1800—1 840, vols. I—Ill. Basel:
Birkhäuser Verlag.
Hamming, R. W. 1998. Mathematics on a distant planet. Am. Math. Mon. 105:
640—650.
Hardy, G. H. 1991. Divergent Series, 2nd ed. New York: Chelsea.
Hartman, S. and J. Mikusiñski. 1961. The Theory of Lebesgue Measure and Inte-
gration. Translated by L. F. Boron. New York: Pergamon Press.
Hawkins, T. 1975. Lebesgue's Theory of Integration: Its Origins and Development,
2nd ed. New York: Chelsea.
Heine, E. 1870. Ueber trigonometrische Reihen. J. Reine Angew. Math. 71: 353—
365.
1872. Die elemente der functionenlehre. J. Reine Angew. Math. 74: 172—
188.
Hermite, C. and T. J. Stieltjes. 1903—1905. Correspondance d'Hermite et de Stielt-
jes. Edited by B. Baillaud and H. Bourget. Paris: Gauthier-Villars.
Hobson, E. W. 1950. The Theory of Functions of a Real Variable and the Theory
of Fourier's Series, 3rd ed. Washington, DC: Harren Press.
Hochkirchen, T. 2003. Theory of Measure and Integration from Riemann to
Lebesgue. In A History of Analysis. Edited by H. N. Jahnke. Providence, RI:
American Mathematical Society, pp. 197—2 12.
Jordan, C. 1881. Sur la série de Fourier. C. R. Acad. Sci. Paris. 92: 228—230.
1892. Remarqes sur les intégrales définies. J. Math. Pures Appl. 4: 69—
99.
1893—1896. Cours d'analyse de l'Ecole Polytechnique, 3 vols. Paris:
Gauthier-Villars.
Kaczor, W. J. and M. T. Nowak. 2000—2003. Problems in Mathematical Analy-
sis, vols. I—Ill. Student Mathematical Library vols. 4, 12, 21. Providence, RI:
American Mathematical Society.
320 Bibliography

Lacroix, S. F. 1816. An Elementary Treatise on the Differential and Integral Calcu-


lus. Translated by C. Babbage, G. Peacock, and J. Herschel with appendix and
notes. Cambridge: J. Deighton and Sons.
1828. Traité Elémentaire de Calcul Différentiel et de Calcul Integral, 4th

ed. Paris: Bachelier.


Lardner, D. 1825. An Elementary Treatise on the Differential and Integral Calculus.
London: John Taylor.
Laugwitz, D. 1999. Bernhard Riemann 1826—1866: Turning Points in the Concep-
tion of Mathematics. Translated by A. Shenitzer. Basel: Birkhäuser.
2002. Riemann's Dissertation and Its Effect on the Evolution of Math-

ematics. In Mathematical Evolutions. Edited by A. Shenitzer and J. Stillwell.


Washington DC: Mathematical Association of America, pp. 55—62.
Lebesgue, H. 1903. Sur une propriété des fonctions. C. R. Acad. £ci. Paris. 137:
1228—1230.
• 1904. Une propriété caractéristique des fonctions de classe 1. Bull. Soc.
Math. France. 32: 229—242.
• 1905a. Sur une condition de convergence des series de Fourier. C. R. Acad.
Sci. Paris. 140: 1378—138 1.
1905b. Recherches sur la convergence des series de Fourier. Math. Ann.
61: 25 1—280.
1905c. Sur les fonctions représentables analytiquement. J. Math. Pures
Appl. 6: 139—216.
1966. Measure and the Integral. Translated and edited by K. 0. May. San
Francisco: Holden-Day.
2003. Leçons sur l'Intégration et la Recherche des fonctions primitives,

3rd ed. New York: Chelsea. Reprinted by American Mathematical Society. Prov-
idence, RI. (Original work published 1904.)
LUtzen, J. 2003. The foundations of analysis in the 19th century. In A History
of Analysis. Edited by H. N. Jahnke. Providence, RI: American Mathematical
Society, pp. 155—196.
Luzin, N. 2002a. Function. In Mathematical Evolutions. Translated by A. Shenitzer
and edited by A. Shenitzer and J. Stillwell. Washington, DC: Mathematical
Association of America, pp. 17—34.
2002b. Two letters by N. N. Luzin to M. Ya. Vygodskii. In Mathematical
Evolutions. Translated by A. Shenitzer and edited by A. Shenitzer and J. Stillwell.
Washington, DC: Mathematical Association of America, pp. 35—54.
Marek, V. and J. Mycielski. 2002. Foundations of mathematics in the twen-
tieth century. In Mathematical Evolutions. Edited by A. Shenitzer and J.
Sti!lwell. Washington, DC: Mathematical Association of America, pp. 225—
246.
Bibliography 321

Medvedev, Fyodor A. 1991. Scenes from the History of Real Functions, trans!.
Roger Cooke. Base!: Birkhäuser Ver!ag.
Moore, Gregory H. 1982. Zermelo's Axiom of Choice: Its Origins, Development,
and Influence. New York: Springer-Ver!ag.
Mykytiuk, S. and A. Shenitzer. 2002. Four significant axiomatic systmes and
some of the issues associated with them. In Mathematical Evolutions. Edited
by A. Shenitzer and J. Sti!!we!!. Washington, DC: Mathematica! Association of
America, pp. 219—224.
Newton, I. 1999. The Principia: Mathematica! Princip!es of Natura! Phi!oso-
phy. Translated by I. B. Cohen and A. Whitman. Berke!ey, CA: University
of Ca!ifornia Press. (Origina!!y pub!ished 1687.)
Osgood, W. F. 1897. Non-uniform convergence and integration of series term by
term. Am. J. Math. 19: 155—190.
Pier, J.-P., ed. 1994. Integration et mesure 1900—1950. In Development of Mathe-
matics 1900—1950. Base!: Birhäuser Ver!ag, pp. 517—564.
Poisson, S.-D. 1820. Suite du mémoire sur !es intégra!es définies. J. de l'Ecole Roy.
Poly. cahier. 11: 295—335.
Renfro, D. L. 2007. Message from discussion Borel set. Goog!e Groups.
groups.google.com/group/sci.math/msg/66168cf5 80929605. Accessed August
21, 2007.
Riemann, B. 1990. Gesammelte Mathematische Werke. Reprinted with comments
by R. Narasimhan. New York: Springer-Ver!ag.
Rudin, W. 1976. Principles of MathematicalAnalysis, 3rd ed. New York: McGraw-
Hi!!.
Saxe, K. 2002. Beginning Functional Analysis. New York: Springer-Ver!ag.
Schappacher, N. and R. Schoof. 1995. Beppo Levi and the Arithmetic of Ellip-
tic Curves. http://ha!.archives-ouvertes.fr/hal-00 1297 19/fr. Accessed 21 August,
2007.
Serret, J.-A. 1894. Calcul Différentiel et Integral, 4th ed. Paris: Gauthier-
Vil!ars.
Shenitzer, A. and J. Stepräns. 2002. The evo!ution of integration. In Mathematical
Evolutions. Edited by A. Shenitzer and J. Sti!!we!!. Washington, DC: Mathemat-
ica! Association of America, pp. 63—70.
Siegmund—Schu!tze, R. 2003. The origins of functiona! analysis. In A History
of Analysis. Edited by H. N. Jahnke. Providence, RI: American Mathematica!
Society, pp. 385—408.
Struik, D. J. 1986. A Source Book in Mathematics 1200—1800. Princeton: Princeton
University Press.
Vita!i, G. 1905. Una proprietà de!!a funzioni misurabi!i. Reale Istituto Lonbardo
di Scienze e Lettere. Rendiconti (2). 38: 600—603.
322 Bibliography

Wapner, L. M. 2005. The Pea and the Sun: A Mathematical Paradox. Wellesley,
MA: A. K. Peters.
Weierstrass, K. T. W. 1894—1927. Mathematische werke von Karl Weierstrass,
7 vols. Berlin: Mayer & Muller.
Whittaker, E. T. and G. N. Watson. 1978. A Course of Modern Analysis, 4th ed.
Cambridge: Cambridge University Press.
Index

72 Bessel's inequality, 278


Abel, Niels Henrik, 75n Betti, Enrico, 204
Abel Prize, 75n, 270 Bolzano, Bernhard, 12n
absolute continuity, 225, 226, 227 Bolzano—Weierstrass theorem, 60, 66, 69
of total variation, 229 Borel, Emile, 56, 64—69, 105, 126, 127—129,
absolute convergence theorem, 19 131, 134, 146, iSOn, 154, 156, 283
absolutely integrable function, 243 Borel measure, 126—129
accumulation point, 47, 55 Borel set, 127, 158, 287
Alexandrov, Pave! Sergeevich, 96 boundary, 56
almost everywhere, 162 bounded variation, 207
almost uniform convergence, 192, and absolute continuity, 227
195 and continuity, 209
Ampere, André Marie, 36 and differentiability, 218
approximate continuity, point of, Bunyakovsky, Viktor Yakovlevich, 272
240
Archimedean principle, 52 C, 16
Archimedes, 4, 52n c, 74
Arzelà,Cesare, 104, 111, 184 C', 228
Arze!à—Osgood theorem, 106, 173 Cantor, Georg, 24, 41, 42, 44—48, 51, 56, 60,
axiom of choice, 76, 153, 154—157 61, 64,71,73—75,77,82,83,99, 110,
152, 153, 204, 288
Baire, René-Louis, 56, 82, 99, 111, 114, 115, Cantor ternary set, 83—87, 128
117, 154 Cantor's theorem, 77, 112, 128
Baire category theorem, 111 Carathéodory, Constantin, 140
Baire's sequence, 100, 120 Carathéodory condition, 140
Banach, Stefan, 155, 264 cardinal number, 152
Banach space, 263 cardinality, 72
Banach—Tarski paradox, 155, 157 of 74
Berkeley, George, 6 Carleson, Lennart Axel Edvard, 270
Bernstein, Felix, 154 Carleson—Hunt theorem, 270
Bessel, Friedrich Wilhelm, 278 category
Bessel functions, 278 first, 111, 112
Bessel's identity, 278 second, 111

323
324 Index

Cauchy, Augustin Louis, 1, 3, 7, 8, 11, 67, continuum hypothesis, 75, 95, 153, 156
132, 156, 272 convergence, 17
Cauchy criterion, 19, 263 absolute, 19
Cauchy integral, 7 almost everywhere, 162, 171
Cauchy sequence, 18, 62 almost uniform, 192, 195
Cauchy—Schwarz—Bunyakovski Inequality, bounded, 184
272 Cesàro, 245
ceiling, 16 (C, 1), 248
Cesàro, Ernesto, 245 (C, k), 249
Cesàro limit, 245, 246 in measure, 194
Chae, Soo Bong, 213 Kronecker's, 110, 194
change of variables pointwise, 17
in Lebesgue integral, 239 table of, 251
characteristic function, 122 uniform, 17, 19, 35, 98, 183
of the rationals, 3, 71 uniform in general, 42
Chasles, Michel, 24 countable additivity, 143
Church—Kleene ordinal, 289 countable set, 63
class, 115,290 countably infinite, 63
closed set, 55 Courant, Richard, 12
compact, 65 cover
closure, 56 countable, 134
cluster point, 47 finite, 44
Cohen, Paul, 75, 155 open, 66
compact set, 66 subcover, 66
closed and bounded, 65
continuous image of, 66 D'Antonio, Larry, 9n
complement, 16 Darboux, Gaston, 12, 24, 24, 25, 33, 36, 39
complete orthogonal set, 273 116
completeness, 62 Darboux integral
of U, 264 lower, 28
content, 44, 44, 123 upper, 28
continuity, 16, 54 Darboux sum, 26
absolute, 225, 226, 227 lower, 26
of total variation, 229 upper, 26
and compactness, 66 Darboux's functions, 36, 39
and differentiability, 12, 36—38, 203, 212, Darboux's theorem, 19, 21
213 de Freycinet, Charles, 11
and integrability, 20, 44—46 de la Vallée Poussin, Charles Jean Gustave
and oscillation at a point, 43 Nicolas, 226
approximate, 240 decreasing function, 17
Cesàro, 249 decreasing sequence, 17
of infinite series, 19 Dedekind, Julius Wilhelm Richard, 23, 24,
of integral, 20 51, 59—61
piecewise, 42 Dedekind cut, 61
relative to a set, 114 deleted neighborhood, 54
uniform, 16, 19, 67 DeMorgan's laws, 18
continuously differentiable function, 228 Denjoy, Arnaud, 292
continuum, 74 dense, 45, 57
Index 325
denumerable set, 63 Fourier, Jean Baptiste Joseph, 7, 10, 24, 41,
derivative 283
Dini, 204, 219 Fourier series, 2—4, 41—42, 114
of infinite series, 19 Carleson—Hunt theorem, 270
derived set, 47, 56, 82 convergence, 279
devil's staircase, 85—87, 297 Dini's condition, 245
Dini, Ulisse, 46, 90, 104, 150, 204, 206, 210, Dirichlet's conditions, 242
212, 245 Féjer's condition, 247
Dini derivative, 204, 219 Jordan's conditions, 245
Dini's theorem, 205 Lebesgue's condition, 247
Dirichlet, Peter Gustav Lejeune, 1, 3, 4, 23, Lipschitz's conditions, 244
24,41,67,68,71, 110, 242, 244, 283 Riesz—Fischer condition, 269
Dirichlet's function, 3, 45, 71, 100, 115, 167, Fraenkel, Adolf Abraham Halevi, 154
199, 242, 293 Fréchet, Maurice René, 56n, 253
discontinuity fundamental theorem of calculus, 8—12
pointwise, 45, 112—1 14 antidifferentiation, 9, 38, 203, 223, 232,
total, 45, 112 294
discrete set, 81 evaluation, 9, 223, 236, 293
distance between functions, 255
dominated convergence theorem, 183, 188 Ta, 103
DS. See devil's staircase T-point, 103
du Bois-Reymond, Paul David Gustav, 11, gauge, 292
12, 37, 46, 99, 103, 104, 120, 121, 242 Gauss, Carl Friedrich, 1, 23
Dugac, Pierre, 67, 67n generalized Fourier coefficient, 277
Duhamel, Jean Marie Constant, 36 generalized Riemann integral, 292,
295
Egorov, Dimitri Fedorovich, 192, 198 Gödel, Kurt, 75, 76, 155
Egorov's theorem, 192, 193 Granville, William Anthony, 11
Einstein, Albert, 75 Grattan-Guinness, Ivor, 9n
empty set, 16 greatest lower bound, 17
equi-integrable sequence, 271
equivalence class, 150 Hadamard, Jacques Salomon, 154, 156, 252
Erdôs, Paul, 247 Halphen, George Henri, 252
Euclid, 52n Hamming, Richard Wesley, 282
Hankel, Hermann, 24, 40, 42, 44—46, 48, 83,
Faber, Georg, 212 94, 112, 120, 204
Faber—Chisholm—Young theorem, 218 Hardy, Godfrey Harold, 11
Fatou, Pierre Joseph Louis, 187 Harnack, Axel, 46, 63, 226, 251, 253
Fatou's lemma, 187 Hartogs, Friedrich Moritz, 152
Fejér, Lipót, 243, 247, 249 Hausdorif, Felix, 96, 140, 154, 156, 253
Fields Medal, 75n Hawkins, Thomas, 46, 64n
finite cover, 44 Heine, Heinrich Eduard, 24, 40, 42, 67, 68,
first category, 111, 112 98, 110,204
first fundamental theorem of measure theory, Heine—Borel theorem, 65, 67—69, 105
69, 146n Henstock, Ralph, 292
first species, 47, 81 Hilbert, David, 153, 154, 253, 272
Fischer, Ernst Sigismund, 253, 264, 269 Hilbert space, 271
floor, 16 Hobson, Ernest William, 11
326 Index

Hochkirken, Thomas, 23n Jordan decomposition theorem, 208


Holder, Otto Ludwig, 99 Jordan measure, 125, 128, 132, 158
HOlder—Riesz inequality, 258 Jourdain, Philip Edward Bertrand, 154
Hunt, Richard A., 270
Kepler, Johannes, 4
ibn al-Haytham, Abu 'Au al-Hasan, 4 Klein, Felix, 63
image, 16 König, Julius, 153
improper integral, 31, 189 Kronecker, Leopold, 110
increasing function, 17 Kronecker's convergence, 110, 194
increasing sequence, 17 Kronecker's theorem, 110
inf, 17 Krull, Wolfgang, 156
infimum of sequence of sets, 150 Kummer, Ernst Eduard, 41, 99
infinite series Kurzweil, Jaroslav, 292
differentiation, 19
integration, 19 L", 256
of continuous functions, 19 closed under addition, 256
infinitesimal, 53, 156 completeness, 264
inner content, 123 containment, 260
inner measure, 136 L°°, 256
inner product, 255 Lacroix, Sylvestre François, 7
of Hubert space, 271 Laplace, Pierre-Simon, 284
integral Lardner, Dionysius, 11
Cauchy, 7 law of cosines, 254
generalized Riemann, 292, 295 least upper bound, 17
improper, 31, 189 Lebesgue, Henri Leon, 8, 13, 56, 65, 67, 67n,
Lebesgue, 171, 295 68, 82, 98, 115—116, 125, 126, 129,
absolute continuity, 226 131, 133, 134, 140, 152, 154, 156,
change of variables, 239 166, 173, 183, 188, 203, 212, 226,
integration by parts, 240 241, 247, 270, 283, 288—290
of simple function, 169 Lebesgue 's dominated convergence theorem,
properties, 169, 176, 223 183, 188
null, 172 Lebesgue inner measure, 135
of continuous function, 20 Lebesgue integral, 171, 295
of infinite series, 19 absolute continuity, 226
over set of measure zero, 172 change of variables, 239
over small domain, 179 integration by parts, 240
Riemann, 8, 23, 25, 48, 120, 132, 165, 166 of simple function, 169
necessary and sufficient conditions for over small domain, 179
existence, 27, 28, 30, 44 properties, 169, 176, 223
Weierstrass, 121, 126 Lebesgue measure, 135, 138
integration, 4—8 Carathéodory condition, 140
integration by parts, 240 Lebesgue outer measure, 134, 136
interior, 56 Lebesgue point, 240
intermediate value property, 16, 19 Lebesgue singular function. See devil's
intermediate value theorem, 19 staircase, 85
Leibniz, Gottfried Wilhelm, 6, 7, 53, 132
Jordan, Camille, 44, 51, 123, 124, 125, 128, length, 44
131, 140, 206, 226, 245, 283 Levi, Beppo, 173
Index 327
urn inf, 17 monotonic sequence, 17
of sequence of sets, 150 Montel, Paul, 69
urn sup, 17 Moore, Eliakim Hastings, 226
of sequence of sets, 150 Moore, Gregory H., 157n
limit point, 47
Lindemann, Ferdinand, 272 N, 16
Liouville, Joseph, 59, 73 neighborhood, 54
Lipschitz, Rudolf Otto Sigismund, 228n, 244 deleted, 54
Lipschitz condition, 228, 244 nested interval principle, 18, 62, 74
Littlewood, John Edensor, 191 Newton, Isaac, 4, 5, 14, 132, 284
Littlewood's three principles, 191 Noether, Max, 99
Liu Hui, 4 nonstandard analysis, 156
local additivity, 141 nonmeasurable set, 151
lower Darboux integral, 28 nonstandard analysis, 53
lower Darboux sum, 26 norm, 253
Luzin, Nikolai Nikolaevich, 192, 198, 282, of a function, 255
292 vector space, 257
Luzin's theorem, 192, 199 normal, 131
simply, 131
max, 17 nowhere dense, 82
mean value theorem, 18 and perfect, 83, 84
measurable function, 159, 159
as limit of simple functions, 164 open cover, 66
as limit of step functions, 197 open set, 54, 62
closure, 160 ordinal number, 152
equivalent definitions, 160 Church—Kleene, 289
limit, 161, 162 transfinite, 288
measurable set, 137, 160 orthogonal set, 273
measure, 136 complete, 273
Borel, 126—129 orthonormal set, 277
Carathéodory condition, 140 oscillation, 26, 43
countable additivity, 143 and continuity, 43
inner, 136 at a point, 43
Jordan, 125, 128, 132, 158 over an interval, 43
Lebesgue, 135, 138 Osgood, William Fogg, 82, 99, 103—106, 109,
Lebesgue inner, 135, 138 184, 189, 192, 282
Lebesgue outer, 134, 136, 138 Osgood's lemma, 105
local additivity, 141 outer content, 44, 123
outer, 136 in the plane, 121
zero, 172 outer measure, 136
Méray, Hugues Charles, 60, 61
metric space, 257 parallelogram law, 273
mm, 17 Parseval des Chênes, Marc-Antoine, 279
Minkowski, Hermann, 140, 258 Parseval's equation, 279
Minkowski—Riesz inequality, 258 partition
monotone convergence theorem, 174 tagged, 8
monotonic function, 17 pea and sun theorem, 155
piecewise, 17 Peano,Guiseppe,44,51, 122, 123, 124
328 Index

perfect set, 83 Cauchy criterion for convergence, 19


and nowhere dense, 83, 84, 94 convergence, 17
Perron, Oskar, 292 decreasing, 17
piecewise continuity, 42 equi-integrable, 271
piecewise monotonic function, 17 increasing, 17
Pincherle, Salvatore, 68 lim inf, 17
Poincaré, Jules Henri, 284 lim sup, 17
pointwise convergence, 17 monotonic, 17
pointwise discontinuity, 45, 112—114 set
Poisson, Siméon Denis, 10, 10, 15 Borel, 127, 158, 287
Pólya, George, 247 boundary, 56
power set, 77 Cantor ternary, 83—87, 128
punctured neighborhood, 54 cardinality, 72
closed, 55
closure, 56
compact, 66
R, 16 complement, 16
Raabe,J.L.,212 complete, 62
Renfro, Dave, 56n, 290n complete orthogonal, 273
Riemann, Georg Friedrich Bernhard, 1, 4, 8, countable, 63
23, 24, 27, 30, 33, 41—43, 132, 156, countably infinite, 63
167, 204, 243, 283 dense, 45, 57
Riemann integral, 8, 23, 25, 48, 120, 132, denumerable, 63
165, 166 derived, 47, 56, 82
improper, 31 difference, 16
necessary and sufficient conditions for discrete, 81
existence, 27, 28, 30, 44 distributivity, 18
Riemann sum, 25 empty, 16
Riemann's function, 33, 36, 44, 45, first species, 47, 81
120 inf, 17
Riesz, Frigyes, 195, 213, 214, 253, 256, 258, interior, 56
264, 269 intersection, 16
Riesz's theorem, 196 max, 17
Riesz—Fischer theorem, 269 measurable, 137, 141, 143, 144, 147,
rising sun lemma, 214 160
Robinson, Abraham, 53, 156 mm, 17
Rubel, Lee Albert, 213 nonmeasurable, 151
notation, 15
Schmidt, Erhard, 253 nowhere dense, 82
Schönflies, Arthur, 68, 69, 154 and perfect, 83, 84, 94
Schwarz, Hermann Amandus, 272 open, 54, 62
second category, 111 orthogonal, 273
second fundamental theorem of measure orthonormal, 277
theory, 146 perfect, 83
second species, 47 and nowhere dense, 83, 84, 94
sequence, 16 power, 77
absolute convergence, 19 second species, 47
Cauchy, 18, 62 separable, 285n
Index 329

sup, 17 trichotomy property, 152


SVC, 83, 85 trigonometric series, 41-42, 46
symmetric difference, 146 type 1,47
type 1, 47 type n,47
type n, 47
union, 16 uniform continuity, 16, 67
well-ordered, 152 uniform convergence, 17, 19, 35, 98, 183
shadow point, 214 uniform convergence in general, 42
Sierpiñski, Waclaw, 156 upper Darboux integral, 28
cr-Algebra, 127 upper Darboux sum, 26
simple function, 163
approximation by, 178 Van Vleck, Edward Burr, 154
simply normal, 131 variation
singular function, 239 bounded, 207
Skolem, Thoralf Albert, 154 total, 207
Smith, Henry J. 5., 83, 99 Veblen, Oswald, 69
Solovay, Robert M., 155 vector space, 18
step function, 196 Vitali, Guiseppe, 69, 150, 151, 152, 154, 192,
Stolz, Otto, 44, 226 226
Struik, Dirk J., 6n Volterra, Vito, 28, 82, 83, 89, 90, 99, 114, 122
subadditivity, 136 Volterra'sfunction, 9 1—94, 116, 120, 122,
subcover, 66 125, 133, 203, 282
sup, 17 theorem, 114
Volterra's
supremum of sequence of sets, 150 von Harnack, Adolf, 63
SVC set, 83, 85 von Neumann, John, 247
SVC(4), 90
SVC(n), 96 Wapner, Leonard P., 155n
symmetric difference, 146 Weierstrass, Karl Theodor Wilhelm, 12n, 13,
Szegô, Gabor, 247 37, 41,51,59—62,68,70,98,
120—122, 183, 203, 212
tag, 8, 25 Weierstrass integral, 121, 126
tagged partition, 8, 292 Weierstrass's functions, 37, 203, 212
Tarski, Alfred, 155 Weigl, Rudolf Stefan, 264
Teichmüller, Paul Julius Oswald, 156 well-orderd set, 152
term-by-term differentiation, 19 Wigner, Eugene, 51
term-by-term integration, 13, 19, 82, 108,
110, 120, 178, 187 Young, Grace Chisholm, 56, 212
topology, 53, 284 Young, William Henry, 56, 68, 96, 212
total discontinuity, 45, 112
total order, 152 Zermelo, Ernst, 153, 154
total variation, 207 Zermelo—Fraenkel axioms, 154, 155
transfinite ordinal, 288 ZF, 154
triangle inequality, 257 ZFC, 154

You might also like