Analysis For Applied Mathematics - Ward Cheney

Download as pdf or txt
Download as pdf or txt
You are on page 1of 455

Graduate Texts in Mathematics 208

Editorial Board
S. Axler F.w. Gehring K.A. Ribet

Springer Science+Business Media, LLC


Graduate Texts in Mathematics
TAKEUTIIZARING. Introduction to 35 ALEXANDERIWERMER. Several Complex
Axiomatic Set Theory. 2nd ed. Variables and Banach Algebras. 3rd ed.
2 OXTOBY. Measure and Category. 2nd ed. 36 KELLEy/NAMIOKA et al. Linear
3 SCHAEFER. Topological Vector Spaces. Topological Spaces.
2nded. 37 MONK. Mathematical Logic.
4 HILTON/STAMMBACH. A Course in 38 GRAUERTIFRITZSCHE. Several Complex
Homological Algebra. 2nd ed. Variables.
5 MAC LANE. Categories for the Working 39 ARVESON. An Invitation to C*-Algebras.
Mathematician. 2nd ed. 40 KEMENY/SNELLIKNAPP. Denumerable
6 HUGHES/PIPER. Projective Planes. Markov Chains. 2nd ed.
7 SERRE. A Course in Arithmetic. 41 ApOSTOL. Modular Functions and Dirichlet
8 TAKEUTIIZARING. Axiomatic Set Theory. Series in Number Theory.
9 HUMPHREYS. Introduction to Lie Algebras 2nded.
and Representation Theory. 42 SERRE. Linear Representations of Finite
\0 COHEN. A Course in Simple Homotopy Groups.
Theory. 43 GILLMAN/JERISON. Rings of Continuous
11 CONWAY. Functions of One Complex Functions.
Variable I. 2nd ed. 44 KENDIG. Elementary Algebraic Geometry.
12 BEALS. Advanced Mathematical Analysis. 45 LoiNE. Probability Theory I. 4th ed.
13 ANDERSON/FuLLER. Rings and Categories 46 LOEVE. Probability Theory II. 4th ed.
of Modules. 2nd ed. 47 MOISE. Geometric Topology in
14 GOLUBITSKy/GUILLEMIN. Stable Mappings Dimensions 2 and 3.
and Their Singularities. 48 SACHSlWu. General Relativity for
15 BERBERIAN. Lectures in Functional Mathematicians.
Analysis and Operator Theory. 49 GRUENBERGIWEIR. Linear Geometry.
16 WINTER. The Structure of Fields. 2nd ed.
17 ROSENBLATT. Random Processes. 2nd ed. 50 EDWARDS. Fermat's Last Theorem.
18 HALMOS. Measure Theory. 51 KLINGENBERG. A Course in Differential
19 HALMOS. A Hilbert Space Problem Book. Geometry.
2nd ed. 52 HARTSHORNE. Algebraic Geometry.
20 HUSEMOLLER. Fibre Bundles. 3rd ed. 53 MANIN. A Course in Mathematical Logic.
21 HUMPHREYS. Linear Algebraic Groups. 54 GRAVERIWATKINS. Combinatorics with
22 BARNES/MACK. An Algebraic Introduction Emphasis on the Theory of Graphs.
to Mathematical Logic. 55 BROWN/PEARCY. Introduction to Operator
23 GREUB. Linear Algebra. 4th ed. Theory I: Elements of Functional
24 HOLMES. Geometric Functional Analysis Analysis.
and Its Applications. 56 MASSEY. Algebraic Topology: An
25 HEWITT/STROMBERG. Real and Abstract Introduction.
Analysis. 57 CROWELLlFox. Introduction to Knot
26 MANES. Algebraic Theories. Theory.
27 KELLEY. General Topology. 58 KOBLITZ. p-adic Numbers, p-adic Analysis,
28 ZARISKIiSAMUEL. Commutative Algebra. and Zeta-Functions. 2nd ed.
Vol.I. 59 LANG. Cyclotomic Fields.
29 ZARISKIiSAMUEL. Commutative Algebra. 60 ARNOLD. Mathematical Methods in
Vol.II. Classical Mechanics. 2nd ed.
30 JACOBSON. Lectures in Abstract Algebra I. 61 WHITEHEAD. Elements of Homotopy
Basic Concepts. Theory.
31 JACOBSON. Lectures in Abstract Algebra II. 62 KARGAPOLOv/MERLZJAKOV. Fundamentals
Linear Algebra. of the Theory of Groups.
32 JACOBSON. Lectures in Abstract Algebra 63 BOLLOBAS. Graph Theory.
III. Theory of Fields and Galois Theory. 64 EDWARDS. Fourier Series. Vol. I. 2nd ed.
33 HIRSCH. Differential Topology. 65 WELLS. Differential Analysis on Complex
34 SPITZER. Principles of Random Walk. Manifolds. 2nd ed.
2nded.
(continued after index)
Ward Cheney

Analysis for Applied


Mathematics
With 27 Illustrations

, Springer
Ward Cheney
Department of Mathematics
University of Texas at Austin
Austin, TX 78712-1082
USA

Editorial Board
S. Axler F. W. Gehring K.A. Ribet
Mathematics Department Mathematics Department Mathematics Department
San Francisco State East Hall University of California
University University of Michigan at Berkeley
San Francisco, CA 94132 Ann Arbor, MI 48109 Berkeley, CA 94720-3840
USA USA USA

Mathematics Subject Classification (2000): 46Bxx, 65L60, 32Wxx, 42B 10

Library of Congress Cataloging-in-Publication Data


Cheney, E. W. (Elliott Ward), 1929-
Analysis for applied mathematics / Ward Cheney.
p. em. - (Graduate texts in mathematics; 208)
Includes bibliographical references and index.
ISBN 978-1-4419-2935-8 ISBN 978-1-4757-3559-8 (eBook)
DOI 10.1007/978-1-4757-3559-8
1. Mathematical analysis. I. Title. II. Series.
QA300.C4437 2001
515-dc21 2001-1020440

Printed on acid-free paper.

© 2001 Springer Science+Business Media New York


Originally published by Springer-Verlag New York, Inc in 2001.
Softcover reprint of the hardcover 1st edition 2001
All rights reserved. This work may .not be translated or copi~d in whole or in part without the
written permission of the publisher (Springer Science+Business Media, LLC ), except for
brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form
of information storage and retrieval, electronic adaptation, computer software, or by similar or
dissimilar methodology now known or hereafter developed is forbidden.
The use of general descriptive names, trade names, trademarks, etc., in this publication, even if
the former are not especially identified, is not to be taken as a sign that such names, as understood
by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone.

Production managed by Terry Kornak; manufacturing supervised by Jerome Basma.


Photocomposed from the author's TeX files.

987 6 5 4 321

SPIN 10833405
Preface

This book evolved from a course at our university for beginning graduate stu-
dents in mathematics-particularly students who intended to specialize in ap-
plied mathematics. The content of the course made it attractive to other math-
ematics students and to graduate students from other disciplines such as en-
gineering, physics, and computer science. Since the course was designed for
two semesters duration, many topics could be included and dealt with in de-
tail. Chapters 1 through 6 reflect roughly the actual nature of the course, as it
was taught over a number of years. The content of the course was dictated by
a syllabus governing our preliminary Ph.D. examinations in the subject of ap-
plied mathematics. That syllabus, in turn, expressed a consensus of the faculty
members involved in the applied mathematics program within our department.
The text in its present manifestation is my interpretation of that syllabus: my
colleagues are blameless for whatever flaws are present and for any inadvertent
deviations from the syllabus.
The book contains two additional chapters having important material not
included in the course: Chapter 8, on measure and integration, is for the ben-
efit of readers who want a concise presentation of that subject, and Chapter 7
contains some topics closely allied, but peripheral, to the principal thrust of the
course.
This arrangement of the material deserves some explanation. The ordering
of chapters reflects our expectation of our students: If they are unacquainted
with Lebesgue integration (for example), they can nevertheless understand the
examples of Chapter 1 on a superficial level, and at the same time, they can
begin to remedy any deficiencies in their knowledge by a little private study
of Chapter 8. Similar remarks apply to other situations, such as where some
point-set topology is involved; Section 7.6 will be helpful here. To summarize:
We encourage students to wade boldly into the course, starting with Chapter 1,
and, where necessary, fill in any gaps in their prior preparation. One advantage
of this strategy is that they will see the necessity for topology, measure theory,
and other topics - thus becoming better motivated to study them. In keeping
with this philosophy, I have not hesitated to make forward references in some
proofs to material coming later in the book. For example, the Banach contraction
mapping theorem is needed at least once prior to the section in Chapter 4 where
it is dealt with at length.
Each of the book's six main topics could certainly be the subject of a year's
course (or a lifetime of study), and many of our students indeed study functional
analysis and other topics of the book in separate courses. Most of them eventu-
ally or simultaneously take a year-long course in analysis that includes complex
analysis and the theory of measure and integration. However, the applied math-
ematics course is typically taken in the first year of graduate study. It seems
to bridge the gap between the undergraduate and graduate curricula in a way
that has been found helpful by many students. In particular, the course and the

v
vi Preface

book certainly do not presuppose a thorough knowledge of integration theory nor


of topology. In our applied mathematics course, students usually enhance and
reinforce their knowledge of undergraduate mathematics, especially differential
equations, linear algebra, and general mathematical analysis. Students may, for
the first time, perceive these branches of mathematics as being essential to the
foundations of applied mathematics.
The book could just as well have been titled Prolegomena to Applied Math-
ematics, inasmuch as it is not about applied mathematics itself but rather about
topics in analysis that impinge on applied mathematics. Of course, there is
no end to the list of topics that could lay claim to inclusion in such a book.
Who is bold enough to predict what branches of mathematics will be useful in
applications over the next decade? A look at the past would certainly justify
my favorite algorithm for creating an applied mathematician: Start with a pure
mathematician, and turn him or her loose on real-world problems.
As in some other books I have been involved with, lowe a great debt of
gratitude to Ms. Margaret Combs, our departmental 'lEX-pert. She typeset and
kept up-to-date the notes for the course over many years, and her resourcefulness
made my burden much lighter.
The staff of Springer-Verlag has been most helpful in seeing this book to
completion. In particular, I worked closely with Dr. Ina Lindemann and Ms.
Terry Kornak on editorial matters, and I thank them for their efforts on my
behalf. I am indebted to David Kramer for his meticulous copy-editing of the
manuscript; it proved to be very helpful in the final editorial process.
I thank my wife, Victoria, for her patience and assistance during the period
of work on the book, especially the editorial phase. I dedicate the book to her
in appreciation.
I will be pleased to hear from readers having questions or suggestions
for improvements in the book. For this purpose, electronic mail is efficient:
cheney(Qmath. utexas . edu. I will also maintain a web site for material related
to the book at http://www . math. utexas . edu/users/ cheney / AAMbook
Ward Cheney
Department of Mathematics
University of Texas at Austin
Contents

Preface .................................................................... v

Chapter 1. Normed Linear Spaces ..................................... 1


1.1 Definitions and Examples ............................................ 1
1.2 Convexity, Convergence, Compactness, Completeness ................. 6
1.3 Continuity, Open Sets, Closed Sets .................................. 15
1.4 More About Compactness .......................................... 19
1.5 Linear Transformations ............................................. 24
1.6 Zorn's Lemma, Hamel Bases, and the Hahn-Banach Theorem ....... 30
1. 7 The Baire Theorem and Uniform Boundedness ...................... 40
1.8 The Interior Mapping and Closed Mapping Theorems ............... 47
1.9 Weak Convergence ................................................. 53
1.10 Reflexive Spaces .................................................... 58

Chapter 2. Hilbert Spaces . ............................................ 61


2.1 Geometry .......................................................... 61
2.2 Orthogonality and Bases ............................................ 70
2.3 Linear Functionals and Operators ................................... 81
2.4 Spectral Theory .................................................... 91
2.5 Sturm-Liouville Theory ........................................... 105

Chapter 3. Calculus in Banach Spaces .............................. 115


3.1 The Frechet Derivative ............................................ 115
3.2 The Chain Rule and Mean Value Theorems ........................ 121
3.3 Newton's Method ................................................. 125
3.4 Implicit Function Theorems ....................................... 135
3.5 Extremum Problems and Lagrange Multipliers ..................... 145
3.6 The Calculus of Variations ........................................ 152

Chapter 4. Basic Approximate Methods of Analysis . ............. . 170


4.1 Discretization ..................................................... 170
4.2 The Method of Iteration ........................................... 176
4.3 Methods Based on the Neumann Series ........................... 186
4.4 Projections and Projection Methods ............................... 191
4.5 The Galerkin Method ............................................. 198
4.6 The Rayleigh-Ritz Method ........................................ 205
4.7 Collocation Methods .............................................. 213
4.8 Descent Methods .................................................. 226
4.9 Conjugate Direction Methods ...................................... 232
4.10 Methods Based on Homotopy and Continuation .................... 237

vii
viii Contents

Chapter 5. Distributions .............................................. 246


5.1 Definitions and Examples .......................................... 246
5.2 Derivatives of Distributions ........................................ 253
5.3 Convergence of Distributions ...................................... 257
5.4 Multiplication of Distributions by Functions ....................... 260
5.5 Convolutions ...................................................... 268
5.6 Differential Operators ............................................. 273
5.7 Distributions with Compact Support .............................. 280

Chapter 6. The Fourier Transform . ................................. 287


6.1 Definitions and Basic Properties ................................... 287
6.2 The Schwartz Space .............................................. 294
6.3 The Inversion Theorems ........................................... 301
6.4 The Plancherel Theorem .......................................... 305
6.5 Applications of the Fourier Transform ............................. 310
6.6 Applications to Partial Differential Equations ...................... 318
6.7 Tempered Distributions ........................................... 321
6.8 Sobolev Spaces .................................................... 325

Chapter 7. Additional Topics .. ...................................... 333


7.1 Fixed-Point Theorems ............................................ 333
7.2 Selection Theorems ................................................ 339
7.3 Separation Theorems .............................................. 342
7.4 The Arzela-Ascoli Theorems ...................................... 347
7.5 Compact Operators and the Fredholm Theory ..................... 351
7.6 Topological Spaces ................................................ 361
7.7 Linear Topological Spaces ......................................... 367
7.8 Analytic Pitfalls ................................................... 373

Chapter 8. Measure and Integration . ............................... 381


8.1 Extended Reals, Outer Measures, Measurable Spaces ............... 381
8.2 Measures and Measure Spaces ..................................... 386
8.3 Lebesgue Measure ................................................. 391
8.4 Measurable Functions ............................................. 394
8.5 The Integral for Nonnegative Functions ............................ 399
8.6 The Integral, Continued ........................................... 404
8.7 The LP-Spaces .................................................... 409
8.8 The Radon-Nikodym Theorem .................................... 413
8.9 Signed Measures .................................................. 417
8.10 Product Measures and Fubini's Theorem .......................... .420
References ................................................. _. __ ....... . 429
Index ......................................................... _......... 437
Symbols . ___ ............................... _... _....................... 443
Chapter 1

N ormed Linear Spaces

1.1 Definitions and Examples 1


1.2 Convexity, Convergence, Compactness, Completeness 6
1.3 Continuity, Open Sets, Closed Sets 15
1.4 More about Compactness 19
1.5 Linear Transformations 24
1.6 Zorn's Lemma, Hamel Bases, and the Hahn-Banach Theorem 30
1. 7 The Baire Theorem and Uniform Boundedness 40
1.8 The Interior Mapping and Closed Mapping Theorems 47
1.9 Weak Convergence 53
1.10 Reflexive Spaces 58

1.1 Definitions and Examples

This chapter gives an introduction to the theory of normed linear spaces. A


skeptical reader may wonder why this topic in pure mathematics is useful in
applied mathematics. The reason is quite simple: Many problems of applied
mathematics can be formulated as a search for a certain function, such as the
function that solves a given differential equation. Usually the function sought
must belong to a definite family of acceptable functions that share some useful
properties. For example, perhaps it must possess two continuous derivatives.
The families that arise naturally in formulating problems are often linear spaces.
This means that any linear combination of functions in the family will be another
member of the family. It is common, in addition, that there is an appropriate
means of measuring the "distance" between two functions in the family. This
concept comes into play when the exact solution to a problem is inaccessible,
while approximate solutions can be computed. We often measure how far apart
the exact and approximate solutions are by using a norm. In this process we are
led to a normed linear space, presumably one appropriate to the problem at hand.
Some normed linear spaces occur over and over again in applied mathematics,
and these, at least, should be familiar to the practitioner. Examples are the
space of continuous functions on a given domain and the space of functions
whose squares have a finite integral on a given domain. A knowledge of function
spaces enables an applied mathematician to consider a problem from a more

1
2 Chapter 1 Normed Linear Spaces

lofty viewpoint, from which he or she may have the advantage of being more
aware of significant features as distinguished from less significant details.
We begin by reviewing the concept of a vector space, or linear space.
(These terms are interchangeable.) The reader is probably already familiar with
these spaces, or at least with the example of vectors in JRn. However, many
function spaces are also linear spaces, and much can be learned about these
function spaces by exploiting their similarity to the more elementary examples.
Here, as a reminder, we include the axioms for a vector space or linear space.
A real vector space is a triple (X, +, .), in which X is a set, and + and·
are binary operations satisfying certain axioms. Here are the axioms:
(i) If x and y belong to X then so does x +y (closure axiom).
(ii) x + y = y + x (commutativity).
(iii)x + (y + z) = (x + y) + z (associativity).
(iv) X contains a unique element, 0, such that x +0 = x for all x in X.
(v) With each element x there is associated a unique element, -x, such
that x + (-x) = O.
(vi) If x E X and A E JR, then A. x E X (JR denotes the set of real numbers.)
(closure axiom)
(vii) A· (x + y) = A· x + A· y (A E JR), (distributivity).
(viii) (A+J,t)·X=A·X+J,t·X (A,J,tEJR), (distributivity).
(ix) A· (J,t. x) = (AJ,t) . x (associativity).
(x) 1· x = x.
These axioms need not be intimidating. The essential feature of a linear space
is that there is an addition defined among the elements of X, and when we add
two elements, the result is again in the space X. One says that the space is
closed (algebraically) under the operation of addition. A similar remark holds
true for multiplication of an element by a real number. The remaining axioms
simply tell us that the usual rules of arithmetic are valid for the two operations.
Most rules that you expect to be true are indeed true, but if they do not appear
among the axioms it is because they follow from the axioms. The effort to keep
the axioms minimal has its rewards: When one must verify that a given system
is a real vector space there will be a minimum of work involved!
In this set of axioms, the first five define an (additive) Abelian group. In
axiom (iv), the uniqueness of 0 need not be mentioned, for it can be proved
with the aid of axiom (ii). Usually, if A E JR and x E X, we write AX in place
of A . x. The reader will note the ambiguity in the symbol + and the symbol
o. For example, when we write Ox = 0 two different zeros are involved, and in
axiom (viii) the plus signs are not the same. We usually write x - y in place of
x + (-y). Furthermore, we are not going to belabor elementary consequences of
the axioms such as A L:~ Xi = L:~ Axi. We usually refer to X as the linear space
rather than (X, +, .). Observe that in a linear space, we have no way of assigning
a meaning to expressions that involve a limiting process, such as L:;'" Xi. This
drawback will disappear soon, upon the introduction of a norm.
From time to time we will prefer to deal with a complex vector space. In
such a space A·X is defined (and belongs to X) whenever A E C and x E X. (The
Section 1.1 Definitions and Examples 3

symbol C denotes the set of complex numbers.) Other fields can be employed
in place of JR and C, but they are rarely useful in applied mathematics. The
field elements are often termed scalars, and the elements of X are often called
vectors.
Let X be a vector space. A norm on X is a real-valued function, denoted
by I II,
that fulfills three axioms:
(i) Ilxll > 0 for each nonzero element in X.
(ii) IIAxl1 = IAlllxl1 for each Ain JR and each x in X.
(iii) Ilx + YII ~ Ilxll + IIYII for all x, YE X. (Triangle Inequality)
A vector space in which a norm has been introduced is called a normed linear
space. Here are eleven examples.
Example 1. Let X = JR, and define Ilxll = lxi, the familiar absolute value
function.
Example 2. Let X = C, where the scalar field is also C. Use = where Ilxll lxi,

Ixlhas its usual meaning for a complex number x. Thus if x = a + ib (where a
and b are real), then Ixl
= v'a 2 + b2 . •

Example 3. Let X = C, and take the scalar field to be lR. The terminology
we have adopted requires that this be called a real vector space, since the scalar
field is lR. •
Example 4. Let X = JRn . Here the elements of X are n-tuples of real numbers
that we can display in the form x = [x(l), x(2), . .. ,x(n)] or x = [Xl, X2, . .. ,x n ].
A useful norm is defined by the equation

IIxlioo = max
l,;;;.';;;n
Ix(i)1
Note that an n-tuple is a function on the set {l, 2, ... , n}, and so the notation
x( i) is consistent with that interpretation. (This is the "sup" norm.) •
Example 5. Let X = JR n , and define a norm by the equation = Ilxll
L~l Ix(i)l·
Observe that in Examples 4 and 5 we have two distinct normed
linear spaces, although each involves the same linear space. This shows the ad-
vantage of being more formal in the definition and saying that a normed linear
space is a pair (X, I II)
etc. etc., but we refrain from doing this unless it is
necessary. •
Example 6. Let X be the set of all real-valued continuous functions defined
on a fixed compact interval [a, b]. The norm usually employed here is

(The notation maxa~s';;;b Ix(s)1


denotes the maximum of the expression as Ix(s)1
s runs over the interval [a, b].) The space X described here is often denoted
by C[a, b]. Sticklers would insist on C([a, b]), because C(S) will be used for
the continuous functions on some general domain S. (This again is the "sup"
norm.) •
4 Chapter 1 Normed Linear Spaces

J:
Example 7. Let X be the set of all Lebesgue-integrable functions defined on
a fixed interval [a, bJ. The usual norm for this space is = IIxll
Jx(s)Jds. In this
space, the vectors are actually equivalence classes of functions, two functions
being regarded as equivalent if they differ only on a set of measure O. (The
reader who is unfamiliar with the Lebesgue integral can substitute the Riemann
integral in this example. The resulting spaces are different, one being complete
and the other not. This is a rather complicated matter, best understood after
the study of measure theory and Lebesgue integration. Chapter 8 is devoted to
this branch of analysis. The notion of completeness of a space is taken up in the
next section.) •
Example 8. Let X = f, the space of all sequences in R

x = [x(1),x(2), ... J
in which only a finite number of terms are nonzero. (The number of nonzero
terms is not fixed but can vary with different sequences.) Define = IIxll
maX n Jx(n)J. •
Example 9. Let X = foo, the space of all real sequences x for which
sUPn Jx(n)J < 00. Define IIxll
to be that supremum, as in Example 8. •
Example 10. Let X = II, the space of all polynomials having real coefficients.
A typical element of II is a function x having the form

One possible norm on II is x H maxi lail. Others are x H maxO:s;t:S;l Ix(t)1 or


x H J; Jx(t)J dt or x H (L:~ JXJ3)1/3. •
Example 11. Let X = R n , and use the familiar Euclidean norm, defined
by
IIxll2 = (I)x(iW) 1/2

i=l
In all of these examples (as well as in others to come) it is regarded as
obvious how the algebraic structure is defined. A complete development would
define x + y, AX, 0, and -x, and then verify the axioms for a linear space. After
that, the alleged norm would be shown to satisfy the axioms for a norm. Thus,
in Example 6, the zero element is the function denoted by 0 and defined by
O( s) = 0 for all s E [a, bJ. The operation of addition is defined by the equation

(x + y)(s) = x(s) + y(s)


and so on.
The concept of linear independence is of central importance. Recall that a
subset S in a linear space is linearly independent if it is not possible to find a
finite, nonempty, set of distinct vectors Xl, X2, ... ,Xm in S and nonzero scalars
C1, C2,' .. ,Cm for which
Section 1.1 Definitions and Examples 5

(Linear independence is not a property of a point; it is a property of a set


of points. Because of this, the usage "the vectors ... are independent" is mis-
leading.) The reader probably recalls how this notion enters into the theory
of nth-order ordinary differential equations: A general solution must involve a
linearly independent set of n solutions.
Some other basic concepts to recall from linear algebra are mentioned here.
The span of a set S in a vector space X is denoted by span(S), and consists
of all vectors in X that are expressible as linear combinations of vectors in S.
Remember that linear combinations are always finite expressions of the form
L~=l AiXi' We say that "S spans X" when X = span(S). A base or basis
for a vector space X is any set that is linearly independent and spans X. Both
properties are essential. Any set that is linearly independent is contained in a
basis, and any set that spans the space contains a basis. A vector space is said
to be finite dimensional if it has a finite basis. An important theorem states
that if a space is finite dimensional, then every basis for that space has the same
number of elements. This common number is then called the dimension of the
space. (There is an infinite-dimensional version of this theorem as well.)
The material of this chapter is accessible in many textbooks and treatises,
such as: [Au], [Av], [BN], [Ban], [Bea], rep], [Day], [Dies], [Dieu], [DS], [Edw],
[Frie2], [Fried], [GP], [Gre], [Gri], [HS], [HP], [Hoi], [Horv], [Jam], [KA], [Kee],
[KF], [Kre], [LanI], [Lo], [Moo], [NaSn], rOD], [Ped], [Red], [RS], [RN], [Roy],
[Rul], [Sim], [Tay2], [Yo], and [Ze].

Problems 1.1

Here is a Chinese proverb that is pertinent to the problems: I hear, I forget; I see, I
remember; I do, I understand!

1. Let X be a linear space over the complex field. Let XT be the space obtained from X by
restricting the scalars to the real field. Prove that XT is a real linear space. Show by an
example that not every real linear space is of the form XT for some complex linear space
X. Caution: When we say that a linear space is a real linear space, this has nothing to
do with the elements of the space. It means only that the scalar field is IR and not IC.
2. Prove the norm axioms for Examples 4-7.
3. Prove that in any normed linear space,

11011 = 0 and !llxll - Ilyll! ~ Ilx - yll

4. Denote the norms in Examples 4 and 5 by II IL", and II Ill' respectively. Find the best
constants in the inequality

Prove that your constants are the best. (The "constants" a and (3 will depend on n but
not x.)
5. In Examples 4, 5, 6, and 7 find the precise conditions under which we have Ilx + yll =
IIxll + Ilyll·
6. Prove that in any normed linear space, if x # 0, then x/llxli is a vector of norm 1.
7. The Euclidean norm on IRn is defined in Example 11. Find the best constants in the
inequality ollxll oc ~ IIxl12 ~ (3ll xllx'
6 Chapter 1 Normed Linear Spaces

8. What theorems in elementary analysis are needed to prove the closure axioms for Example
6?
9. What is the connection between the normed linear spaces f and II defined in Examples
8 and 1O?
t
10. For any t in the open interval (0,1), let be the sequence [t, t 2 , t 3 , .. . J. Notice that
t E foe. Prove that the set {t: 0 < t < I} is linearly independent.
11. In the space II we define special elements called monomials. They are given by xn(t) =
t n where n = 0, 1,2, ... Prove that {Xn : n = 0, 1,2,3 ... } is linearly independent.

12. Let T be a set of real numbers. We say that T is bounded above if there is an M
in ]R such that t ~ M for all t in T. We say that M is an upper bound of T. The
completeness axiom for ]R asserts that if a set T is bounded above, then the set of
all its upper bounds is an interval of the form [b,oo). The number b is the least upper
bound, or supremum of T, written b = l.u.b.(T) = sup(T). Prove that if x < b, then
(x, oo)nT is nonempty. Give examples to show that [b, oo)nT can be empty or nonempty.
There are corresponding concepts of bounded below, lower bound, greatest lower
bound, and infimum.
13. Which of these expressions define norms on ]R2? Explain.

(a) max{lx(l)l, Ix(l) + x(2)1}


(b) Ix(2) - x(l)1
(c) Ix(l)1 + Ix(2) - x(l)1 + Ix(2)1
14. Prove that in any normed linear space the conditions Ilxll = 1 and IIx - yll < £ < 1 imply
that Ilx - Y/llylill < 2£.
15. Prove that if NI and N2 are norms on a linear space, then so are olNI + 02N2 (when
Or > 0 and 02 > 0) and (N'f + Ni)I/2.
16. Is the following set of axioms for a norm equivalent to the set given in the text? (a) Ilxll ¥
o if x ¥ 0, (b) II AX II = -Allxll if A ~ 0, (c) IIx + yll ~ IIxll + lIyll·
17. Prove that in a normed linear space, if Ilx+yll = Ilxll+llyll, then Ilox+,Byll = lIoxll+ll,Byll
for all nonnegative 0 and ,B.
18. Why is the word "distinct" essential in our definition of linear independence on page 4?
19. Is the set of functions J;(x) = Ix - ii, where i = 1,2 ... , linearly independent?
20. One example of an "exotic" vector space is described as follows. Let X be the set
of positive real numbers. We define an "addition", Ell, by x Ell y = xy and a "scalar
multiplication" by a 0 x = xa. Prove that (X, Ell, 0) is a vector space.
21. In Example 10, two norms (say NI and N2) were suggested. Do there exist constants
such that NI ~ ON2 or N2 ~ ,BNI?
22. In Examples 4 and 5, let n = 2, and draw sketches of the sets {x E]R2 : Ilxll = I}.
(Symmetries can be exploited.)

1.2 Convexity, Convergence, Compactness, Completeness

A subset K in a linear space is said to be convex if it contains every line segment


connecting two of its elements. Formally, convexity is expressed as follows:
[XEK & yEK & O~"\~l] ~ ..\x+(l-..\)YEK
The notion of convexity arises frequently in optimization problems. For example,
the theory of linear programming (optimization of linear functions) is based on
Section 1.2 Convexity, Convergence, Compactness, Completeness 7

the fact that a linear function on a convex polyhedral set must attain its extrema
at the vertices of the set. Thus, to locate the maxima of a linear function
over a convex polyhedral set, one need only test the vertices. The central idea
of Dantzig's famous simplex method is to move from vertex to vertex, always
improving the value of the objective function.
Another application of convexity occurs in studying deformations of a physi-
cal body. The "yield surface" of an object is generally convex. This is the surface
in 6-dimensional space that gives the stresses at which an object will fail struc-
turally. Six dimensions are needed to account for all the variables. See [Mar],
pages 100-104.
Among examples of convex sets in a linear space X we have:
(i) the space X itself;
(ii) any set consisting of a single point;
(iii) the empty set;
(iv) any linear subspace of X;
(v) any line segment; i.e. a set of the following form in which a and bare
fixed:
{Aa+(I-A)b: O~A~I}
In a normed linear space, another important convex set is the unit cell or unit
ball:
{x EX: ~ I} Ilxll
In order to see that the unit ball is convex, let Ilxll ~ 1, IIYII ~ 1, and 0 ~ A ~ 1.
Then, with Jl = 1 - A,

If we let n = 2 in Examples 4 and 5 of Section 1.1, then we can draw pictures


of the unit balls. They are shown in Figures 1.1 and 1.2.

-1 1 -1

-1 -1

Figures 1.1 and 1.2. Unit balls


There is a family of norms on ]Rn, known as the t'p-norms, of which the norms
in Examples 4 and 5 are special cases. The general formula, for 1 ~ p < 00, is

n ) lip
Ilxll p =
(
~ Ix(i}IP
8 Chapter 1 Normed Linear Spaces

The case p = 00 is special; for it we use the formula


IIXlloo = max Ix(i)1
l~t~n

It can be shown (Problem 1) that lim p--+oo Ilxllp = Ilxlloo. (This explains the
notation.) The unit balls (in ]R2) for II lip are shown for p = 1, 2, and 7, in
Figure 1.3.

0.5

·0.5 0.5

-0.5

Figure 1.3. The unit balls in fp, for p = 1, 2, and 7.


In any normed linear space there exists a metric (and its corresponding
topology) that arises by defining the distance between two points as
d(x, y) = Ilx - YII
All the topological notions from the theory of metric spaces then become avail-
able in a normed linear space. (See Problem 23.) In Chapter 7, Section 6,
the theory of general topological spaces is broached. But we shall discuss here
topological concepts restricted to metric spaces or to normed linear spaces. A
sequence Xl, X2, ... in a normed linear space is said to converge to a point X
(and we write Xn --+ x) if
lim
n--+oo Ilx
n - = 0 xii
For example, in the space of continuous functions on [0,1J furnished with the
max-norm (as in Example 6 of Section 1, page 3), the sequence of functions
xn(t) = sin(t/n) converges to 0, since

IIXn - 011 = sup Isin(t/n)1 = sin(1/n) --+ 0


O~t~l

The notion of convergence is often needed in applied mathematics. For example,


the solution to a problem may be a function that is difficult to find but can be
approached by a suitable sequence of functions that are easier to obtain. (Maybe
they can be explicitly calculated.) One then would need to know exactly in what
sense the sequence was approaching the actual solution to the problem.
A subset K in a normed space is said to be compact if each sequence
in K has a subsequence that converges to a point in K. (Caution: In general
topology, this concept would be called sequential compactness. Refer to Section
7.6.) A subsequence of a sequence Xl, X2, ... is of the form x nl ' x n2 ' ... , where
the integers ni satisfy nl < n2 < n3 < .... Our notation for a sequence is [xn J,
or [xn : n E NJ, or [Xl, X2, .. . J. With this meagre equipment we can already
prove some interesting results.
Section 1.2 Convexity, Convergence, Compactness, Completeness 9

Theorem 1. Let K be a compact set in a normed linear space X.


To each x in X there corresponds at least one point in K of minimum
distance from x.

Proof. Let x be any member of X. The distance from x to K is defined to be


the number
dist (x, K) = inf
zEK
Ilx - zll
By the definition of an infimum (Problem 12 in Section 1.1, page 6), there exists
a sequence [Yn] in K such that Ilx - Ynll-+
dist (x, K). Since K is compact,
there is a subsequence converging to a point in K, say Yni -+
Y E K. Since

Ilx - YII ~ Ilx - Yni I + IIYni - YII


we have in the limit Ilx-YII ~ dist (x, K) ~ Ilx-YII. (The final inequality follows
from the definition of the distance function.) •
The preceding theorem can be useful in problems involving noisy measure-
ments. For example, suppose that a noisy measurement of a single entity x is
available. If a set K of admissible noise-free values for x is prescribed, then
the best noise-free estimate of x can be taken to be a point of K as close as
possible to x. Theorem 1 is also important in approximation theory, a branch
of analysis that provides the theoretical underpinning for many areas of applied
mathematics.
Example 1. On the real line, an open interval (a, b) is not compact, for we
can take a sequence in the interval that converges to the endpoint b, say. Then
every subsequence will also converge to b. Since b is not in the interval, the
interval cannot be compact. On the other hand, a closed and bounded interval,
say [a, b], is compact. This is a special case of the Heine-Borel theorem. See the
discussion before Lemma 1 in Section 1.4, page 20. •

Given a sequence [xn] in a normed linear space (or indeed in any metric
space), is it possible to determine, from the sequence alone, whether it con-
verges? This is certainly an important matter for practical purposes, since we
often use algorithms to generate sequences that should converge to a solution
of a given problem. The answer to the posed question is that we cannot infer
convergence, in general, solely from the sequence itself. If we confine ourselves to
the information contained in the sequence, we can construct the doubly indexed
sequence Cnm = Ilxn- xmll. If [c nm ] does not converge to zero, then the given
sequence [xn] cannot converge, as is easily proved: For any x in the space, write

This shows that if Cnm does not converge to 0, then [xn] cannot converge. On
the other hand, if Cnm converges to zero, one intuitively thinks that the sequence
ought to converge, and if it does not, there must be a flaw in the space itself: The
limit of the sequence should exist, but the limiting point is somehow missing from
the space. Think of the rational numbers as an example. The missing ingredient
is completeness of the space, to which we now turn.
10 Chapter 1 Normed Linear Spaces

A sequence [xn] in a normed linear space X is said to have the Cauchy


property or to be a Cauchy sequence if

lim sup Ilxi -


n~oo i~n
Xj II = °
j~n

If every Cauchy sequence in the space X is convergent (to a point of X, of


course), then the space X is said to be complete. A complete normed linear
space is termed a Banach space, in honor of Stefan Banach, who lived from 1892
to 1945. His book [Ban] stimulated the study of functional analysis for several
decades. Examples 1-7, 9, and 11, given previously, are all Banach spaces.
The real number field IR is complete, and so is the complex number field C.
The rational field <Q is not complete. These facts are established in elementary
analysis courses.
Completeness is important in constructing solutions to a problem by taking
the limit of successive approximations. One often wants information about the
limit (i.e., the solution). Does it have the same properties as the approximations?
For example, if all the approximating functions are continuous, must the limit
also be continuous? If all the approximating functions are bounded, is the limit
also bounded? The answers to such questions depend on the sense in which the
limit is achieved; in other words, they depend on the norm that has been chosen
and the function space that goes with it. Typically, one wants a norm that leads
to a complete normed linear space, i.e., a Banach space.
Here is an example of a normed linear space that is not a Banach space:
Example 2. Let the space be the one described in Example 8 of Section 1.1,
page 4. This is e, the space of "finitely-nonzero sequences," with the "sup norm"
Ilxll = maxi Ix(i)l· Define a sequence [Xk] in e by the equation

Xk = [1,~,~, ... ,~, 0, 0, ...J


If m > n, then

Xm - Xn = [0, ... ,0, n : l' ... , ~ , 0, ...J


Since IIXm -Xnll = 1/(n+ 1), we conclude that the sequence [Xk] has the Cauchy
property. If the space were complete, we would have Xn -+ y, where y E e. The
point y would be finitely nonzero, say y(n) = 0 for n > N. Then for m > N, Xm
would have as its Nth term the value liN, while the Nth term of y is O. Thus
IIXm - YII ~ liN, and convergence cannot take place. •

Theorem 2. The space C[a,b] with norm Ilxll = max s Ix(s)1 is a


Banach space.

Proof. Let [xn] be a Cauchy sequence in C[a, b]. (This space is described in
Example 6, page 3.) Then for each s, [xn(s)] is a Cauchy sequence in R Since IR
is complete, this latter sequence converges to a real number that we may denote
Section 1.2 Convexity, Convergence, Compactness, Completeness 11

by x(s). The function x thus defined must now be shown to be continuous, and
we must also show that n - Ilx xii
-+ o. Let t be fixed as the point at which
continuity is to be proved. We write

This inequality should suggest to the reader how the proof must proceed. Let
e > o. Select N so that n - Ilx xmll
~ e/3 whenever m ~ n ~ N (Cauchy
property). Then for m ~ n ~ N, Ixn(s) - xm(s)1
~ e/3. By letting m -+ 00 we
get Ixn(s) - x(s)1 ~ e/3 for all s. This shows that Ilxn- xii
~ e/3 and that the
Ilx
sequence n - xii
converges to o. By the continuity of Xn there exists a 6 > 0
such that IXn(s) - xn(t)1 < e/3 whenever It - sl < 6. Inequality (1) now shows
that Ix(s) - x(t)1 < e when It - sl < 6. (This proof illustrates what is sometimes
called "an e/3 argument.") •
Remarks. Theorem 2 is due to Weierstrass. It remains valid if the interval
[a, b] is replaced by any compact Hausdorff space. (For topological notions, refer
to Section 7.6, starting on page 361.) The traditional formulation of this theorem
states that a uniformly convergent sequence of continuous functions on a closed
and bounded interval must have a continuous limit. A sequence of functions [In]
converges uniformly to I if

(2) Ve 3n Vk Vs [k>n ====> IIk(s)-I(s)l<e]

(In this succinct description, it is understood that e > 0, n E N, kEN, and s is


in the domain of the functions.) By contrast, pointwise convergence is defined
by
Vs Ve 3n Vk [k>n ====> 1!k(s)-I(s)l<e]
Our use of the austere and forbidding logical notation is to bring out clearly
and to emphasize the importance of the order of the quantifiers. Thus, in the
definition of uniform convergence, n does not (cannot) depend on s, while in
the definition of pointwise convergence, n may depend on s. Notice that by the
definition of the norm being used, (2) can be written

or simply as lim n. . . oo IIIn -


11100 = O. The latter is conceptually rather simple, if
one is already comfortable with this norm (called the "supremum norm" or the
"maximum norm").
The (perhaps) simplest example of a sequence of continuous functions that
converges pointwise but not uniformly to a continuous function is the sequence
[In] described as follows. The value of In(x) is 1 everywhere except on the
interval [0,2/n], where its value is given by Inx - 11.

Problems 1.2

1. Prove that limp-tx Ilxlip = maxU:;i~n Ix(i)1 for every x in IRn.


12 Chapter 1 Normed Linear Spaces

2. Is this property of a sequence equivalent to the Cauchy property?

lim sup IIXk - xnll = 0


n---too k~n

Answer the same question for this property: For every positive E there is a natural number
n such that Ilxm - xnll < E whenever m ~ n.

3. Prove that if a sequence [Xn] in a Banach space satisfies 2::'=1 Ilxnll < 00, then the
series 2::'=1 Xn converges.
4. Prove that Theorem 2 is not true for the norm J Ix(t)1 dt.
5. Prove that the union of a finite number of compact sets is compact. Give an example to
show that the union of an infinite family of compact sets can fail to be compact.
6. Prove that II lip on IRn does not satisfy the triangle inequality if 0 < p < 1 and n ~ 2.

7. Prove that if Xn -+ x, then the set {X,X1, X2, ... } is compact.


8. A cluster point (or accumulation point) of a sequence is the limit of any convergent
subsequence. Prove that if a sequence lies in a compact set and has only one cluster
point, then it is convergent.
9. Prove that the convergence in Problem 1 above is monotone.
10. Give an example of a countable compact set in IR having infinitely many accumulation
points. If your example has more than a countable number of accumulation points, give
another example, having no more than a countable number.
11. Let Xo and Xl be any two points in a normed linear space. Define X2, X3, ... inductively
by putting
n = 0,1,2, ...
Prove that the resulting sequence is a Cauchy sequence.
12. A particular Banach space of great importance is the space '-=(S), consisting of all
bounded real-valued functions on a given set S. For X E loo(8) we define

Ilxli oo = sup Ix(s)1


sES

Prove that this space is complete. Cultural note: The space loc (l\l) is of special interest.
Every separable metric space can be embedded isometrically in it! You might enjoy
trying to prove this, but that is not part of problem 12.
13. Prove that in a normed linear space a sequence cannot converge to two different points.
14. How does a sequence [Xn : n E !\I] differ from a countable set {Xn : n E !\I}?
15. Is there a norm that makes the space of all real sequences a Banach space?
16. Let Co denote the space of all real sequences that converge to zero. Define Ilxll
sUPn Ix(n)l. Prove that Co is a Banach space.
17. If K is a convex set in a linear space, then these two sets are also convex:

u +K = {u +X:x E K} and AK = {AX: x E K}

18. Let A be a subset of a linear space. Put

AC = {t Aiai : n E!\I , Ai ~ 0 , ai E A, t Ai = 1}

Prove that A C A C Prove that AC is convex. Prove that A C is the smallest convex set
containing A. This latter assertion means that if A is contained in a convex set B, then
AC is also contained in B. The set AC is the convex hull of A.
Section 1.2 Convexity, Convergence, Compactness, Completeness 13

19. If A and B are convex sets, is their vector sum convex? The vector sum of these two sets
isA+B={a+b: aEA,bEB}.
20. Can a norm be recovered from its unit ball? Hint: If x E X, then X/A is in the unit
ball whenever IAI ~ Ilxli. (Prove this.) On the other hand, X/A is not in the unit ball if
IAI < Ilxll· (Prove this.)
21. What are necessary and sufficient conditions on a set 8 in a linear space X in order that
8 be the unit ball for some norm on X?
22. Prove that the intersection of a family of convex sets (all contained in one linear space)
is convex.
23. A metric space is a pair (X, d) in which X is a set and d is a function (called a metric)
from X x X to IR such that
(i) d(x, y) ~ 0
(ii) d(x, y) = 0 if and only if X =y
(iii) d(x, y) = d(y, x)
(iv) d(x, y) ~ d(x, z) + d(z, y)
Prove that a normed linear space is a metric space if d( x, y) is defined as II x - y II.

24. For this problem only, we use the following notation for a line segment in a linear space:

(a, b) = {Aa+ (1-A)b: 0 ~ A ~ 1}


A polygonal path joining points a and b is any finite union of line segments
U~=l (ai, ai+l), where al = a and an+l = b. If the linear space has a norm, the length
of the polygonal path is L.:~=l Ila; - ai+lli. Give an example of a pair of points a, bin
a normed linear space and a polygonal path joining them such that the polygonal path
is not identical to (a, b) but has the same length. A path of length Iia - bll connecting a
and b is called a geodesic path. Prove that any geodesic polygonal path connecting a
and b is contained in the set {x: Ilx - all ~ lib - all}.

25. If Xn -+ x and if the Cesaro means are defined by an = (Xl + ... +xn)/n, then an -+ x.
(This is to be proved in an arbitrary normed linear space.)

26. Prove that a Cauchy sequence that contains a convergent subsequence must converge.

27. A compact set in a normed linear space must be bounded; i.e., contained in some multiple
of the unit ball.

28. Prove that the equation f(x) = L.:;;"=o a k cos bkx defines a continuous function on IR,
provided that 0 ~ a < l. The parameter b can be any real number. You will find
useful Theorem 2 and Problem 3. Cultural Note: If 0 < a < 1 and if b is an odd
integer greater than a-I, then f is differentiable nowhere. This is the famous Weierstrass
nondifferentiable function. (See Section 7.8, page 374, for more information about this
function.)

29. Prove that a sequence [xn] in a normed linear space converges to a point x if and only if
every subsequence of [Xn] converges to x.

30. Prove that if ¢ is a strictly increasing function from N into N, then ¢(n) ~ n for all n.

3l. Let 8 be a subset of a linear space. Let 8 1 be the union of all line segments that join
pairs of points in 8. Is 8 1 necessarily convex?
14 Chapter 1 Normed Linear Spaces

32. (continuation) What happens if we repeat the process and construct 8 2 , 8 3 , ... ? (Thus,
for example, 82 is the union of line segments joining points in 81.)

33. Let I be a compact interval in JR, I = [a, b]. Let X be a Banach space. The notation
C(I, X) denotes the linear space of all continuous maps I : I --t X. We norm C(I, X)
by putting 11111 = SUPtEI III(t)lI. Prove that C(I, X) is a Banach space.

34. Define In(x) = e- nx . Show that this sequence of functions converges pointwise on [0,1]
to the function 9 such that g(O) = 1 and g(t) = 0 for t -# O. Show that in the L 2-norm
on [0,1], In converges to O. The L2- norm is defined by 11111 = {fo1 II(t)i2dt}1/2.

35. Let [Xn] be a sequence in a Banach space. Suppose that for every c > 0 there is a
convergent sequence [Yn] such that sUPn IIxn - Ynll < c. Prove that [xn] converges.

36. In any normed linear space, define K(x, r) = {y : IIx - yll ~ r}. Prove that if K(x, ~) c
K(O, 1) then 0 E K(x, ~).

37. Show that the closed unit ball in a normed linear space cannot contain a disjoint pair of
closed balls having radius ~.

38. (Converse of Problem 3) Prove that if every absolutely convergent series converges
in a normed linear space, then the space is complete. (A series Xn is absolutely 2:
convergent if 2: IIxnll < 00.)
39. Let X be a compact Hausdorff space, and let C(X) be the space of all real-valued
continuous functions on X, with norm 11111 = supII(x)l. Let [In] be a Cauchy sequence
in C(X). Prove that

lim
X---+Xo
lim In(x)
n-+oo
= n-+<Xl
lim lim In(x)
x---t-XQ

Give examples to show why compactness, continuity, and the Cauchy property are needed.

40. The space £1 consists of all sequences x = [x(1),x(2), ... ] in which x(n) E JR and
2: Ix(n)1 < 00. The space £2 consists of sequences for which 2: Ix(n)12 < 00. Prove
that £1 C £2 by establishing the inequality 2: Ix(n)12 ~ (2: Ix(n)1)2.

41. Let X be a normed linear space, and 8 a dense subset of X. Prove that if each Cauchy
sequence in 8 has a limit in X, then X is complete. A set 8 is dense in X if each point
of X is the limit of some sequence in 8.

42. Give an example of a linearly independent sequence [xo, Xl, X2,"'] of vectors in loo such
that 2::'=0 Xn = O. Don't forget to prove that 2: Xn = O.
43. Prove, in a normed space, that if Xn --t X and Ilxn - Ynll --t 0, then Yn --t X. If Xn --t X
and IIxn - Yn II --t 1, what is lim Yn ?

44. Whenever we consider real-valued or complex-valued functions, there is a concept of


absolute value of a function. For example, if x E C[O, 1], we define Ixl by writing Ixl(t) =
/x(t)/. A norm on a space of functions is said to be monotone if IIxll ;;:: IIYII whenever
Ixl ;;:: IYI· Prove that the norms II 1100 and II lip are monotone norms.

45. (Continuation) Prove that there is no monotone norm on the space of all real-valued
sequences.

46. Why isn't the example of this section a counterexample to Theorem 2?


Section 1.3 Continuity, Open Sets, Closed Sets 15

47. Any normed linear space X can be embedded as a dense subspace in a complete normed
linear space X. The latter is fully determined by the former, and is called the completion
of X. A more general assertion of the same sort is true for metric spaces. Prove that the
completion of the space f. in Example 8 of Section 1.1 (page 4) is the space Co described
in Problem 16. Further remarks about the process of completion occur in Section 1.8,
page 60.

48. Metric spaces were defined in Problem 23, page 13. In a metric space, a Cauchy sequence
is one that has the property limn,m d(xn,x m ) = O. A metric space is complete if
every Cauchy sequence converges to some point in the space. For the discrete metric
space mentioned in Problem 11 (page 19), identify the Cauchy sequences and determine
whether the space is complete.

1.3 Continuity, Open Sets, Closed Sets

Consider a function f, defined on a subset D of a normed linear space X and


taking values in another normed linear space Y. We say that f is continuous
at a point x in D if for every sequence [xnl in D converging to x, we have also
f(xn} -t f(x}. Expressed otherwise,

A function that is continuous at each point of its domain is said simply to be


continuous. Thus a continuous function is one that preserves the convergence
of sequences.
Example. The norm in a normed linear space is continuous. To see that this
is so, just use Problem 3, page 5, to write

\IIXnll-llxll\ ~ IIXn - xii


Thus, if Xn -t x, it follows that Ilxn II -t Ilxll.
With these definitions at our disposal, we can prove a number of important

(yet elementary) theorems.

Theorem 1. Let f be a continuous mapping whose domain D is a


compact set in a normed linear space and whose range is contained in
another normed linear space. Then f(D} is compact.

Proof. To show that f(D} is compact, we let [Ynl be any sequence in f(D},
and prove that this sequence has a convergent subsequence whose limit is in
f(D}. There exist points Xn ED such that f(xn} = Yn' Since D is compact, the
sequence [xnl has a subsequence [xnil that converges to a point xED. Since f
is continuous,
16 Chapter 1 Normed Linear Spaces

Thus the subsequence [Yni 1converges to a point in f(D).



The following is a generalization to normed linear spaces of a theorem that
should be familiar from elementary calculus. It provides a tool for optimization
problems-even those for which the solution is a function.

Theorem 2. A continuous real-valued function whose domain is a


compact set in a normed linear space attains its supremum and infi-
mum; both of these are therefore finite.

Proof. Let f be a continuous real-valued function whose domain is a compact


set D in a normed linear space. Let M = sup{J(x) : XED}. Then there
is a sequence [xnl in D for which f(xn) -+ M. (At this stage, we admit the
possibility that M may be +00.) By compactness, there is a subsequence [xnil
converging to a point xED. By continuity, f(x ni ) -+ f(x). Hence f(x) = M,
and of course M < 00. The proof for the infimum is similar. •

A function f whose domain and range are subsets of normed linear spaces
is said to be uniformly continuous if there corresponds to each positive c a
positive 8 such that II f (x) - f (y) II < c for all pairs of points (in the domain of
f) satisfying Ilx - YII < 8. The crucial feature of this definition is that 8 serves
simultaneously for all pairs of points. The definition is global, as distinguished
from local.

Theorem 3. A continuous function whose domain is a compact


subset of a normed space and whose values lie in another normed space
is uniformly continuous.

Proof. Let f be a function (defined on a compact set) that is not uniformly


continuous. We shall show that f is not continuous. There exists an c > 0 for
which there is no corresponding 8 to fulfill the condition of uniform continuity.
That implies that for each n there is a pair of points (x n , Yn) satisfying the
condition Ilxn - Ynll < lin and II!(xn) - f(Yn)11 ? c. By compactness the
sequence [xnl has a subsequence [xniJ that converges to a point x in the domain
of f. Then Yni -+ x also because IIYni - xii ~ IIYni - x ni II + Ilx ni - xii· Now the
continuity of f at x fails because
c ~ Ilf(x ni ) - f(Yni)11 ~ Ilf(x ni ) - f(x)11 + Ilf(x) - f(Yni)11 •

A subset F in a normed space is said to be closed if the limit of every


convergent sequence in F is also in F. Thus, for all sequences this implication is
valid:
[xn E F & Xn -+ xl ==> x E F
As is true of the notion of completeness, the concept of a closed set is useful
when the solution of a problem is constructed as a limit of an approximating
sequence.
By Problem 4, the intersection of any family of closed sets is closed. There-
fore, the intersection of all the closed sets containing a given set A is a closed
set containing A, and it is the smallest such set. It is commonly written as II or
cl(A), and is called the closure of A.
Section 1.3 Continuity, Open Sets, Closed Sets 17

Theorem 4. The inverse image of a closed set by a continuous map


is closed.

Preof. Recall that the inverse image of a set A by a map f is defined to be


f- ' (A) = {x : f(x) E A}. Let f : X -t Y, where X and Yare normed spaces
and I is continuous. Let K be a closed set in Y. To show that f-1(K) is closed,
we start by letting [x n ] be a convergent sequence in f-1(K). Thus Xn -t x and
f(x n ) E K. By continuity, f(x n ) -t f(x). Since K is closed, f(x) E K. Hence
x E f-1(K). •

As an example, consider the unit ball in a normed space:


{x: Ilxll ~ I}
This is the inverse image of the closed interval [0,1] by the function x H Ilxll.
This function is continuous, as shown above. Hence, the unit ball is closed.
Likewise, each of the sets
{x: IIx-all ~r} {x: IIx-all
~r} {x: Q ~ ~ ,6} IIx - all
is closed.
An open set is a set whose complement is closed. Thus, from the preceding
remarks, the so-called "open unit ball," i.e., the set
u= {x : Ilxll < I}
is open, because its complement is closed. Likewise, all of these sets are open:
{x: Ilxll > I} {x: IIx - all < r} {x: Q < Ilx - all <,6}
An alternative way of describing the open sets, closer to the spirit of general
topology, will now be discussed.
The open c-cell or c-ball about a point Xo is the set
B(xo,c) = {x : Ilx - xoll < c}
Sometimes this is called the c-neighborhood of Xo. A useful characterization of
open sets is the following: A subset U in X is open if and only if for each x E U
there is an c > 0 such that B(x, c) C U. The collection of open sets is called the
topology of X. One can verify easily that the topology T for a normed linear
space has these characteristic properties:
(1) the empty set, 0 , belongs to T;
(2) the space itself, X, belongs to T;
(3) the intersection of any two members of T belongs to T;
(4) the union of any subfamily of T belongs to T.
These are the axioms for any topology. One section of Chapter 7 provides an
introduction to general topology.
A series 2::%"=1 xk whose elements are in a normed linear space is conver-
gent if the sequence of partial sums Sn = 2::Z=l Xk converges. The given series
is said to be absolutely convergent if the series of real numbers 2::%"=1 Ilxkll
is convergent. That means simply that 2::%"=1 IIXkl1
< 00. Problem 3, page 13,
asks for a proof that absolute convergence implies convergence, provided that
the space is complete. See also Problem 38, page 14. The following theorem
gives another important property of absolutely convergent series.
18 Chapter 1 Normed Linear Spaces

Theorem 5. If a series in a Banach space is absolutely convergent,


then all rearrangements of the series converge to a common value.

Proof. Let L~l Xi be such a series and L~l a rearrangement of it. Put
Xki

X = L~l Xi, Sn = L7=1 Xi, Sn = L7=1 and M = L~llixili. Then


Xk i ,
L~l Ilxki I
~ M. This proves that L~l Xk i is absolutely convergent and hence
convergent. (Here we require the completeness of the space.) Put y = L~l Xki·
Let c > o. Select n such that Li~n Ilxill
< c and such that < c when IISm - xii
m ~ n. Select T so that IISr - YII
< c and so that {I, ... , n} C {k kr }. l,... ,
Select m such that {k l , ... , k r } C {I, ... , m}. Then m ~ nand

m
IISm - Srll = II(xl + ... + xm) - (Xkl + ... + xkr)11 ~ L Ilxill < c
i=n+l

Hence


In using a series that is not absolutely convergent, some caution must be
exercised. Even in the case of a series of real numbers, bizarre results can arise
if the series is randomly re-ordered. A good example of a series of real numbers
that converges yet is not absolutely convergent is the series Ln (_1)n In. The
series of corresponding absolute values is the divergent harmonic series. There
is a remarkable theorem that includes this example:

Riemann's Theorem. If a series of real numbers is convergent but


not absolutely so, then for every real number, some rearrangement of
the series converges to that real number.

Proof. Let the series L Xn satisfy the hypotheses. Then lim Xn = 0 and

L Xn - L Xn = L IXnl = 00
Xn>O xn<O

Since the series L Xn converges, the two series on the left of the preceding
equation must diverge to +00 and -00, respectively. (See Problems 12 and 13.)
Now let T be any real number. Select positive terms (in order) from the series
until their sum exceeds T. Now add negative terms (chosen in order) until the
new partial sum is less than T. Continue in this manner. Since limxn = 0, the
partial sums thus created differ from T by quantities that tend to zero. •

Problems 1.3

1. Prove that the sequential definition of continuity of f at x is equivalent to the "e,8"


definition, which is

"Ie> 0 38> 0 Vu [ IIx - ull < 8 ~ IIf(x) - f(u) II < eJ


Section 1.4 More about Compactness 19

2. Let U be an arbitrary subset of a normed space. Prove that the function x ...... dist(x, U)
is continuous. This function was defined in the proof of Theorem 1 in Section 1.2, page
9. Prove, in fact, that it is "nonexpansive":

I dist(x, U) - dist(y, U)I ::::; Ilx - yll

3. Let X be a normed space. We make X x X into a normed linear space by defining


lI(x,y)1I = IIxll + Ilyll· Show that the map (x,y) ...... x + y is continuous. Show that the
norm is continuous. Show that the map (A, x) ...... AX is continuous when lR x X is normed
by II(A,x)11 = IAI + IIxll·
4. Prove that the intersection of a family of closed sets is closed.
x
5. If x # 0, put = x/llxll. This defines the mdial projection of x onto the surface of the
unit ball. Prove that if x and y are not zero, then

Ilx - ilil ::::; 211x - yll/llxli

6. Use Theorem 2 and Problem 2 in this section to give a brief proof of Theorem 1 in
Section 2, page 9.
7. Using the definition of an open set as given in this section, prove that a set U is open if
and only if for each x in U there is a positive c such that B(x, c) C U.
8. Prove that the inverse image of an open set by a continuous map is open.
9. The (algebraic) sum of two sets in a linear space is defined by A + B = {a + b: a E A,
b E B}. Is the sum of two closed sets (in a normed linear space) closed? (Cf. Problem
19, page 13.)
10. Prove that if the series L:':l Xi converges (in some normed linear space), then Xi --t 0.

11. A common misconception about metric spaces is that the closure of an open ball S = {x :
d( a, x) < r} is the closed ball S' = {x : d( a, x) ::::; r}. Investigate whether this is correct
in a discrete metric space (X,d), where d(x,y) = 1 if x # y. What is the situation in a
normed linear space? (Refer to Problem 23, page 13.)
12. Let L Xn and L Yn be two series of nonnegative terms. Prove that if one of these series
converges but the other does not, then the series L(Xn - Yn) diverges. Can you improve
this result by weakening the hypotheses?
13. Let L Xn be a convergent series of real numbers such that L
IX n I = 00. Prove that the
series of positive terms extracted from the series L
Xn diverges to 00. It may be helpful
to introduce Un = max(xn, 0) and Vn = min(xn,O). By using the partial sums of series,
one reduces the question to matters concerning the convergence of sequences.
14. Refer to Problem 12, page 12, for the space too(S). We write::::; to signify a pointwise
inequality between two members of this space. Let 9n and In be elements of this space,
for n = 1,2, ... Let 9n ~ 0, In-l - 9n-l ::::; In ::::; M, and L~ 9i ::::; M for all n. Prove
that the sequence [In] converges pointwise. Give an example to show that convergence
in norm may fail.

1.4 More About Compactness

We continue our study of compactness in normed linear spaces. The starting


point for the next group of theorems is the Heine-Borel theorem, which states
that every closed and bounded subset of the real line is compact, and conversely.
We assume that the reader is familiar with that theorem.
20 Chapter 1 Normed Linear Spaces

Our first goal in this section is to show that the Heine-Borel theorem is true
for a normed linear space if and only if the space is finite-dimensional. Since most
interesting function spaces are infinite-dimensional, verifying the compactness of
a set in these spaces requires information beyond the simple properties of being
bounded and closed. Many important theorems in functional analysis address
the question of identifying the compact sets in various normed linear spaces.
Examples of such theorems will appear in Chapter 7.

Lemma 1. In the space IR n with norm Ilxlloo = maxl~i~n Ix(i)1


each ball {x Ilxlloo ~ c} is compact.

Proof. Let [xkl be a sequence of points in IR n satisfying Ilxklloo ~ c. Then


the components obey the inequality -c ~ xk(i) ~ c. By the compactness of the
interval [-c, c], there exists an increasing sequence heN having the property
that lim [Xk(1) : k E Id exists. Next, there exists another increasing sequence
12 C h such that lim [xk(2) : k E 12l exists. Then lim [Xk(1) : k E 12l
exists also, because h C h. Continuing in this way, we obtain at the nth
step an increasing sequence In such that lim[xdi) : k E Inl exists for each
i = 1, ... , n. Denoting that limit by x'(i), we have defined a vector x' such that
Ilxk - x'iloo ---+ 0 as k runs through the sequence of integers In. •

Lemma 2. A closed subset of a compact set is compact.

Proof. If F is a closed subset of a compact set K, and if [xnl is a sequence in


F, then by the compactness of K a subsequence converges to a point of K. The
limit point must be in F, since F is closed. •

A subset S in a normed linear space is said to be bounded if there is a


constant c such that Ilxll ~ c for all XES. Expressed otherwise, sUPrES Ilxll <
00.

Theorem 1. In a finite-dimensional normed linear space, each


closed and bounded set is compact.

Proof. Let X be a finite-dimensional normed linear space. Select a basis for


X, say {Xl, ... ,xn}. Define a mapping T : IR n ---+ X by the equation
n
Ta = La(i)Xi a= (a(l), ... ,a(n)) EIRn
i=l

If we assign the norm II 1100 to IR n , then T is continuous because

IITa - Tbll = II t(a(i) - b(i))Xill ~ t la(i) - b(i)1 Ilxill

n n
~ mfx la(i) - b(i)I' L IIXjl1 = Iia - bll oo L IIXj II
j=l j=l
Section 1.4 More about Compactness 21

Now let F be a closed and bounded set in X. Put M = T-1(F). Then M is


closed by Theorem 4 in Section 1.3, page 17. Since F = T(M), we can use
Theorem 1 in Section 1.3, page 15, to conclude that F is compact, provided that
M is compact. To show that M is compact, we can use Lemmas 1 and 2 above
if we can show that for some c,

In other words, we have only to prove that M is bounded. To this end, define

{3 = inf{ IITal1 : Iialloo = I}


This is the infimum of a continuous map on a compact set (prove that). Hence
the infimum is attained at some point b. Thus = 1 andIlbl oo

{3 = IITbl1 = II ~ b(i)Xill
Since the points Xi constitute a linearly independent set, and since b i- 0, we
{3
conclude that Tb i- 0 and that > O. Since F is bounded, there is a constant
c such that Ilxll c
~ for all x E F. Now, if a E ]Rn and a i- 0, then is a ajllall oo
vector of norm 1; consequently, IIT(ajllalloo)11 {3,
~ or

This is obviously true for a = 0 also. For a E M we have Ta E F, and


{3llalloo IITal1
~ ~ c, whence Iialloo ~ cj{3. Thus, M is indeed bounded. •

Corollary 1. Every finite-dimensional normed linear space is com-


plete.

Proof. Let [xnl be a Cauchy sequence in such a space. Let us prove that the
sequence is bounded. Select an index m such that Ilxi - Xjll
< 1 whenever
i, j ~ m. Then we have

(i ~ m)

Hence for all i,

Since the ball of radius c is compact, our sequence must have a convergent
subsequence, say x ni ~ x'. Given E: > 0, select N so that Xj < E: whenIlxi - I
i,j ~ N. Then IIXj - XnJ
< E: when i,j ~ N, because ni > i. By taking the
limit as i ~ 00, we conclude that IIXj - x'il
~ E: when j ~ N. This shows that
Xj ~ x. •
22 Chapter 1 Normed Linear Spaces

Corollary 2. Every finite-dimensional subspace in a normed linear


space is closed.

Proof. Recall that a subset Y in a linear space is a subspace if it is a linear


space in its own right. (The only axioms that require verification are the ones
concerned with algebraic closure of Y under addition and scalar multiplication.)
Let Y be a finite-dimensional subspace in a normed space. To show that Y is
closed, let Yn E Y and Yn -+ y. We want to know that Y E Y. The preceding
corollary establishes this: The convergent sequence has the Cauchy property and
hence converges to a point in Y, because Y is complete. •

Riesz's Lemma. If U is a closed and proper subspace (U is neither


o nor the entire space) in a normed linear space, and if 0 < A < 1, then
there exists a point x such that 1 = Ilxll and dist(x, U) > A.

Proof. Since U is proper, there exists a point z EX" U. Since U is


closed, dist(z, U) > o. (See Problem 11.) By the definition of dist(z, U) there
is an element u in U satisfying the inequality liz - ull < A-I dist(z, U). Put
x = (z - u)/llz - ull. Obviously, Ilxll = 1. Also, with the help of Problem 7, we
have
dist(x, U) = dist(z - u, U)/llz - ull = dist(z, U)/llz - ull > A •

Theorem 2. If the unit ball in a normed linear space is compact,


then the space has finite dimension.

Proof. If the space is not finite dimensional, then a sequence [x n ] can be


defined inductively as follows. Let Xl be any point such that IlxI11 = 1. If
Xl, ... ,xn-l have been defined, let Un - l be the subspace that they span. By
Corollary 2, above, Un - l is closed. Use Riesz's Lemma to select Xn so that
IIXnl1 = 1 and dist(xn, Un-I) > 1· 1
Then IIXn - xiii> whenever i < n. This
sequence cannot have any convergent subsequence. •

Putting Theorems 1 and 2 together, we have the following result.

Theorem 3. A normed linear space is finite dimensional if and only


if its unit ball is compact.

In any normed linear space, a compact set is necessarily closed and bounded.
In a finite-dimensional space, these two conditions are also sufficient for compact-
ness. In any infinite-dimensional space, some additional hypothesis is required
to imply compactness. For many spaces, necessary and sufficient conditions for
compactness are known. These invariably involve some uniformity hypothesis.
See Section 7.4, page 347, for some examples, and [DS] (Section IV.14) for many
others.

Problems 1.4

1. A real-valued function f defined on a normed space is said to be lower semicontinuous


if each set {x : f{x) ~ >.} is closed (>. E JR). Prove that every continuous function is lower
Section 1.4 More about Compactness 23

semicontinuous. Prove that if f and -fare lower semicontinuous, then f is continuous.


Prove that a lower semicontinuous function attains its infimum on a compact set.
2. Prove that the collection of open sets (as we have defined them) in a normed linear space
fulfills the axioms for a topology.
3. Two norms, Nl and N2, on a vector space X are said to be equivalent if there exist
positive constants 0 and (3 such that ONl ~ N2 ~ (3N l . Show that this is an equivalence
relation. Show that the topologies engendered by a pair of equivalent norms are identical.
4. Prove that a Cauchy sequence converges if and only if it has a convergent subsequence.
5. Let X be the linear subspace of all real sequences x = [x(l), x(2), ... J such that only a
finite number of terms are nonzero. Is there a norm for X such that (X, II II) is a Banach
space?
6. Using the notation in the proof of Theorem 1, prove in detail that F = T(M).
7. Prove these properties of the distance function dist(x, U) (defined in Section 1.2, page 9)
when U is a linear subspace in a normed linear space:
(a) dist(Ax, U) = IAI dist(x, U)
(b) dist(x - u, U) = dist(x, U) (u E U)
(c) dist(x + y, U) ~ dist(x, U) + dist(y, U)
8. Prove this version of Riesz's Lemma: If U is a finite-dimensional proper subspace in a
normed linear space X, then there exists a point x for which IIxll = dist(x, U) = 1.
9. Prove that if the unit ball in a normed linear space is complete, then the space is complete.
10. Let U be a finite-dimensional subspace in a normed linear space X. Show that for each
x E X there exists a u E U satisfying IIx - ull = dist(x, U).
11. Let U be a closed subspace in a normed space X. Prove that the distance functional has
the property that for x EX" U, dist(x,U) > O.
12. In any infinite-dimensional normed linear space, the open unit ball contains an infinite
disjoint family of open balls all having radius ~ (!!) (Prove it, of course. While you're at
it, try to improve the number ~.)
13. In the proof of Theorem 1, show that M is bounded as follows. If it is not bounded,
let ak E M and lIaklloc --+ 00. Put a~ = ak/llakiloc. Prove that the sequence [a~]
has a convergent subsequence whose limit is nonzero. By considering Ta~, obtain a
contradiction of the injective nature of T.
14. Prove that the sequence [xnJ constructed in the proof of Theorem 2 is linearly indepen-
dent.
15. Prove that in any infinite-dimensional normed linear space there is a sequence [xn] in
the unit ball such that IIxn - Xm II > 1 when n # m. If you don't succeed, prove the
same result with the weaker inequality IIxn - Xm II ~ 1. (Use the proof of Theorem 2 and
Problem 8 above.) Also prove that the unit ball in eoe contains a sequence satisfying
Ilxn - xmll = 2 when n # m. Reference: [DiesJ.
16. Let S be a subset of a normed linear space such that IIx - yll ~ 1 when x and y are
different points in S. Prove that S is closed. Prove that if S is an infinite set then it
cannot be compact. Give an example of such a set that is bounded and infinite in the
space e[O, 1J.
17. Let A and B be nonempty closed sets in a normed linear space. Prove that if A + B is
compact, then so are A and B. Why do we assume that the sets are nonempty? Prove
that if A is compact, then A + B is closed.
24 Chapter 1 Normed Linear Spaces

1.5 Linear Transformations

Consider two vector spaces X and Y over the same scalar field. A mapping
f :X -t Y is said to be linear if
f(au + (3v) = af(u) + (3f(v)
for all scalars a and (3 and for all vectors u, v in X. A linear map is often called
a linear transformation or a linear operator. If Y happens to be the scalar
field, the linear map is called a linear functional. By taking a = (3 = a we see
at once that a linear map f must have the property f(O) = O. This meaning of
the word "linear" differs from the one used in elementary mathematics, where a
linear function of a real variable x means a function of the form x >--+ ax + b.
Example 1. If X = IR n and Y = IR m , then each linear map of X into Y is of
the form f(x) = y,
n
y(i) = LaijX(j) (1 ~ i ~ m)
j=1

where the aij are certain real numbers that form an m x n matrix. •
Example 2. Let X = C[O, 1J and Y = IR. One linear functional is defined by
f(x) = f; x(s) ds. •
Example 3. Let X be the space of all functions on [O,lJ that possess n
continuous derivatives, x', x", ... ,x(n). Let ao, a1, ... ,an be fixed elements of
X. Then a linear operator D is defined by

i=O
Such an operator is called a differential operator. •
Example 4. Let X = C[O, 1J = Y. Let k be a continuous function on [O,lJ x

11
[O,lJ. Define K by
(Kx)(s) = k(s,t)x(t)dt
This is a linear operator, in fact a linear integral operator. •
Example 5. Let X be the set of all bounded continuous functions on IR+ =

1
{t E IR: t ~ a}. Put
00
(Lx)(s) = e-stx(t)dt

This linear operator is called the Laplace Transform. •


Example 6. Let X be the set of all continuous functions on IR for which
f~oo Ix(t)1 dt < 00. Define

(Fx)(s) = 1: e-27ristx(t) dt

This linear operator is called the Fourier Transform.


If a linear transformation T acts between two normed linear spaces, then

the concept of continuity becomes meaningful.
Section 1.5 Linear Transformations 25

Theorem 1. A linear transformation acting between normed linear


spaces is continuous if and only if it is continuous at zero.

Proof. Let T : X -t Y be such a linear transformation. If it is continuous, then


of course it is continuous at O. For the converse, suppose that T is continuous
atO. For each € > 0 there is a 8 > 0 such that for all x,
jjxjj <8 => jjTxjj <€
Hence
jjx - yjj <8 => jjTx - Tyjj = jjT(x - y)jj <€

A linear transformation T acting between two normed linear spaces is said
to be bounded if it is bounded in the usual sense on the unit ball:
sup { jjTxjj : jjxjj ~ I} < 00
Example 7. Let X = C 1 [0,1]' the space of all continuously differentiable
functions on [0,1]. Give X the norm jjxjjoo = sup Ix(s)l. Let f be the linear
functional defined by f (x) == x' (1). This functional is not bounded, as is seen
by considering the vectors Xn (s) = sn. On the other hand, the functional in
Example 2 is bounded since If(x)1 ~ J;
Ix(s)1 ds ~ jjxjjoo· •

Theorem 2. A linear transformation acting between normed linear


spaces is continuous if and only if it is bounded.

Proof. Let T : X -t Y be such a map. If it is continuous, then there is a 8 >0


such that
jjxjj ~ 8 => jjTxjj ~ 1
If jjxjj ~ 1, then 8x is a vector of norm at most 8. Consequently, jjT(8x)jj ~ 1,
whence jjTxjj ~ 1/8. Conversely, if jjTxjj ~ M whenever jjxjj ~ 1, then

jjxll ~ :r => lI~xll ~ 1 => liT (~x) II ~ M => IITxl1 ~ €

This proves continuity at 0, which suffices, by the preceding theorem. •

If T : X -t Y is a bounded linear transformation, we define

IITII = sup{ IITxl1 : Ilxll ~ I}


It can be shown that this defines a norm on the family of all bounded linear
transformations from X into Y; this family is a vector space, and it now becomes
a normed linear space, denoted by £(X, Y).
The definition of jjTjjleads at once to the important inequality

jjTxjj ~ jjTjj jjxjj


To prove this, notice first that it is correct for x = 0, since TO = O. On the other
hand, if x#- 0, then x/jjxjj is a vector of norm 1. By the definition of jjTjj, we
have jjT(x/jjxjj)jj ~ jjTjj, which is equivalent to the inequality displayed above.
That inequality contains three distinct norms: the ones defined on X, Y, and
£(X, Y).
26 Chapter 1 Normed Linear Spaces

Theorem 3. A linear functional on a normed space is continuous if


and only if its kernel ("null space") is closed.

Proof. Let f :X -t lR be a linear functional. Its kernel is

ker(f) = {x : f(x) = O}

This is the same as f-1({O}). Thus if f is continuous, its kernel is closed, by


Theorem 4 in Section 1.3, page 17. Conversely, if f is discontinuous, then it is
not bounded. Let Ilxnll : : ;
1 and f(x n ) -t 00. Take any x not in the kernel and
consider the points x - EnXn, where En = f(x)/ f(x n ). These points belong to
the kernel of f and converge to x, which is not in the kernel, so the latter is not
closed. •

Corollary 1. Every linear functional on a finite-dimensional normed


linear space is continuous.

Proof. If f is such a functional, its null space is a subspace, which, by Corollary


2 in Section 1.4, page 22, must be closed. Then Theorem 3 above implies that
f is continuous. •

Corollary 2. Every linear transformation from a finite-dimensional


normed space to another normed space is continuous.

Proof. Let T : X -t Y be such a transformation. Let {b 1 , . .. , bn } be a basis


for X. Then each x E X has a unique expression as a linear combination of
basis elements. The coefficients depend on x, and so we write x = L~=1 Ai(X)bi .
These functionals Ai are in fact linear. Indeed, from the previous equation and
the equation u = L Ai (u )bi we conclude that
n
ax + j3u = Z)aAi(X) + j3Ai(U)] bi
i=1

Since we have also


n
ax + j3u = L Ai(ax + j3u)b i
i=1

we may conclude (by the uniqueness of the representations) that

Now use the preceding corollary to infer that the functionals Ai are continuous.
Getting back to T, we have

and this is obviously continuous.



Section 1.5 Linear Transformations 27

Corollary 3. All norms on a finite-dimensional vector space are


equivalent, as defined in Problem 3, page 23.

Proof. Let X be a finite-dimensional vector space having two norms and 11111
11112· The identity map J from (X, I Ill)
to (X, I IU
is continuous by the
preceding result. Hence it is bounded. This implies that

By the symmetry in the hypotheses, there is a (3 such that IIxl11 ~ (3llxI12. •


Recall that if X and Yare two normed linear spaces, then the notation
£(X, Y) denotes the set of all bounded linear maps of X into Y. We have seen
that boundedness is equivalent to continuity for linear maps in this context. The
space £(X, Y) has, in a natural way, all the structure of a normed linear space.
Specifically, we define

(aA + (3B)(x) = a(Ax) + (3(Bx)


IIAII = sup{ IIAxlly : x EX, Ilxllx ~ l}
In these equations, A and B are elements of £(X, Y), and x is any member of
X.

Theorem 4. If X is a normed linear space and Y is a Banach space,


then £(X, Y) is a Banach space.

Proof. The only issue is the completeness of £(X, Y). Let [An] be a Cauchy
sequence in £(X, Y). For each x E X, we have

This shows that [Anx] is a Cauchy sequence in Y. By the completeness of Y we


can define Ax = lim Anx. The linearity of A follows by letting n -+
00 in the
equation

The boundedness of A follows from the boundedness of the Cauchy sequence


IIAnl1
[An]. If ~ M then IIAnXl1 Mllxll
~ for all x, and in the limit we have
IIAxl1 Mllxll·
~ Finally, we have IIAn - AII-+ 0 because if IIAn - I
Am ~ c when
m,n;;;: N, then for all x of norm 1 we have IIAnx - AmXl1 ~ c when m,n;;;: N.
Then we can let m -+ 00 to get IIAnx - Axil ~ c and IIAn - All
~ c. •
The composition of two linear mappings A and B is conventionally written
as AB rather than AoB. Thus, (AB)x = A(Bx). If AA is well-defined (Le., the
range of A is contained in its domain), then we write it as A2. All nonnegative
powers are then defined recursively by writing AD = J, An+1 = AAn.
28 Chapter 1 Normed Linear Spaces

Theorem 5. The Neumann Theorem. Let A be a bounded


linear operator on a Banach space X (and taking values in X). If
I All 1,
< then I - A is invertible, and

A)-I = L
00

(I - Ak
k=O

Proof. Put Bn = 2:~=o Ak. The sequence [Bn] has the Cauchy property, for
if n > m, then

IIBn - Bmll = 1\ kf+1 Akll ~ kf+IIIAkll ~ ~ IIAllk


00

= IIAllm L IIAllk = IIAllm /( 1-IIAII)


k=O

(In this calculation we used Problem 20.) Since the space of all bounded linear
operators on X into X is complete (Theorem 4), the sequence [Bn] converges to
a bounded linear operator B. We have

n n+1
(I - A)Bn = Bn - ABn =L Ak - L Ak = J - A n+ 1
k=O k=1
Taking a limit, we obtain (J - A)B = I. Similarly, B(I - A) = I. Hence
B = (I - A)-I. •
The Neumann Theorem is a powerful tool, having applications to many
applied problems, such as integral equations and the solving of large systems of
linear equations. For examples, see Section 4.3, which is devoted to this theorem,
and Section 3.3, which has an example of a nonlinear integral equation.

Problems 1.5

1. Prove that the closure of a linear subspace in a normed linear space is also a subspace.
(The closure operation is defined on page 16.)
2. Prove that the operator norm defined here has the three properties required of a norm.
3. Prove that the kernel of a linear functional is either closed or dense. (A subset in a
topological space X is dense if its closure is X.)
4. Let {Xl, ... , Xk} be a linearly independent finite set in a normed linear space. Show that
there exists a 8 > 0 sum that the condition

max IIXi-Yill<8
l~i~k

implies that {Yl, ... ,Yk} is also linearly independent.

5. Prove directly that if T is an unbounded linear operator, then it is discontinuous at O.


(Start with a sequence [xnJ such that Ilxnll ~ 1 and IITxnll -+ 00.)
Section 1.5 Linear Transformations 29

6. Let A be an m x n matrix. Let X =


IR n , with norm IIxll co =
maxl ';:; i';:; n Ix(i)l. Let Y =
IR m , with norm lIyllco = maxl ';:; i ';:; m ly(i)l· Define a linear transforma tion T from X to
Y by putting (Tx)(i) = 2::7= 1 a;j x(j), 1 ~ i ~ m. Prove that IITII = max; 2::7= 1Ia;jl.

7. Prove that a linear map is injective (i.e., one-to-one) if and only if its kernel is the 0
subspace. (The kernel of a map T is {x : Tx = O}.)
8. Prove that the norm of a linear transformation is the infimum of all the numbers l'v! that
satisfy the inequality IITxl1 ~ Mllxll for all x.

9. Prove the (surprising) result that a linear transformation is continuous if and only if it
transforms every sequence converging to zero into a bounded sequence.

10. If f is a linear functional on X and N is its kernel, then there exis ts a one-dimensional
subspace Y such that X =Y EEl N. (For two sets in a linear space , we define U +V as
the set of all sums u +v when u ranges over U and v ranges over V. If U and V are
subspaces with only 0 in common we write this sum as U EEl V.)

11. The space eco(S) was defined in Problem 12 of Section 1.2, page 12. Let S = N, and
define T: eoc(N) -+ c[-!,!l by the equation (Tx)(s) = 2::;;"=1 x(k)sk. Prove that T is
linear and continuous.

12. Prove or disprove: A linear map from a normed linear space into a finite-dimensional
normed linear space must be continuous.

13. Addition of sets in a vector space is defined by A + B = {a + b : a E A , b E B} .


Better: A +B ={x : 3 a E A & 3 b E B such that x = a + b}. Scalar multiplication
is AA = {Aa : a E A}. Does the family of all subsets of a vector space X form a vector
space with these definitions?

14. Let Y be a closed subspace in a Banach space X. A "coset" is a set of the form x +Y =
{x +y :y E V}. Show that the family of all cosets is a normed linear space if we use the
norm III x + Y III = dist(x, V).
15. Refer to Problem 12 in the preceding section, page 23. Show that the assertion there is
not true if ~ is replaced by !.
16. Prove that for a bounded linear transformation T : X -+ Y

IITII = sup IITxll = sup IITxll/llxll


IIrll= 1 #0

17. Prove that a bounded linear transformation maps Cauchy sequences into Cauchy se-
quences.

18. Prove that if a linear transformation maps some nonvoid open set of the domain space
to a bounded set in the range space, then it is continuous.

19. On the space C[O, 1] we define "point-evaluation functionals" by t*(x) = x(t). Here
t E (0, 1] and x E C(O , 1]. Prove that IIt'li = 1. Prove that if I/> = 2:: 7= 1 A;t: , where
tl , t 2, · .·,tn are distinct points in [0, 1] , then 111/>11 = 2::7=1 IA; I·

20. In the proof of the Neumann Theorem we used the inequality IIAk II ~ IIAlik . Prove this.

21. Prove that if {<PI, ... ,qln} is a linearly independent set of linear functionals , then for
suitable Xj we have qI;(Xj) = 8ij for 1 ~ i,j ~ n.
30 Chapter 1 Normed Linear Spaces

22. Prove that if a linear transformation is discontinuous at one point, then it is discontinuous
everywhere.

23. Linear transformations on infinite-dimensional spaces do not always behave like their
counterparts on finite-dimensional spaces. The space Co was defined in Problem 1.2.16
(page 12). On the space Co define

Ax = A[x(I), x(2), ... j = [x(2), x(3), ... j


Bx = B[x(I), x(2), ... j = [0, x(I), x(2), ... j
Prove that A is surjective but not invertible. Prove that B is injective but not invertible.
Determine whether right or left inverses exist for A and B.

24. What is meant by the assertion that the behavior of a linear map at any point of its
domain is exactly like its behavior at o?

25. Prove that every linear functional f on IR n has the form f(x) = 2:~=1 oix(i), where
x(I), x(2), ... , x(n) are the coordinates of x. Let 0 = [01,02, ... , On] and show that the
relationship f ...... 0 is linear, injective, and surjective (hence, an isomorphism).

26. Is it true for linear operators in general that continuity follows from the null space being
closed?

27. Let <Po,(Pl, ... ,<Pn be linear functionals on a linear space. Prove that if the kernel of <Po
contains the kernels of all <Pi for 1 :(; i :(; n, then <Po is a linear combination of <PI , ... , <Pn.

28. If L is a bounded linear map from a normed space X to a Banach space Y, then L has a
unique continuous linear extension defined on the completion of X and taking values in
Y. (Refer to Problem 1.2.47, page 15.) Prove this assertion as well as the fact that the
norm of the extension equals the norm of the original L.

29. Let A be a continuous linear operator on a Banach space X. Prove that the series
2::'=0 An In! converges in £(X, X). The resulting sum can be denoted by eA. Is eA
invertible?

30. Investigate the continuity of the Laplace transform (in Example 5, page 24).

1.6 Zorn's Lemma, Hamel Bases, and the Hahn-Banach Theorem

This section is devoted to two results that require the Axiom of Choice for their
proofs. These are a theorem on existence of Hamel bases, and the Hahn-Banach
Theorem. The first of these extends to all vector spaces the notion of a base,
which is familiar in the finite-dimensional setting. The Hahn-Banach Theorem
is needed at first to guarantee that on a given normed linear space there can
be defined continuous maps into the scalar field. There are many situations
in applied mathematics where the Hahn-Banach Theorem plays a crucial role;
convex optimization theory is a prime example.
The Axiom of Choice is an axiom that most mathematicians use unre-
servedly, but is nonetheless controversial. Its status was clarified in 1940 by a
famous theorem of Godel [Go]. His theorem can be stated as follows.
Section 1.6 Zorn's Lemma, Hamel Bases, Hahn-Banach Theorem 31

Theorem 1. If a contradiction can be derived from the Zermelo-


Fraenkel axioms of set theory (which include the Axiom of Choice),
then a contradiction can be derived within the restricted set theory
based on the Zermelo-Fraenkel axioms without the Axiom of Choice.

In other words, the Axiom of Choice by itself cannot be responsible for intro-
ducing an inconsistency in set theory. That is why most mathematicians are
willing to accept it. In 1963, Paul Cohen [Coh] proved that the Axiom of Choice
is independent of the remaining axioms in the Zermelo-F'raenkel system. Thus
it cannot be proved from them. The statement of this axiom is as follows:

Axiom of Choice. If A is a set and / a function on A such that


/(a) is a nonvoid set for each a E A, then / has a "choice function."
That means a function c on A such that c(a) E /(a) for all a E A.

For example, suppose that A is a finite set: A = {a1, ... , an}. For each i in
{1,2, ... ,n} a nonempty set /(ai) is given. In n steps, we can select "repre-
sentatives" Xl E /(a1), X2 E /(a2), etc. Having done so, define c(ad = Xi for
i = 1,2, ... , n. Attempting the same construction for an infinite set such as
A = JR, with accompanying infinite sets /(a), leads to an immediate difficulty.
To get around the difficulty, one might try to order the elements of each set /(a)
in such a way that there is always a "first" element in /(a). Then c(a) can be
defined to be the first element in /(a). But the proposed ordering will require
another axiom at least as strong as the Axiom of Choice! For a second example,
see Problem 45, page 40.
A number of other set-theoretic axioms are equivalent to the Axiom of
Choice. See [Kel] and [RR]. Among these equivalent axioms, we single out
Zorn's Lemma as being especially useful. First, we require some definitions.
Definition 1. A partially ordered set is a pair (X, -<) in which X is a set
and -< is a relation on X such that
(i) X -< X for all X
(ii) If x -< y and y -< Z, then x -< Z

Definition 2. A chain, or totally ordered set, is a partially ordered set in


which for any two elements x and y, either x -< y or y -< x.
Definition 3. In a partially ordered set X, an upper bound for a subset A
in X is any point x in X such that a -< x for all a E A.
Example 1. Let S be any set, and denote by 28 the family of all subsets of
S, including the empty set 0 and S itself. This is often called the "power set"
of S. Order 28 by the inclusion relation c. Then (2 8 , C) is a partially ordered
set. It is not totally ordered. An upper bound for any subset of 28 is S. •
Example 2. In JR2, define x -< y to mean that x( i) ~ y( i) for i = 1 and 2.
This is a partial ordering but not a total ordering. Which quadrants in JR2 have
upper bounds? •
Example 3. Let:F be a family of functions (whose ranges and domains need
not be specified). For / and g in :F we write / -< g if two conditions are fulfilled:
32 Chapter 1 Normed Linear Spaces

(i) dom(f) C dom(g)


(ii) J(x) = g(x) for all x in dom(f)
When this occurs, we say that "g is an extension of J." Notice that this is
equivalent to the assertion J C g, provided that we interpret (as ultimately we
must) J and 9 as sets of pairs of elements. •
Definition 4. An element m in a partially ordered set X is said to be a
maximal element if every x in X that satisfies the condition m -< x also
satisfies x -< m.

Zorn's Lemma. A partially ordered set contains a maximal element


if each totally ordered subset has an upper bound.

Definition 5. Let X be a linear space. A subset H of X is called a Hamel


base, or Hamel basis, if each point in X has a unique expression as a finite
linear combination of elements of H.
Example 4. Let X be the space of all polynomials defined on lR. A Hamel
base for X is given by the sequence [hnJ where hn(s) = sn, n = 0, 1,2,.... •

Theorem 2. Every nontrivial vector space has a Hamel base.

Proof. Let X be a nontrivial vector space. To show that X has a Hamel


base we first prove that X has a maximal linearly independent set, and then
we show that any such set is necessarily a Hamel base. Consider the collection
of all linearly independent subsets of X, and partially order this collection by
inclusion, C. In order to use Zorn's Lemma, we verify that every chain in
this partially ordered set has an upper bound. Let C be a chain. Consider
S' = U{S : SEC}. This certainly satisfies S C S· for all SEC. But is S·
linearly independent? Suppose that '2::7=1 niSi = 0 for some scalars ni and for
some distinct points Si in S·. Each Si belongs to some Si E C. Since C is a chain
(and since there are only finitely many Si), one of these sets (say Sj) contains all
the others. Since Sj is linearly independent, we conclude that '2:: Inil = O. This
establishes the linear independence of S· and the fact that every chain in our
partially ordered set has an upper bound. Now by Zorn's Lemma, the collection
of all linearly independent sets in X has a maximal element, H. To see that H
is a Hamel base, let x be any element of X. By the maximality of H, either
H U {x} is linearly dependent or H U {x} C H (and then x E H). In either
case, x is a linear combination of elements of H. If x can be represented in two
different ways as a linear combination of members of H, then by subtraction, we
obtain 0 as a nontrivial linear combination of elements of H, contradicting the
linear independence of H. •
In the next theorem, when we say that one real-valued function, J, is dom-
inated by another, p, we mean simply that J(x) ~ p(x) for all x.

Hahn-Banach Theorem. Let X be a real linear space, and let


p be a function from X to IR such that p(x + y) ~ p(x) + p(y) and
p(>..x) = >..p(x) if>.. ;?; O. Any linear functional defined on a subspace of
Section 1.6 Zorn's Lemma, Hamel Bases, Hahn-Banach Theorem 33

X and dominated by p has an extension that is linear, defined on X,


and dominated by p.

Proof. Let f be such a functional, and let Xo be its domain. Thus Xo is a


linear subspace of X. In approaching the theorem for the first time and wonder-
ing how to discover a proof, one naturally asks how to extend the functional f
to a domain containing Xo that is only one dimension larger than Xo. If that is
impossible, then the theorem itself cannot be true. Accordingly, let y be a point
not in the original domain. To extend f to Xo + span(y) it suffices to specify a
value for f(y) because of the necessary equation

f(x + AY) = f(x) + Af(y) (x E Xo , A E JR)

The value of f(y) must be assigned in such a way that

f(x) + Af(y) ~ p(x + AY) (x E Xo , A E JR)

If A = 0, this inequality is certainly valid. If A > 0, we must have

f (~) + f (y) ~ p (~ + y) (x E Xo)

or

If A < 0, we must have

1
f(X2) + f(y) ~ -;xp(x + AY) = -p( -X2 - y)

These two conditions on f(y) can be written together as

In order to see that there is a number satisfying this inequality, we compute

f(Xl) - f(X2) = f(Xl - X2) ~ P(XI - X2) = P(XI +y - X2 - y)


~ P(XI +y) +P(-X2 - y)

This completes the extension by one dimension.


Next, we partially order by the inclusion relation (C) all the linear exten-
sions of f that are dominated by p. Thus h c 9 if and only if the domain of
9 contains the domain of h, and g(x) = h(x) on the domain of h. In order to
use Zorn's Lemma, we must verify that each chain in this partially ordered set
has an upper bound. But this is true, since the union of all the elements in
such a chain is an upper bound !9r the chain. (Problem 2.) By Zor~s Lemma,
there exists a maximal element f in our partially ordered set. Then f ~ a linear
functional that is an extension of f and is dominated by p. Finally, f must be
defined on all of X, for if it were not, a further extension would be possible, as
shown in the first part of the proof. •
34 Chapter 1 Normed Linear Spaces

Corollary 1. Let <I> be a linear Functional defined on a subspace Y


in a normed linear space X and satisFying

I<I>(Y) I ~ MIIYII (y E y)
Then <I> has a linear extension defined on all of X and satisFying the
above inequality on X.

Proof. Use the Hahn-Banach Theorem with p(x) = Mllxll.



Corollary 2. Let Y be a subspace in a normed linear space X.
IF w E X and dist( w, Y) > 0, then there exists a continuous linear
Functional <I> defined on X such that <I>(Y) = 0 For all Y E Y, <I>(w) = 1,
and 11<1>11 = 1/ dist(w, Y).

Proof. Let Z be the subspace generated by Y and w. Each element of Z has a


unique representation as Y+ AW, where Y E Y and A E R It is clear that <I> must
be defined on Z by writing <I>(y + AW) = A. The norm of <I> on Z is computed as
follows, in which the supremum is over all nonzero vectors in Z:

11<1>11 = + Aw)/lly + Awll = sup


sup I<I>(Y IAI/IIY + Awll = sup lillY/A + wll
= 11 inf IIY + wll = 11 dist(w, Y)
By Corollary 1, we can extend the functional <I> to all of X without increase of
its norm. •

Corollary 3. To each point w in a normed linear space there


corresponds a continuous linear functional <I> such that 11<1>11 = 1 and
<I>(w) = 11wI1·
Proof. In Corollary 2, take Y to be the O-subspace.

At this juncture, it makes sense to associate with any normed linear space X
a normed space X' consisting of all continuous linear functionals defined on X.
Corollary 3 shows that X' is not trivial. The space X' is called the conjugate
space of X, or the dual space or the adjoint of X.
Example 4. Let X = R n , endowed with the max-norm. Then X' is (or
can be identified with) R n with the norm II 111' To see that this is so, recall
(Problem 1.5.25, page 30) that if <I> E X*, then <I>(x) = 2:~=1 u( i)x( i) for a
suitable u E Rn. Then
n n
11<1>11 = sup IL U(i)X(i)1 = L l'u(i)1 = Ilulll

Ilxlloo';;;l i=l i=l

Example 5. Let Co denote the Banach space of all real sequences that converge
to zero, normed by putting Ilxlloo = sup Ix(n)l. Let f1 denote the Banach
space of all real sequences u for which 2:::"=1 lu(n)1 < 00, normed by putting
Ilulll = 2:::"=1 lu(n)l· With each u E f1 we associate a functional <l>u E Co by
means of the equation <l>u(x) = 2:::"=1 u(n)x(n). (The connection between these
two spaces is the subject of the next result.) •
Section 1.6 Zorn's Lemma, Hamel Bases, Hahn-Banach Theorem 35

Proposition. The mapping u f--t 1>u


is an isometric isomorphism
between £1 and co' Thus we can say that Co "is" £1.

Proof. Perhaps we had better give a name to this mapping. Let A : £1 -+


Co be defined by Au =: 1>u. It is to be shown that for each u, Au is linear
and continuous on co. Then it is to be shown that A is linear, surjective, and
isometric. Isometric means =: IIAul1 Ilulll.
That 1>u
is well-defined follows from
the absolute convergence of the series defining 1>u(x):

The linearity of 1>u is obvious:


1>u(ax + (3y) =: L u(n) [ax(n) + (3y(n)] =: a L u(n)x(n) + (3 L u(n)y(n)
=: a1>u(x) + (31)u(Y)
The continuity or boundedness of 1>u is easy:

By taking a supremum in this last inequality, considering only x for which


Ilxlloo : ;
1, we get

On the other hand, if c > 0 is given, we can select N so that I:~N+l < c. lu(n)1
Then we define x by putting x(n) =: sgn u(n) for n ::;; N, and by setting x(n) =: 0
for n > N. Clearly, x E Co and Ilxlloo
=: 1. Hence

N N
II1>ull ~ 1>u(x) =: Lx(n)u(n) =: L lu(n)1 > Ilull l - c
n=l n=l

Since c was arbitrary, II1>ull ~ Ilulll. Hence we have proved

Next we show that A is surjective. Let '¢ E co. Let <5n be the element of Co
that has a 1 in the nth coordinate and zeros elsewhere. Then for any x,

00

x=: L x(n)<5n
n=l

Since '¢ is continuous and linear,


36 Chapter 1 Normed Linear Spaces

Consequently, if we put u(n) = 'I/J(8n ), then 'I/J(x) = ¢u(x) and 'I/J = ¢u' To verify
that u E £1, we define (as above) x(n) = sgnu(n) for n ~ Nand x(n) = 0 for
n> N. Then
N

2: lu(n)1 = 2: x (n)u(n) = 'I/J(x) ~ 11'l/Jllllxll = 1I'l/J1l


n=l
Thus Ilull l ~ 1I'l/J1l.
Finally, the linearity of A follows from writing

¢O:U+{JV(x) = 2:(au + ,Bv)(n)x(n) = a 2: u(n)x(n) +,B 2: v(n)x(n)


= (a¢u + ,B¢v)(x) •
Corollary 4. For each x in a normed linear space X, we have
IIxll = max{I¢(x)1 : ¢ E X' , II¢II = 1}
Proof. If ¢ E X' and II¢II = 1, then

1¢(x)1 ~ 1I¢lIlIxll = IIxll


Therefore,
sup{I¢(x)1 : ¢ E X' , II¢II = 1} ~ IIxll
For the reverse inequality, note first that it is trivial if x = o. Otherwise, use
Corollary 3. Then there is a functional 'I/J E X' such that 'I/J(x) = IIxll and
1I'l/J1l = 1. Note that the supremum is attained. •

A subset Z in a normed space X is said to be fundamental if the set of


all linear combinations of elements in Z is dense in X. Expressed otherwise, for
each x E X and for each e > 0 there is a vector L~=l .\Zi such that Zi E Z,
Ai E JR, and
IIx - 2:
AiZili < e

We could also state that dist(x,span Z) = 0 for all x E X. As an example, the


vectors
81 = (1,0,0, ... )
82 = (0,1,0, ... )
etc.
form a fundamental set in the space Co.
Example 6. In the space C[a, b], with the usual supremum norm, an important
fundamental set is the sequence of monomials
uo(t) =1, U1(t) =t , U2(t) = t2 , ...
The Weierstrass Approximation Theorem asserts the fundamentality of this se-
quence. Thus, for any x E C[a, bJ and any e > 0 there is a polynomial u for
which IIx - ull oo < e. Of course, u is of the form L~o AiUi' •

Definition 5. If A is a subset of a normed linear space X, then the annihilator


of A is the set
AJ.. = {¢ E X' : ¢(a) = 0 for all a E A}
Section 1.6 Zorn's Lemma, Hamel Bases, Hahn-Banach Theorem 37

Theorem 3. A subset in a normed space is fundamental if and only


if its annihilator is {a}.

Proof. Let X be the space and Z the subset in question. Let Y be the closure
of the linear span of Z. If Y i= X, let x EX" Y. Then by Corollary 2, there
exists </J EX' such that </J(x) = 1 and </J E y.l.. Hence </J E Z.l. and Z.l. i= O. If
Y = X, then any element of Z.l. annihilates the span of Z as well as Y and X.
Thus it must be the zero functional; i.e., Z.l. = O. •

Theorem 4. If X is a normed linear space (not necessarily complete)


then its conjugate space X' is complete.

Proof. This follows from Theorem 4 in Section 1.5, page 27, by letting Y = lR
in that theorem. •

Problems 1.6

1. Let X and Y be sets. A function from a subset of X to Y is a subset f of X x Y such


that for each x E X there is at most one y E Y satisfying (x, y) E f. We write then
f(x) = y. The set of all such functions is denoted by S. Prove or disprove the following:
(a) S is partially ordered by inclusion. (b) The union of two elements of S is a member
of S. (c) The intersection of two elements of S is a member of S. (d) The union of any
chain in S is a member of S.
2. In the proof of the Hahn-Banach theorem, show that the union of the elements in a chain
is an upper bound for the chain. (There are five distinct things to prove.)
3. Denote by CO the normed linear space of all functions x : N --t IR having the property
limn-too x(n) = 0, with norm given by IIxll = sUPn Jx(n)J. Do the vectors em defined by
em(n) = 8nm form a Hamel base for CO?
4. If {h" : a E I} is a Hamel base for a vector space X, then each element x in X has a
representation x = L"
A(a)h" in which A : I --t IR and {a: A(a) # O} is finite. (Prove
this.)
5. Prove that every real vector space is isomorphic to a vector space whose elements are
real-valued functions. ("Function spaces are all there are.")
6. Prove that any linearly independent set in a vector space can be extended to produce a
Hamel base.
7. If U is a linear subspace in a vector space X, then U has an "algebraic complement,"
which is a subspace V such that X = U + V, Un V = O. ("0" denotes the zero subspace.)
(Prove this.)
FIVE EXERCISES (8-12) ON BANACH LIMITS

8. The space loo consists of all bounded sequences, with norm IIxlioo = sUPn Jx(n)J. Define
T : lOO --t loo by putting

Tx = [x(l), x(2) - x(l), x(3) - x(2), x(4) - x(3) ... J

Let M denote the range of T, and put u = [1,1,1, ... J. Prove that dist(u,M) = 1.
9. Prove that there exists a continuous linear functional </> E M.l such that 11</>11 = </>( u) = 1.
The functional </> is called a Banach limit, and is sometimes written LIM.
10. Prove that if x E loo and x ;;;: 0, then </>(x) ;;;: O.
11. Prove that </>(x) = limn x(n) when the limit exists.
12. Prove that if y = [x(2), x(3), ... J then </>(x) = </>(y).
38 Chapter 1 Normed Linear Spaces

13. Let loo denote the normed linear space of all bounded real sequences, with norm given
by IIxIL", = sUPn Ix{n)l. Prove that loe is complete, and therefore a Banach space. Prove
that lj = loo, where the equality here really means isometrically isomorphic.

14. A hyperplane in a normed space is any translate of the null space of a continuous,
linear, nontrivial functional. Prove that a set is a hyperplane if and only if it is of the
form {x : ¢(x) = ,X}, where ¢ E X' "0 and ,X E lIt A translate of a set S in a vector
space is a set of the form v + S = {v + s : S E S}.
15. A half-space in a normed linear space X is any set of the form {x : ¢(x) ~ ,X}, where
¢ E X' "0 and ,X E JR. Prove that for every x satisfying IIxll = 1 there exists a half-space
such that x is on the boundary of the half-space and the unit ball is contained in the
half-space.

16. Prove that a linear functional ¢ is a linear combination of linear functionals ¢1, ... , ¢n if
and only if N(¢) :l n::"1 N(¢i). Here N(¢) denotes the null space of ¢. (Use induction
and trickery.)

17. Prove that a linear map transforms convex sets into convex sets.

18. Prove that in a normed linear space, the closure of a convex set is convex.

19. Let Y be a linear subspace in a normed linear space X. Prove that

dist(x, Y) = sup{¢(x) : ¢ E X' , ¢ 1. Y , II¢II = I}


Here the notation ¢ 1. Y means that ¢(y) = 0 for all y E Y.
20. Let Y be a subset of a normed linear space X. Prove that Y -L is a closed linear subspace
in X'.

21. If Z is a linear subspace in X', where X is a normed linear space, we define

Z-L = {x EX: ¢(x) = 0 for all ¢ E Z}

Prove that for any closed subspace Y in X, (Y -L h = Y. Generalize.

22. Let J(z) = L.::;;"=o anz n , where [an] is a sequence of complex numbers for which nan --t O.
Prove the famous theorem of Tauber that L.:: an converges if and only if limz--+1 J(z)
exists. (See [DS], page 78.)

23. Do the vectors tl n defined just after Corollary 4 form a fundamental set in the space lX)
consisting of bounded sequences with norm Ilxlloo = maxn Ix(n)l?
THREE EXERCISES (24-26) ON SCHAUDER BASES (See [Sem] and [Sing].)

24. A Schauder base (or basis) for a Banach space X is a sequence [un] in X such that eacil
x in X has a unique representation

00

x = L,XnUn
n=l

This equation means, of course, that limN--+ocllx - L.::~=1 'xnunll = O. Show that one
Schauder base for Co is given by un(m) = tlnm (n, m = 1,2,3, ... ).
Section 1.6 Zorn's Lemma, Hamel Bases, Hahn-Banach Theorem 39

25. Prove that the An in the preceding problem are functions of x and must be, in fact, linear
and continuous.

26. Prove that if the Banach space X possesses a Schauder base, then X must be separable.
That is, X must contain a countable dense set.

27. Prove that for any set A in a normed linear space all these sets are the same:
A-L, (closure A)-L, (span A)-L, [closure (span A)]-L,.

28. Prove that for x E co,

29. Use the Axiom of Choice to prove that for any set S having at least 2 points there is a
function f :S --t S that does not have a fixed point.

30. An interesting Banach space is the space C consisting of all convergent sequences. The
norm is IIxlloc = sUPn Jx(n)J. Obviously, we have these set inclusions among the examples
encountered so far:

Prove that Co is a hyperplane in c. Identify in concrete terms the conjugate space c'.

31. Prove that if H is a Hamel base for a normed linear space, then so is {h/llhil : h E H}.
32. Let X and Y be linear spaces. Let H be a Hamel base for X. Prove that a linear map
from X to Y is completely determined by its values on H, and that these values can be
arbitrarily-assigned elements of Y.

33. Prove that on every infinite-dimensional normed linear space there exist discontinuous
linear functionals. (The preceding two problems can be useful here.)

34. Using Problem 33 and Problem 1.5.3, page 28, prove that every infinite-dimensional
normed linear space is the union of a disjoint pair of dense convex sets.

35. Let two equivalent norms be defined on a single linear space. (See Problem 1.4.3, page
23.) Prove that if the space is complete with respect to one of the norms, then it is
complete with respect to the other. Prove that this result fails (in general) if we assume
only that one norm is less than or equal to a constant multiple of the other.

36. Let Y be a subspace of a normed space X. Prove that there is a norm-preserving injective
map J : Y' --t X' such that for each <P E Y', J<p is an extension of <p.

37. Let Y be a subspace of a normed space X. Prove that if Y -L = 0, then Y is dense in X.

38. Let T be a bounued linear map of Co into Co. Show that T must have the form (Tx)(n) =
2.::':1 ani X(i) for a suitable infinite matrix [ani]. Prove that sUPn 2.::':1 Jani = IITII·
J

39. Prove that if #S = n, then #2s = 2n.

40. What implications exist among these four properties of a set S in a normed linear space
X? (a) S is fundamental in X; (b) S is linearly independent; (c) S is a Schauder base
for X; (d) S is a Hamel base for X.
40 Chapter 1 Normed Linear Spaces

41. A "spanning set" in a linear space is a set S such that each point in the space is a linear
combination of elements from S. Prove that every linear space has a minimal spanning
set.

42. Let f : IR --t IR. Define x -< y to mean f(x) :::; f(y). Under what conditions is this a
partial order or a total order?

43. Criticize the following "proof' that if X and Y are any two normed linear spaces, then
X' == Y·. We can assume that X and Yare subspaces of a third normed space z.
(For example, we could use Z == X EI1 Y, a direct sum.) Clearly, X' is a subspace of
Z', since the Hahn-Banach Theorem asserts that an element of X' can be extended,
without increasing its norm, to Z. Clearly, Z· is a subspace of Y', since each element
of Z· can be restricted to become an element of Y·. So, we have X' C Z· C Y·. By
symmetry, Y' C X'. So X· == Y' .

44. Let K be a subset of a linear space X, and let f : K --t IR. Establish necessary and
sufficient conditions in order that f be the restriction to K of a linear functional on X.

45. For each a in a set A, let f(a) be a subset of N. Without using the Axiom of Choice,
prove that f has a choice function.

1. 7 The Baire Theorem and Uniform Boundedness

This section is devoted to the first consequences of completeness in a normed


linear space. These are stunning and dramatic results that distinguish Banach
spaces from other normed linear spaces. Once we have these theorems (in this
section and the next), it will be clear why it is always an advantage to be working
with a complete space. The reader has undoubtedly seen this phenomenon when
studying the real number system (which is complete). When we compare the
real and the rational number systems, we notice that the latter has certain
deficiencies, which indeed had already been encountered by the ancient Greeks.
For example, they knew that no square could have rational sides and rational
diagonal! Put another way, certain problems posed within the realm of rational
numbers do not have solutions among the rational numbers; rather, we must
expect solutions sometimes to be irrational. The simplest example, of course, is
x2 = 2. Our story begins with a purely metric-space result.

Theorem 1. Baire's Theorem. In a complete metric space, the


intersection of a countable family of open dense sets is dense.

Proof. (A set is "dense" if its closure is the entire space.) Let 0 1 , O2 , • •• be


open dense sets in a complete metric space X. In order to show that n:=l
On is
dense, it is sufficient to prove that this set intersects an arbitrary nonvoid open
ball 8 1 in X. For each n we will define an open ball and a closed ball:

Select any Xl E X and let T1 > O. We want to prove that 8 intersects 1 On. n:=l
Since 0 1 is open and dense, 0 1 n 8 1 is open and nonvoid. Take 8~ C 8 1 n 0 1 •
Section 1.7 The Baire Theorem and Uniform Boundedness 41

Then take S~ C S2 n O2 , S~ C S3 n 0 3 , and so on. At the same time we can


insist that Tn .J.. o. Then for all n,

The points Xn form a Cauchy sequence because Xi, Xj E Sn if i,j > n, and so

Since X is complete, the sequence [xnl converges to some point X'. Since for
i > n,
Xi E S~+1 C S1 nOn
we can let i -+ to conclude that x' E S~+1 C S1 n On. Since this is true for
n:=1 On does indeed intersect S1.
00
all n, the set •

Corollary. If a complete metric space is expressed as a countable


union of closed sets, then one of the closed sets must have a non empty
interior.

Proof. Let Xbe a complete metric space, and suppose that =X U:=1 Fn ,
where each Fn is a closed set having empty interior. The sets On = X "Fn are
open and dense. Hence by Baire's Theorem, n:=1
On is dense. In particular, it
is nonempty. If X E n:=1
On, then X EX" U:=1
Fn , a contradiction. •
A subset in a metric space X (or indeed in any topological space) is said to
be nowhere dense in X if its closure has an empty interior. Thus the set of
irrational points on the horizontal axis in ]R2 is nowhere dense in 1R2 . A set that
is a countable union of nowhere dense sets is said to be of category I in X. A
set that is not of category I is said to be of category II in X.
Observe that all three of these notions are dependent on the space. Thus
one can have E C X C Z, where E is of category II in X and of category I in
Z. For a concrete example, the one in the preceding paragraph will serve.
The Corollary implies that if X is a complete metric space, then X is of the
second category in X.
Intuitively, we think of sets of the first category as being "thin," and those
of the second category as "fat." (See Problems 5, 6, 7, for example.)

Theorem 2. The Banach-Steinhaus Theorem. Let {Ao}


be a family of continuous linear transformations defined on a Banach
space X and taking values in a normed linear space. In order that
sUPo IIAol1 < 00, it is necessary and sufficient that the set {x EX:
sUPo IIAoxl1 < oo} be of the second category in X.

Proof. Assume first that c = sUPo IIAo II < 00. Then every x satisfies IIAoxl1 :::;;
cllxll, and every x belongs to the set F = {x : sUPo IIAoxl1 < oo}. Since F = X,
the preceding corollary implies that F is of the second category in X.
For the sufficiency, define

Fn = {x EX: sup IIAoxl1 :::;; n}


o
42 Chapter 1 Normed Linear Spaces

and assume that F is of the second category in X. Notice that F = U~=l Fn·
Since F is of the second category, and each Fn is a closed subset of X, the
definition of second category implies that some Fm contains a ball. Suppose
that
B =- {x EX: Ilx - xoll :;;;; r} C Fm (r > 0)
For any x satisfying Ilxll :;;;; 1 we have Xo + rx E B. Hence

IIAoxl1 = IIAo [r-1(xo + rx - xo)] II


:;;;; r-11IAo(xo +rx)11 +r-11IA o xoll:;;;; 2r- 1m

Hence IIAol1 :;;;; 2r- 1 m for all 0:.



Theorem 3. The Principle of Uniform Boundedness. Let
{Ao} be a collection of continuous linear maps from a Banach space X
into a normed linear space. Ifsupo I!Aoxll < 00 for each x E X, then
sUPo IIAol1 < 00.

Example 1. Consider the familiar space C[O, 1]. We are going to show that
most members of C[O, 1] are not differentiable. Select a point { in the open
interval (0,1). For small positive values of h we define a linear functional ¢>h by
the equation

,/.. ( ) = x({ + h) - x({ - h)


'/'h x 2h (x E C[O, 1])

It is elementary to prove that ¢>h is linear and that II ¢>h II = h -1. Consequently,
by the Banach-Steinhaus Theorem, the set of x such that sUPh l¢>h(X)1 < 00 is
of the first category. Hence the set of x for which sUPh l¢>h(X)1 = 00 is of the
second category in C[O, 1]. In other words, the set of functions in C[O, 1] that
are not differentiable at { is of the second category in C[O, 1]. •
Example 2. The formal Fourier series of a function x is

L
00

O:n(x)e int
n=-oo

where the functionals O:n are defined by

1 (27< .
O:n(x) = 211' 10 x(s)e- ms ds

If x belongs to C27<' the space of continuous 211'-periodic functions on [0,211']


(endowed with the sup-norm), then the coefficients O:n(x) certainly exist; (in
fact, they exist if x is only Lebesgue integrable). A sequence of linear operators,
called Fourier projections, is obtained by truncation of the series:
n
(Anx)(t) = L O:k(x)e ikt
k=-n
Section 1.7 The Baire Theorem and Uniform Boundedness 43

It can be shown that the norm of An, considered as a map of C 2rr into itself,
is roughly (4/71"2) log n. In fact, the norm of each functional t* 0 An has this
property. Recall from Problem 19 in Section 1.5 (page 29) that t* denotes
point-evaluation at t, so that

(t* 0 An)(x) = t*(Anx) = (Anx)(t)

Since sUPn Ilt* 0 Anll = +00, the set of x in C2rr whose Fourier series diverge
at a specified point t is a set of the second category. Thus, for most periodic
continuous functions, the Fourier series do not converge. •

Theorem 4. Let [An] be a sequence of continuous linear transfor-


mations from a Banach space X into a normed linear space. In order
that limn Anx = 0 for all x E X it is necessary and sufficient that
sUPn IIAnl1 < 00 and that Anu ---+ 0 for each U in some fundamental
subset of X.

Proof. If Anx ---+ 0 for all x, then obviously sUPn IIAnxl1 < 00 for all x. Hence
sUPn IIAnl1 < 00, by the Principle of Uniform Boundedness.
For the other half of the theorem, assume that IIAnl1 < M for all nand
that An U ---+ 0 for all U in a fundamental set F. It is elementary to prove that
AnY ---+ 0 for all Y in the linear span of F. Now let x E X. Let f > O. Select y
in the linear span of F so that Ilx - YII < f/2M. Select m so that IIAnYl1 < f/2
whenever n :;::, m. Then for n :;::, m we have


Example 3. The Riemann integral of a continuous function x defined on [a, b]
can be obtained as a limit as follows:

Ia
b
x(s)ds =
n b- a b- a
lim Lx(a+i--)'--
n-+oo n
i=1
n

This suggests that we consider the problem of approximating functionals ¢ that


have the form

(1) 1jJ(x) = lb x(s) w(s) ds x E C[a,b]

in which w is a fixed integrable function called the weight. We seek to approx-


imate 1./) by a sequence of functionals <Pn having the form
n
(2) <Pn(x) = LAnix(Sni) x E C[a,b]
i=1

Notice that <Pn is simply a linear combination of point-evaluation functionals.


One can argue with some justification that from the practical, numerical, stand-
point only such functionals are realizable. Other functionals, such as those
44 Chapter 1 Normed Linear Spaces

involving integrals, must be approximated by the simpler realizable ones. Func-


tionals of this type were considered in Problem 1.5.19 (page 29), and a result of
that problem is the formula
n
II4>nll = L IAnil
i=l

Here it is necessary to assume that for each n, {Snl, Sn2, ... , snn} is a set of n
distinct points in [a, bJ. We call these points the "nodes" of the functional 4>n.
An old theorem of Szego, presented next, concerns this example. •

Theorem 5. Let 'lj; and 4>n be as in Equations (1) and (2) above.
In order that 4>n(x) ~ 'lj;(x) for each x E C[a, b], it is necessary and
sufficient that these two conditions be fulfilled:
n
(i) sup L IAnil < 00
n i=l
(ii) The convergence occurs for all the elementary monomial functions,
S f-t sk, k = 0, 1,2, ....

Proof. Consider the sequence of functionals ['lj; - 4>nJ. The norm of 'lj; is

11'lj;11 = sup \ (b x(s) w(s) dX\ ~ (b Iw(s)1 ds


IIxll<:;;l Ja Ja
Consequently, condition (i) is equivalent to the condition

sup 11'lj; - 4>nll < 00


n

Next observe that the functions ek defined by the equation ek(s) = sk, where
k = 0,1, ... , form a fundamental set in C[a, bJ, by the Weierstrass Polynomial
Approximation Theorem. Now apply the preceding theorem. •

Problems 1. 7

1. Prove the equivalence of these properties of a set U in a normed linear space X:


(a) U intersects each nonempty open set in X
(b) U intersects each open ball in X
(c) The closure of U is X
(d) For each x E X and each to > 0 there is a point u E U satisfying the inequality
Ilx - ull < to
(e) The set X" U contains no open ball.
2. An interesting metric space is obtained by taking any set X and defining d(x, y) to be 1
if x ¥ y and 0 if x = y. In such a metric space identify the open sets, the closed sets,
the convergent sequences, and the compact sets. Also determine whether the closure of
{x: d(x,y) < r} is the set {x: d(x,y) ~ r}. Is (X,d) complete?
3. Prove that the set of functions in e[O, 11 that do not possess a right-derivative at a given
point in [0, 1) is dense.
Section 1.7 The Baire Theorem and Uniform Boundedness 45

4. Is every set of the second category the complement of a set of the first category?
5. Prove that in a complete metric space, the complement of a set of the first category is
dense and of second category.
6. Prove that a closed, proper subspace in a normed linear space is nowhere dense (and
hence of first category).
7. Prove that in a Banach space, a subspace of second category must be dense.
8. Prove that in a Banach space every nonempty open set is of the second category. Prove
that this assertion is not true for normed linear spaces in general. (Give an example.)
9. Let [xn] be a sequence in a Banach space X. Assume that sUPn 1</>{xn)1 < 00 for each
</> E X'. Prove that [Xn] is bounded. Does X have to be complete for this? If so, give a
suitable example.
10. Determine the category of these sets: (a) the rationals in 1R; (b) the irrationals in 1R; (c) the
union of all vertical lines in 1R 2 that pass through a rational point on the horizontal axis;
(d) the set of all polynomials in C[O, I].
11. Does a homeomorphism (continuous map having a continuous inverse) preserve the cat-
egory of sets?
12. Give an example to show that a homeomorphic image of a complete metric space need
not be complete.
13. Prove that any subset of a set of the first category is also of the first category. Prove
that a set that contains a set of second category is also of second category.
14. Is the closure of a nowhere dense set also nowhere dense? Is the closure of a set of the
first category also of the first category?
15. For each natural number n, let An be a continuous linear transformation of a Banach
space X into a normed linear space Y. Suppose that for each x E X the sequence [An x]
is convergent. Define A by the equation Ax = limn--->oc Anx. Prove that A is linear and
continuous. Explain why completeness is needed.
16. Let X be the space of real sequences x = [x{l), x(2), ... ] in which only a finite number of
terms are nonzero. Give X the supremum norm. Define functionals </>n by the equation
</>n{x) = l:~=1 x{i). Show that the sequence [</>n{X)] is bounded for each x, that each
4>n is continuous, but that the sequence [</>n] is not bounded. (Compare to the Uniform
Boundedness Theorem.)

17. Prove that the set of reals whose decimal expansions do not contain the digit 7 is a set
of the first category.
18. Select a function Xo E C[O, I] and a sequence of reals [an]. Define recursively

n = 0, 1, ...

Assume that for each t E [0, I] there is an n for which xn{t) = O. Prove that Xo = O.
19. Return to Problem 15, and suppose that Y is complete. Weaken the hypotheses on An
so that An is not necessarily linear and the set of x for which [Anx] converges is of the
second category. Prove that this set must be X and that A is continuous.
20. Let [An] be a sequence of continuous linear maps from one Banach space X to another.
Prove that the set of x for which [Anx] is a Cauchy sequence is either X or a set of first
category.
21. Prove that in a complete metric space a set of the first category has empty interior.
22. Prove that in a complete metric space, if a countable intersection of open sets is dense,
then it is of second category.
23. Give an example of a metric space having count ably many points that contains no subset
of second category.
46 Chapter 1 Normed Linear Spaces

24. Prove that a set V is nowhere dense if and only if each nonempty open set has a nonempty
open subset that lies in the complement of V.
25. (Principle of Condensation of Singularities). For each n and m in 1'>1, let Anm be a
bounded linear operator from a Banach space X into a normed linear space Y. Assume
that sUPm IIAnm II = 00 for each n. Prove that the set

{ x EX: s:; IIAnmxl1 = 00 for each n}


is of second category.
26. (The Cantor Set). This famous set is C = [0,1] '- U::'=l An, where
Al = (~, ~), A2 = (!,~) u (~, ~),
A3 = (i-r, Mu (IT, MUm,~) u m, ~), and so on.

Draw pictures of [0, 1] '- AI, [0,1] '- (AI u A 2 ), and so on to see that we are successively
removing the middle thirds from intervals. Each An is open, so U
An is open. Hence C
is closed. Prove that C is nowhere dense. Prove that the lengths of the removed intervals
add up to 1. Explain how there can be anything left in C. Prove that C is a "perfect
set," i.e., if x E C, then C'- {x} is not closed.
27. Prove this theorem: Let X be a complete metric space. Let {f,,} be a family of continuous
real-valued maps defined on X. Assume that for each x, sup" If,,(x)1 < 00. Then for
some nonvoid open set 0, sUPxEO sup" If,,(x)1 < 00.
28. Prove that a countable union of sets of the first category is also a set of the first category.
29. Prove that a nowhere dense set is of the first category.
30. Is a countable set in a metric space necessarily a set of the first category?
31. Answer the question in Problem 30 for countable subsets of a normed linear space.
32. Prove that the sets Fn occurring in the proof of the Banach-Steinhaus Theorem are
closed.
33. In a complete metric space, is every nonempty open set of the second category?
34. A metric space (X, d) is said to be discrete if d(x, y) = 1 whenever x oF y. In such a
space identify the nowhere dense sets, sets of first category, sets of second category, and
dense sets. (Cf. Problem 2.)
35. Can a normed linear space have any of the peculiar properties of discrete metric spaces?
36. Show that a countable discrete metric space can be embedded isometrically in the Banach
space Co.
37. Give an example of sets S C Fe X, where X is a complete metric space, F is a closed
set in X, and S is of Category II in F but of Category I in X.
38. The intersection of a countable family of open sets is called a G6-set. Prove that the set
of rationals is not a G 6-set in JR.
39. (Continuation) Let f : JR --+ JR be continuous. Show that each set f-l(r) is a G6-set.
40. (Continuation) Let f : JR --+ JR. Define
w(x) = inf sup If(u) - f(v)1
<>0 Ix-ul<e
Ix-vl<e

{x: w(x) < c} is open.


°
Prove that w(x) = for each x at which f is continuous. Prove that for E > 0, the set

41. (Continuation) Prove that there is no function f : JR --+ JR that is continuous at each
rational point and discontinuous at each irrational point.
Section 1.8 Interior Mappings and Closed Mappings 47

42. Can an infinite-dimensional Banach space have a countable Hamel base?


43. Prove that the complement of a nowhere dense set is dense. What about the converse:
is it true?
44. Let f E COC(JR). Thus f has derivatives of all orders on lR. Suppose that 0 E {f(n)(t) :
n = 0, 1,2, ... } for each t.
Then f is a polynomial.
45. A point x in a metric space is isolated if for some c > 0, the ball of radius c centered at
x contains no point of the space except x. Prove that a complete metric space in which
there are no isolated points is uncountable.
46. If f : JR -+ JR, then there is an interval (a, b) and a number M such that each point of
(a, b) is the limit of a sequence [xn] such that a < Xn < band If(xn)1 :::; M.
47. Let X be a complete metric space and for each n let Fn be a closed set having empty
interior. Prove that U:'=l
Fn has empty interior.

48. Prove that a set E in a metric space X (or any topological space) is nowhere dense if
and only if X " E is dense.
49. In a metric space, is a singleton {x} always nowhere dense? Answer the same question
for a normed linear space.
50. Prove that if A is of the second category and B is of the first category, then A " B is of
the second category.
51. Is a countable intersection of sets of the second category necessarily a set of the second
category?
52. A subset of a metric space is called a residual set if its complement is of the first category.
Prove that the intersection of countably many residual sets is a residual set.

1.8 The Interior Mapping and Closed Mapping Theorems

A function f from one normed linear space X to another Y is said to be closed


(or to have a closed graph) if f is closed as a subset of X x Y. Expressed
otherwise, the set
{(x,f(x») :xEX}
is a closed set in X x Y. In terms of sequences, the closed property of f is
that the conditions Xn -+ x and f(xn) -+ y imply that y = f(x). It is clear
that a continuous map is closed. For general topological spaces this is still
true if Y is a Hausdorff space ([Ru1], page 29). The outstanding example of a
linear transformation that is closed but not continuous is the derivative operator
D acting on the differentiable functions in C[a, b] and mapping into C[a, b]. If
Xn -+ x and DX n -+ y, then y = Dx. This is actually a theorem of calculus
([Wid], page 305). Let us stop to prove it. We denote by C l [a, b] the linear
space of all functions on [a, b] whose derivatives exist and are continuous on
[a,b].

Theorem 1. Let Xn E Cl[a, b], IIXn -xlloo -+ 0, and Ilx~ -Ylloo -+ o.


Then y E C[a, b] and x' = y.

Proof. Since Xn E C l (a, bj, we have x~ E C[a, b). Thus y E C[a, b), by Theorem
2 in Section 1.2, page 10. By the Fundamental Theorem of Calculus and the
48 Chapter 1 Normed Linear Spaces

continuity of integration,

it
a
y( s) ds = it
ann
lim x~ (s ) ds = lim it x~
a
(s) ds

= lim[xn(t) - xn(a)] = x(t) - x(a)


n

Differentiation with respect to t now yields y(t) = x'(t).



Of course, in general we may not infer that x~ -+ x' from the sole hypothesis
that Xn -+ x, even if Xn E C l [a, b] and the convergence is uniform. For example,
the sequence xn(s) = ~ sin ns converges (uniformly) to 0, but the sequence
x~ (s) = cos ns does not converge even pointwise.
Another property that a mapping f : X -+ Y may have is being an interior
(or "open") mapping. That means that f maps open sets to open sets.

Theorem 2. The Interior Mapping Theorem. If a closed


linear transformation maps one Banach space onto another, then it is
an interior map.

Proof. Let L : X -» Y, where L is linear and closed, and X and Yare Banach
spaces. (This double arrow signifies a surjection.) Let S be the open unit ball
in X or Y, depending on the context. Since L is surjective,

Since Y is complete, the Baire Theorem implies that one of the sets cl [L(nS)]
has a nonempty interior. Suppose, then, that for some m in Nand r > °
v + rS C clL(mS)

It follows that v E cl L(mS), and hence

rS C clL(mS) - v C clL(mS) - clL(mS) C clL(2mS)

Hence S c clL(tS) for some t > 0, namely t = 2m/r.


We will now prove that S C L(2tS). Let y be any point of S. Select a
sequence of positive numbers t5n such that L~=l t5n < 1. Since y E cl L( tS),
there is an Xl in tS such that Ily - LXIII < 151 . Since

there is a point X2 E t5 ltS such that Ily - LXI - LX211 < 152. We continue this
construction, obtaining a sequence Xl, X2, ... whose partial sums Zn = Xl + ... +
Xn have the property Ily - LZnl1 < t5n . Also, we have
Section 1.8 Interior Mappings and Closed Mappings 49

The sequence [znl has the Cauchy property because

IIZn+i - znll = IIXn+l + ... + xn+ill < tOn + ... + tOn+i- 1 < t L OJ
j~n

Since X is complete, Zn -+ Z for some Z E X. Clearly, Ilzll


< 2t, or Z E 2tS.
Since L is closed and LZn -+ y, we conclude that y = Lz; thus y E L(2tS) as
claimed.
To complete the proof we show that L(U) is open in Y whenever U is open
in X. Let y be any point in L(U). Then y = Lx for some x E U. Since U is open,
there exists a 0 > 0 such that x + OS C U. Then y + OL(S) C L(U). By our
previous work, we know that S C L(2tS). Hence (O/2t)S C OL(S) and

y + (O/2t)S C L(U)

Thus L(U) contains a neighborhood of y, and L(U) is open.



Corollary 1. If an algebraic isomorphism of one Banach space onto
another is continuous, then its inverse is continuous.

Proof. Let L : X -» Y be such a map. (The two-headed arrow denotes a


surjective map. Thus L(X) = Y.) Being continuous, L is closed. By the Interior
Mapping Theorem, L is an interior map. Hence L -1 is continuous. (Recall that
a map I is continuous if 1-1 carries open sets to open sets.) •

Corollary 2. If a linear space can be made into a Banach space


with two norms, one of which dominates the other, then these norms
are equivalent.

Proof. Let X be the space, and N 1 , N2 the two norms. The equivalence of
two norms is explained in Problem 1.4.3, page 23. Let I denote the identity map
acting from (X,N2 ) to (X,N1 ). Assume that the norms bear the relationship
Nl ~ N 2. Since N 1 {Ix) ~ N 2(x), we see that I is continuous. By the preceding
corollary, I-I is continuous. Hence for some 0, N2(X) = N 2(I-I X) ~ oN1 (x) .•

Theorem 3. The Closed Graph Theorem. A closed linear map


from one Banach space into another is continuous.

Proof. Let L: X -+ Y be closed and linear. In X, define a new norm N(x) =


Ilxll + IILxll· Then (X, N) is complete. Indeed, if [xnl is a Cauchy sequence with
the norm N, then [xnl and [Lxnl are Cauchy sequences with the given norms in
X and Y. Hence Xn -+ x and LX n -+ y, since X and Yare complete. Since L is
closed, Lx = y and so

Ilx - Xnll + IILx - LXnll-+ 0


N(x - xn) =

By the preceding corollary, N(x) ~ ollxll for some o. Hence IILxl1 ~ ollxll •
50 Chapter 1 Normed Linear Spaces

Theorem 4. A normed linear space that is the image of a Banach


space by a bounded, linear, interior map is also a Banach space.

Proof. Let L : X --» Y be the bounded, linear, interior map. Assume that X
is a Banach space. By Problem 1.2.38 (page 14), it suffices to prove that each
absolutely convergent series in Y is convergent. Let Yn E Y and L: IIYnl1 < 00.
By Problem 2 (of this section), there exist Xn E X such that LX n = Yn and (for
some c> 0) Ilxn II ::( CllYn II· Then L: IIXn II ::( c L: llYn II < 00. By Problem 1.2.3,
page 12, the series L: Xn converges. Since L is continuous and linear, L(L: xn) =
L: LX n = L: Yn, and the latter series is convergent. •
Let L be a bounded linear transformation from one normed linear space,
X, to another, Y. The adjoint of L is the map L* : Y* -+ X* defined by
L *¢ = ¢ 0 L. Here ¢ ranges over Y*. It is elementary to prove that L * is linear.
It is bounded because

IIL*II = sup IIL*¢II = sup sup I(L*¢)(x)1


¢ ¢ x

= sup sup 1¢(Lx)1 = sup IILxl1 = IILII


x ¢ x

In this equation ¢ ranges over functionals of norm 1 in Y*, and x ranges over
vectors of norm 1 in X. We used Corollary 4 on page 36.
In a finite-dimensional setting, an operator L can be represented by a matrix
A (which is not necessarily square). This requires the prior selection of bases
for the domain and range of L. The adjoint operator L * is represented by
the complex conjugate matrix A*. An elementary theorem asserts that A is
surjective ("onto") if and only if A * is injective ("one-to-one"). (See Problem 20.)
The situation in an infinite-dimensional space is only slightly more complicated,
as indicated in the next three theorems.

Theorem 5. Let L be a continuous linear transformation from one


normed linear space to another. The range of L is dense if and only if
L * is injective.

Proof. Let L : X -+ Y. By Theorem 3 in Section 1.6 (page 37), applied


to L(X), we have these equivalent assertions: (1) L(X) is dense in Y. (2)
L(X).l. = O. (3) If ¢ E L(X).l., then ¢ = O. (4) If ¢(Lx) = o for all x, then
¢ = O. (5) If L*¢ = 0, then ¢ = O. (6) L* is injective. •

Theorem 6. The Closed Range Theorem. Let L be a bounded


linear transformation defined on a normed linear space and taking val-
ues in another normed linear space. The range of L and the null space
of L *, denoted by N (L *), are related by the fact that [N (L *) h is the
closure of the range of L.

Proof. Recall the notation U.l. for the set {x EX: ¢(x) = 0 for all ¢ E U},
where X is a normed linear space and U is a subset of X*. (See Problems 1.6.20
and 1.6.21, on page 38, as well as Problem 13 in this section, page 52.) We
denote by R(L) the range of L. To prove [closure R(L)] C [N(L*)].l., let Y be
Section 1.8 Interior Mappings and Closed Mappings 51

an element of the set on the left. Then Y = lim Yn for some sequence [Yn] in
R(L). Write Yn = LX n for appropriate Xn . To show that Y E [N(L*)].1. we must
prove that ¢(y) = 0 for all ¢ E N(L*). We have

¢(y) = ¢(lim Yn) = lim ¢(Yn) = lim ¢(Lxn)


= lim(¢oL)(xn) = lim(L*¢)(xn) = limO = 0
To prove the reverse inclusion, suppose that Y is not in [closure R(L)]. We
shall show that Y is not in [N(L*)].1.' By Corollary 2 of the Hahn-Banach
Theorem (page 34), there is a continuous linear functional ¢ such that ¢(y) =I- 0
and ¢ annihilates each member of [closure R(L)]. It follows that for all x,
(L*¢)(x) = (¢ 0 L)(x) = ¢(Lx) = O. Consequently, ¢ E N(L*). Since ¢(y) =I- 0,
we conclude that Y tj. [N(L*)].1.' •

Theorem 7 Let L be a continuous, linear, injective map from one


Banach space into another. The range of L is closed if and only if L is
bounded below: inf
IIxll=l
IILxl1
> O.

Proof. Assume first that IILxl1


~ c
> 0 when Ilxll
= 1. By homogeneity,
IILxl1 ~ cllxll for all x. To prove that the range, R(L), is closed, let Yn E R(L)
and Yn ---+ y. It is to be shown that Y E R(L). Let Yn = Lx n . The inequality

reveals that [x n ] is a Cauchy sequence. By the completeness of the domain space,


Xn ---+ x for some x. Then, by continuity,

Lx = L(limxn) = limLxn = limYn = Y


Hence Y E R(L).
Now assume that R(L) is closed. Then L maps the domain space X injec-
tively onto the Banach space R(L). By Corollary 1 of the Interior Mapping The-
orem (page 49), L has a continuous inverse. The equation ~ IlL -lYII IlL -lllllyll
is equivalent to Ilxll IIL-1111ILxll,
~ showing that L is bounded below. •

Problems 1.8

1. Use the notation in the proof of the Interior Mapping Theorem. Show that a linear map
L : X -t Y is interior if and only if L(8) J r 8 for some r > O.
2. Show that a linear map L : X --->t Y is interior if and only if there is a constant c such
that for each y E Y there is an x E X satisfying Lx = y, Ilxll ~ cllyll.
3. Define T : CO -t Co by the equation (Tx)(n) = x(n + 1). Which of these properties does
T have: injective, surjective, open, closed, invertible? Does T have either a right or a
left inverse?
4. Prove that a closed (and possibly nonlinear) map of one normed linear space into another
maps compact sets to closed sets.
5. Let L be a linear map from one Banach space into another. Suppose that the conditions
Xn -t 0 and LXn -t y imply that y = O. Prove that L is continuous.
52 Chapter 1 Normed Linear Spaces

6. Prove that if a closed map has an inverse, then the inverse is also closed.
7. Let M and N be closed linear subspaces in a Banach space. Define L : M x N --+ M + N
by writing L(x, y) = x + y. Prove that M + N is closed if and only if L is an interior
map.
8. Adopt the hypotheses of Problem 7. Prove that M + N is closed if and only if there is
a constant c such that each z E M + N can be written z = x + y where x E M, yEN,
and Ilxll + lIyll ~ cllzll·
9. Let L : X --+> Y be a continuous linear surjection, where X and Y are Banach spaces.
Let Yn --+ Y in Y. Prove that there exist points Xn E X and a constant e E IR such that
LXn = Yn, the sequence [XnJ converges, and IIxnll ~ ellYnll.
10. Recall the space f. defined in Example 8 of Section 1.1. Define L : l --+ l by (Lx)(n) =
nx(n). Use the sup-norm in l and prove that L is discontinuous, surjective, and closed.
11. Is the identity map from (C[-1, 1J, II II",,) into (C[-l, 1], II 111) an interior map? Is it
continuous?
12. (Continuation) Denote the two spaces in Problem 11 by X and Y, respectively. Let

Show that G is closed in Y. Define

nx Ixl ~ lin
9n(S) = { x/lxl Ixl > lin
Show that [9nJ is a Cauci1y sequence in Y. Since the space L1[-1, 1J is complete, 9n --+ 9
in L1. Since G is closed, 9 should be in G. But it is discontinuous. Explain.
13. Let X be a normed linear space, and let K C X and U C X*. Define

K.L = {¢ E X* : ¢(x) = 0 for all x E K}

U.1 = {x EX: ¢(x) = 0 for all ¢ E U}


Prove that these are closed subspaces in X* and X, respectively.
14. Prove that for any subset K in a normed linear space, (K.1 h is the closure of the linear
span of K. The Hahn-Banach Theorem can be used as in the proof of the Closed Range
Theorem. Problem 13 will also be helpful.
15. Prove that if L is a linear operator having closed range and acting between normed linear
spaces, then the equation Lx = y is solvable for x if and only if y E [N(L *)J.1.
16. Prove that if L is a bounded linear operator from one normed space into another, and
if IILxll1 dist(x,N(L)) is bounded away from 0 when IIxll = 1, then the conclusion of
Problem 15 is again valid.
17. Let T be a linear map of a Banach space X into itself. Suppose that there exists a
continuous, linear, one-to-one map L : X --+ X such that LT is continuous. Does it
follow that T is continuous?
18. Define an operator L by the equation

Describe the range of L and prove that it does not contain the function f(x) = et .
19. (Continuation) Draw the same conclusion as in Problem 18 by invoking the Closed Range
Theorem. Thus, find ¢ in the null space of L * such that ¢(f) f. O.
Section 1.9 Weak Convergence 53

20. For an m x n matrix A prove the equivalence of these assertions: (a) A* is injective. (b)
The null space of A* is O. (c) The columns of A* form a linearly independent set. (d)
The rows of A form a linearly independent set. (e) The row space of A has dimension
m. (f) The column space of A has dimension m. (g) The column space of A is IRm. (h)
The range of A is IRm. (i) A is surjective, as a map from IR n to IRm.

1.9 Weak Convergence

A sequence [xn] in a normed linear space X is said to converge weakly to an


element x if rfJ(xn) --+ rfJ(x) for every rfJ in X'. (Sometimes we write Xn ->. x.)
The usual type of convergence can be termed norm convergence or strong
convergence. It refers, of course, to Ilxn - xii
--+ O. Clearly, if Xn --+ x, then
Xn ->. x, because each rfJ in X' is continuous. This observation justifies the
terms "strong" and "weak."
Example 1. For an example of a sequence that converges weakly to zero yet
does not converge strongly to any point, consider the vectors en in Co defined by
en{i) = 6in . These are the "standard unit vectors" in the space co. (This space
was defined in Problem 1.2.16, on page 12.) Recall from Section 1.6, particularly
the proposition on page 34, that every continuous linear functional on Co is of
the form
00

rfJ(x) = La(i)x(i)
i=l
for a suitable point a E fl. Thus rfJ(e n) = a(n) --+ O. The sequence [x n] does not
have the Cauchy property, because Ilxn- Xmll
= 1 when n #- m. •

Lemma. A weakly convergent sequence is bounded.

Proof. Let X be the ambient space, and suppose that Xn --' x. Define func-
tionals xnon X' by putting
(rfJEX')
For each rfJ, the sequence [rfJ(xn)] converges in JR.; hence it is bounded. Thus
sUPn IXn(rfJ) I < 00. By the Uniform Boundedness Theorem (page 42), applied in
the complete space X', IIXnl1
~ M for some constant M. Hence, for all n,

EX', IlrfJll ~ 1} ~ M
sup {IXn(rfJ)1 : rfJ

By Corollary 4 of the Hahn-Banach Theorem (page 36), IIXnl1 ~ M. •

Theorem 1. In a finite-dimensional normed linear space, weak and


strong convergence coincide.

Proof. Let X be a k-dimensional space. Select a base {b 1 , ••• , bk } for X and


let rfJ1, ... , rfJk be the linear functionals such that for each X,
k
X= L rfJi(x)bi
i=l
54 Chapter 1 Normed Linear Spaces

By Corollary 1 on page 26, each functional ¢i is continuous. Now if Xn -->. x,


then we have ¢i(X n) -+ ¢(x), and consequently,

IlL
k k
Ilx - Xnll = ¢i(X - Xn)bill ~ L l¢i(X - Xn)lllbill-+ 0 •
i=l i=l
Theorem 2. If a sequence [xnJ in a normed linear space converges
weakly to an element x, then a sequence of linear combinations of the
elements Xn converges strongly to x.

Proof. Another way of stating the conclusion is that x belongs to the closed
subspace
Y = closure (span{ Xl, X2, ... })
If X f- Y, then by Corollary 2 of the Hahn-Banach Theorem (page 34), there is
a continuous linear functional ¢ such that ¢ E y..L and ¢( x) = 1. This clearly
contradicts the assumption that Xn -->. X. •

A refinement of this theorem states that a sequence of convex linear combi-


nations of {Xl,X2,"'} converges strongly to x. This can be proved with the aid
of a separation theorem, such as Theorem 3 in Section 7.3, page 344.

Theorem 3. If the sequence [Xo, Xl, X2, ... J is bounded in a normed


linear space X and if ¢(xn) -+ ¢(xo) for all ¢ in a fundamental subset
of X', then Xn -->. X.

Proof. (The term "fundamental" was defined in Section 1.6, page 36.) Let
:F be the fundamental subset of X' mentioned in the theorem. Let 'l/J be any
member of X'. We want to prove that 'l/J(x n) -+ 'l/J(xo). By hypothesis, there is a
constant M such that Ilxill < M for i = 0, 1,2, ... Given c > 0, select ¢l,"" ¢m
in :F and scalars AI, ... ,Am such that

Put ¢ = L Ai¢i. It is easily seen that ¢(xn) -+ ¢(xo). Select N so that for all
n> N we have the inequality I¢(xn) - ¢(xo)1 < c/3. Then for n > N,

1'l/J(Xn) - 'l/J(Xo) I ~ I'l/J(xn) - ¢(xn)1+ I¢(xn) - ¢(xo)1 + I¢(xo) - 'l/J(Xo) I


~ 11'l/J - ¢IIIIXnll + c/3 + II¢ - 'l/Jlllixoll


c c c
~ 3M M + 3 + 3M M = c

Example 2. Fix a real number p in the range 1 ~ P < 00. The space I!p is
defined to be the set of all real sequences X for which L~=l Ix(n)IP < 00. We
define a norm on the vector space I!p by the equation

00 ) lip
Ilxllp = ~ Ix(n)IP
(
Section 1.9 Weak Convergence 55

For p = 00, we take foo to be the space of bounded sequences, with norm
Ilxlloo = sUPn Ix(n)l. We shall outline some of the theory of these spaces. (This
theory is actually included in the theory of the LP spaces as given in Chapter
8.) Notice that in these spaces there is a natural partial order: x ~ y means
that x(n) ~ y(n) for all n. We also define Ixl by the equation Ixl(n) = Ix(n)l . •

HOlder Inequality. Let 1 < p < 00, lip + 11q = 1, x E fp, and
y E f q . Then

L x(n)y(n) ~ Ilxll p IIYllq


00

n=1

Minkowski Inequality. If x and yare two members of fp, then

Proof. For p = 1 an elementary proof goes as follows:

Ilx + YIII =L Ix(n) + y(n)1 ~ L Ix(n)1 +L ly(n)1 = Ilxll l + Ilylll


Now assume 1 < p < 00. Then

L Ix(n) + y(n)I ~ L {lx(n)1 + ly(n)IY


P

~ L {2max[lx(n)l, ly(n)llY
=L 2P max{lx(n)IP, ly(n)IP}
~ 2P L {lx(n)IP + ly(n)IP} < 00

This proves that x + y E f p . Now let lip + llq = 1 and observe that

because

Therefore, by the Holder inequality,

Ilx + yll: = L Ix(n) + y(n)IP


~ L Ix(n) + y(n)IP-Ilx(n)1 + L Ix(n) + y(n)IP-Ily(n)1
~ II Ix + ylP-Ill q {llxll p + IIYllp}
= Ilx + yll:/q {llxll p + IIYllp}

Thus, finally,

Some theorems about these spaces are given here without proof.
56 Chapter 1 Normed Linear Spaces

Theorem 4. The conjugate of fp is isometrically isomorphic to


fq, where p-I + q-1 = 1. (Here 1 ~ p < 00.) The isomorphism pairs
each element </> in f; with the unique element yin fq such that </>(x) =
~k x(k)y(k).

Theorem 5. Let x and Xn be in f p • We have Xn -' x if and only if


Ilxnllp
is bounded and xn(k) -+ x(k) for each k.

Theorem 6. Let S be a compact Hausdorff space, and suppose


X,X n E C(S). We have Xn -' x if and only if IIXnlloo
is bounded and
xn(s) -+ x(s) for each s E S.

Theorem 7. (Schur's Lemma) In the space f1' the concepts of weak


and strong convergence of sequences coincide.

A subset F in a normed linear space X is said to be weakly sequentially


closed if the weak limit of any weakly convergent sequence in F is also in F. A
weakly sequentially closed set F is necessarily closed in the norm topology, for
if Xn E F and Xn -+ x, then Xn -' x E F. (A simple example of a closed set that
is not weakly sequentially closed is the surface of the unit ball in the space co.)

Theorem 8. A subspace of a normed linear space is closed if and


only if it is weakly sequentially closed.

Proof_ Let Y be a weakly sequentially closed subspace in the normed space


X. If Yn E Y and Yn -+ y, then Yn -' Y and Y E Y. Hence Y is norm-closed.
For the converse, suppose that Y is norm-closed, and let Yn E Y, Yn -' y.
If Y 1- Y, then (because Y is closed) we have dist(y, Y) > O. By Corollary 2 of
the Hahn-Banach Theorem (page 34) there is a functional </> E y.L such that
</>(y) = 1. Hence </>(Yn) does not converge to </>(y) , contradicting the assumed
weak convergence. •

A refinement of this theorem states that a convex set is closed if and only
if it is weakly sequentially closed. See [DSl, page 422.

Theorem 9. A linear continuous mapping between normed spaces


is weakly sequentially continuous.

Proof. Let A : X -+ Y be linear and norm-continuous. In order to prove


that A is weakly continuous, let Xn -' x. For all </> E Y', </> 0 A E X'. Hence
</>(Axn - Ax) -+ 0 for all ¢ E Y·. •

In a conjugate space X', the concept of weak convergence is also available.


Thus </>n -' </> if and only if F(</>n) -+ F(</» for each F EX". There is another
type of convergence, called weak* convergence. We say that [</>nl converges to </>
in the weak* sense if </>n(x) -+ </>(x) for all x E X.
Section 1.9 Weak Convergence 57

Theorem 10. Let X be a separable normed linear space, and [¢n]


a bounded sequence in X'. Then there is a subsequence [¢ni] that
converges in the weak* sense to an element of X'.

Proof. Since X is separable, it contains a countable dense set, {XI,X2," .}.


Since [¢n] is bounded, so is the sequence [¢n(xd]. We can therefore find an
increasing sequence NI C N such that limnENI ¢n(xd exists. By the same rea-
soning there is an increasing sequence N2 C NI such that limnEN2 ¢n(X2) exists.
Continuing in this way, we generate sequences

Now use the Cantor diagonalization process: Define ni to be the ith element
of Ni . We claim that limi-+oo ¢ni (Xk) exists for each k. This is true because
limnENk ¢n(Xk) exists by construction, and if i ? k, then ni E Ni C Nk. For any
x E X we write

This inequality shows that [¢ni(x)] has the Cauchy property in IR for each x E
X. Hence it converges to something that we may denote by ¢(x). Standard
arguments show that ¢ E X'. •
For Schur's Lemma, see [HP] page 37, or [Ban] page 137. The original
source is [Schul. See also [Jam] page 288, or [HoI] page 149.

Problems 1.9

1. Show that the Holder Inequality remains true if we replace the left-hand side by
L lx(n)lly(n)l·
2. If 1 < p < q, what inclusion relation exists between ep and eq ?
3. Prove that if Xn ~ x and llxn II ~ c, then IIxil :::; c. Why can we not conclude that
IIxil = c? Give examples. Explain in terms of weak continuity and weak semicontinuity
of the norm.
4. Fix p > 1 and define a nonlinear map T on ep by the equation Tx = IxI P - 1 sgn(x).
Thus, (Tx)(n) = Ix(n)IP-l sgn(x(n)) for all n. Prove that T maps tp into i q , where
lip + l/q = 1. Then determine whether T is surjective.
5. Prove this theorem: In order that a sequence [xnl in a normed linear space X converge
weakly to an element x it is necessary and sufficient that the sequence be bounded and
that </>(x n ) ~ </>(x) for all functionals </> in a set that is dense on the surface of the unit
ball in X'.
6. Prove this characterization of weak convergence in the space co: In order that a sequence
Xn converge weakly to an element x in the space CO it is necessary and sufficient that the
sequence be bounded and that (for each i) we have limn-toc xn(i) = x(i).
7. A Banacl1 space X is said to be weakly complete if every sequence [xnl such that </>(xn)
converges for each </> in X' must converge weakly to an element x in X. Prove that the
space CO is not weakly complete.
58 Chapter 1 Normed Linear Spaces

1.10 Reflexive Spaces

Let X be a Banach space. It is possible to embed X isomorphically and iso-


metrically as a subspace of X**. There may be many ways to do this, but one
embedding is called the natural or canonical embedding, denoted by J. Thus
J : X ---+ X**, and its definition is

(Jx)(</» = </>(x) </> E X*, xEX

The reader may wish to pause and prove that J is a linear isometry.
For an example of this embedding, let X = co; then X* = fll and X** = floo.
In this case, J : Co ---+ floo, and J can be interpreted as the identity embedding,
since </>(x) = L~=l u(n)x(n) for an appropriate u E fll.
If the natural map of X into X** is surjective, we say that X is reflexive.
Thus if X is reflexive, it is isometrically isomorphic to X**. The converse is false,
however. A famous example of R.C. James exhibits an X that is isometrically
isomorphic to X**, but the isometry is not the canonical map J, and indeed the
canonical image of J(X) is a proper subspace of X** in the example. See [Ja2].

Theorem 1. Each space flp, where 1 < p < 00, is reflexive.

Proof. If p-l + q-l = 1, then fl; = flq and fl~ = flp by Theorem 4 of Section
1.9, page 56. Hence fl;* = fl p. But we must be sure that the isometry involved
in this statement is the natural one, J. Let A : flp ---+ fl~ and B : flq ---+ fl; be
the isometries that have already been discussed in a previous section. Thus, for
example, if x E flp then Ax is the functional on flq defined by

L x(n)y(n)
00

(Ax)(y) =
n=l

Define B* : fl;* ---+ fl~ by the equation

B*</> = </> 0 B

One of the problems asks for a proof of the fact that B* is an isometric isomor-
phism of fl;* onto fl~. Thus B*-l A is an isometric isomorphism of flp onto fl;*.
Now we wonder whether B*-l A = J. Equivalent questions are these:

B>-lAx = Jx (x E flp)
Ax=B*Jx (x E flp)
(Ax)(y) = (B* Jx)(y) (x E flp, y E flq)
(Ax)(y) = (Jx)(By) (x E flp, y E fl q)
(Ax)(y) = (By)(x) (x E flp, y E flq)

The final assertion is true because both sides of the equation are by definition
L~=l x(n)y(n). •
Section 1.10 Reflexive Spaces 59

Theorem 2. A closed linear subspace in a reflexive Banach space


is reflexive.

Proof. Let Y be a closed subspace in a reflexive Banach space X. Let J : X -++


X" be the natural map. Define R : X' --+ Y' by the equation R4> = 4>1Y. (This
is the restriction map.) Let fEY". Define Y = J-1(f 0 R). We claim that
Y E Y. Suppose that Y 1 Y. By a corollary of the Hahn-Banach Theorem, there
exists 4> E X' such that 4>(Y) oJ 0 and 4>(Y) = O. Then it will follow that R4> = 0
and that 4>(Y) = 4>( J- 1(f 0 R)) = (f 0 R)( 4» = 0, a contradiction. Next we claim
that for all1j; E Y', f(1j;) = 1j;(y). Let ;j; be a Hahn-Banach extension of 1j; in
X'. Then 1j; = R;j; and f(1j;) = f(R;j;) = (foR)(;j;) = (Jy)(;j;) = ;j;(y) = 1j;(y) .•

Theorem 3. A Banach space is reflexive if and only if its conjugate


space is reflexive.

Proof. Let X be reflexive. Then the natural embedding J : X --+ X" is


surjective. Let <I> EX"', and define 4> E X' by the equation 4> = <I> 0 J. Then
for arbitrary f E X" we have f = Jx for some x, and consequently,

f(4)) = (Jx)(4» = 4>(x) = (<I> 0 J)(x) = <I>(Jx) = <I>(f)


Thus <I> is the image of 4> under the natural map of X' into X···. This natural
map is therefore surjective, and X' is reflexive.
For the converse, suppose that X' is reflexive. By what we just proved,
X" is reflexive. But J(X) is a closed subspace in X", and by the preceding
theorem, J(X) is reflexive. Hence X is reflexive (being isometrically isomorphic
to J(X)). •

Eberlein-Smulyan Theorem. A Banach space is reflexive if and


only if its unit ball is weakly sequentially compact.

Proof. (Partial) Let X be reflexive, S its unit ball, and [Yn] a sequence in
S. We wish to extract a subsequence [Yni] such that Yni --' yES. To start,
let Y be the closure of the linear span of {Yl, Y2, ... }. Then Y is a closed and
separable subspace of X. By Theorem 2, Y is reflexive, and so Y = Y". Since
Y" is separable, so is Y' . Let {1j;1, 1j;2, ... } be a countable dense set in Y·. Since
[1j;I(Yn)] is bounded, there exists an infinite set NI eN such that limnEN l 1j;dYn)
exists. Proceeding as we did in the proof Theorem 10, Section 1.9, page 57, we
find a subsequence Yni such that 1j;( Yn i) converges for all1j; E Y'. By a corollary
of the uniform boundedness theorem, there is an element f of Y" such that
1j;( Yn i) --+ f(1j;) for all1j; E Y'. Since Y is reflexive, f(1j;) = 1j;(y) for some Y E Y.
Hence 1j;(YnJ --+ 1j;(y) for all1j; E Y·. Now if 4> E X', then 4>IY E Y·. Hence

Thus Yni --' y. By a corollary of the Hahn-Banach Theorem, ~ l. iiyii


The converse is more difficult, and we do not give the proof. See [Yo], page
141, or [Tay2], page 230. •
60 Chapter 1 Normed Linear Spaces

Theorem of James. A Banach space X is reflexive if and only


if each continuous linear functional on X attains its supremum on the
unit ball of X.

Proof. (Partial) Suppose that X is reflexive. Let <I> E X', and select Xn E X
such that IIXnl1 :::;; 1 and <I>(xn) ----711<1>11. By the Eberlein-Smulyan Theorem, there
is a subsequence [xnij that converges weakly to a point x satisfying Ilxll :::;; 1. By
the definition of weak convergence,

The converse is more difficult, and we refer the reader to [Holj, page 157.•

One application of the second conjugate space occurs in the process of com-
pletion. If X is a normed linear space that is not complete, can we embed it
linearly and isometrically as a dense set in a Banach space? If so, such a Banach
space is termed a completion of X. The Cantor method of completion of a
metric space is fully discussed in [KFj. The idea of that method is to create a
new metric space whose elements are Cauchy sequences in the original metric
space.
If X is a normed linear space, we can embed it, using the natural map
J, into its second conjugate space X". The latter is automatically complete.
Hence J(X) can be regarded as a completion of X. It can be proved that all
completions of X are isometrically isomorphic to each other.
The Lebesgue spaces Lp[a, bj can be defined without knowing anything
about Lebesgue measure or integration. Here is how to do this. Consider the
space C[a, bj of all continuous real-valued functions on the interval [a, bj. For
1 :::;; p < 00, we introduce the norm

[f b ]l/P
Ilxllp = a Ix(s)IP ds

In this equation, the integration is with respect to the Riemann integral. The
space C[a, b], endowed with this norm, is denoted by Cp[a, bj. It is not complete.
Its completion is Lp[a, bj. Thus if J is the natural map of Cp[a, bj into its second
conjugate space, then

Problems 1.10

1. Use the fact that c~ = fl and fi = foe to prove that the successive conjugate spaces of
co are all nonrefiexive.

2. Find a sequence in the unit ball of Co that has no weakly convergent subsequence.
Chapter 2

Hilbert Spaces

2.1 Geometry 61
2.2 Orthogonality and Bases 70
2.3 Linear Functionals and Operators 81
2.4 Spectral Theory 91
2.5 Sturm-Liouville Theory 105

2.1 Geometry

Hilbert spaces are a special type of Banach space. In fact, the distinguishing
characteristic is that the Parallelogram Law is assumed to hold:

This succinct description gives no hint of the manifold implications of that as-
sumption. The additional structure available in a Hilbert space makes it the
preferred domain for much of applied mathematics! We pursue a more tradi-
tional approach to the subject, not basing everything on the Parallelogram Law,
but using ideas that are undoubtedly already familiar to the reader, in particular
the dot product or inner product of vectors. An inner-product space is a
vector space X over the complex field in which an inner product (x, y) has been
defined. We require these properties, for all x, y, and z in X:
(1) (x, y) is a complex number
(2) (x, y) = (y, x) (complex conjugate)
(3) (ax, y) = a(x, y) a EC
(4) (x,x»O if x#O
(5) (x + y, z) = (x, z) + (y, z)
The term "pre-Hilbert space" is also used for an inner-product space. Occa-
sionally, we will employ real inner-product spaces and real Hilbert spaces. For
them, the scalar field is JR, and the inner product is real-valued. However, some
theorems to be proved later are valid only in the complex case.

61
62 Chapter 2 Hilbert Spaces

Example 1. Let X = en (the set of all complex n-tuples). If two points


are given in en, say x = [x(1),x(2), ... ,x(n)] and Y = [y(1),y(2), ... ,y(n)], let
(x,y) = I:~1x(i)y(i). •
Example 2. Let X be the set of all complex-valued continuous functions
1 -
defined on [0,1]. For x and y in X, define (x, y) = fax(t)y(t)dt. •
In any inner-product space it is easy to prove that

(x + y, x + y) = (x, x) + 2R(x, y) + (y, y) R = "real part"


(x, ay) = a(x, y)
(x, y + z) = (x, y) + (x, z)
n n

\I:xi'Y) = I:(Xi'Y)
i=1 i=1

In an inner-product space, we define the norm of an element x to be Ilxll =


J(x:X).

Theorem 1. The norm has these properties


a. Ilxll > 0 if x # 0
h. Ilaxll = lal Ilxll (a E q
c. l(x,y)1 ~ Ilx1111Y11 Cauchy-Schwarz Inequality
d. Ilx + yll ~ Ilxll + IIYII Triangle Inequality
e. Ilx + Yl12 + Ilx - Yl12 = 211 xl1 2+ 211Yl12 Parallelogram
Equality
f. If (x, y) = 0, then Ilx + Yl12 = IIxl12 + IIyl12 Pythagorean
Law.

Proof. Only c and d offer any difficulty. For c, let IIYII = 1 and write
o ~ (x - >"y, x - >..y) = (x, x) - X(x, y) - >"(y, x) + 1>"1 2 (y, y).
Now let>.. = (x,y) to get 0 ~ IIxl12 -1(x,y)1 2 . This establishes c in the case
IIYII = 1. By homogeneity, this suffices. To prove d, we use c as follows:
Ilx + Yl12 = (x + y, x + y) = (x, x) + (y, x) + (x, y) + (y, y)
= IIxl12 + 2R(x, y) + IIYl12 ~ IIxl12 + 21(x, y)1 + IIYl12

~ II xl1 2+ 211 xlillYII + IIyl12 = (11xll + Ilyll)2 •


Item e in Theorem 1 is called the Parallelogram Equality (or "Law") because it
states that the sum of the squares of the four sides of a parallelogram is equal
to the sum of the squares of the two diagonals.
Sect jon 2.1 Geometry 63

Lemma. In an jnner-product space:


a. x = 0 jf and only jf (x, v) = 0 for all v
b. x = y jf and only jf (x, v) = (y, v) for all v
c. Ilxll = sup{l(x, v)1 : Ilvll = I}
Proof. If x = 0, then (x, v) = 0 for all v by Axiom 3 for the inner product. If
(x, v) = 0 for all v, then (x, x) = 0, and so :r = 0 by Axiom 4. The condition
x = y is equivalent to x - y = 0, to (x - y, v) = 0 for all v, and to (x, v) = (y, v)
for all v. If Ilvll = 1, then by the Cauchy-Schwarz Inequality, l(x,v)1 ~ Ilxll. If
x = 0, then Ilxll ~ l(x,v)1 for all v. If x -1= 0, let v = x/llxll· Then Ilvil = 1 and
(x, v) = Ilxll. •
Definition. A Hilbert space is a complete inner-product space.
Recall the definition of completeness from Section l.2 (page 10): It means
that every Cauchy sequence in the space converges to an element of the space.
Example 3. The space of complex-valued continuous functions on [0,1] fur-
nished with inner product

(x,y) = 11X(t)y(t)dt

is not complete. Consider the sequence shown in Figure 2.l. The sequence
of functions has the Cauchy property, but does not converge to a continuous
function.

Figure 2.1

Example 4. We write L2[a, b] for the set of all complex-valued Lebesgue



measurable functions on [a, b] such that

(The concept of measurability is explained in Chapter 8, Section 4, page 394.)


In L 2 [a, b], put (x, y) = J:
x( t )y( t) dt. This space is a Hilbert space, a fact known
as the Riesz-Fischer Theorem (1906). See Chapter 8, Section 7, page 411 for
the proof. This space contains many functions that have singularities. Thus,
the function t H C 1/ 3 belongs to L2[0, 1], but t H C 2/ 3 does not. •
In L2 [a, b], two functions f and 9 are regarded as equivalent if they differ
only on a set of measure zero. Refer to Chapter 8 for an extended treatment
of these matters. A set of measure 0 is easily described: For any E: > 0 we can
64 Chapter 2 Hilbert Spaces

cover the given set with a sequence of open intervals (an, bn ) whose total length
satisfies Ln(bn - an) < c. An important consequence is that if f is an element
of £2, then f (x) is meaningless! Indeed, f stands for an equivalence class of
functions that can differ from each other at the point x, or indeed on any set of
points having measure O. When f(x) appears under an integral sign, remember
that the x is dispensable: The integration operates on the function as a whole,
and no particular values f(x) are involved.
Example 5. Let (S, A, /L) be any measure space. The notation £2(S) then
denotes the space of measurable complex functions on S such that J 1f (s) 12 d/L <
00. In L2(S), define (f,g) = J f(s)g(s)d/L. Then L2(S) is a Hilbert space. See
Theorem 3 in Section 8.7, page 411. •
Example 6. The space £2 (or £2) consists of all complex sequences x =
[x(I), x(2), . .. ] such that L Ix(n)12 < 00. The inner product is (x, y) =
L x(n)y(n). This is a Hilbert space, in fact a special case of Example 5. Just
take S = N and use "counting" measure. (This is the measure that assigns to
a set the number of elements in that set.) This example is also included in the
general theory of the spaces £p, as outlined in Section 1.9, pages 54-56. •

Theorem 2. If K is a closed, convex, nonvoid set in a Hilbert space


X, then to each x in X there corresponds a unique point y in K closest
to x; that is,
Ilx - YII = dist(x, K) := inf{ Ilx - vii: v E K}
Proof. Put 0 = dist(x,K), and select Yn E K so that Ilx - Ynll -+ o. Notice
that ~(Yn + Yrn) E K by the convexity of K. Hence II ~(Yn + Ym) - xii ~ Q. By
the Parallelogram Law,

llYn - Yrnl1 2 == II(Ym - x) - (Yn - x)11 2


== 211Yn - xl1 2 + 211Ym - xl1 2 -llYn + Yrn - 2xl12
== 211Yn - xl1 2 + 211Ym - xl1 2 - 411~(Yn + Ym) - xl1 2
~ 211Yn - xl1 2 + 211Ym - xl1 2 - 40 2 -+ 0
This shows that [Yn] is a Cauchy sequence. Hence Yn -+ Y for some Y EX. Since
K is closed, Y E K. By continuity,

For the uniqueness of the point Y, suppose that Yl and Y2 are points in K of
distance 0 from x. By the previous calculation we have


In an inner-product space, the notion of orthogonality is important. If
(x, y) == 0, we say that the points x and yare orthogonal to each other, and we
write x J.. y. (We do not say that the points are orthogonal, but we could say
that the pair of points is orthogonal.) If Y is a set, the notation x J.. Y signifies
that x J.. Y for all Y E Y. If U and V are sets, U J.. V means that u J.. v for all
u E U and all v E V.
Section 2.1 Geometry 65

Theorem 3. Let Y be a subspace in an inner-product space X. Let


x E X and y E Y. These are equivalent assertions:
a. x - y .1 Y, i.e., (x - y, v) = 0 for all v E Y.
b. y is the unique point of Y closest to x.

Proof. If a is true, then for any u E Y we have

Here we used the Pythagorean Law (part 6 of Theorem 1).


Now suppose that b is true. Let u be any point of Y and let>. be any scalar.
Then (because y is the point closest to x)

Hence
2R{"X(x - y, u) } ~ 1>'1 2IIul12
If (x - y, u) i- 0, then u i- 0 and we can put>. = (x - y, u) IIIul12 to get a
contradiction:


Definition. The orthogonal complement of a subset Y in a inner-product
space X is

y1-={XEX:(x,y)=O for all yEY}

Theorem 4. If Y is a closed subspace of a Hilbert space X, then


X=Yffiy1-.

Proof. We have to prove that Y 1- is a subspace, that Y n Y 1- = 0, and that


X C Y + Y 1-. If VI and V2 belong to Y 1-, then so does 01 VI + 02V2, since for
Y E Y,
(y, 01 VI + 02V2) = 01 (y, VI) + 02(y, V2) = 0

If x E Y n y1-, then (x,x) = 0, so x = O. If x is any element of X, let y be


the element of Y closest to x. By the preceding theorem, x - y.l Y. Hence the
equation x = y + (x - y) shows that X c Y + y1-. •

Theorem 5. If the Parallelogram Law is valid in a normed linear


space, then that space is an inner-product space. In other words, an
inner product can be defined in such a way that (x, x) = Ilx112.

Proof. We define the inner product by the equation


66 Chapter 2 Hilbert Spaces

From the definition, it follows that

4R(x, y) = Ilx + YI12 _ IIX _ YI12


From this equation and the Parallelogram Law we obtain

4R(u + v, y) = Ilu + v + YI12 - IIU + v _ YI12


= {211u + YI12 + 211Vl12 -IIU + Y - V11 2}
_ {211u11 2+ 211v - YI12 -IIU _ v + Y112}
= {IIU + YI12 -IIU _Y112} + {IIV + YI12 -IIV _ Y112}
+ {IIU + YI12 + IIU _YI12 _ 211Ul12_ 211Y112}
+ {211Y112 + 21H2 -llv + YI12 -IIV - yin
= 4R(u, y) + 4R(v, y)
This proves that R(u + v, y) = R(u, y) + R(v, V). Now by putting iy in place
of y in the definition of (x, y) we obtain (x, iy) = -i(x, V). Hence the imaginary
parts of these complex numbers satisfy

T.(u + v, y) = -Ri(u + v, y) = R(u + v, iy)


= R(u, iy) + R(v, iy) = - Ri(u, y) - Ri(v, y)
= T.(u,y) +T.(v,y)

(In this equation, T. denotes "the imaginary part of.") Thus we have fully es-
tablished that (u + v, y) = (u, y) + (v, y). By induction, we can then prove that
(nx, y) = n(x, y) for all positive integers n. From this it follows, for any two
positive integers m and n, that

(:x,y) =:m(:,y) =: (x,y)

By continuity, we obtain (AX, y) = A(X, y) for any A ;?! O. From the definition,
we quickly verify that

(-x,y) = -(x,y) and (ix,y) = i(x,y)

Hence (AX, y) = A(X, y) for all complex scalars A. From the definition we obtain

4(x, x) =112xl12 + illx + ixl12 - illx - ixl12


= 411xl12 + ill + il211xl12 - ill- Wllxl1 2= 411xl12
Finally, we have

4(y, x) = Ily + xl1 2 -Ily - xl1 2+ illY + ixl12 - illY - ixl12


= Ilx + Yl12 -llx - Yl12 + ill- i(y + ix)11 2- illi(y - ix)11 2
= Ilx + Yl12 -llx - Yl12 + illx _ iyl12 - illx + iyl12

= 4(x, y)

Section 2.1 Geometry 67

In an inner-product space, the angle between two nonzero vectors can be


defined. In order to see what a reasonable definition is, we recall the Law of
Cosines from elementary trigonometry. In a triangle having sides a, b, c and
angle () opposite side c, we have

Notice that when () = 90°, this equation gives the Pythagorean rule. In an
inner-product space, we consider a triangle as shown in Figure 2.2.
x
y

o
Figure 2.2
We have

Ilx - YI12 = (x - Y,x - y) = (x,x) - (x,y) - (y,x) + (y,y)


= IIxl12 + IIYl12 - 2R(x, y)

On the other hand, we would like to have the law of cosines:

Therefore, we define cos () so that Ilx1111Y11 cos () = R(x, y) Thus

R(x,y)
() = Arccos Ilx1111Y11
The "principal value" of Arccos is used; it is an angle in the interval [0,7l'J. Is
the definition proper? Yes, because the number R(x,y)llxll-lllyll-l lies in the
interval [-1, 1], by the Cauchy-Schwarz inequality. Other definitions for the
angle between two vectors can be given. See [Ar], pages 87-90.
There are many sources for the theory of Hilbert spaces. In addition to the
references indicated at the end of Section 1.1, there are these specialized texts:
[AG], [Ar], [Berb], [Berb2]' [DMJ, [HaI2J, [HaI3], [St], and [Youn].

Problems 2.1

1. Verify that Example 1 is an inner-product space.


2. Verify that Example 2 gives an inner product. Give all details, especially for the fourth
axiom.
3. Prove the four equations stated in the text just after Example 2.
4. Fix x and y in an inner-product space, and determine the value of), for which Ilx - )'yli
is a minimum.
5. Prove the Parallelogram Law.
68 Chapter 2 Hilbert Spaces

6. Let K be a convex set in an inner-product space. Let z be a point at distance 0 from


K. Prove that the diameter of the set

{x E K : IIx - zll ::( 0 + 8}


is not greater than 2v'208 + 82 . The diameter of a set Sis sUPu.vES Ilu - vII.
7. Prove that in an inner-product space if IIxll = 1 < IIYII, then lI(x - y/llyll)11 < IIx - yll·
8. Prove that in an inner-product space X, the mapping x I-t (x, v) is continuous. (Here v
can be any fixed vector in the space.) Prove that on X X X the mapping (x, y) I-t (x, y)
is continuous.
9. In an inner-product space X, let M = {x EX: (x,v) = O}, where v is a fixed, nonzero
vector. Show that M is a closed subspace. Prove that M has codimension 1.
10. For any subset M of an inner product space X, define M1. = {x EX: (x, m) = 0 for all
m E M}. Prove that M 1. is a closed subspace and that M n M 1. is either 0 (the empty
set) or O. (Here 0 denotes the zero subspace, {O}.)
11. Prove that l(x,y)1 = Ilxll lIyll if and only if one of the vectors x and y is a multiple of
the other.
12. Let X be any linear space, and let H be a Hamel basis for X. Show how to use H to
define an inner product on X and thus create an inner-product space.
13. Let A be an nxn matrix. In the real space IRn, define (x,y) = yT Ax. (Here we interpret
elements of IRn as n x 1 matrices. Thus yT is a 1 x n matrix.) Find necessary and sufficient
conditions on A in order that our definition shall produce a genuine inner product.
14. Let X be the space of all "finitely nonzero sequences" of complex numbers. Thus x E X if
x: N -> IR and {n: x(n) 1= O} is finite. For x and y in X, define (x,y) = x(n)y(n). 2::'=1
Prove that X is not a Hilbert space.
15. Let X = IR2, and define an inner product between vectors x = [xU), x(2)J and y =
[y(l), y(2)J by the equation

(x, y) = 2x(1)y(1) + x(2)y(2)


Prove that this makes X a real inner-product space. Let

Y = {y EX: y(l) - y(2) = O}


Find the point y of Y closest to x = [O,lJ. Draw an accurate sketch showing all of this.
Explain why x - y is not perpendicular to Y. Does this contradict Theorem 5? Draw a
sketch of the unit ball.
16. Prove or disprove this analogue of Theorem 4: If Y is a subspace of a Hilbert space X,
then X =
Y Ell Y 1. .
17. Let x and y be points in a real inner-product space such that Ilx + Yll2 = IIxll 2+ IIYIl2.
Show that x .1 y. Show that this is not always true in a complex inner-product space.

18. In an inner-product space, prove that if IIxn II lIyll and (xn, y) -> lIyll2, then In -> y.
->

19. Prove or disprove: In a Hilbert space, if 2::'=1 IIxn ll 2 < 00, then the series 2::'=1 xn
converges.
20. Find all solutions to the equation (x, a)e = b, assuming that a, b, and e are given vectors
in an inner-product space.
21. Indicate how the equation Ax = b can be solved if the operator A is defined by Ax =
2:~=1 (x, ai)ei. Describe the set of all solutions.

22. Find all solutions to the equation x + (x, a)e = b.


23. Use Problem 22 to solve the integral equation xes) + f01 x(t)t 2 sdt = eoss.
Section 2.1 Geometry 69

24. Let v = [v(1),v(2), ... 1 be an element of £2 Prove that the set {x E £2: Ix(n)I";; Iv(n)1
for all n} is compact in f.2.
25. Prove that if M is a closed subspace in a Hilbert space, then M.L.L = M.
26. Prove that if M = M.L.L for every closed linear subspace in an inner-product space, then
the space is complete.
27. Prove that if M and N are closed subspaces of a Hilbert space and if M ~ N, then
M + N is closed.
28. Consider the mapping A in Problem 21. Find necessary and sufficient conditions on ai
and Ci in order that A have a fixed point other than O.
29. In a Hilbert space, elements w, Ui, and Vi are given. Show how to find an x such that
n

X = W + 2:)x, Vi)Ui
i=l

30. In a Hilbert space, let IIxnll ~ c, IIYnll ~ c, and (xn,Yn) ~ c2 . Prove that Ilxn -Ynll ~
O. Then make two generalizations. Is there any similar result for unbounded sequences?

31. If MeN, then N.L C M.L. Prove this.

32. Let K be a closed convex set in a Hilbert space X. Let x E X and let Y be the point of K
closest to x. Prove that R(x - y, v - y) ,,;; 0 for all v E K. Interpret this as a separation
theorem, i.e., an assertion about a hyperplane and a convex set. Prove the converse.

33. Prove that if ai ~ 0 and L::':1 ai < 00, then

34. The Banach space (1 consists of sequences [Xl, X2, ... 1for which L:
IXn I < 00. The norm
is defined to be Ilxll = 2::
IXnl· Prove that £1 is dense in f.2, and explain why this does
not contradict the fact that £1 is complete.

35. Prove that in a real-inner product space

36. In a real inner-product space, does the equation IIx + Y + zl12


imply any orthogonality relations among the three points?

37. Find the necessary and sufficient conditions on the complex numbers Wi, W2, ..• , Wn in
order that the equation
n
(x,y) = Z:=x(k)y(k)Wk
k=l

shall define an inner product on en.


38. Prove that if x is an element of £2, then for all natural numbers n,

inf Ix(k)1 ""' Ix(j)1 = 0


k>n ~
j=1
70 Chapter 2 Hilbert Spaces

39. Let K be a closed convex set in a Hilbert space X. For each x in X, let Px be the point
of K closest to x. Prove that IIPx - pyll ~ IIx - yll. (Cf. Problems 2.2.24, 2.1.32.)

40. (Continuation) Prove that each closed convex set K in a Hilbert space X is a "retract",
i.e., the identity map on K has a continuous extension mapping X onto K.

41. Let F and G be two maps (not assumed to be linear or continuous) of an inner product
space X into itself. Suppose that for all x and y in X, (F(x),y) = (x,G(y». Prove that
if a sequence Xn converges to x, and G(xn) converges to y, then y = G(x). Prove also
that F(O) = G(O) = O.

42. Prove that in an inner product space, if A > 0, then

2.2 Orthogonality and Bases

Definition. A set A of vectors in an inner-product space is said to be or-


thogonal if (x, y) = 0 whenever x E A, YEA, and x f. y. Recall that we write
x..l y to mean (x,y) = 0, x..l S to mean that x..l y for all YES, and U ..1 V
to mean that x ..1 y for all x E U and y E V.

Theorem 1. Pythagorean Law. If {Xl, X2, ... , xn} is a finite


orthogonal set of n distinct elements in an inner-product space, then

Proof. By our assumptions, Xi f. Xj if if. j, and consequently,


n 2 n n nn n n
IILXjl1 = (LXj , LXi) = LL(Xj,Xi) = L(Xj,Xj) = L IIXjl 2 •
j=l j=l i=l j=l i=l j=l j=l

This theorem has a counterpart for orthogonal sets that are not finite, but
its meaning will require some explanation. What should we mean by the sum
of the elements in an arbitrary subset A in X? If A is finite, we know what is
meant. For an infinite set, we shall say that the sum of the elements of A is s if
and only if the following is true: For each positive f. there exists a finite subset
Ao of A such that for every larger finite subset F we have

When we say "larger set" we mean only that Ao c F c A. Notice that the
definition employs only finite subsets of A. For the reader who knows all about
Section 2.2 Orthogonality and Bases 71

"nets," "generalized sequences," or "Moore-Smith convergence," we remark that


what is going on here is this: We partially order the finite subsets of A by
inclusion. With each finite subset F of A we associate the sum S(F) of all the
elements in F. Then S is a net (i.e., a function on a directed set). The limit of
this net, if it exists, is the sum S of all the elements of A. To be more precise,
it is often called the unordered sum over A.
When dealing with an orthogonal indexed set of elements [Xi] in an inner-
product space, we always assume that Xi # Xj if i # j. This assumption allows
us to write Xi ..L Xj when i # j.

Theorem 2. The General Pythagorean Law. Let [Xj] be an


orthogonal sequence in a Hilbert space. The series converges if2: Xj
and only if 2: IIXjl12 < 00. If 2: IIXjl12 = A < 00, then II 2:Xj112 = A,
and the sum 2: Xj is independent of the ordering of the terms.

Proof. Put Sn = 2:;Xj and Sn = 2:; IIXjI12.


By the finite version of the Pythagorean Law, we have (for m > n)
m 2 m
IISm - Snl1 2
= IILXjl1 = L IIXjl12 = ISm - snl
n+l n+l

Hence [Sn] is a Cauchy sequence in X if and only if [sn] is a Cauchy sequence


in JR.. This establishes the first assertion in the theorem.
Now assume that A < 00. By the Pythagorean Law, IISnl12 = Sn, and hence
in the limit we have II 2: X J 112 = A. Let u be a rearrangement of the original
series, say U = 2: Xk·. Let Un = 2: n1 Xk·. By the theory of absolutely convergent
J J
series in JR., we have 2: IIXkj 112 = A. Hence, by our previous analysis, Un -t U
and IIuI1 2 = A. Now compute
n m n m

(Un,Sm) = \LXkj , LXi) = L L II x il1 2bikj


j=1 i=1 j=1 i=1

We let n -t 00 to get (u, Sm) = 2:::1 IIXi 112. Then let m -t 00 to get (u, x) = A,
where X = lim Sm. It follows that X = u, because

IIX - ul1 2= IIxI1 2- 2R(x, u) + IIul12 = A - 2A + A = 0 I

Definition. A set U in an inner-product space is said to be orthonormal if


each element has norm 1 and if (u, v) = 0 when u, v E U and u # v. If the set U
is indexed in a one-to-one manner so that U = lUi : i E I], then the condition of
orthonormality is simply (Ui' Uj) = bij, where, as usual, bij is 1 when i = j and
is 0 otherwise. If an indexed set is asserted to be orthonormal, we shall always
assume that the indexing is one-to-one, and that the equation just mentioned
applies.
If [Vi: i E I] is an orthogonal set of nonzero vectors, then [vdllvill : i E f]
is an orthonormal set.
72 Chapter 2 Hilbert Spaces

Theorem 3. If [Yl, Y2, ... ,Yn] is an orthonormal set in an inner-


product space, and if Y is the linear span of {Yi : 1 ::;; i ::;; n}, then for
any x, the point in Y closest to x is I:~l (x, Yi)Yi.

Proof. Let Y = I:~=l (x, Yi)Yi. By Theorem 3 in Section 2.1, page 65, it suffices
to verify that x - Y ..1 Y. For this it is enough to verify that x - Y is orthogonal
to each basis vector Yk. We have

(x - y, Yk) = (x, Yk) - \'L (x, Yi)Yi, Yk) = (x, Yk) - L (x, Yi)(Yi, Yk)
i i


The vector Y in the above proof is called the orthogonal projection of x onto
Y. The coefficients (x, Yi) are called the (generalized) Fourier coefficients of
x with respect to the given orthonormal system. The operator that produces
Y from x is called an orthogonal projection or an orthogonal projector.
Look ahead to Theorem 7 for a further discussion.

Corollary 1. If x is a point in the linear span of an orthonormal


set [Yl, Y2,"" Yn] then x = I:~=l (x, Yi)Yi.

Theorem 4. Bessel's Inequality. If [Ui : i E I] is an orthonormal


system in an inner-product space, then for every x,

Proof. For j ranging over a finite subset J of I, let Y = I:(x, Uj)Uj. This vector
Y is the orthogonal projection of x onto the subspace U = span[uj : j E J]. By
Theorem 3, x - y..l U. Hence by the Pythagorean Law

This proves our result for any finite set of indices. The result for I itself now
follows from Problem 4. •

Corollary 2. If [Ul, U2, ... ] is an orthonormal sequence in an


inner-product space, then for each x, lim n --+ oo (x, un) = O.

Corollary 3. If [Ui : i E I] is an orthonormal system, then for


each x at most a countable number of the Fourier coefficients (x, Ui)
are nonzero.

Proof. Fixing x, put I n = {i E I: I(X,Ui)1 > lin}. By the Bessel Inequality,

jEJn jEJn
Section 2.2 Orthogonality and Bases 73

Hence I n is a finite set. Since

UI n
00

{i: (X,Ui) =I o} =
n=l

we see that this set must be countable, it being a union of countably many finite
sets. •
Let X be any inner-product space. An orthonormal basis for X is any
maximal orthonormal set in X. It is also called an "orthonormal base." In this
context, "maximal" means not properly contained in another orthonormal set.
In other words, it is a maximal element in the partially ordered family of all
orthonormal sets, when the partial order is set inclusion, C. (Refer to Section
1.6, page 31, for a discussion of partially ordered sets.)

Theorem 5. Every nontrivial inner-product space has an orthonor-


mal basis.

Proof. Call the space X. Since it is not 0, it contains a nonzero vector x.


The set consisting solely of x/llxli
is orthonormal. Now order the family of
all orthonormal subsets of X in the natural way (by inclusion). In order to use
Zorn's Lemma, one must verify that each chain of orthonormal sets has an upper
bound. Let C be such a chain, and put A * = U{ A : A E C}. It is obvious that
A* is an upper bound for C, but is A* orthonormal? Take x and y in A* such
that x =I y. Say x E Al E C and y E A2 E C. Since C is a chain, either Al C A2
or A2 CAl. Suppose the latter. Then x, y E A 1 • Since Al is orthonormal,
(x,y) = O. Obviously, Ilxll
= 1. Hence A* is orthonormal. •

Theorem 6. The Orthonormal Basis Theorem. For an or-


thonormal family lUi] (not necessarily finite or countable) in a Hilbert
space X, the following properties are equivalent:
a. lUi] is an orthonormal basis for X.
b. Ifx E X and x 1. Ui for all i, then x = o.
c. For each x E X, x = L:(X,Ui)Ui.
d. For each x and y in X, (x,y) = L:(X,Ui}(y,Ui}.
e. For each x in X, IIxl12
= L: I(X,UiW. (Parseval Identity)

Proof. To prove that a implies b, suppose that b is false. Let x =I 0 and


x 1. Ui for all i. Adjoin xiI/xii to the family lUi] to get a larger orthonormal
family. Thus the original family is not maximal and is not a basis.
To prove that b implies c, assume b and let x be any point in X. Let
y = L: (x, Ui }Ui· By Bessel's inequality (Theorem 4), we have

By Theorem 2, the series defining y converges. (Here the completeness of X is


needed.) Then straightforward calculation (as in the proof of Theorem 3) shows
that x - y 1. Ui for all i. By b, x - y = o.
74 Chapter 2 Hilbert Spaces

To prove that c implies d, assume c and write

Straightforward calculation then yields (x,y) = I:(X,Ui)(y,Ui).


To prove that d implies e, assume d and let y = x in d. The result is the
assertion in e.
To prove that e implies a, suppose that a is false. Then [UiJ is not a maximal
orthonormal set. Adjoin a new element, x, to obtain a larger orthonormal set.
Then 1 = IIxl12 =f. I: I(X,ui)J2 = 0, showing that e is false. •
Example 1. One orthonormal basis in p2 is obtained by defining un(j) = rSnj .
Thus
Ul = [1, 0, 0, ... J, U2 = [0, 1, 0, ... J, etc.

To see that this is actually an orthonormal base, use the preceding theorem, in
particular the equivalence of a and b. Suppose x E p2 and (x, un) = 0 for all n.
Then x(n) = 0 for all n, and x = o. •
Example 2. An orthonormal basis for L2[0, IJ is provided by the functions
un(t) = e27rint , where n E Z. One verifies the orthonormality by computing the
appropriate integrals. To show that [unJ is a base, we use Part b of Theorem 6.
Let x E L2[0, 1J and x =f. O. It is to be shown that (x,u n ) =f. 0 for some n. Since
the set of continuous functions is dense in L2, there is a continuous y such that
Ilx - < YII Il xll/5.
Then IIYII ) Ilxll-llx - YII
> tllxll·
By the Weierstrass
Approximation Theorem, the linear span of [un] is dense in the space e[a, 1],
furnished with the supremum norm. Select a linear combination p of [un] such
that lip - Ylloo < Il xll/5. Then lip - YII < Il xll/5. Hence Ilpll > IIYII-IIY - pll >
~llxll. Then

l(x,p)1 ) l(p,p)I-I(y - p,p)I-I(x - y,p)1


) IIpl12 -IIY - plillpil-Ilx - Ylillpil > 0
Thus it is not possible to have (x, un) = 0 for all n.
Recall that we have defined the orthogonal projection of a Hilbert space

X onto a closed subspace Y to be the mapping P such that for each x E X, Px
is the point of Y closest to x.

Theorem 7. The Orthogonal Projection Theorem. The


orthogonal projection P of a Hilbert space X onto a closed subspace
Y has these properties:
a. It is well-defined; i.e., Px exists and is unique in Y.
b. It is surjective, i.e., P(X) = Y.
c. It is linear.
d. IiY is not 0 (the zero subspace), then Ilpll = 1.
e. x - Px ..1 Y for all x.
Section 2.2 Orthogonality and Bases 75

f. P is Hermitian; i.e., (Px, w/ = (x, Pw/ for all x and w.


g. If [y;] is an orthonormal basis for Y, then Px = ,£(x, Yi/Yi.
h. P is idempotent; i.e., pZ = P.

pxr
I. Py = Y for all Y E Y. Thus PlY =!y.

j. Ilxll z= I Px l1 2
+ Ilx -
Proof. This is left to the problems.

The Gram-Schmidt process, familiar from the study of linear algebra, is an
algorithm for producing orthonormal bases. It is a recursive process that can be
applied to any linearly independent sequence in an inner-product space, and it
yields an orthonormal sequence, as described in the next theorem.

Theorem 8. The Gram-Schmidt Construction. Let


[VI, V2, V3, ... J be a linearly independent sequence in an inner product
space. Having set Ul = vdllvll1, define recursively
n-l
Vn - '£ (vn, Ui/Ui
i=1
Un = ----'-n--"-I---- n = 2,3, ...
Ilvn - '£ (Vn,Ui/Uill
i=1

Then [Ul,UZ,U3, ... J is an orthonormal sequence, and for each n,


span{ Ul, Uz, .. . , un} = span{ VI, V2,' .. , vn }.

Notice that in the equation describing this algorithm there is a normalization


process: the dividing of a vector by its norm to produce a new vector pointing
in the same direction but having unit length. The other action being carried out
is the subtraction from the vector Vn of its projection on the linear span of the
orthonormal set presently available, Ul, U2, ... ,Un-I' This action is obeying the
equation in Theorem 3, and it produces a vector that is orthogonal to the linear
span just described. These remarks should make the formulas easy to derive or
remember.
Example 3. (A nonseparable inner-product space). A normed linear space
(or any topological space) is said to be separable if it contains a countable
dense set. If an inner-product space is nonseparable, it cannot have a count-
able orthonormal base. For an example, we consider the uncountable family of
functions u).. (t) = eiAt , where t E lR and >. E R This family of functions is
linearly independent (Problem 5), and is therefore a Hamel basis for a linear
space X. We introduce an inner product in X by defining the inner product of
two elements in the Hamel base:

This is the value that arises in the following integration:

2
T-+oo
1
lim -T jT u)..(t)ua(t)dt=
-T 2
T-+oo
1
lim -T jT ei()..-a)tdt
-T
76 Chapter 2 Hilbert Spaces

If A = (7, this calculation produces the result 1. If A -I (7, we get O. Elements of


X have the property of almost periodicity. (See Problem 1.) •
Example 4. (Other abstract Hilbert spaces). A higher level of abstraction
can be used to generate further inner product spaces and Hilbert spaces. Let us
create at one stroke a Hilbert space of any given dimension. Let S be any set.
The notation e S denotes the family of all functions from S to the field IC. This
set of functions has a natural linear structure, for if x and y belong to e S , x + y
can be defined by
(x + y)(s) = x(s) + y(s)
A similar equation defines AX for A E IC. Within e S we single out the subspace
X of all x E e S such that

(1)

(Here we are using the notion of unordered sum as defined previously.) This
construction is familiar in certain cases. For example, if S = {I, 2, ... , n}, then
the space X just constructed is the familiar space en. On the other hand, if
S = N, then X is the familiar space £2. In the space X, addition and scalar
multiplication are already defined, since X c e S . Naturally, we define the inner
product by

(2) (x,y) = 2)x(s)y(s): s E S]


Much of what we are doing here loses its mystery when we recall (from the
Corollary to Theorem 4) that the sums in Equations (1) and (2) are always
countable. The space discussed here is denoted by £2(S). •
Example 5. (Legendre polynomials.) An important example of an orthonor-
mal basis is provided by the Legendre polynomials. We consider the space
C[-I,I] and use the simple inner product

(f,g) = [II f(t)g(t)dt


Now apply the Gram-Schmidt process to the monomials t >--+ 1, t, t 2 , t 3 , ... The
un-normalized polynomials that result can be described recursively, using the
classical notation Pn :
Po(t) = 1
2n - 1 n- 1
Pn(t) = - - t Pn - 1 (t) - - - Pn - 2 (t) (n=2,3, ... )
n n
The orthonormal system is, of course, Pn = Pn/IIPnll. The completion of the
space C[-I, 1] with respect to the norm induced by the inner product is the
space L2[-I, 1]. Every function f in this space is represented in the L2-sense
by the series
00

f = L(f,Pk)Pk
k=O
Section 2.2 Orthogonality and Bases 77

We should be very cautious about writing


00

f(t) = "L,U,Pk)Pk(t)
k=O

because, in the first place, f(t) is meaningless for an element f E L2[-1,1].


In this context, f stands for an equivalence class of functions that differ from
each other on sets of measure zero. In the second place, such an equation would
seem to imply a pointwise convergence of the series, and that is questionable,
if not false. Without more knowledge about the expansion of f in Legendre
polynomials, we can write only

Consult [Davis] or [Sz] for the conditions on f that guarantee uniform conver-
gence of the series to f.

Problems 2.2

1. A function f : IR -+ IC is said to be almost periodic if for every c > 0 there is an e > 0


such that each interval of length e contains a number T for which

sup If(8 + T) - f(8)1 < c


SER

Prove that every periodic function is almost periodic, and that the sum of two almost
periodic functions is almost periodic. Refer to [Bes] and [Tay2] for further information.
2. Prove Theorem 7.
3. Prove Theorem 8. (Theorem 7 will help.)

4. Let x : I -+ IR+, where I is some index set. Suppose that there is a number M such that
L [Xj : j E J] ~ M for every finite subset J in I. Prove that L[Xi : i E I] exists and
does not exceed M. What happens if we drop the hypothesis x j ~ O?

5. Prove that the set of functions {u.x : .x E IR}, defined in Example 3, is linearly indepen-
dent.

6. Using the inner product

(x, y) == [11 x(t)y(t) dt

construct an orthonormal set {UO,Ul,U2,U3} where (for each j) Uj is a polynomial of


degree at most j. (One can apply the Gram-Schmidt process to the functions Vj(t) == t j .)

7. Prove that the functions un(t) == e int (n == 0, ±1, ±2, ... ) form an orthonormal system
with respect to the inner product

(x, y) == -1
271" 1"
-7r
x(t) y(t) dt
78 Chapter 2 Hilbert Spaces

8. Prove that the functions

cosnt n = -1, -2, -3, ...


un(t)= { sinnt n=1,2,3, ...
1/V2 n =0

form an orthonormal system with respect to the inner product

(x, y) 11"
= -; _" x(t)y(t) dt

9. Prove that the Chebyshev polynomials

Tn (t) = cos( n Arccos t) (-1 !'( t !'( 1 ; n = 0, 1,2, ... )

form an orthogonal system with respect to the inner product

What is the corresponding orthonormal system? Hint: Make a change of variable t =


cos 0 and apply Problem 8.

10. Let VI, V2, ... be a sequence in a Hilbert space X such that span{ VI, V2, ... } = X. Show
that X is finite dimensional.

11. Prove that any orthonormal set in an inner product space can be enlarged to form an
orthonormal basis.

12. Let D be the open unit disk in the complex plane. The space H2(D) is defined to be the
space of functions f analytic in D and satisfying ID If(z)i2 dz < 00. In H2(D) we define
(f,g) = ID f(z)g(z) dz. Prove that the functions un(z) = zn (n = 0,1,2, ... ) form an
orthogonal system in H2(D). What is the corresponding orthonormal sequence?

13. If 0 < Q < {3, which of these implies the other?


(a) L Ilxnll" < 00 , (b) L Ilxnll i3 < 00 .

14. Prove that if {VI, V2, ... } is linearly independent, then an orthogonal system can be
constructed from it by defining UI = VI and

L(v
n-I

Un = Vn - n ,Uj}uj/llujI12 n =2,3, ...


j=1

15. Illustrate the process in Problem 14 with the four vectors VO,VI,V2,V3, where Vj(t) =tj
II
and the inner product is defined by (x, y) = - I x(t)y(t) dt.

16. Let lUi : i Ell be an orthonormal basis for a Hilbert space X. Let [Vi: i Ell be an
orthonormal set satisfying Li Ilui - vil1 2 < 1. Show that [v;] is also a basis for X.
17. Where does the proof of Theorem 6 fail if X is an incomplete inner-product space? Which
equivalences remain true?
Section 2.2 Ortlwgonality and Bases 79

18. Prove that if P is the orthogonal projection of a Hilbert space X onto a closed subspace
Y, then I - P is the orthogonal projection of X onto y.L.

19. (Cf. Problem 12.) Let r be the unit circle in the complex plane. For functions continuous
on r define (f, g) = -i I.
f(z )g(z) z dz. Prove that this is an inner product and that the
functions zn form an orthogonal family.

20. Prove that an orthogonal projection P has the property that (Px, x) = IIPxl1 2 for all x.

21. Let [un} be an orthonormal sequence in an inner product space. Let [On} C <C and
L::'=1 IOnl 2 < 00. Show that the sequence of vectors Yn = L:7=1 OjUj has the Cauchy
property.

22. Let [Ul, U2,· .. , un} be an orthonormal set in an inner product space X. What choice of
coefficients AJ makes the expression IIx- L:7=1 AjUj II a minimum? Here x is a prescribed
point in X.
dn
23. Define Pn(t) =
_(t 2 - l)n for n = 0,1,2, ... Prove the orthogonality of {Pn : n E N}
dtn
II
with respect to the inner product (x, y) = -1 x(t)y(t) dt.

24. If K is a closed convex set in a Hilbert space X, there is a well-defined map P : X ..... K
such that Ilx - Pxll = dist(x, K) for all x. Which properties (a), ... , (j) in Theorem 7
does this mapping have? (Cf. Problem 2.1.39, page 70.)

25. Consider the real Hilbert space X = £2[-71", 11"}, having its usual inner product, (x, y) =
I~1f x(t)y(t) dt. Let U be the subspace of even functions in X; these are functions such
that u( -t) = u(t). Let V be the subspace of odd functions, v( -t) = -v(t). Prove that
X = U + V and that U 1. V. Prove that the orthogonal projection of X onto U is given
by Px = u, where u(t) = ~ [x(t) + x( -t)}. Find the orthogonal projection Q : X ..... V.
Give orthonormal bases for U and V, and express P and Q in terms of them.

26. Let [en} be an orthonormal sequence in a Hilbert space. Let M be the linear span of this
sequence. Prove that the closure of M is

27. Let [en : n E N} be an orthonormal basis in a Hilbert space. Let [on} be a sequence
in <C. What are the precise conditions under which we can solve the infinite system of
equations (x,e n ) = On (n EN)?

28. Find orthonormal bases for the Hilbert spaces in Examples 3 and 4.

29. What are necessary and sufficient conditions in order that an orthogonal set be linearly
independent?

30. A linear map P is a projection if p2 = P. Prove that if P is a projection defined on a


Hilbert space and IIPII = 1, then P is the orthogonal projection onto a subspace.

31. Let [un: n E N} be an orthonormal sequence in a Hilbert space X. Define


80 Chapter 2 Hilbert Spaces

Prove that the map a >-t L an Un is an isometry of [2 onto Y. Prove that Y is a closed
subspace in X.

32. An indexed set lUi : i E I] in a Hilbert space is said to be stable if there exist positive
constants A and B such that

whenever a E [2(1). Prove tha.t a stable family is linearly independent. Prove that every
orthonormal family is stable.

33. (Continuation) Let lUi : i E Z] be an orthonormal family. Define Vi = Ui + Ui+l' Prove


that [Vi: i E Z] is stable. Generalize.

34. (Continuation) Let [Ui : i E I] be a stable family. Let a : I --t C. Prove that these
properties ofa are equivalent: (1) Llail2 < 00; (2) Laiui converges; (3) Lai(X,Ui)
converges for each x in the Hilbert space.

35. (Continuation) Let lUi : i E I] be an indexed family of vectors of norm 1 in a Hilbert


space. Prove that if Litj I(Ui, UjW < 1, then the given family is stable.
36. (Continuation) Prove that if lUi : i E I] is stable, then {L aitti : a E [2(1)} is a closed
subspace.

37. Let [X1,X2, ... ,Xn ] be an ordered set in an inner-product space. Assume that it is
orthogonal in this sense: If Xi # Xj, then (Xi,Xj) = O. Show by an example that the
Pythagorean law in Theorem 1 may fail.

38. (Direct sums of Hilbert spaces). For n = 1,2,3, ... let Xn be a Hilbert space over the
complex field. The direct sum of these spaces is denoted by $:=1 X n , and its elements
are sequences [Xn : n E Ill], where Xn E Xn and L::1 IIxnl12 < 00. Show how to make
this space into a Hilbert space and prove the completeness.

39. This problem gives a pair of closed subspaces whose sum is not closed. Let X be an
infinite-dimensional Hilbert space, and let {un} be an orthonormal sequence in X. Put

1 Vn - 1
2
Wn = U2n+1 Zn = -Vn
n
+ ---Wn
n

Let W and Z denote the closed linear spaces generated by {w n } and {zn}. Prove that

(1) All three sequences {Vn}, {Wn}, {Zn} are orthonormal.

(2) The vector Xo is well-defined; i.e., its series converges.

(3) The vector Xo is in the closure of W + Z.


(4) If Z E Z, then (z, vn) = (z, zn)/n.

(5) If W E W, then (w, v n ) = O.


(6) If Xo = W + z, where W E Wand Z E Z, then

1 = n(xo, vn) = n(w + Z, vn ) = (z, zn) --t 0


This contradiction will show that Xo rt W + Z.
Section 2.3 Linear Functionals and Operators 81

40. Prove that an orthonormal set in a separable Hilbert space can have at most a countable
number of elements. Hint: Consider the open balls of radius ~ centered at the points in
the orthonormal set.

41. Let [un] be an orthonormal base in a Hilbert space. Define Vn = 2- 1/ 2(U2n + u2n+d.
Prove that [v n ] is orthonormal. Define another sequence [w n ] by the same formula,
except + is replaced by -. Show that the v-sequence and the w-sequence together
provide an orthonormal basis for the space.

42. Let X and Y be measure spaces, and f E L2(X X V). Let lUi] be an orthonormal basis
for L2(X). Prove that for suitable Vi E L2(y), we have f(x, y) = L
Ui(X)Vi(Y).

2.3 Linear Functionals and Operators

Recall from Section 1.5, page 24, that a linear functional on a vector
space X is a mapping ¢ from X into the scalar field such that for vectors x, y
and scalars a, b,
¢(ax + by) = a¢(x) + b¢(y)
If the space X has a norm, and if

(1) sup 1¢(x)1 < 00


IIxll=1

¢
we say that is bounded, and we denote by I!¢I!
the supremum in the inequality
(1). (Boundedness is equivalent to continuity, by Theorem 2 on page 25.)
The bounded linear functionals on a Hilbert space have a very simple form,
as revealed in the following important result.

Theorem 1. Riesz Representation Theorem. Every continuous


linear functional defined on a Hilbert space is of the form x H (x, v)
for an appropriate vector v that is uniquely determined by the given
functional.

Proof. Let X be the Hilbert space, and ¢ a continuous linear functional. De-
fine Y = {x EX: ¢(x) = O}. (This is the null space or kernel of ¢). IfY = X,
then ¢(x) = 0 for all x and ¢(x) = (x,O). If Y =1= X, then let 0 =1= U E y..L. (Use
Theorem 4 in Section 2.1, page 65.) We can assume that ¢(u) = 1. Observe
that X = Y EB Cu, because x = x - ¢(x)u + ¢(x)u, and x - ¢(x)u E Y. Define
v= u/llul1 2. Then
(x, v) = (x - ¢(x)u, v) + (¢(x)u, v) = ¢(x)(u, v) = ¢(x)(u, u)/llul12 = ¢(x) •

Example 1. Let X be a finite-dimensional Hilbert space with a basis


[UI' U2, ... , Un l, not necessarily orthonormal. Each point x of X can be rep-
resented uniquely in the form x = Lj Aj(X)Uj, and the Aj are continuous linear
82 Chapter 2 Hilbert Spaces

functionals. (Refer to Corollary 2 in Section 1.5, page 26.) Hence by Theorem 1


there exist points Vj E X such that

n
X = L)x,Vj)Uj xEX
j=1

Since Ui = 2::7=1 (Ui' Vj )Uj, we must have (Ui' Vj) = lSij. In this situation, we say
that the two sets [Ul, U2, . .. ,un] and [VI, V2, .. . ,Vn] are mutually biorthogonal
or that they form a biorthogonal pair. See [Brez]. •
Before reading further about linear operators on a Hilbert space, the reader
may wish to review Section 1.5 (pages 24-30) concerning the theory of linear
transformations acting between general normed linear spaces.
Example 2. The orthogonal projection P of a Hilbert space X onto a closed
subspace Y is a bounded linear operator from X into X. Theorem 7 in Sec-
tion 2.2 (page 74) indicates that P has a number of endearing properties. For
example, Ilpll= 1. •
Example 3. It is easy to create bounded linear operators on a Hilbert space X.
Take any orthonormal system lUi] (it may be finite, countable, or uncountable),
and define Ax = 2::i 2:: j aij (x, Uj )Ui. If the coefficients aij have the property
2:: i 2:: j laij 12 < 00, then A will be continuous. •

Theorem 2. Existence of Adjoints. If A is a bounded linear


operator on a Hilbert space X (thus A : X -+ X), then there is a
uniquely defined bounded linear operator A' such that

(Ax, y) = (x, A 'y) (x,y E X)

Furthermore, IIA'II = IIAII·


Proof. For each fixed y, the mapping x f-t (Ax, y) is a bounded linear func-
tional on X:

(A(>'x + W), y) = (>.Ax + J.lAz, y) = >'(Ax, y) + J.l(Ax, y)


I(Ax,y)1 ~ IIAxllllyl1 ~ IIAllllxllllYl1
Hence by the Riesz Representation Theorem (Theorem 1 above) there is a unique
vector V such that (Ax, y) = (x, v). Since v depends on A and y, we are at liberty
to denote it by A'y. It remains to be seen whether the mapping A' thus defined
is linear and bounded. We ask whether

A * (>.y + J.lz) = >'A' y + J.lA· z

By the Lemma in Section 2.1, page 63, it would suffice to prove that for all x,

(x,A·(>'y+J.lz)) = (x,>'A'y + J.lA·z)


Section 2.3 Linear Functionals and Operators 83

For this it will be sufficient to prove

(x, A*(AY + p,z)) = :\(x, A*y) + JI(x, A* z)


By the definition of A*, this equation can be transformed to

(Ax, AY + {tz) = :\(Ax, y) + JI(Ax, z)

This we recognize as a correct equation, and the steps we took can be reversed.
For the boundedness of A* we use the lemma in Section 2.1 (page 63) and
Problem 15 of this section (page 90) to write

IIA*II = sup IIA*YII = sup sup l(x,A*y)1


Ilyll=l Ilyll=l Ilxll=l
= sup sup I(Ax,y)1 = sup IIAxl1 = IIAII
IIxll=l IIYII=l Ilxll=l

The uniqueness of A* is left as a problem. (Problem 11, page 89)



The operator A* described in Theorem 2 is called the adjoint of A. For
an operator A on a Banach space X, *
A is defined on X*
by the equation
X
A*4> = 4> 0 A. If is a Hilbert space, X*
can be identified with by the Riesz X
Theorem: 4>(x) = (x,y). Then (A*4»(x) = (4)oA)(x) = 4>(Ax) = (Ax,y). Thus,
the Hilbert space adjoint is almost the same, and no shame attaches to this
innocent blurring of the distinction.
Example 4. Let an operator T on £2(S) be defined by the equation

(Tx)(s) = is k(s,t)x(t)dt

Here, S can be any measure space, as in Example 5, page 64. Assume that the
kernel of this integral operator satisfies the inequality

is is Ik(s, tW dtds < 00


Then T is bounded, and its adjoint is an integral operator of the same type,
whose kernel is (s, t) H k( t, s). Such operators have other attractive proper-
ties. (See Theorem 5, below.) They are special cases of Hilbert-Schmidt
operators, defined in Section 2.4, page 98.
If A is a bounded linear operator such that A = A *, we say that A is self-
adjoint. A related concept is that of being Hermitian. A linear map A on
an inner product space is said to be Hermitian if (Ax, y) = (x, Ay) for all x
and y. This definition does not presuppose the boundedness of A. However,
the following theorem indicates that the Hermitian property (together with the
completeness of the space) implies self-adjoint ness.
84 Chapter 2 Hilbert Spaces

Theorem 3. If a linear map A on a Hilbert space satisfies (Ax, y) =


(x, Ay) for all x and y, then A is bounded and self-adjoint.

Proof. For each y in the unit ball, define a functional </>y by writing </>y(x) =
(Ax, y). It is obvious that </>y is linear, and we see also that it is bounded, since
by the Cauchy-Schwarz inequality

l</>y(x)1 = I(Ax,y)1 = l(x,Ay)1 ~ IlxllllAYl1


Notice also that by the Lemma in Section 2.1, page 63,

sup IIPy(x)1 = sup I(Ax,y)1 = IIAxl1


Ilyll';;1 lIyll';;1

By the Uniform Boundedness Principle, (Section 1.7, page 42),

00 > sup II</>yll = sup sup l</>y(x)1


lIyll';;1 lIyll';;1I1xll';;1
= sup sup I(Ax,y)ll</>y(x)1
IIxll';;11Iyll';;1
= sup sup I(Ax,y)ll</>y(x)1
IIxll';;1I1yll';;1
= sup sup I(Ax,y)ll</>y(x)1
IIxll';;1I1yll';;1
= sup sup I(Ax, y)1 = sup IIAxl1 = IIAII
IIxll';;1I1yll:;::;1 IIxll:;::;1

The equation (Ax, y) = (x, Ay) = (x, A*y), together with the uniqueness of the
adjoint, shows that A = A*. •

With any bounded linear transformation A on an inner product space we


can associate a quadratic form x H (Ax,x). We define

I I Alii = sup I(Ax,x)1


IIxll=1

Lemma 1. Generalized Cauchy-Schwarz Inequality. If A is


a Hermitian operator, then

I(Ax,y)1 ~ IllAlllllxllllYl1
Proof. Consider these two elementary equations:

(A(x + y),x + y) = (Ax,x) + (Ax,y) + (Ay,x) + (Ay,y)


-(A(x - y), x - y) = -(Ax, x) + (Ax, y) + (Ay, x) - (Ay, y)

By adding these equations and using the Hermitian property of A, we get

(1) (A(x + y), x + y) - (A(x - y),x·- y) = 4R(Ax, y)


Section 2.3 Linear Functionals and Operators 85

From the definition of I I A I I and a homogeneity argument, we obtain

(2) I(Ax,x)1 ",; IIIAlllll xl1 2 (x E X)


Using Equation (1), then (2), and finally the Parallelogram Law, we obtain

14R(Ax, y)1 = I(A(x + y),x + y) - (A(x - y), x - Y)I


",; I(A(x + y), x + y)1 + I(A(x - y), x - y)1
",; IllAlllllx + Yl12 + IllAlllllx _ Yl12
= IIIAIII(21IxI12 + 211Y112)

Letting Ilxll = Ilyll = 1 in the preceding equation establishes that

IR(Ax,y)1 ",; I I Alii (11xll = IIYII = 1)


For a fixed pair x, y we can select a complex number 8 such that 181 = 1 and
8(Ax,y) = I(Ax,y)l. Then

I(Ax,y)1 = I(A(8x),y)1 = IR(A(8x),y)1 ",; I I Alii


By homogeneity, this suffices to prove the lemma.

Lemma 2. If A is Hermitian, then IIAII = I I Alii·
Proof. By the Cauchy-Schwarz inequality,

IliA I I = sup I(Au,u)I"'; sup IIAullllul1 = sup IIAul1 = IIAII


Ilull=l lIull=l Ilull=l
For the reverse inequality, use the preceding lemma to write

IIAII = sup IIAxl1 = sup sup I(Ax,y)1


IIxll=l IIxll=l lIyll=l
",; sup sup
IIxll=l IIYII=l
I I Alii Ilx1111Y11 = I I Alii •
Definition. An operator A, mapping one normed linear space into another, is
said to be compact if it maps the unit ball of the domain to a set whose closure
is compact.
When we recall that a continuous operator is one that maps the unit ball to
a bounded set, it becomes evident that compactness of an operator is stronger
than continuity. It is certainly not equivalent if the spaces involved are infinite
dimensional. For example, the identity map on an infinite-dimensional space is
continuous but not compact.

Lemma 3. Every continuous linear operator (from one normed


linear space into another) having finite-dimensional range is compact.

Proof. Let A be such an operator, and let ~ be the unit ball. Since A is
continuous, A(~) is a bounded set in a finite-dimensional subspace, and its
closure is compact, by Theorem 1 in Section 1.4, page 20. •
86 Chapter 2 Hilbert Spaces

Theorem 4. If X and Yare Banach spaces, then the set of compact


operators in £(X, Y) is closed.
Proof. Let [An] be a sequence of compact operators from X to Y. Suppose
that IIAn - All -+ o. To prove that A is compact, let [Xi] be a sequence in the
unit ball of X. We wish to find a convergent subsequence in [AXi]. Since Al
is compact, there is an increasing sequence heN such that [AIXi : i Ell]
converges. Since A2 is compact, there is an increasing sequence 12 C h such
that [A2Xi : i E h] converges. Note that [AIXi : i E 12] converges. Continue
this process, and use Cantor's diagonal process. Thus we let I be the sequence
whose ith member is the ith member of h for i = 1,2, ... By the construction,
[AnXi : i E I] converges. To prove that [AXi : i E I] converges, it suffices to
show that it is a Cauchy sequence. This follows from the inequality
IIAxi - AXil1 ~ IIAxi - AnXi11 + IIAnXi - AnXi11 + IIAnXi - AXil1
~ IIA - Anlllixill + IIAnXi - AnXi11 + IIAn - Alllixill •
Theorem 5. Let 8 be any measure space. In the space L2(8),
consider the integral operator T defined by the equation

(Tx)(s) = Is k(s,t)x(t)dt
If the kernel k belongs to the space L2(8 x 8), then T is a compact
operator from L2(8) into L2(8).
Proof. Select an orthonormal basis [un] for L2(8), and define anm =
(Tum, Un). This is the "matrix" for T relative to the chosen basis. In fact,
we have for any X in L2(8), x = Ln (x, un)u n , whence

Tx = L(Tx,un)un = L (L(X,Um)TUm,Un )un


. ._. n n m
(3)
= L[Lanm(x,um)]Un
n m
Using the notation ks for the univariate function t f-t k(s, t), we have

= j L l j ks(t)un(t) dtl2 ds = j LI(Tun)(s)1 2 ds


n n

= L jl(Tun)(s)1 2 ds = L IITunl1 2
n n

= LLI(Tu n ,um)1 2= LLIamn l2


n m n m

L 13m
00

= where 13m = L lamn l2


m=l n
Section 2.3 Linear Functionals and Operators 87

Equation (3) suggests truncating the series that defines T in order to obtain
operators of finite rank that approximate T. Hence, we put
n 00

Tn x = LLaij(X,Uj)Ui
i=1 j=1

By subtraction,
00

Tx - Tnx = LLaij(X,Uj)Ui
i>n j=1
whence, by the Cauchy-Schwarz inequality (in e2 !) and the Bessel inequality,

LL laijl2 L
~ 2 0000

IITx - Tn X l1 2 = L I Laij(X,Uj)1 ~ I(X,UkW


i>n j=1 i>n j=1 k=1
00

~ II x l1 2LLl a ijl2 = Il x l1 2Lf3i


i>n j=1 i>n

This shows that liT - Tnll--+ O. Since each Tn is compact, so is the limit T, by
Theorem 4. •

Theorem 6. The null space of a bounded linear operator on a


Hilbert space is the orthogonal complement of the range of its adjoint.

Proof. Let A be the operator and N(A) its null space. Denote the range of
A* by R(A*). If x E N(A) and z is arbitrary, then

(x,A*z) = (Ax,z) = (O,z) = 0

Hence x E R(A*)-L and N(A) C R(A*)-L. Conversely, if x E R(A*)-L, then

(Ax, Ax) = (x, A*(Ax)) = 0


whence Ax = 0, x E N(A), and R(A*)-L C N(A).

Corollary. A Hermitian operator whose range is dense is injective
(one-to-one).

A sequence [xnl in a Hilbert space is said to converge weakly to a point


x if, for all y,

A convenient notation for this is Xn --' x. Notice that this definition is in com-
plete harmony with the definition of weak convergence in an arbitrary normed
linear space, as in Chapter 1, Section 9, (page 53). Of course, the Riesz Repre-
sentation Theorem, proved earlier in this section (page 81), is needed to connect
the two concepts.
88 Chapter 2 Hilbert Spaces

Example 5. If [un] is an orthonormal sequence, then Un -" O. This follows


from Bessel's inequality,

which shows that (un, y) -+ 0 for all y.


We say that a sequence [xn] in an inner-product space is a weakly Cauchy

sequence if, for each y in the space, the sequence [(xn' y)] has the Cauchy prop-
erty in C.

Lemma 4. A weakly Cauchy sequence in a Hilbert space is weakly


convergent to a point in the Hilbert space.

Proof. Let [xn] be such a sequence. For each y, the sequence [(y,x n )] has
the Cauchy property, and is therefore bounded in C. The linear functionals ¢n
defined by ¢n (y) = (y, xn) have the property

sup l¢n(y)1 < 00 (y EX)


n

By the Uniform Boundedness Principle (Section 1.7, page 42), we infer that
I I : :;
¢n M for some constant M. Since

Ilxn I = Iiyli=l
sup I(y, xn) I = II¢n I : :; M

we conclude that [xn] is bounded. Put ¢(y) = limn(y,x n ). Then ¢ is a bounded


linear functional on X. By the Riesz Representation Theorem, there is an x for
which ¢(y) = (y,x). Hence limn(y,x n ) = (y,x) and Xn -" x. •
Many problems in applied mathematics can be cast as solving a linear equa-
tion, Ax = b. For our discussion here, A can be any linear operator on a Hilbert
space, X, and b E X. Does the equation have a solution, and if it does, can
we calculate it? The first question is the same as asking whether b is in the
range of A. Here is a basic theorem, called the "Fredholm Alternative." It is
the Hilbert space version of the Closed Range Theorem in Section 1.8, page 50.
Other theorems called the Fredholm Alternative occur in Section 7.5.

Theorem 7. Let A be a continuous linear operator on a Hilbert


space. If the range of A is closed, then it is the orthogonal complement
of the null space of A *; in symbols,

R(A) = [N(A*)].L

Proof. This is similar to Theorem 6, and is therefore left to the problems.


(Half of the theorem does not require the closed range.) •
Section 2.3 Linear Functionals and Operators 89

Problems 2.3

1. Let X be a Hilbert space and let A : X -t X be a bounded linear operator. Let lUi : i E II
be any orthonormal basis for X. (The index set may be uncountable.) Show that there
exists a "matrix" (a function Q on I x I) such that for all x, Ax = LiLjQij(X,Uj)Ui.

2. Adopt the hypotheses of Problem 1. Show that there exist vectors Vi such that Ax =
Li(x,Vi)Ui' Show also that the vectors Vi can be chosen so that IIvili ::::; IIAII.

3. Let [unl be an orthonormal sequence and let Ax = L.An(X,Un)Un, where [.AnI is


bounded. Prove that A = A' if and only if [.AnI C IR.
4. Prove that a bounded linear transformation on a Hilbert space is completely determined
by its values on an orthonormal basis. To what extent can these images be arbitrary?
5. Let X be a complex Hilbert space. Let A : X -t X be bounded and linear. Prove that
if Ax ~ x for all x, then A = O. Show that this is not true for real Hilbert spaces.
6. Let A be an operator on a Hilbert space having the form Ax = L.An (x, un)un , where
[unl is an orthonormal sequence and [.AnI is a bounded sequence in C. If f is analytic
on a domain containing [.AnI. then we define f(A)(x) to I:re L f(.An)(x, Un)un. For the
function f(z) = e Z prove that f(A + B) = f(A)f(B), provided that AB = BA.
7. Prove, without using the Hahn-Banach theorem, that a bounded linear functional defined
on a closed subspace of a Hilbert space has an extension (of the same norm) to the whole
Hilbert space.
8. Let Y be a subspace (possibly not closed) in a Hilbert space X. Let L be a linear map
from X to Y such that x - Lx ~ Y for all x EX. Prove that L is continuous and
idempotent. Prove that Y is closed and that L is the orthogonal projection of X onto Y.
9. Let A be a bounded linear operator mapping a Banach space X into X. Prove that if

L IcnlllAlln < 00
00

n=O

then L:"=o cnAn is also a bounded linear operator from X into X.


10. An operator A whose adjoint has dense range is injective.
11. Prove the uniqueness of A' and that A" = A.
12. Prove the continuity assertion in Example 3.
13. Let [enl be an orthonormal sequence, [.AnI a bounded sequence in C, and Ax =
L.An(X,en)en . Show that the operators defined by the partial sums L~ .Ak(x,ek)ek
need not converge (in operator norm) to A. Find the exact conditions on [.AnI for which
this operator convergence is valid. Prove that if the partial sum operators converge to
A, then A is compact.
14. Let X be a separable Hilbert space, and [unl an orthonormal basis for X. Define A :
X-tXby

Ax = f ~(x,un)un
n=l

Notice that A is a compact Hermitian operator. Prove that the range of A is the set

L 1n(y,un)1 2<00}
00

{YEX:
n=l

Prove that the range of A is not closed. Hint: Consider the vector V = L:=l Un/no
90 Chapter 2 Hilbert Spaces

15. Let X and Y be two arbitrary sets. For a function f :X x Y --t JR, prove that

sup sup f(x, y) = sup sup f(x, y)


xEX yEY yEY xEX

Show that this equation is not generally true if we replace sUPxEX by infxEx on both
sides.
16. Prove that if An --t A, then A~ --t A*. (This is continuity of the map A t-t A*.)
17. Prove that the range of a Hermitian operator is orthogonal to its kernel. Can this
phenomenon occur for an operator that is not Hermitian?
18. Prove that for a Hermitian operator A, the function x t-t dist(x, R(A)) is a norm on
ker(A). Here R(A) denotes the range of A.
19. Let A be a bounded linear operator on a Hilbert space. Define [x, yJ = (Ax, y). Which
properties of an inner product does [, J have? What takes the place of the Cauchy-
Schwarz inequality? What additional assumptions must be made in order that [, J be an
inner product?
20. Give an example of a nontrivial operator A on a real Hilbert space such that Ax 1. x
for all x. You should be able to find an example in JR 2 . Can you do it with a Hermitian
operator? (Cf. Problem 5.)
21. Let [unJ be an orthonormal sequence in an inner product space. Let [AnJ be a sequence
of scalars such that the series L AnUn converges. Prove that L IAnl2 < 00.
22. Let [unJ be an orthonormal sequence in a Hilbert space. Let Ax = L~I On (x, Un)un ,
where [onJ is a bounded sequence in IC. Prove that A is continuous. Prove that if [OnJ
is a bounded sequence in JR, then A is Hermitian. Prove that if [onJ is a sequence in C
such that L IOnl 2 < 00, then A is compact. Suggestion: Use Lemma 3 and Theorem 4.

23. If [unJ is an orthonormal sequence and Ax =L An (x, un)un where An E C and An It 0,


then A is not compact.
24. Let v be a point in a Hilbert space X. Define ¢(x) = (x,v) for all x E X. Show that the
mapping T such that Tv = ¢ is one-to-one, onto X*, norm-preserving, and conjugate
linear: T(OI VI + 02V2) = QITvl + Q2Tv2.
25. Prove that if X is an infinite--dimensional Hilbert space, then a compact operator on X
cannot be invertible.
26. Let X be a Hilbert space. Let A : X --t X be linear and let B : X --t X be any map such
that (Ax, y) = (x, By) for all x and y. Prove that A is continuous, that B is linear, and
that B is continuous.
27. Adopt the hypotheses of Problem 3, and prove that IIAII ~ sUPn IAnl.

28. Illustrate the Fredholm Alternative with this example. In a real Hilbert space, let A be
defined by the equation Ax = x - A(V,X)W, where v and ware prescribed elements of
the space, and (v, w) f. O. The scalar A is arbitrary. What are A*, N(A*), R(A)? (The
answers depend on the value of A.)
29. Refer to Theorem 5, and assume that 8 = [O,lJ. Prove that if the kernel k is continuous,
then Tx is continuous, for each x in L2(8).
30. Let A be a bounded linear operator on a Hilbert space, and let [unJ and [vnJ be two
orthonormal bases for the space. Prove that if Ln Lm I(Aun ,vm )1 2 < 00, then A is
compact. Suggestion: Base the proof on Lemma 3 and Theorem 4. Write

31. Define the operator T as in Theorem 5, page 86, and assume that

c = lllk(S' t)12 dsdt < 00


Section 2.4 Spectral Theory 91

Prove that if [un] is an orthonormal sequence and if TUn An Un for each n, then
Ln IAnl 2 ::; c.
32. Prove Theorem 7.
33. Prove the assertion made in Example 3.

2.4 Spectral Theory

In this section we shall study the structure of linear operators on a Hilbert


space. Ideally, we would dissect an operator into a sum of simple operators or
perhaps an infinite sum of simple operators. In the latter scenario, the terms of
the series should decrease in magnitude in order to achieve convergence and to
make feasible the truncation of the series for actual computation.
What qualifies as a "simple" operator? Certainly, we would call this one
"simple": Qx = (x,u)v, where U and v are two prescribed elements of the space.
The range of Q is the subspace generated by the single vector v. Thus, Q is an
operator of rank 1 (rank = dimension of range). We may assume that I vii
= 1,
since we can compensate for this by redefining u. Every operator of rank one is
of this form.
Another example of a simple operator (again of rank 1) is Tx = a(x, u)u.
Notice that in defining the operator T there is no loss of generality in assuming
that Ilull = 1, because one can adjust the scalar a to compensate. Next, having
adopted this slight simplification, we notice that T has the property Tu = au.
Thus, a is an eigenvalue of T and u is an accompanying eigenvector. From such
primitive building blocks we can construct very general operators, such as
00

Lx = Laj(x,uj/Uj
j=1

This goal, of representing a given operator L in the form shown, is beautifully


achieved when the operator L is compact and Hermitian. (These terms are
defined later.) We even have the serendipitous bonus of orthonormality in the
sequence [un). Each Un will then be an eigenvector, since
00

LUn = Laj(un,uj)Uj = an Un
j=1

Definition. An eigenvalue of an operator A is a complex number .x such that


A - .xI has a nontrivial null space. The set of all eigenvalues of A is denoted
here by A(A). (Caution: A(A) is defined differently in many books.)
If X is a finite-dimensional space, and if A : X -+ X is a linear map, then
A certainly has some eigenvalues. To see that this is so, introduce a basis for X
so that A can be identified with a square matrix. The following conditions on a
complex number .x are then equivalent:
(i) A - .xI has a nontrivial null space
92 Chapter 2 Hilbert Spaces

(ii) A - )..J is singular


(iii) det(A - AI) = 0 (det is the determinant function)
Since the map A f-T det(A - AI) is a polynomial of degree n (if A is an n x n
matrix), we see that there exist exactly n eigenvalues, it being understood that
each is counted a number of times equal to its multiplicity as a root of the
polynomial. This argument obviously fails for an infinite-dimensional space.
Indeed, an operator with no eigenvalues is readily at hand in Problem l.
If A is an eigenvalue of an operator A, then any nontrivial solution of the
equation Ax = AX is called an eigenvector of A belonging to the eigenvalue A.

Lemma 1. If A is a Hermitian operator on an inner-product space,


then:
(1) All eigenvalues of A are real.
(2) Any two eigenvectors of A belonging to different eigenvalues
are orthogonal to each other.
(3) The quadratic form x f-T (Ax, x) is real-valued.

Proof. Let Ax = AX, Ay = JLY, x f= 0, Y f= 0, A f= JL. Then


A(X, x) = (AX, x) = (Ax, x) = (x, Ax) = (x, AX) = "X(x, x)
Thus A is real. To see that (x, y) = 0, use the fact that A and JL are real and
write
(A - JL)(x, y) = (AX, y) - (x, JLY) = (Ax, y) - (x, Ay) = 0
For (3), note that (Ax, x) = (x, Ax) = (Ax, x). •
Lemma 2. A compact Hermitian operator A on an inner-product
space has at least one eigenvalue A such that IAI = IIAII.
Proof. Since the case A = 0 is trivial, we assume that A f= O. Put III A III =
sup{I(Ax,x)l: Ilxll = 1}. By Lemma 2 in Section 2.3 (page 85), III A III = IIAII.
Take a sequence of points Xn such that Ilxnll = 1 and liml(Axn,xn)1 = III A III·
Since A is compact, there is a sequence of integers nl, n2, ... such that limi AXni
exists. Put y = limi Axni . Then y f= 0 because I(Axni' x ni )I -t III A III f= O. By
taking a further subsequence we can assume that the limit A = lim (AXni' x ni )
exists. By Lemma 1, A is real. Then

IIAxni - AXnJ2 = IIAXni 112 - A(Axni' x ni ) - A(xni' Ax ni ) + A211Xni 112


Hence
o ~ lim IIAXni - AXni 112 = IIyl12 _ A2 _ A2 + A2 = IIyl12 _ A2

This proves that IAI ~ IIYII. On the other hand, from the above work we also
have
IIYII = lim IIAxni II ~ lim IIAllllXni II = IIAII = IAI
Thus our previous inequality shows that 0 ~ lim IIAxni - AXni II ~ 0, and that

Ily - AXni II ~ Ily - AXni II + IIAxni - AXni II -t 0


Thus x ni -t y/A. Finally, Ay = A(limAx ni ) = AlimAxni = AY, so Y is in the
null space of A - AI, and A is an eigenvalue. •
Section 2.4 Spectral Theory 93

Theorem 1. The Spectral Theorem. If A is a compact Hermi-


tian operator defined on an inner-product space, then A is of the form
Ax = I: Ak(X, ek)ek for an appropriate orthonormal sequence {ek}
(possibly finite) and appropriate real numbers Ak satisfying lim Ak = 0.
Furthermore, the equations Aek = Akek hold.

Proof. If A = 0, the conclusion is trivial. If A i= 0, we let Xl = X. Let Al and


el be an eigenvalue and unit eigenvector determined by the preceding lemma.
Thus, IAII = IIAII· Let X 2 = {x : (x,el) = O}. Then X 2 is a subspace of Xl,
and A maps X 2 into itself, since (Ax,el) = (X,Ael) = (x,Alel) = "XI(x,el) =
for any x E X 2 • (Thus X 2 is an invariant subspace of A.) We consider the
°
restriction of A to the inner product space X 2, denoted by AIX2. This operator
is also compact and Hermitian. Also, IIAIX211 ~ IIAII. If AIX2 i= 0, then the
preceding lemma produces A2 and e2, where IIe211 = 1, IA21 = IIAIX211 ~ IAII,
e2 ..l X I, Ae2 = A2e2. We continue this process. At the nth stage we have
IAII ~ IA21 ~ ... ~ IAnl > 0, {el, ... ,en} orthonormal, and Aek = Akek for
k = 1, ... , n. We define X n + l to be the orthogonal complement of the linear
span of [el,'" ,en]' If AIXn+1= 0, the process stops. Then the range of A is
spanned by el, ... ,en' Indeed, for any x, the vector x- I:~ (x, ek)ek is orthogonal
to {el,'" ,en}; hence it lies in X n+l , and so A maps it to zero. In other words,

n n
Ax = L(x,ek)Aek = LAk(x,ek)ek
k=l k=l
If AIXn+1i= 0, we apply the preceding lemma to get An+l and en+l. It remains
to be proved that if the above process does not terminate, then lim Ak = 0.
°
Suppose on the contrary that IAnl ~ f > for all n E N. Then en/An is a
bounded sequence, and by the compactness of A, the sequence A( en / An) must
contain a convergent subsequence. But this is not possible, since A(e n/ An) = en
and {en}, b~ing orthonor~al, satisfies Ilen - emil = ,)2. In the infinite case let
Yn = X - I:k=l (x, ek)ek. Smce Yn ..l I:k=l (x, ek)ek,
n n
II xl1 2= llYn + L(x,ek)ekI1 2 = IIYnl1 2 + L l(x,ekW ~ IIYnl1 2
k=l k=l
Since IAn+11 is the norm of IIAIXn+lll, we have

Remark. Every nonzero eigenvalue of A is in the sequence [An].

Proof. Suppose Ax = AX, x i= 0, A i= 0, A </:. {An: n EN}. Then x ..l en for


all n by Lemma 1. But then Ax = I: An (x, en)e n = 0, a contradiction. •
94 Chapter 2 Hilbert Spaces

Remark. Each nonzero eigenvalue A of A occurs in the sequence


[Anl repeated a number of times equal todim{x: (A-AI)X = O}. Each
of these numbers is finite.

Proof. Since An -t 0, a nonzero eigenvalue A can be repeated only a finite


number of times in the sequence. If it is repeated p times, then the subspace
{x : (A - AI)X = O} contains an orthonormal set of p elements and so has
dimension at least p. If the dimension were greater than p, there would exist
xi- 0 such that Ax = AX and (x, en) = 0 for all n (again impossible). •
The next theorem gives an application of the spectral resolution of an op-
erator, namely, a formula for inverting the operator A - AI when A is compact
and A is not an eigenvalue of A. (The Hermitian property is not assumed.)

Theorem 2. Let A be a compact operator (on an inner-product


space) having spectral decomposition Ax = L: An (x, en)e n . (We allow
An to be complex.) If 0 i- A rj. {An}, then A - >.I is invertible, and

(A - >.I)-IX = _A-IX + A-I"" A (x,e n ) e


L nAn_A n
Proof. If the series converges, then our formula is correct. Indeed, by the
continuity of A - >.I we have by straightforward calculation
(A - AI)Bx = B(A - AI)X = x
where Bx is defined by the right side of the equation in the statement of the
theorem. In order to prove that the series converges, define the partial sums
_ ~ (x,ek)
Vn - L A _ Aek
k=1 k
The sequence [vnl is bounded, because with an application of the Pythagorean

t i:'~klI2 ~ s~p
law and Bessel's inequality we have

IIvn l1 2 = 1 1 Aj ~ AI2 ~ I(x, ekW ~ {3 '11 x 11 2


Since A is compact, An -t 0, by Problem 15. Thus {3 < 00. Also, the sequence
[Avnl contains a convergent subsequence. But [Avnl is a Cauchy sequence, and a
Cauchy sequence having a convergent subsequence is convergent (Problem 1.2.26,
page 13). To see that [Avnl is a Cauchy sequence, write

A _ ~ A (x,ek)
Vn - L k A _ Aek
k=1 k
and
m

L l(x,ekW •
k=n+l
If an operator A is not necessarily compact but has a known spectral res-
olution (in the form of an orthonormal series), then certain conclusions can be
drawn, as illustrated in the next three theorems.
Section 2.4 Spectral Theory 95

Theorem 3. Let A be an operator on an inner-product space


having the form Ax = L~=1 An (x, en)e n , where {en} is an orthonormal
sequence and [An] is a bounded sequence of nonzero complex numbers.
Let M be the linear span of {en : n E N}. Then M 1. = ker( A).

Proof. The following are equivalent properties of a vector x:


(a) x E ker(A)
(b) IIAxl12 = 0
(c) L IA n (x,e n )12 = 0
(d) (x, en) = 0 for all n.

Theorem 4. Adopt the hypotheses of Theorem 3. The orthonormal
set {en} is maximal if and only ifker(A) = o.

Proof. By Theorem 3, ker(A) = 0 if and only if M1. = O. (In these equations, 0


denotes the 0 subspace.) The condition M 1. = 0 is equivalent to the maximality
of {en}. Here refer to Theorem 6 in Section 2.2, page 73, and observe that the
equivalence of (a) and (b) in that theorem does not require the completeness of
the space. •

Theorem 5. Let A be an operator on a Hilbert space such that Ax =


L~=l An (x, en)e n , where [en] is an orthonormal sequence and [An] is a
bounded sequence of nonzero complex numbers. If v is in the range of
A, then one solution of the equation Ax = v is x = L~=1 A~1 (v, en)e n .

Proof. Since v is in the range of A, v = Az for some z. Hence


00

(v,e m) = (Az,e m) = \LAn(z,en)en,em) = Am(z,em)


n=l

From this we have


00 00

L IA~l(v,enW = L l(z,enW ~ II zl12


n=l n=l

This implies the convergence of the series x = L~=l A~l (v, en)e n , by Theorem 2
in Section 2.2, page 71. It follows that

Example 1. Consider the operator A defined on L2[O, 1] by the equation

(Ax)(t) = 11 G(s,t)x(s)ds

where
(l-S)t when O~t~s~1
G(s, t) = { .
(1 - t)s when 0 ~ s ~ t ~ 1
96 Chapter 2 Hilbert Spaces

The eigenvalues of A are An = n 2 7r 2 , and the corresponding eigenfunctions are


en(t) = V2sin(2n7rt). This example is discussed also in Section 2.5 (page 107)
and in Section 4.7 (page 215). Theorem 5 shows how to solve the equation
Ax = v when v is a prescribed function in the range of A. •
We turn now to the topic of Fredholm integral equations of the first kind.
These have the form

(2) Is K(s, t)x(t) dt = I(s)

In this equation, the functions K and 1 are prescribed. The unknown


function x is to be determined. A natural setting is the space L2(8), described
on page 64. Let us assume that the kernel K is in the class L2(8 x 8), so that
Theorem 5 of Section 2.3 (page 86) is applicable.
If the integral on the left side of Equation (2) were a Riemann integral on
the interval [0,1]' it would be a limit of linear combinations of sections of the
bivariate function K. That is,
n 1 ..
}~~L(~)K(s, ;)x(;)
j=l
The sections of K are functions of s parametrized by the variable t:
S I-t Kt(s) = K(s, t)
Thus, we must expect the integral equation to have a solution only if f is !n the
L2 -closure of the linear span of the sections Kt. This argument is illformal, but
nevertheless alerts us to the possibility of there being no solution.
Adopting the notation of Theorem 5 in Section 2.3 (and its proof), we have

(Tx)(s) = Is K(s,t)x(t)dt

The operator T thus defined from L2(8) to L2(8) is compact. (It is an example
of a Hilbert-Schmidt operator.) Its range cannot be all of L2(8), except in the
special case when L2(8) is finite dimensional. Equation (2) will have a solution
if and only if f is in the range of T. Now, as in Section 2.3,
00 00

Tx = LLaij(X,Uj)Ui
i=l j=l

On the other hand, if 1 is in the range of T, we have


00

(3) 1 = L (f, Ui)Ui


i=l

Hence the equation Tx = 1 will be true if and only if Equation (3) holds and
00

Laij(X,Uj) = (f,Ui) (i=1,2, ... )


j=l
Section 2.4 Spectral Theory 97

Putting ~j = (x,Uj) and f3i = (1,Ui), we have the following infinite system of
linear equations in an infinite number of unknowns:
00

Laij~j = f3i (i=1,2, ... )


j=1
A pragmatic approach is to "truncate" the system by choosing a large integer n
and considering the finite matrix problem
n
L aij~;n) = f3i (i=1,2, ... ,n)
j=1

Here the notation ~Jn) serves to remind us that we must not expect ~;n) to equal
(x, Uj). One can define Xn = 'L.7=1 ~;n)Uj and examine the behavior of the
sequences [xnl and [Txnl. Will this procedure succeed always? Certainly not,
for the integral equation may have no solution, as previously mentioned.
Other approaches to the solution of integral equations are explored in Chap-
ter 4. The case of Equation (2) in which the kernel is separable or "degenerate,"
i.e., of the form
n
K(s, t) = L Ui(S)Vi(t)
i=1

is easily handled:

(Tx)(s) = 1S
K(s, t)x(t) dt = 1t
S i=1
Ui(S)Vi(t)X(t) dt
n
= LUi(S)(Vi,X)
i=1

This shows that the range of T is the finite-dimensional space spanned by the
functions U1, U2, ... ,Un' Hence, in order that there exist a solution to the given
integral equation it is necessary that f be in that same space: f = 'L.~=1 CiUi.
Any x such that (Vi, x) = Ci will be a solution.
Spectral methods can also be applied to Equation (2). Here, one assumes
the kernel to be Hermitian: K(s, t) = K(t, s). Then the operator T is Hermitian,
and consequently has a spectral form
00

Tx =L An (x, Un)Un
n=1
in which [unl is an orthonormal sequence. If f is in the span of that orthonormal
sequence, we write f = 'L.~=1 (1, un)u n . The solution, if it exists, must then be
the function x whose Fourier coefficients are (1, un) I An. If this sequence is not
an £2 sequence, we are out of luck! Here we are following Theorem 5 above. This
procedure succeeds if f is in the range of T.
For compact operators that are not self-adjoint or even normal there is still a
useful canonical form that can be exploited. It is described in the next theorem.
98 Chapter 2 Hilbert Spaces

Theorem 6. Singular-Value Decomposition for Compact Op-


erators. Every compact operator on a separable Hilbert space is
expressible in the form

L (x, un)vn
00

Ax =
n=l

in which [unl is an orthonormal basis for the space and [vnl is an orthog-
onal sequence tending to zero. (The sequences [unl and [vnl depend on
A.)

Proof. The operator A* A is compact and Hermitian. Its eigenvalues are non-
negative, because if A* Ax = f3x, then

o ~ (Ax, Ax) = (x, A * Ax) = (x, f3x) = f3(x, x)


Now apply the spectral theorem to A* A, obtaining

L A; (x, un)un
00

A* Ax =
n=l

where [unl is an orthonormal basis for the space and A~ -+ O. Since we are
assuming that [unl is a base, we permit some (possibly an infinite number) of
the An to be zero. In the spectral representation above, each nonzero eigenvalue
A~ is repeated a number of times equal to its geometric multiplicity. Define
Vn = Au n . Then we have

Hence [vnl is orthogonal, and Ilvnll = An -+ O. Since [unl is a base, we have for
arbitrary x,

L (x, un)un
00

x=
n=l

Consequently,

L (x, un)Aun = L (x, un)vn


00 00

Ax =
n=l n=l

A general class of compact operators that has received much study is the
Hilbert-Schmidt class, consisting of operators A such that

for some orthonormal basis [uol. It turns out that if this sum is finite for one
orthonormal base, then it is finite for all. In fact, there is a better result:
Section 2.4 Spectral Theory 99

Theorem 7. Let [ua] and [V}3] be two orthonormal bases for a


Hilbert space. Every linear operator A on the space satisfies

La IIAual1 2= L}3 IIAv}311 2


Proof. By the Orthonormal Basis Theorem, Section 2.2 (page 73), we have

LIIAual1 2= LLI(Aua,v}3W = LLI(Aua,v}3W


a a}3 }3a
= LL l(ua,A*v}3W = L IIA*v}311 2
}3 a }3
Letting {ua} = {V}3} in this calculation, we obtain L}3IIAv}3112 = L}3IIA*v}3112.
By combining these equations, we obtain the required result. •
Example 2. An example of a Hilbert-Schmidt operator arises in the following
integral equation from scattering theory:

U(x)=J(x) f G(lx-yl)h(y)u(y)dy
J..t n

Here, J, G, and h are prescribed functions, and U is the unknown function. The
function h often has compact support. (Thus it vanishes on the complement
of a compact set.) It models the sound speed in the medium, and in a simple
case could be a constant on its support. The function J in the integral equation
represents the incident wave in a scattering experiment. An important concrete
case is
u(x) = eWx
.1 ei\x-y\
4 I
1R3 1[" X - Y
Ih(y)u(y) dy
In this equation, p is a unit vector (prescribed). Notice the singularity in the ker-
nel of this integral equation. Unfortunately, in the real world, such singularities
are the rule rather than the exception. •
References for operator theory in general are [DS, vol.lI], [RS], [AG], [HaI2].

Problems 2.4

1. Let X be a Hilbert space having a countable orthonormal base [u 1, U2 , ... J. Define an


operator A by the equation
oc
Ax == L (x, Un)Un+l
n=l

What are the eigenvalues of A? Is A compact? Is A Hermitian? What is the norm of A?


2. Repeat Problem 1 for the operator

oc
Ax= LO<n(X,Un)Un
n=l
100 Chapter 2 Hilbert Spaces

in which [an] is some prescribed bounded real sequence. Find the conditions under which
A -1 exists as a bounded linear operator.
3. Repeat Problem 1 for the operator
oc
Ax = L(x,Un+1)Un
n=l

4. Repeat Problem 1 if the basis is [... , U-2, U-l, UO, Ul, ... ] and A = L;:"=-oc (x, un )un+1'
What is A-I?
5. Let Y be a subspace of a Hilbert space X, and let A : Y -+ X be a (possibly unbounded)
linear map such that A-I: X -+ Y exists and is a compact linear operator. Prove that
if (A - AI)-1 exists, then it is compact.
6. Prove that for a compact Hermitian operator A on a Hilbert space these properties are
equivalent:
(a) (Ax, x) ~ 0 for all x
(b) All eigenvalues of A are nonnegative
7. Prove these facts about the spectral sets: (A is defined on page 91.)

(a) A(A) = A(A*)


(b) If A is invertible, A(A-l) = {A-I: A E A(A)}
(c) A(An) :) {An: A E A(A)} for n = 1,2,3, ...
8. Let {€I, e2, ... } be an orthonormal system (countable or finite). Let AI, A2, ... be complex
numbers such that limAn = O. Define Ax = LAn(x,en)en. Prove that the series
converges, that A is a bounded linear operator, and that A is compact. Prove also that
if the Ak are real, then A is Hermitian. Suggestion: Exploit the facts that operators of
finite rank are compact and limits of compact operators are compact.
9. In the spectral theorem, when is the following equation true?
00

x = L(x,en)en
n=l

10. Let P be the orthogonal projection of a Hilbert space onto a closed subspace. What are
the eigenvalues of P? Give the spectral form of P and 1- P.
11. Let A be a bounded linear operator on a Hilbert space. Prove that:
(1) A commutes with An for n = 0, 1,2, ....
(2) A commutes with p(A) for any polynomial p.
(3) If A-I exists, then A commutes with A -n for n = 0, 1, 2, 3, ....
(4) If (A - AI)-1 exists, then it commutes with A.
12. An operator A is said to be normal if AA* = A* A. Give an example of an operator
that is not normal. (The eminent mathematician Olga Tausky once observed that most
counterexamples in matrix theory are of size 2 x 2.) Are there any real 2 x 2 normal
matrices that are not self-adjoint? (Other problems on normal operators: 29,39,40,41.)
13. Establish the first equation in the proof of Theorem 4.
14. If A is a bounded linear operator on a Hilbert space, then A + A* and i(A - A*) are
self-adjoint. Hence A is of the form B + iC, where Band C are self-adjoint.
15. Let {el,e2, ... } be an orthonormal sequence. Let Ax = LAn(X,en)en, in which 0 <
inf IAn I < sup IAn I < 00. Prove that the series defining Ax converges. Prove that A is
not compact. Prove that A is bounded. What are the eigenvalues and eigenvectors of A?
Section 2.4 Spectral Theory 101

16. Find the eigenvalues and eigenvectors for the operator Ax = -x" acting on the space
X = {x E £2[0,1]: x(O) = 0 and x'(l) +'Yx(l) = O}. Here'Y is a prescribed real number.
How can the eigenvalues be computed numerically? Find the first one accurate to 3 digits
when 'Y = -~. Newton's method, described in Section 3.3, can be used.
17. Prove that if Ax = >.x, A *y = /-LY, and>' '" Ii, then x ~ y.
18. If A is Hermitian and x is a vector such that Ax '" 0, then A nx '" 0 for n = 0, 1,2, ....
19. Every compact Hermitian operator is a limit of a sequence of linear combinations of
orthogonal projections.

20. If >. is an eigenvalue of A2 and>' > 0, then either +V'X or -v'X is an eigenvalue of A.
(Here, A is any bounded operator.) Hint: If A 2 x = >.x, then for suitable c, x ± cAx is
an eigenvector of A.
21. Consider the problem x" +(>.2 -q)x = 0, x(O) = 1, x'(O) = O. Show that this initial-value
problem can be solved by solving instead the integral equation

x(t) - -lit
>. 0
q(s) sin{>.(t - s»)x(s) ds = cos(>.t)

22. If >. is an eigenvalue of A, then IIAII ~ 1>'1.


23. If A is Hermitian and p is a polynomial having real coefficients, then p(A) is Hermitian.
24. A bounded linear operator A on a Hilbert space is said to be unitary if AA* = A* A = I.
Prove that for such an operator, (Ax,Ay) = (x,y) and IIAxll = Ilxll.
25. (Continuation) All eigenvalues of a unitary operator satisfy 1>'1 = 1.
26. If Ax = E::"=l
>'n(x,en)en, what is a formula for Ak (k = 0, 1,2, ... )? (The en form an
orthonormal sequence.)
27. Let A and B be compact Hermitian operators on a Hilbert space. Assume that AB = BA.
Prove that there is an orthonormal sequence [Un] such that

Hint: If >. is an eigenvalue of A, put E = {x : Ax = AX}, and show that B(E) C E.


Apply the spectral theorem to BIE.
28. An operator A on a Hilbert space is said to be skew-Hermitian if A* = -A. Prove a
spectral theorem for compact skew-Hermitian operators. (Hint: Consider iA.)
29. Assume that A is "normal" (AA* = A* A) and compact. Prove a spectral theorem for
A. Use A = ~(A + A*) + ~(A - A*), Problem 28, and Problem 27.
30. Let A be a compact Hermitian operator on a Hilbert space X. Assume that all eigenvalues
of A are positive, and prove that (Ax, x) > 0 for all nonzero x.
31. Prove that a compact operator on an infinite-dimensional normed linear space cannot be
invertible.
32. Let [Un] be an orthonormal sequence in a Hilbert space and let [>'n] be a bounded sequence
in IC. The operator Ax = E
>'n (x, Un)Un is compact if and only if >'n -+ O.
33. Criticize this argument: Let A be defined as in Problem 32. We show that A is surjective,
provided that >'n '" 0 for all n. Take y arbitrarily. To find an x such that Ax = y we
write the equivalent equation E
>'n(x, un)un = y. Take the inner product on both sides
with Um, obtaining >'m(x,um ) = (y,um). Thus

_ "'""' (y, um)


x- L - - - U m
>'m
102 Chapter 2 Hilbert Spaces

34. Let A be a bounded linear operator on a Hilbert space. Suppose that the spectral
decomposition of A is known:
00

Ax = L>'n(x,en)en
n=1

where [en] is an orthonormal sequence. Show how this information can be used to solve
the equation Ax - !LX = b. Make modest additional assumptions if necessary.
35. Prove that the eigenvalues of a bounded linear operator A on a normed linear space all
lie in the disk of radius IIAII in the complex plane.
36. Prove that if P is an orthogonal projection of a Hilbert space onto a subspace, then
for any scalars 0 and {3 the operator oP + {3(I - P) is normal (i.e., commutes with its
adjoint).
37. Prove that an operator in the Hilbert-Schmidt class is necessarily compact.
38. Prove that every operator having the form described in Theorem 6 is compact, thus
establishing a necessary and sufficient condition for compactness.
39. Find all normal 2 x 2 real matrices. Repeat the problem for complex matrices.
40. Prove that for a normal operator, eigenvectors corresponding to different eigenvalues are
mutually orthogonal.
41. Prove that a normal operator and its adjoint have the same null space.

Appendix to Section 2.4

In this appendix we consider a finite-dimensional vector space X, and discuss


the relationship between linear transformations and matrices.
Let L : X -t X be a linear transformation. If an ordered basis is selected
for X, then a matrix can be associated with L in a certain standard way. (If Lis
held fixed while the basis or its ordering is changed, then the matrix associated
with L will change.) The association we use is very simple. Let [Ul,"" un] be
an ordered basis for X. Then there must exist scalars aij such that
n
(1) LUj = LaijUi (1 ~ j ~ n)
i=l
The n x n array of scalars

is called the matrix of L relative to the ordered basis [U1' ... , un].
With the aid of the matrix A it is easy to describe the effect of L on any
vector x. Write x = 2:.']=1 Cj Uj. The n-tuple (C1,"" cn) is called the coordinate
vector of x relative to the ordered basis [U1, ... , un]. Then

(2) Lx
n n n
= LCj LUj = LCj L a i j Ui = L
n(n) L a i j Cj Ui
j=1 j=1 i=1 i=l j=l
Appendix to Section 2.4 103

The coordinate vector of Lx therefore has as its entries

(1 ~ i ~ n)

This n-tuple is obtained from the matrix product

If the basis for X is changed, what will be the new matrix for L? Let [VI, ... , vnl
be another ordered basis for X. Write
n
(3) Vj = LPijUi (1 ~ j ~ n)
i=1

The n x n matrix P thus introduced is nonsingular. Now let B denote the matrix
of L relative to the new ordered basis. Thus
n n n
(4) LVj = LbkjVk = Lbkj LPikUi (1 ~ j ~ n)
k=1 k=1 i=1

Another expression for LVj can be obtained by use of Equations (2) and (3):

(5) (1 ~ j ~ n)

Upon comparing (4) with (5) we conclude that


n n
(6) L Pik bkj = L aik Pkj (l~i,j~n)
k=1 k=1

Thus in matrix terms,

(7) PB=AP or B = P-IAP


Any two matrices A and B are said to be similar to each other if there exists
a nonsingular matrix P such that B = p- I AP. The matrices for a given linear
transformation relative to different ordered bases form an equivalence class under
the similarity relation. What is the simplest matrix that can be obtained for
a linear transformation by changing the basis? This is a difficult question, to
which one answer is provided by the Jordan canonical form. Another answer
can be given in the context of a finite-dimensional Hilbert space when the linear
transformation L is Hermitian.
104 Chapter 2 Hilbert Spaces

Let [Ul, ... , unl be an ordered orthonormal basis for the n-dimensional
Hilbert space X. Let L be Hermitian. The spectral theorem asserts the exis-
tence of an ordered orthonormal basis [VI, ... , vnl and an n-tuple of real numbers
(AI, ... ,An) such that

n
(8) Lx = LAi(x,vi)vi
i=1
As above, we introduce matrices A and P such that
n n
(9) LUj = LaijUi Vj = LPij Ui (1 ~ j ~ n)
,=1 i=1
The matrix B that represents L relative to the v-basis is the diagonal matrix
diag(Al, ... , An), as we see from Equation (8). Thus from Equation (7) we
conclude that A is similar to a diagonal matrix having real entries. More can be
said, however, because P has a special structure. Notice that
/ n n ) n n
(I)jk = bjk = (Vb Vj) = \ {;Pik Ui, ~Prj Ur = { ; ~Pik1jrj(Ui' Ur )

n n
= LPikPij = L(P*)ji(P)ik = (p* P)jk
i=1 i=l
This shows that
(10) P*P= I
(It follows by elementary linear algebra that P P* = I.) Matrices having the
property (10) are said to be unitary. We can therefore state that the matrix A
(representing the Hermitian operator L with respect to an orthonormal base) is
unitarily similar to a real diagonal matrix.
Finally, we note that if an n x n complex matrix A is such that A = A * , then
A is the matrix of a Hermitian transformation relative to an orthonormal basis.
Indeed, we have only to select any orthonormal base [Ul, . .. ,unl and define L
by
n
LUj = LaijUi (1 ~ j ~ n)
i=1

t tt
Then, of course,

Lx =L (
Cj Uj) = Cj aij Ui
j=1 j=li=1
By straightforward calculation we have
(Lx, y) = (x, Ly)
A matrix A satisfying A * = A is said to be Hermitian. We have proved
therefore the following important result, regarded by many as the capstone of
elementary matrix theory:

Theorem. Every complex Hermitian matrix is unitarily similar to


a real diagonal matrix.
Section 2.5 Sturm-Liouville Theory 105

2.5 Sturm-Liouville Theory

In this section differential equations are attacked with the weapons of


Hilbert space theory. Recall that in elementary calculus we interpret integration
and differentiation as mutually inverse operations. So it is here, too, that dif-
ferential operators and integral operators can be inverse to each other. We find
that a differential operator is usually ill-behaved, whereas the corresponding in-
tegral operator may be well-behaved, even to the point of being compact. Thus,
we often try to recast a differential equation as an equivalent integral equation,
hoping that the transformed problem will be less troublesome. (This theme will
reappear many times in Chapter 4.) This strategy harmonizes with our general
impression that differentiation emphasizes the roughness of a function, whereas
integration is a smoothing operation, and is thus applicable to a broader class
of functions.
Definition. The Sturm-Liouville operator is defined by
(Ax)(t) = [P(t)x'(t)J' + q(t)x(t) i.e., Ax = (px')' + qx
where x is two-times continuously differentiable, p is real-valued and continu-
ously differentiable, and q is real-valued continuous. The domain of the functions
x, p, and q is an interval [a, bJ. We permit x to be complex-valued. Let eight
real numbers Oij, f3ij be specified 1 ~ i,j ~ 2. Assume that
p(a)(f3l1f322 - f312f32d = p(b)(011022 - 012 ( 21)

Let X be the subspace of L2 [a, bJ consisting of all twice continuously differen-


tiable functions x such that
ol1x(a) + 012x'(a) + f3l1x(b) + f312X'(b) = 0
021x(a) + 022x'(a) + f321X(b) + f322X'(b) = 0
Assume also that 13111322 i 13121321 or 011 022 i 012021'

Theorem 1. Under the preceding hypotheses, A is a Hermitian


operator on X.

Proof. Let x, y E X. We want to prove that (Ax, y) = (x, Ay). We compute

(Ax, y) - (x, Ay) = lb [yAx - xAYJ = lb [y(px')' + yqx - x(py')' - xqy-]

= lb [y(px')' - x(pY')']

= lb [y(px')' + y'px' - x(py')' - x'py']

= lb [px'y - pxy'J' = [px'y - pxy']:

= p(b) [x' (b)y(b) - x(b)y'(b)]- p(a) [x'(a)y(a) - x(a)y'(a)]

= -p(b) [detw(b)] + p(a)[detw(a)]


106 Chapter 2 Hilbert Spaces

where w(t) is the Wronski matrix

x(t) y(t)]
[
w(t) = x'(t) y'(t)

Put also

Our hypothesis on p is that p( a) det,8 = p( b) det o. The fact that x, y E X


gives us ow(a) + ,8w(b) = O. This yields (deto)[detw(a)] = (det,8)[detw(b)].
Note that det( -,8) = det(,8) because,8 is of even order. Multiplying this by p(b)
gives us p(b)detodetw(a) = p(b)det,8detw(b). By a previous equation, this is
p(a)det,8detw(a) = p(b)det,8detw(b). If det,8 =1= 0, we have p(b)detw(b) =
p(a) det w(a). If det 0 =1= 0, a similar calculation can be used. •

Lemma. A second-order linear differential equation

a(t)x"(t) + b(t)x'(t) + c(t)x(t) = d(t) (a ~ t ~ b)

can be put into the form of a Sturm-Liouville equation (px')' + qx = f,


provided that the functions a, b, c are continuous and a( t) =1= 0 for all t
in the interval [a, b].

+ bx' + cx = d by
J
Proof. We transform the equation axil multiplying by the
integrating factor ~ exp (b(t)/a(t)) dt. Thus

x"efbla + (b/a)x'efbla + (c/a)xef bla = (d/a)ef bla


or

Let

Example 1.
p = ef bla , q = (c/a)ef bla
If Ax = -x" (i.e., p(t) = -1 and q(t) = 0), what are the

eigenvalues and eigenfunctions? The solutions to -x" = AX are of the form
Cl sin ..;xt
+ C2 cos ..;xt.
Hence every complex number A is an eigenvalue, and
each eigenspace is of dimension 2. •
Example 2. Let Ax = -x" as before, but let the inner-product space be
the subspace of L2[0, 'IT] consisting of twice continuously differentiable functions
that satisfy x(O) = x('IT) = O. The eigenvalues are n 2 for n = 0, 1,2, ... , and the
eigenfunctions are sin n 2 t. •

The next theorem illustrates one case of the Sturm-Liouville Problem. We


take p(t) = 1 in the differential equation and let ,8ll = ,812 = 021 = 022 = O.
We assume that 10lll + 10121 > 0 and 1,8211 + 1,8221 > o. It is left to Problem 8
Section 2.5 Sturm-Liouville Theory 107

to prove that the differential operator is Hermitian on the subspace of functions


that satisfy the boundary conditions.
Our goal is to develop a method for solving the equation Ax = y, where y is
a given function, and x is to be determined. The plan of attack is to find a right
inverse of A (say AB = I) and to give x = By as the solution to the problem.
It will turn out that the spectral theorem is applicable to B.
We assume that there exist functions u and v such that

(1) u" = qu
(2) v" = qv allv(a) +a12v'(a) = 0
(3) u'(a)v(a) - u(a)v'(a) = 1

From (3) we see that u =1= 0 and v =1= O. The left side of (3) is the Wronskian of
u and v evaluated at a.
In practical terms, u and v can be obtained by solving two initial-value
problems. This is often done as follows. Find Uo and Vo such that

u~ = quo uo(b) = 1 u~(b) = 0


v~ = qvo vo(a) = 0 v~(a) = 1

The u and v required will then be suitable linear combinations of Uo and Vo.
Now we observe that for all s,

u'(s)v(s) - u(s)v'(s) = 1

This is true because the left side takes the value 1 at s = a and is constant.
Indeed,

.!!:..- [u'v - uv'] = u"v + u'v' - uv" - u'v' = quv - uqv =0


ds
Next we construct a function 9 called the Green's function for the prob-
lem:
u(s)v(t) a~t~s~b
{
g(s, t) = v(s)u(t) a~s~t~b

The operator A in this case is defined by

(4) Ax = x" - qx

and the domain of A is the closure in L 2 [a, b] of the set of all twice continuously
differentiable functions x such that

Theorem 2. A right inverse of A in Equation (4) is the operator B


defined by

(5) (By)(s) = lb g(s, t)y(t) dt


108 Chapter 2 Hilbert Spaces

Proof. It is to be proved that AB = f. Let y E C[a, bJ and put x = By. We


show first that Ax = y. From the equation

x(s) = lb g(s,t)y(t)dt

=1 8
u(s)v(t)y(t) dt + lb v(s)u(t)y(t) dt

=u(s) 1 8
v(t)y(t)dt+v(s) lb u(t)y(t)dt

we have
x'(s) = u'(s) 1 8
v(t)y(t) dt + u(s)v(s)y(s)

+ v'(s) lb 'u(t)y(t) dt - v(s)u(s)y(s)

= u'(s) 1 8
v(t)y(t) dt + v'(s) lb u(t)y(t) dt

Another differentiation gives us

x"(s) = u"(s)1 8
v(t)y(t) dt + u'(s)v(s)y(s) + v"(s) lb u(t)y(t) dt - v'(s)u(s)y(s)

= q(s)u(s) 1 8
v(t)y(t) dt + q(s)v(s) lb u(t)y(t) dt + y(s) [u'(s)v(s) - u(s)v'(s)]

= q(s)x(s) + y(s)
In the last step, the constant value of the Wronskian was substituted. Our
calculation shows that x" - qx = y or Ax = y, as asserted. Hence AB = f.
It remains to prove that x EX, i.e., that x satisfies the boundary conditions.
We have, from previous equations,

x(a) = v(a) lb u(t)y(t) dt = cv(a)

lb
and
x'(a) = v'(a) u(t)y(t) dt = cv'(a)
Hence
D:l1x(a) + D:12x'(a) = D:l1cv(a) + D:12Cv'(a) = 0
Similarly we verify that

Remark. If it is known that the homogeneous boundary-value problem has



only the trivial solution, then B is also a left inverse of A. In order to verify this,
Section 2.5 Sturm-Liouville Theory 109

let x EX, Y = Ax, and By = z. The previous theorem shows that y = ABy =
Az and that z E X. Hence x - z E X and A(x - z) = o. It follows that x - z = 0,
so that x = By = BAx.
Remark. The operator B in the previous theorem is Hermitian, because (by
Problem 9) 9 satisfies the equation

g(8, t) = get, 8)

Now we apply the Spectral Theorem to the operator B. Notice that B is


compact by Theorem 5 in Section 2.3, page 86. There exist an orthonormal
sequence [un] in L2[a, b] and real numbers An such that

00

By= LAn(y,Un}Un
n=l

Since BUk = AkUk, we have Uk = AkAuk, and Uk satisfies the boundary con-
ditions. This equation shows that Uk is an eigenvector of A corresponding to
the eigenvalue 1/ Ak. Since Ak ~ 0, 1/ Ak ~ 00. Consequently, a solution to the
problem Ax = y, where y is given and x must satisfy the boundary conditions,
is
00

x = By = L An(y, un}un
n=l

Example 3. Consider the boundary-value problem

Ax=x"+x=y x/CO) = xCrr) = 0

We shall solve it by means of a Green's function. For the functions U and v we


can take u(t) = sint and vet) = cost. In this case the Green's function is

sin 8 cos t 0 ~ t ~ 8 ~ 7r
g(8, t) = {
cos 8 sin t 0 ~ 8 ~ t ~ 7r

The compact Hermitian integral operator B is given by

(By)(8) = sin 8 1 8
cos ty(t) dt + cos 8 /." sin t yet) dt

Example 4. Let us solve the problem in Example 3 by using the Spectral
Theorem. The eigenvalues and eigenvectors of the differential operator A are
obtained by solving x" +x = /-LX. The general solution of the differential equation
is
x(t) = clsin~t+c2cos~t
Imposing the conditions x/CO) = x(7r) = 0, we find that the eigenvalues are /-Ln =
1 - (n - t)2 and the eigenfunctions are vn(t) = cos(2n - 1)t/2. The Vn are also
eigenfunctions of B, corresponding to eigenvalues An = 1/ /-Ln = (n - n2 + ~rl.
110 Chapter 2 Hilbert Spaces

Observe that the eigenfunctions Vn are not of unit norm. If O:n = I/llvnl/, then
[O:nvn] is an orthonormal system, and the spectral resolution of B is

00

By = LAn(y,O:nVn}(O:nVn)
n=l

A computation reveals that O:n = (2/rr)1/2. Hence we can write

00

By = (2/rr) L An(y, vn)vn


n=l

Use of this formula is equivalent to the traditional method for solving the
boundary-value problem

x"+x=y x'(O) = x(7r) = 0

The traditional method starts with the functions vn(t) = cos 2n; 1 t, which
satisfy the boundary conditions. Then we build a function of the form x =
2:::"=1 cnV n · This also satisfies the boundary conditions. We hope that with a
correct choice of the coefficients we will have Ax = y. Since AVn = J1,nvn, this
equation reduces to 2:::"=1 CnJ1,nVn = y. To discover the values of the coefficients,
take the inner product of both sides with Vm:

By orthogonality, we get CmJ1, mo:;;.2 = (y, vm) and Cm = (y, Vm)J1,;;.lo:;".


Notice that Theorem 2 has given us an alternative method for solving the
inhomogeneous boundary-value problem. Namely, we simply use the Green's
function to get x:

x(s) = (By)(s) = 1'" g(s, t)y(t) dt



Our next task is to find out how to determine a Green's function for the more
general Sturm-Liouville problem. The differential equation and its boundary
conditions are as follows:

Ax == (px')' +qx = y x E C 2[a,b]


(6) { O:llx(a) + O:12x'(a) + f311x(b) + f312X'(b) = 0

O:21x(a) + O:22x'(a) + f321X(b) + f322X'(b) = 0

We are looking for a function 9 defined on [a, b] x [a, b]. As usual, the t-sections
of 9 are gIven by gt(s) = g(s,t).
Section 2.5 Sturm-Liouville Theory 111

Theorem 3. The Green's function for the above problem is charac-


terized by these five properties:
(i) 9 is continuous in [a, b] x [a, b].
(1100) og . ..
oS IS contInUOUS In a < s <
t < b an d'In a < t < s < b.
(iii) For each t, gt satisfies the boundary conditions.
(iv) Al = 0 in the two open triangles described in (ii).
(v) lim ~g (s, t) - lim ~g (s, t) = -1/p(s).
t.J.. s uS tt s uS

Proof. As in the previous proof, we take y E era, b] and define


x(s) = lb g(s, t)y(t) dt

It is to be shown that x is in the domain of A and that Ax = y. The domain of A


consists of twice-continuously differentiable functions that satisfy the boundary
conditions. Let us use' to denote partial differentiation with respect to s. Since

x(s) = is g(s,t)y(t)dt+ lb g(s,t)y(t)dt

we have (as in the previous proof)

x'(s) = 1 8
g'(s,t)y(t)dt+ lb g'(s,t)y(t)dt

It follows that

x(a) = lab g(a, t)y(t) dt x(b) = lab g(b, t)y(t) dt


x'(a) = lab g'(a,t)y(t)dt x'(b) = lb g'(b, t)y(t) dt

Any linear combination of x(a), x(b), x'(a), and x'(b) is obtained by an integra-
tion of the same linear combination of g(a, t), g(b, t), g'(a, t), and g'(b, t). Since
l satisfies the boundary conditions, so does x. We now compute X"(S) from the
equation for x' ( s ):

x" (s) = g' (s, s- )y( s) + is gil (s, t)y(t) dt - g' (s, s+ )y(s) + lb gil (s, t)y(t) dt

= y(s)/p(s) + lb g"(s,t)y(t)dt

Here the following notation has been used:

g'(s,s+) = lim g'(s,t) g'(s, s-) = lim g'(s, t)


t,J..s tts
112 Chapter 2 Hilbert Spaces

Now it is easy to verify that Ax = y. We have

Ax = (px')' + qx = p'x' + px" + qx


Hence

(Ax )(s) = p' (s) lb g' (s, t)y(t) dt + y( s) + p( s) lb gil (s, t)y(t) dt

+q(s) lb g(s,t)y(t)dt

= y(s) + lb [(p(s)g'(s, t))' + q(s)g(s, t)]y(t) dt = y(s)

because gt is a solution of the differential equation.


Example 5. Find the Green's function for this Sturm-Liouville problem:

x" = y x(O) = x'(O) = 0

The preceding theorem asserts that gt should solve the homogeneous differential
equation in the intervals 0 < s < t < 1 and 0 < t < s < 1. Furthermore, gt
should be continuous, and it should satisfy the boundary conditions. Lastly,
g'(s, t) should have a jump discontinuity of magnitude -1 as t passes through
the value s. One can guess that g is given by

g(s, t) = {o O~s~t~l

s-t O~t~s~l

If we proceed systematically, it will be seen that this is the only solution. In the
triangle 0 < s < t < 1, Agt = 0, and therefore gt must be a linear function of s.
We write g(s, t) = a(t) + b(t)s. Since gt must satisfy the boundary conditions,
we have g(O,t) = (8g/8s)(0,t) = o. Thus a(t) = b(t) = 0 and g(s,t) = 0 in
this triangle. In the second triangle, 0 < t < s < 1. Again gt must be linear,
and we write g(s, t) = a(t) + f3(t)s. Continuity of g on the diagonal implies that
a(t) + f3(t)t = 0, and we therefore have g(s, t) = -f3(t)t + f3(t)s = f3(t)(s - t).
The condition (8g/8s)(s,s+) - (8g/8s)(s,s-) = -l/p leads to the equation
0- f3(t) = -1. Hence g(s,t) = s - t in this triangle. The solution to the
inhomogeneous boundary-value problem x" = y is therefore given by

x(s) = 1 8
(s - t)y(t) dt

Example 6. Find the Green's function for the problem

x" - x' - 2x = Y x(O) = 0 = x(l)


We tentatively set

u(s)v(t) O~s~t~l
(7) {
g(s, t) = v(s)u(t) O~t~s~l
Section 2.5 Sturm-Liouville Theory 113

and try to determine the functions u and v. The homogeneous differential equa-
tion has as its general solution the function

x(s) = o:e- s + (3e 2s

The solution satisfying the condition x(O) = 0 is

u(s) = o:e- s - o:e 2s

The solution satisfying the condition x(1) = 0 is

With these choices, the function 9 in Equation (7) satisfies the first four require-
ments in Theorem 3. With a suitable choice of the parameters 0: and (3, the
fifth requirement can be met as well. The calculation produces the following
equation involving the Wronskian of u and v:

g'(s, s+) - g'(s, s-) = u'(s)v(s) - v'(s)u(s)


= 0:(3(3 - 3e 3 )e s

In this problem, the function p is p( s) = e- s , because

X"(S) - x'es) = (e-Sx'(s))'


Hence condition 5 in Theorem 3 requires us to choose 0: and (3 such that 0:(3 =
-(3 - 3e3)-1 ~ .0017465. Then

o:(3(e-S - e2s )(e 2t - e3- t)


g(s, t) = {
0:(3(e 2s - e 3 - s )(e- t _ e 2t ) •
Example 7. Find the Green's function for this Sturm-Liouville problem:

x" + 9x = y x(O) = x(7r/2) = 0

According to the preceding theorem, 9 should be a continuous function on the


square 0 ~ s, t ~ 7r /2, and gt should solve the homogeneous problem in the
intervals 0 ~ s ~ t and t ~ s ~ 7r /2. Finally, og/ as
should have a jump of
magnitude -1 as t increases through the value s. These considerations lead us
to define
- ~ sin 3s cos 3t 0 ~ s ~ t ~ 7r /2
g(s, t) = {
-~ cos3ssin3t 0 ~ t ~ s ~ 7r/2 •
Problems 2.5

1. Find the eigenvalues and eigenfunctions for the Sturm-Liouville operator when p =q=1
and
114 Chapter 2 Hilbert Spaces

2. Prove that an operator of the form

(Ax)(s) = lb k(s, t)x(t) dt

is Hermitian if and only if k(s, t) = k(t, s).


3. Find the Green's function for the problem

x" - 3x' + 2x = y x(O) = 0 = x(l)

4. Find the Green's function for the problem

x" - 9x =y x(O) = 0 = x(l)

5. Prove that if u and v are in C2[a, b], then the function

u(s)v(t) a ~ s ~t ~ b
g(8, t) = {
v(s)u(t) a ~ t ~ s ~ b

has properties (i) and (ii) mentioned in Theorem 3.


6. (Continuation) Show that if
pv2(ulv)' = -1

then 9 (in Problem 5) will have property (v) in Theorem 3.


8. Prove that if p = 1 in the Sturm-Liouville problem and (311 = /312 = 0<21 = 0<22 = 0 then
A is Hermitian.
9. Prove that the function 9 in Equation (4) is symmetric: g(s, t) = g(t, s).
10. Let Ax = (px')' - qx. Prove Lagrange's identity:

xAy - yAx = [p(xy' - yx')j'

11. (Continuation) Prove Green's formula:

lb (xAy - yAx) = p(xy' - yx')I~

12. Show that the Wronskian for any two solutions of the equation (px')' - qx = 0 is a scalar
multiple of lip, and so is either identically zero or never zero. (Here we assume p(t) f. 0
for a ~ t ~ b.)
13. Find the eigenvalues and eigenfunctions for the operator A defined by the equation Ax =
-x" +2x' -x. Assume that the domain of A is the set of twice continuously differentiable
functions on [0,1] that have boundary values x(O) = x(l) = O.
Chapter 3

Calculus in Banach Spaces

3.1 The Frechet Derivative 115


3.2 The Chain Rule and Mean Value Theorems 121
3.3 Newton's Method 125
3.4 Implicit Function Theorems 135
3.5 Extremum Problems and Lagrange Multipliers 145
3.6 The Calculus of Variations 152

3.1 The Frechet Derivative

In this chapter we develop the theory of the derivative for mappings between
Banach spaces. Partial derivatives, Jacobians, and gradients are all examples of
the general theory, as are the Gateaux and F'rechet differentials. Kantorovich's
theorem on Newton's method is proved. Following that there is a section on
implicit function theorems in a general setting. Such theorems can often be
used to prove the existence of solutions to integral equations and other similar
problems. Another section, devoted to extremum problems, illustrates how the
methods of calculus (in Banach spaces) can lead to solutions. A section on the
"calculus of variations" closes the chapter.
The first step is to transfer, with as little disruption as possible, the ele-
mentary ideas of calculus to the more general setting of a normed linear space.

Definition. Let f : D --+ Y be a mapping from an open set D in a normed


linear space X into a normed linear space Y. Let xED. If there is a bounded
linear map A : X --+ Y such that

. Ilf(x + h) - f(x) - Ahll


(1)
~~ IIhll =0

then f is said to be Frechet differentiable at x, or simply differentiable at


x. Furthermore, A is called the (Frechet) derivative of f at x.

115
116 Chapter 3 Calculus in Banach Spaces

Theorem 1. If f is differentiable at x, then the mapping A in the


definition is uniquely defined. (It depends on x as well as f.)

Proof. Suppose that Al and A2 are two linear maps having the required prop-
erty, expressed in Equation (1). Then to each c > 0 there corresponds a 6> 0
such that
(i=1,2)

whenever Ilhll
< 6. By the triangle inequality, IIAIh -
A2hll < 2cllhll
whenever
Ilhll
< 6. Since Al - A2 is homogeneous, the preceding inequality is true for all
h. Hence IIAI - A211 2c.
~ c
Since was arbitrary, IIAI - A211
= o. •
Notation. If f is differentiable at x, its derivative, denoted by A in the
definition, will usually be denoted by f' (x). Notice that with this notation
f'(x) E £(X, Y). This is NOT the same as saying f' E £(X, Y). It will be
necessary to distinguish carefully between f' and f' (x).

Theorem 2. If f is bounded in a neighborhood of x and if a linear


map A has the property in Equation (1), then A is a bounded linear
map; in other words, A is the Frkhet derivative of f at x.

Proof. Choose 6 > 0 so that whenever Ilhll ~ 6 we will have


Ilf(x + h)11 ~ M and Ilf(x + h) - f(x) - Ahll ~ Ilhll
Then for Ilhll ~ 6we have IIAhl1 ~ 2M +Ilhll ~ 2M +6. For Ilull ~ 1, 116ull ~ 6,
whence IIA(6u)11 ~ 2M +6. Thus IIAII ~ (2M +6)/6. •
Example 1. Let X = Y = R Let f be a function whose derivative (in the
elementary sense) at x is A. Then the Frechet derivative of f at x is the linear
map h H Ah, because

lim If(x + h) - f(x) - Ahl = lim If(x + h) - f(x) _ AI = 0


h-+O Ihl h-+O h

Thus, the terminology adopted here is slightly different from the elementary
notion of derivative in calculus. •

Example 2. Let X and Y be arbitrary normed linear spaces. Define f : X --t


Y by f(x) = Yo, where Yo is a fixed element ofY. (Naturally, such an f is called
a constant map.) Then f'(x) = o. (This is the 0 element of £(X, Y).) •

Example 3. Let f be a bounded linear map of X into Y. Then f'(x) = f.


Indeed, Ilf(x + h) - f(x) - f(h)11 = o. Observe that the equation f' = f is not
true. This illustrates again the importance of distinguishing carefully between
f'(x) and f'· •
Section 3.1 The Frechet Derivative 117

Theorem 3. If f is differentiable at x, then it is continuous at x.

Proof. Let A = f'(x). Then A E £(X, Y). Given c > 0, select 6 > 0 so that
6 < c/(1 + II All) and so that the following implication is valid:
Ilhll < 6 =} Ilf(x + h) - f(x) - Ahll/llhil < 1
Then for Ilhll < 6, we have by the triangle inequality

Ilf(x + h) - f(x)11 :::; Ilf(x + h) - f(x) - Ahll + IIAhl1


< Ilhll + IIAhl1 :::; Ilhll + IIAllllhl1
< 6(1 + II AI!) < c •
Example 4. Let X = Y = C[O, 1] and let </> : JR -+ JR be continuously
differentiable. Define f : X -+ Y by the equation f(x) = </> 0 x, where x is any
element of C[O, 1]. What is f'(x)? To answer this, we undertake a calculation
of f(x + h) - f(x), using the classical mean value theorem:

[f(x + h) - f(x)](t) = </>(x(t) + h(t)) - </>(x(t)) = </>'(x(t) + O(t)h(t))h(t)


where 0 < O(t) < 1. This suggests that we define A by
Ah=(</>'ox)h
With this definition, we shall have at every point t,

[f(x + h) - f(x) - Ah](t) = </>'(x(t) + O(t)h(t))h(t) - </>'(x(t))h(t)

Hence, upon taking the supremum norm, we have

II!(x + h) - f(x) - Ahll :::; II</>'


</>' xllllhil
0 (x + Oh) - 0

By comparing this to Equation (1) and invoking the continuity of </>', we see that
A is indeed the derivative of fat x. Hence f'(x) = </>' x. 0 •

Theorem 4. Let f : JRn -+ JR. If each of the partial derivatives


Dd (= 8 f/ 8Xi) exists in a neighborhood of x and is continuous at x
then f'(x) exists, and a formula for it is
n
f'(x)h = L Dd(x), hi
i=l

Speaking loosely, we say that the Frechet derivative of f is given by


the gradient of f.

Proof. We must prove that


118 Chapter 3 Calculus in Banach Spaces

We begin by writing
n
f(x + h) - f(x) = f(v n ) - f(vo) = ~)f(vi) - f(V i - 1)]
i=l

where the vectors vi and V i - 1 differ in only one coordinate. Thus we put VO = x
and Vi = V i - 1 + hiei , where ei is the ith standard unit vector. By the mean
value theorem for functions of one variable,

where 0 < Bi < 1. Putting this together, and using the Cauchy-Schwarz in-
equality, we have

IIhll- 1\t(x + h)- f(x) - L hiDd(x) I

= Ilhll-ll L hdDd(V i - 1 + Bihi ei ) - Dd(x)] I

~ IIhll-1 IlhIIVL[Dd(Vi-1 + Bihiei) - Dd(x)]2 -+ 0

as Ilhll-+ 0, by the continuity of Dd at x. Note that

Theorem 5. Let f : IRn -+


IRm , and let II, ... , f m be the component
functions of f. If all partial derivatives Djfi exist in a neighborhood
of x and are continuous at x, then f' (x) exists, and

n
(J'(X)h)i = L Djfi(x)· h j for all hE IRn
j=l

Speaking informally, we say that the Frechet derivative of f is given by


the Jacobian matrix J of f at x: J ij = (Djfi(X)).

Proof. By the definition of the Euclidean norm,

Each of the m terms in the sum (including the divisor Ilh112)


converges to 0 as
h -+
O. This is exactly the content of the preceding theorem. •
Example 5. Let f(x) = v'IXIX21. Then the two partial derivatives of f exist
at (0,0), but f'(O,O) does not exist. Details are left to Problem 16. •
Section 3.1 The Frechet Derivative 119

Example 6. Let L be a bounded linear operator on a real Hilbert space X.


Define F : X --t R. by the equation F(x) = (x, Lx). In order to discover whether
F is differentiable at x, we write

F(x + h) - F(x) = (x+ h,Lx+ Lh) - (x,Lx)


= (x, Lh) + (h, Lx) + (h, Lh)
Since the derivative is a linear map, we guess that A should be Ah = (x, Lh) +
(h, Lx). With that choice, IAhl ~ 211x1111Lllllhll, showing that IIAII ~ 211x1111LII.
Thus A is a bounded linear functional. Furthermore,

IF(x + h) - F(x) - Ahl = I(h, Lh)1 ~ IILllllhl12 = o(h)


(The notation o(h) is explained in Problem 6.) This establishes that A = F'(x).
Notice that
Ah = (L*x + Lx, h)
References for the material in this chapter are [Avl], [Av2) , [Bart], [BI) ,

[Bo], [Car], [Cart], [CS], [Cou], [Dieu], [Els) , [Ewi) , [Fox), [FM) , [GF], [Gold),
[Hesl], [Hes2], [JLJ), [Lanl], [NSS], [PBGM), [Sag], [Schj), [Wein), and [Youl).

Problems 3.1

1. Let 9 be a function of two real variables such that 922 is continuous. (This notation means
second partial derivative with respect to the second argument.) Define I : C[O,l] -t
C[O,l] by the equation (I(x»(t) = fol g(t, x(s» ds. Compute the Frechet derivative of
I. You may need Taylor's Theorem.
2. Let I be a Frechet-differentiable function from a Hilbert space X into JR. The gradient
of I at x is a vector v E X such that I'(x)h = (h, v) for all hEX. Prove that such a v
exists. (It depends on x.) Illustrate with I(x) = (a,x)2, a E X and fixed.
3. Prove that if I and 9 are differentiable at x, then so is I+g, and (I+g)'(x) = I'(x)+g'(x).
4. Let X, Y and Z be normed linear spaces. Prove that if f : X -+ Y is differentiable and
if A : Y -+ Z is a bounded linear map, then (A 0 I)' = A 0 I'.
5. Let I : X -t X be differentiable, X a real Hilbert space, and v EX. Define 9 : X -t JR
by g(x) = (I(x) , v). Prove that 9 is differentiable, and determine g'.
6. We write h H o(h) for a generic function that has the property

. o(h)
~~o lihif = 0

Thus I'(x) is characterized by the equation I(x + h) - I(x) -I'(x)h = o(h). Prove that
the family of all such functions 0 from X to Y is a vector space.
7. Find the derivative of the map I : C[O, 1] -t C[O, 1] defined by I(x) = g. x. Here the dot
signifies ordinary multiplication, and 9 E C[O, 1].
8. Supply the missing details in Example 4. For example, you should establish the fact that
114>' 0 (x + 8h) - 4>' 0 xII converges to 0 when h converges to O. Quote any theorems from
real analysis that you use.
9. Let X and Y be two normed linear spaces, and let x EX. Let I and 9 be functions
defined on a neighborhood of x and taking values in Y. Following Dieudonne, we say
that "I and 9 are tangent at x" if

lim II/(x + h) - g(x + h) II = 0


h .... O IIhll
120 Chapter 3 Calculus in Banach Spaces

Prove that this is an equivalence relation. Prove that the relationship is preserved if the
norms in X and Yare changed to equivalent ones. Prove that x t--+ f(xo)+ f'(xo)(x-xo)
is the unique affine map tangent to f at xo. (An affine map is a constant plus a linear
map.)
10. Show that these two functions are tangent at x = 2:

f(x) = x2 g(x) = 3 + }17 - (x - 6)2

Draw a picture to illustrate.


11. Prove that if f and 9 are tangent at x and if both are differentiable at x, then f' (x) =
g'(x). Here f and 9 should be as general as in Problem 9.
12. Let X = e[O, 1], with its usual sup-norm. Select ti E [0,1] and Vi E e[O, 1], and define
f(x) = L~=I[x(ti)J2vi' Prove that f is differentiable at all points of X and give a
formula for f'.
13. Prove that the supremum norm on the space e[O, 1] is not differentiable at any element
x for which there are two or more points t in [0,1] where Ix(t)1 = Ilxll.
°
14. Recall that Co is the space of sequences converging to and that the norm is Ilxll =
max n Ix(n)l. Prove that the norm is differentiable at x if and only if there is a unique n
such that Ix(n)1 = Ilxll.
15. Let YO be a point in a normed linear space Y. Define f : IR -t Y by the equation
f(t) = tyo· Compute f'. Now define g(t) = (sint)yo and compute g'.
16. Supply the missing details for Example 5.

17. Define f : e[o, 1] -t e[O, 1] by the equation [f(x)](t) = x(t) + Jo1 [X(st)]2 ds. Compute
f'(x).
18. Prove that if f is differentiable at x, then f is Lipschitz continuous at x. This means
that Ilf(y) - f(x)11 ,,;; Ally - xii for some A and all y in a neighborhood of x.
19. Let an (n = 0, 1,2, ... ) be real numbers such that L~=o anZ n converges for all z E IC.
Let X be a Banach space. Define f : .c(X, X) -t .c(X, X) by the equation f(A)
L~=o anAn What is the Frechet derivative of f?

20. Explain the difference between these statements:


(i) f' is continuous at x.
(ii) f'(x) is continuous.
Prove that if f'(x) exists, then it is continuous and differentiable. Give an example of a
mapping f such that f' is continuous but not differentiable.
21. Refer to the definition of the Frechet derivative. If the bounded linear map A satisfies
the weaker condition

lim .!.lIf(x + Ah) - f(x) - AAhl1


>--tOA
= °
for every hEX, then f is said to be Gateaux differentiable at x, and A is the
Gateaux derivative at x. Prove that if f is Frechet differentiable at x, then it is
Gateaux differentiable at x, and the two derivatives are equal.
22. Let f be a differentiable map from one normed linear spaces into another. Let y be a
point such t.hat f-I({y}) contains no point x for which f'(x) = O. Prove that f-I({y})
contains no nonvoid open set.
23. If f : IR -t IR n , what is the formula for f'(x)?

24. Prove that in an inner-product space the funct.ions f(x) = IIxll 2 and g(x) = (a,x) are
differentiable. Give formulas for the derivatives.
Section 3.2 The Chain Rule and Mean- Value Theorems 121

3.2 The Chain Rule and Mean Value Theorems

We continue to work with a function f : D -+ Y, where D is an open set in


a normed linear space X, and Y is another normed linear space. In the next
theorem, we have another mapping 9 defined on an open set in Y and taking
values in a third normed space. In the proof we use notation explained in
Problem 3.1.6, page 119.

Theorem 1. The Chain Rule. If f is differentiable at x and if 9


is differentiable at f(x), then go f is differentiable at x, and
(g 0 f)'(x) = g'(f(x)) 0 f'(x)
Proof. Define F = go f, A = f'(x), y = f(x), B = g'(y), and
ol(h) = f(x + h) - f(x) - Ah (h E X)
02(k) = g(y + k) - g(y) - Bk (k E Y)
</>(h) = Ah + 01 (h)
It is to be shown that F'(x) = BA. This requires a calculation as follows:
F(x + h) - F(x) - BAh = g(f(x + h)) - g(f(x)) - BAh
= g[f(x) + Ah + 01 (h)]- g(y) - BAh
= g[y + </>(h)] - g(y) - BAh
= g(y) + B</>(h) + 02(</>(h)) - g(y) - BAh
= B[Ah + ol(h)] + 02 (</>(h)) - BAh
= B01(h) + 02( </>(h))

In order to see that this last expression is o(h), notice first that IIBo1(h)11 ~
IIBllllo1(h)ll.
Hence this term is o(h). Now let c: > o. Select 01 > 0 so that

Ilkll < (h ===? 1 0 2(k)11 < c:llkll/(IIAII + 1)


Select 8> 0 so that 8 < 8d(iiAII + 1) and so that

Ilhll < 8 ===? Il o1(h)11 < Ilhll


Now let Ilhll < 8. Then we have
II</>(h)II = IIAh+o1(h)11 ~ IIAllllhl1 + Il o1(h)11
< (II All + l)ll hl < (II All + 1)8 < 81
Consequently, using k = </>(h), we conclude that

The mean value theorem of elementary calculus does not have an exact

analogue for mappings between general normed linear spaces. (An exception to
this assertion occurs in the case when f : X -+ R See Theorem 2, below.) Even
for functions f : IR -+ X, the expected mean-value theorem fails, as we now
illustrate.
122 Chapter 3 Calculus in Banach Spaces

Example. Define f:]R --+]R2 by the equation f(t) = (cost,sint). We ask: Is


the equation
f(27r) - f(O) = !,(t)27r
true for some t E (0,27r)? The answer is "No," because the left side of the
equation is (0,0), while f'(t) = (- sin t, cos t) #- (0,0). •
However, the mean value theorem of elementary calculus does have a gen-
eralization to real-valued functions on a normed linear space. We present this
first.

Theorem 2. Mean Value Theorem I. Let f be a real-valued


mapping defined on an open set D in a normed linear space. Let
a, bED. Assume that the line segment

°
[a, b] = {a + t(b - a) : ~ t ~ I}

lies in D. If f is continuous on [a, b] and differentiable on the open line


segment (a,b), then forsome~ in (a,b),

f(b) - f(a) = !,(~)(b - a)

Proof. Put g(t) = f(a + t(b - a)). Then g is continuous on the interval [0,1]
and differentiable on (0,1). By the chain rule,

g'(t) = !'(a + t(b - a))(a - b)

By the mean value theorem of elementary calculus,

f(b) - f(a) = g'(T) = !'(a + T(b - a))(b - a)


= !,(~)(b - a)

Theorem 3. Mean Value Theorem II. Let f be a continuous
map of a compact interval [a, b] of the real line into a normed linear
space Y. If, for each x in (a, b), f'(x) exists and satisfies 11f'(x)11 ~ M,
then Ilf(b) - f(a)11 ~ M(b - a).

Proof. It suffices to prove that if a < a < {3 < b, then Ilf({3)- f(a)11 ~ M(b-a)
because, the desired result would follow from this by continuity. Also, it suffices
to prove Ilf({3) - f(a)11 ~ (M + E)(b - a) for an arbitrary positive E. Let S be
the set of all x in [a, {3] such that

Ilf(x) - f(a)11 ~ (M + E)(X - a)

By continuity, S is a closed set. Let Xo = supS. Since S is compact, Xo E S. To


complete the proof, the main task is to show that Xo = {3. Suppose that Xo < {3
and look for a contradiction. Since f is differentiable at xo, there is a positive 6
such that 6 < {3 - Xo and

Ihl < 6 ==} Ilf(xo + h) - f(xo) - !'(xo)hll < Elhl


Section 3.2 The Chain Rule and Mean- Value Theorems 123

Put h = 8/2 and u = Xo + 8/2. Then

Ilf(u) - f(xo) - !,(xo)(u - xo)11 < €(u - xo)

Hence

Ilf(u) - f(xo)11 < 11!'(xo)(u - xo)11 + €(u - xo) ~ (M + €)(u - xo)

Since Xo E 5, we have also

Ilf(xo) - f(a)11 ~ (M + €)(xo - a)

Hence

Ilf(u) - f(a)11 ~ II!(u) - f(xo)11 + Ilf(xo) - f(a)11 ~ (M + €)(u - a)

This proves that u E 5. Since u > xo, we have a contradiction. Thus Xo = (3,
(3 E 5, and

Ilf((3) - f(a)11 ~ (M + €)((3 - a) < (M + €)(b - a)



Theorem 4. Mean Value Theorem III. Let f be a map from
an open set D in one normed linear space into another normed linear
space. If the line segment

5={ta+(1-t)b: O~t~ I}

lies in D and if f'(x) exists at each point of 5, then

Ilf(b) - f(a)11 ~ lib - all sup 11!'(x)11


xES

Proof. Define g( t) = f (ta + (1 - t)b) for 0 ~ t ~ 1. By the chain rule, g' exists
and g'(t) = f'(ta + (1- t)b)(a - b). By the second Mean Value Theorem

Ilf(b) - f(a)11 = Ilg(l) - g(O)11 ~ sup 11g'(t)11 ~ lib - all sup Ilf'(x)11
O::;;t::;;l xES

Notice that g = f 0 f, where f(t) = ta + (1 - t)b. Thus e'(t) E C(JR, X). Hence
in the formula for g', the term (a- b) is interpreted as a mapping from R to X
defined by t t-+ t· (a - b). •
124 Chapter 3 Calculus in Banach Spaces

Theorem 5. Let X and Y be normed spaces, D a connected open


set in X, and f a differentiable map of D into Y. If f'(x) = 0 for all
XED, then f is a constant function.

Proof. Since l' (x) exists for all xED, f is continuous on D (by Theorem 3 of
Section 3.1, page 117). Select Xo ED and define A = {x ED: f(x) = f(xo)}.
This is a closed subset of D (i.e., the intersection of D with a closed set in X).
But we can prove that A is also open. Indeed, if x E A, then there is a ball
B(x, r) c D, because D is open. If y E B(x, r), then the line segment from x to
y lies in B(x, r). By the Mean Value Theorem II,

Ilf(x) - f(y)11 ::;; Ilx - YII O,;;;t';;;l


sup 1lJ'(tx + (1 - t)Y)11 = 0

So f(y) = f(x) = f(xo). This means that YEA. Hence B(x, r) c A. Thus A is
open (it contains a neighborhood of each of its points). A set is connected if it
contains no proper subset that is open and closed. Since A is open and closed
and non empty, A = D. •
The connectedness of D is essential in the preceding theorem, even if D c JR..
For example, suppose that D = (0,1) U (2,3) and that f(x) = 1 on (0,1) while
f(x) = 2 on (2,3). Then f is certainly not constant, although f'(x) = 0 at each
point of D.

Problems 3.2

1. Let X = e[o, 1J and let f(x) = Ilxlll = Jol!x(t)!dt. Is f differentiable?


2. Prove that the norm in a real Hilbert space is differentiable except at O. Hint: Find the
derivative of IIxl1 2 first.

3. Let X be a real Hilbert space and vEX. Define f(x) = Ilxl1 2 v. What is f'(x)?
4. Let f be a continuous real-valued map on a Hilbert space. If f'(xo) exists, then there is
a direction of steepest descent at Xo. This means that there exists a vector u of norm 1
for which (d/dt)f(xo + tU)!t=o is a maximum. What is u?
5. Let f be a differentiable and continuous real-valued function defined on an open set D
in a normed linear space. Suppose that xo E D and that f(xo) ~ f(x) for all XED.
Prove that f'(XO) = O.

6. Let D be a bounded open set in a finite-dimensional normed linear space. Let D be the
closure of D. Let f : D -t IR be continuous. Assume f differentiable in D and that f is
constant on D'-. D (the boundary of D). Show that f'(x) = 0 for some xED. (Hint: A
continuous real-valued function on a compact set achieves its maximum and minimum.
Use Problem 5.)
7. Let K be a closed convex set contained in an open set D contained in a Banach space
X. Let f : D -t X. Assume that f'(X) exists for each x E K and that f(K) C K.
Assume also that sup{IIf'(x)1I : x E K} < 1. Show that f has a unique fixed point in K.
(Banach's Theorem, page 177, is helpful.)
8. The mean value theorem for functions f : IR -t IR states that f(x+h) - f(x) = hf'(x+8h)
for some 8 E (0,1). Show that this is not valid for complex functions. Try e Z , z = 0, h =
211'i, and at least one other function.
9. Let f be a differentiable map from a normed space X to a normed space Y. Let Yo be a
point of Y such that f' is invertible at each point of f-l(yO). Prove that f-l(yO) is a
discrete set.
Section 3.3 Newton's Method 125

10. Write out the conclusion of Theorem 2 in the case that X = IRn, using the partial
derivatives &/ /&Xi.

3.3 Newton's Method

The elementary form of Newton's method is used to find a zero of a function


f : JR --+ JR (or "root" of the equation f(x) = 0). The method is iterative
and employs the formula Xn+l = Xn - f(xn)j f'(x n ). Its rationale is as follows:
Suppose that Xn is an approximation to a zero of f. We try to find a suitable
correction to Xn so as to obtain the nearby root. That is, we try to determine h
so that f(xn + h) = o. By Taylor's Theorem,

0= f(xn + h) = f(xn) + hj'(xn) + o(h)


So, by ignoring the o(h) term, we are led to h = - f(xn)j f'(xn). If f is now
a mapping of one Banach space, X, into another, Y, the same rationale leads
us to Xn+l = Xn - [j'(Xn)]-l f(xn). Of course f'(x n ) is a linear operator from
X into Y, and the inverse [f'(Xn)]-l will have to be assumed to exist as a
bounded linear operator from Y to X. First, we examine the simple case, when
f : JR --+ JR.
Theorem 1. Let f be a function from JR to R Assume that 1"
is bounded, that f(r) = 0, and that f'(r) i= o. Let 8 be a positive
number such that
1 .
P =- -28 max 1j"(x)1 -;- mm 1j'(x)1 < 1
Ix-rl':;;o Ix-rl':;;o

If Newton's method is started with Xo E [r - 8, r + 8], then for all n,

Proof. Define en = Xn - r. Then

In this equation, the point ~n is between Xn and r. Hence I~n-rl ::;; IXn-rl = lenl·
Using this we have

f(xn) f(xn)
en+l = Xn+l - r = Xn - f'(x n ) - r = en - f'(x n )
enj'(xn) - f(xn) 2 1"(~n)
= f'(Xn) = en f'(x n )
Since Ixo - rl ::;; 8 by hypothesis, we have leol ::;; 8 and I~o - rl ::;; 8. Hence
lell ::;; !e61f"(~o)ljIf'(xo)1 ::;; !e6· 2pj8 ::;; pleal· By repeating this we establish
126 Chapter 3 Calculus in Banach Spaces

that IXn+l - rl :::;; Plxn - rl (convergence). Similarly, we have Jell :::;; (p/8)eij and
len+ll :::;; (p/8)e~. (quadratic convergence). •
The successive errors en in the preceding theorem obey an inequality
len+11 :::;; Cle n l2. Suppose, for example, that C = 1 and leol :::;; 10- 1 . Then
lell :::;; 10- 2, le21 :::;; 10- 4 , hi : :; 10- 8 , and so on. For an iterative process, this
is an extraordinarily favorable state of affairs, as it indicates a doubling of the
number of significant digits in the numerical solution at each step.
Example 1. For finding the square root of a given positive number a, one can
solve the equation X2 - a = 0 by Newton's method. The iteration formula turns
out to be
Xn+l = ~(Xn + xaJ
This formula was known to the ancient Greeks and is called Heron's formula.
In order to see how well it performs, we can use a computer system such as
Mathematica, Maple, or Matlab to obtain the Newton approximations to ..;2.
The iteration function is g(x) = (x + 2/x)/2, and a reasonable starting point is
Xo = 1. Mathematica is capable of displaying Xn with any number of significant
figures; we chose 60. The input commands to Mathematica are shown here.
(Each one should be separated from the following one by a semicolon, as shown.)
The output, not shown, indicates that the seventh iterate has at least 60 correct
digits!
g[x_] :=(x+(2/x))/2; g[1] ; N[X.60] ; g [X] ; g[X] ;

Example 2. We illustrate the mechanics of Newton's method in higher di-
mensions with the following problem:

{ X-Y+l=O
X2 +y2 - 4 = 0

where x and yare real variables. We have here a mapping J : ]R2 -+ ]R2,
and we seek one or more zeros of J. The Newton iteration is Un+l = Un -
[!'(Un)]-l J(u n ), where Un = (Xn, Yn) E ]R2. The derivative J'(u) is given by the
Jacobian matrix J. We find that

J = [ 1 -1]
2x 2y J-l- 1 [2 Y 11]
- 2x+2y -2x

Hence the iteration formula, in detail, is this:

If we start at Uo = (0,2)T, the next vectors are Ul = (1,2)T and U2 =


(5/6, 11/6). A symbolic computation system such as those mentioned above
Section 3.3 Newton's Method 127

can be used here, too. The problem is chosen intentionally as one easily visual-
ized: One seeks the points where a line intersects a circle. See Figure 3.1. •

Figure 3.1
The remarkable theorem of Kantorovich is presented next. This theorem:
(1) Proves the existence of a zero of a function from suitable hypotheses, and (2)
Establishes the quadratic convergence of the Newton algorithm. When it was
published in 1948, this theorem gave new information about Newton's method
even when the domain space X was two-dimensional.

Theorem 2. Kantorovich Theorem on Newton's Method.


Let f : X -+ Y be a map from a Banach space X into a Banach
space Y. Let Xo be a point of X where f'(xo) exists and is invertible.
Define
ao = 11f'(xO)-lf(xo)11 bo = 11f'(xO)-lll
S= {x EX: Ilx-xoll ~ 2ao}
k = 2sup{IIf'(x) - f'(v)II/llx - vii: x,v E S,X =/: v}
If f is differentiable in S and if aobok ~ ~, then f has a zero in S.
Newton's iteration started at Xo converges quadratically to the zero.

Proof. At the nth step we will have x n , an, bn such that


(1) Xn E S
(2) f'(Xn)-l exists
(3). 11f'(xn)-l f(xn)11 ~ an
(4) Ilf'(xn)-lll ~ bn
(5) anbnk ~ ~
(6) an ~ ao/2 n
Observe that 1 - anbnk ;::: ~, and that properties (1)-(6) are true for n = O.
Now define
128 Chapter 3 Calculus in Banach Spaces

We will prove properties (1)-(6) for n + 1.


(I) Xn+l is well-defined because of (2).
(II) IIXn - xn+lll = 11f'(xn)-1 f(xn) I ~ an·
(III)

IIXn+1 - xoll ~ Ilxn+1 - Xnll + IIXn - xn-lll + ... + IlxI - xoll


~ an + an-l + ... + ao
~ ao(~n + _1_ + ... + ~ + 1) ~ 2ao
'" 2n 1 2 - 2 '"
Thus Xn+l E S.
(IV)
an+lbn+lk = Gkbn+la~)bn+lk
1 2
= "2(anbn+l k)
1 2 -2
= "2(anbnk) (1- anbnk)

2 1
~ 2(anbnk) ~"2

Observe that anbn+lk ~ 1.


(V) Let H = f'(Xn)-1 f'(x n+1)' Then H is invertible because

III - HII = 11!'(xn)-I{j'(xn) - !'(xn+I)}11


~ 11f'(xn)-IIIIIf'(xn) - f'(xn+I)11
1 1 1
~ bn "2kllxn - xn+lll ~ bn "2 kan ~ 4

We know also that IIH-III


~ (1- ~anbnk)-l. It follows that f'(xn+d is invert-
ible since it is a product of invertible operators, f' (xn+d = f' (xn)H.
(VI) From (V) we have

11f'(xn+d-111 = IIH- If'(xn)-lll ~ IIH-Illllf'(xn)-lil


1
~ (1- "2a nbn k)-lb n ~ bn (1- anbnk)-l = bn+l

(VII) Define g(x) = x - f'(Xn)-1 f(x). Then g(xn) = Xn+l' If XES, then
Ill(x)11 = III - f'(Xn)-1 f'(x)11 = 11f'(xn)-I{f'(xn) - f'(X)} I
~ 11f'(xn)-IIIIIf'(xn) - f'(x)11 ~ bn~kllXn - xii
(VIII) Using the Mean Value Theorem and parts (VII) and (II), we have

11f'(Xn)-1 f(xn+dll = IIXn+1 - g(Xn+I)11 = Ilg(xn ) - g(Xn+I)11


~ Ill(x)IIIIXn- Xn+lll ~ Ill(x)llan
Section 3.3 Newton's Method 129

Here x is some point on the line segment joining Xn to Xn+l. From part (VII),
it follows that

1
11!,(xn)-l !(xn+l)11 ~ bnZkllxn - xllan

~ bn~kllXn - Xn+11lan ~ bn~ka;,

(IX)

11!,(Xn+d- !(xn+1)li = IIH- 1!'(X n )-l !(xn+1)11


1

~ IIH- 11111!,(xn)-1!(Xn+dll
1 -1 1 2
~ (1- Zanbnk) bnZka n
-1 1 2 1 2
~ (1- anbnk) bnZka n = Zbn+1an k = an+1

(X) Using the observation made in (IV), we have

This completes the induction phase of the proof.


(XI) If m > n, then from parts II and X we have

IIXn - Xm I ~ IIXn - xn+111 + IIXn+1 - xn+211 + ... + Ilxm-1 - Xm I


1 1
~ an + an+1 + ... ~ an (1 +Z + 4" + ... ) = 2an

Notice that an -+ 0 by (6). Hence [xnl is a Cauchy sequence. By the com-


pleteness of X there is a point x* such that Xn -+ x*. We have x* E S by
(1 ).
(XII) Define h n = anbnk. From part (IV), we have

Therefore,

(XIII)
1 2 1 -1 2
an = Zkbna n- 1 = Zkbn- 1(1- an-1 bn-1 k ) an- 1
~ (kbn- 1an-dan-1 = hn- 1an-1
130 Chapter 3 Calculus in Banach Spaces

Repeating this inequality, we obtain

[1 2n-l] [12(2ho)2n-2] ... [12(2ho)20] aO


:::;; 2(2ho)

= aO (~) n(2ho)2n-l+2n-2+ ... +1

= a02-n( 2ho)2n-l

(XIV) IIXn - x'll : :;2an :::;; 2a02-n(2aobok)2n-l = c· 2- n . ()2n, where c = 4a5bok.


If () == 2a obok < 1, this is quadratic convergence. Here x· = lim Xn . The sequence
[xn] has the Cauchy property, by part (XI).
(XV) In order to prove that f(x') = 0, write Ilf(xn)11 = 11f'(xn)(X n -xn+dll : :;
11f'(xn)IIIIXn- Xn+lll· Now IIXn - xn+lll -+ 0 and Ilf'(xn)11 is bounded as a
function of n because11f'(xn)11 : :;
11f'(xn) - f'(xo)11 + 11f'(xo)11 : :;
~IIXn - + xoll
11f'(xo)ll.
Since f is continuous, f(x') = f(limxn) = limf(xn) = O. •

For the preceding theorem and the one to follow (for which we do not give the
proof), we refer the reader to [Gold] and to [KA].

Theorem 3. Kantorovich's Theorem on the Simplified New-


ton Method. Assume the hypothesis of Kantorovich's The-
orem except that the radius of S is set equal to the quantity
[1 - VI - 2a obok JI(bok). Then the simplified Newton iteration

converges at least geometrically to a zero of f in S.

The next theorem concerns a variant of Newton's method due to R.E. Moore
[Moo]. In this theorem, we have two normed linear spaces X and Y. An open
set n in X is given, and a mapping F : n -+ Y is prescribed. It is known that
F has a zero x· in n and that F'(x') exists. We wish to determine x'. For this
purpose we set up an iterative scheme of the form

(7) G(X) = x - A(x)F(x)

Here, A(x) E £(Y,X), and we assume that

(8) sup
xEn
IIA(x)11 = M < 00

It is intended that A(x) be an approximate inverse of F'(x·). We assume that

(9) sup III - A(x)F'(x*)11 = A < 1


xEn
Section 3.3 Newton's Method 131

Theorem 4. There is a neighborhood of x* such that the iteration


sequence defined in Equation (7) converges to x* for arbitrary starting
points in that neighborhood.
Proof. Select E > 0 such that
(10)
By the definition of the Fnkhet derivative F'(x*), we can write
(11) F(x) = F(x*) + F'(x*)(x - x*) + TJ(x)
where TJ(x) is o(llx - x*II). In particular, we can select 8> 0 so that

(12) Ilx - x*11 ~ 8 ==;. [x E nand IITJ(x)11 ~ Ellx - x*ll]

From (11), using the fact that F(x*) = 0 and the definition of G, we have
G(x) - x* = x - x* - A(x)F(x)
= x - x* - A(x)[F'(x*)(x - x*) + TJ(x)]
= x - x* - A(x)F'(x*)(x - x*) - A(x)TJ(x)
= [I - A(x)F'(x*)] (x - x*) - A(x)TJ(x)
If we assume further that Ilx - x*11 ~ 8, then

IIG(x) - x*11 ~ "llx - x*11 + MIITJ(x)11


~ "llx-x*11 +MEllx-x*11
= ("+ME)llx-x*11 =Bllx-x*11
If the starting point Xo for the iteration is within distance 8 of x*, then
IlxI - x*11 = IIG(xo) - x*11 ~ Bllxo - x*11 ~ B8
Continuing, we have
IIx2 - x*11 = IIG(XI) - x*11 ~ Bllxl - x*11 ~ B2 8
In general, IIx n - x*1I ~ Bn8, and hence Xn -+ x*.

Corollary 1. If there is an r > 0 such that
sup IIF'(x)-111 < 00 and sup III - F'(X)-IF'(x*)11 <1
Ilx-x*II':;;r IIx-x*II':;;r
then there is a neighborhood of x* in which Newton's method converges
from arbitrary starting points.

Corollary 2. If X = Y and if III - F'(x*)11 < 1, then the iteration


Xn+l = Xn - F(x n) will converge to x* if started sufficiently near to
x* .

Corollary 3. -
If I~I F'(XO)-l F'(x*)11 < 1, then the simplified
Newton iteration Xn+l = Xn - F'(XO)-l F(x) converges to x* if started
sufficiently near to x* .

Applications to nonlinear integral equations. In the following para-


graphs we shall discuss the application of Newton's method and the Neumann
132 Chapter 3 Calculus in Banach Spaces
Theorem to nonlinear integral equations. A rather general model problem is
considered:

(13) x(s) - A 11 g(s, t,x(t)) dt = v(s)

Here A, v, and 9 are given. We assume that v E C[a, 1] and that 9 is continuous
on the 3-dimensional set

D={(s,t,u):a~s~l, a~t~l, -oo<u<oo}

Also, we assume that Ig(s, t, ud - g(s, t, U2) I ~ klu1 - u21 in the domain D.
Theorem 5. If IAlk < 1, then the integral equation (13) above has
aunique solution.
Proof. Apply the Contraction Mapping Theorem (Chapter 4, Section 2, page
177) to the mapping F defined on C[a, 1] by (Fx)(s) = v(s) +A J01g(s, t, x(t)) dt.
We see easily that

IIFx1 - FX211 = sup I(Fxl)(S) - (FX2)(s)1


s

~ IAI sup (1 Ig (s,t,x1(t)) -


s Jo
g(s,t,x2(t))1

~ IAll1 klx1(t) - x2(t)1 dt

~ IAlkllx1 - x211 •

If IAlk < 1, then the sequence X n +1 = F(x n ) will converge, in the space C[a, 1],
to a solution of the integral equation. In this process, Xo can be an arbitrary
starting point in C[a, 1]. Newton's method can also be used, provided that we
start at a point sufficiently close to the solution. For Newton's method, we define
the mapping f by

(J(x))(s) = x(s) - A J g(s, t,x(t)) dt - v(s)

We require f', which is given by

[f'(x)h](s) = h(s) - A 11 g3(s,t,x(t))h(t)dt

where g3 is the partial derivative of 9 with respect to its third argument, i.e.,
g3(S, t, u) = (a/au)g(s, t, u).
Section 3.3 Newton's Method 133

Lemma. If g3 exists on the domain

Q = {(s,t,u) : 0::;:; s::;:; 1,0::;:; t::;:; 1, lui::;:; Ilxll}

and if
. 1
hm -[g(s,t,u+r) - g(s,t,u) - rg3(s,t,U)] = 0
r .... O r

uniformly in Q, then f'(x) is as given above.

The next step in using Newton's method is to compute f'(X)-1. Observe


that f'(x) = I - AA, where A is the integral operator whose kernel is g3(S, t, x(t)).

11
Explicitly,
(Ah)(s) = g3 (s, t, x(t)) h(t) dt

This is a linear operator, since x is fixed. If IAI IIAII


< 1, then I - AA is invertible,
and by the Neumann Theorem in Chapter 1, Section 5, page 28, we have
00

J'(X)-1 = ~)AA)k = I + AA + A2 A2 + A3 A 3 + ... = I + AB


o

where B = A + AA2 + .. '.


If A is any integral operator of the form

(Ah)(s) = 11 k(s, t)h(t) dt

then we can define a companion operator B depending on a real parameter A by


the equation

(14) R = A-I [(I - AA)-I - I]


Theorem 6. The operator B, as just defined, is also an integral
operator, having the form

(15) (Bh)(s) = fa1 r(s, t)h(t) dt

The kernel satisfies these two integral equations

r(s, t) = k(s, t) + A J~ k(s, u)r(u, t) du


(16) {
r(s, t) = k(s, t) + AJ~ k(u, t)r(s, u) du
Proof. From the definition of B we have AB = (I - AA)-1 - I or 1+ AB =
(I - AA)-1. Consequently, we have

(I + AB)(I - AA) = (I - AA)(I + AB) = I

1+ AB - AA - A2 BA = 1- AA + AB - AAB = I
134 Chapter 3 Calculus in Banach Spaces

and
(17) B = (I + "\B)A = A(I + "\B)
Conversely, from Equation (17) we can prove Equation (14). Thus Equation
(17) serves to characterize B. Now assume that r satisfies Equations (16) and
that B is defined by Equation (15). We will show that B must satisfy Equation
(17), and hence Equation (14). We have

[(A+..\BA)h](s) = 11 k(s,t)h(t)dt+..\ J r(s, u)(Ah)(u) du

= 11 k(s, t)h(t) dt +..\11 11 r(s,u) k(u, t)h(t) dtdu

= 11{ k(s, t) +..\ 11 r(s, u)k(u, t) du }h(t) dt

= 11 r(s, t)h(t) dt = (Bh)(s)

This proves that B = A + "\BA. Similarly, B = A + "\AB.



Example 2. ([Gold], page 160.) Solve the integral equation

x(s) _11 st Arctanx(t) dt = 1 + S2 - 0.485s

This conforms to the general theory outlined above. We have as kernel


g(s, t, u) = st Arctan u, and g3(S, t, u) = st/(l + u 2). We take as starting point
for the Newton iteration the constant function xo(t) = 3/2. Then
9 4
g(s,t,xo(t)) = st/(l + 4) = 13 st = ast

Then f'(xo) = I - A, where (Ax)(s) = f; astx(t)dt. Also we can express


f'(XO)-l = (I - A)-l = 1+ B, as in the preceding proof. We know that B is
an integral operator whose kernel, r, satisfies the equations

r(s, t) = a st + f; asur(u,t)du
{
r(s, t) = a st + fo1 atur(s, u) du
From these equations it is evident that r(s, t)/ st is on the one hand a function of
t only, and on the other hand a function of s only. Thus r(s, t)/ st is constant, say
(3, and r(s, t) = {3 s t. Substituting in the integral equation for r and solving gives
us (3 = 12/35. One step in the Newton algorithm will be Xl = xo- f'(XO)-l f(xo).
We compute y = f(xo) as follows:

y(s) = xo(s) -1 1
g(s, t, xo(t)) dt - v(s)

= -3 -
20
11 st Arctan -3 dt - 1 - S2
2
+ .485s
1
= "2 - .0063968616s - s2
Section 3.4 Implicit Function Theorems 135

Then Xl = Xo - f'(XO)-ly = Xo - (I + B)y = Xo - Y - By. Hence

3[1
XI(8) = "2 "2-,8-8 2] - iot [1
f38t "2-,t-t 2] dt (, :::::; .0063968616)

= 1 + 82 + (.0071279315) 8

Problems 3.3

1. For the one-dimensional version of Newton's method, prove that if T is a root of multi-
plicity m, then quadratic convergence in the algorithm can be preserved by defining

Xn+l =X 1l - m/(xn)/ !'(xn)

2. Prove the corollaries, giving in each case the precise assumptions that must be made
concerning the starting points.
3. Let eo, el,··. be a sequence of positive numbers satisfying en+l ~ ce~. Find necessary
and sufficient conditions for the convergence limn en = O.
4. Let I be a function from IR to IR that satisfies the inequalities I' > 0 and I" > O. Prove
that if I has a zero, then the zero is unique, and Newton's iteration, started at any point,
converges to the zero.
5. How must the analysis in Theorem 1 be modified to accommodate functions from C to
C? (Remember that the Mean Value Theorem in its real-variable form is not valid.)
6. If T is a zero of a function I, then the corresponding "basin of attraction" is the set
of all x such that the Newton sequence starting at x converges to T. For the function
I(z) = z2 + 1, z E C, and the zero T = i, prove that the basin of attraction contains the
disk of radius ~ about T.

3.4 Implicit Function Theorems

In this section we give several versions of the Implicit Function Theorem and
prove its corollary, the Inverse Function Theorem. Theorems in this broad cate-
gory are often used to establish the existence of solutions to nonlinear equations
of the form f(x) = y. The conclusions are typically local in nature, and describe
how the solution X depends on y in a neighborhood of a given solution (XO,yo).
Usually, there will be a hypothesis involving invertibility of the derivative f'(xo).
The intuition gained from examining some simple cases proves to be com-
pletely reliable in attacking very general cases. Consider, then, a function
F : JR2 -+ lR. We ask whether the equation F(x, y) = 0 defines y to be a
unique function of x. For example, we can ask this question for the equation

X + y2 -1 = 0 (X, y E JR)

This can be "solved" to yield y = JI=X. The graph of this is shown in the
accompanying Figure 3.2. It is clear that we cannot let X be the point A in the
figure, because there is no corresponding y for which F(x, y) == x + y2 - 1 = O.
One must start with a point (xo, Yo) like B in the figure, where we already have
F(xo, Yo) = O. Finally, observe that at the point C there will be a difficulty,
for there are values of x near C to which no y's correspond. This is a point
136 Chapter 3 Calculus in Banach Spaces

where dy/dx = 00. Recall that ify = y(x) and if F(x,y(x)) = 0, then y' can be
obtained from the equation

(In this equation, Di is partial differentiation with respect to the ith argu-
ment.) Thus y' = - DI F / D2 F, and the condition y' (xo) = 00 corresponds to
D2F(xo, Yo) = o. In this example, notice that another function arises from
Equation (1), namely
y =-Jf=X
In a neighborhood of (1,0), both functions solve Equation (1), and there is a
failure of uniqueness.
2

A
-3

Figure 3.2
In the classical implicit function theorem we have a function F of two real
variables in class C 1 • That means simply that 8F/8x and 8F/8y exist and are
continuous. It is convenient to denote these partial derivatives by FI and F 2 •

Theorem 1. Classical Implicit Function Theorem. Let F be


a CI-function on the square

If F(xo, Yo) = 0 and F2 (xo, Yo) =I 0, then there is a continuously


differentiable function f defined in a neighborhood of Xo such that
Yo = f(xo) and F(x,j(x)) = 0 in that neighborhood. Furthermore,
f'(x) = -FI (x,y)/F2(x,y), where y = f(x).

Proof. Assume that F2(xo,yo) > O. Then by continuity, F2(x,y) > a > 0 in
a neighborhood of (xo, Yo), which we assume to be the original 6-neighborhood.
The function y f-t F(xo, y) is strictly increasing for Yo - 6 ~ y ~ Yo + 6. Hence

F(xo, Yo - 6) < F(xo, Yo) = 0 < F(xo, Yo + 6)


By continuity there is an E in (0,6) such that F(x, Yo - 6) < 0 < F(x, Yo + 6) if
Ix - xol < E. By continuity, there corresponds to each such x a value of y such
that F(x, y) = 0 and Yo - 6 < y < Yo + 6. If there were two such y's, then by
Rolle's theorem, F 2 (x, y) = 0 at some point, contrary to hypothesis. Hence y is
unique, and we may put y = f(x). Then we have F(x, f(x)) = 0 and Yo = f(xo).
Section 3.4 Implicit Function Theorems 137

Now fix Xl in the c-neighborhood of Xo. Put YI = f(xd. Let ilx be a small
number and let YI + ily = f(XI + ilx). Then
0= F(XI + ilx, YI + ily)
= FI (Xl + 8ilx, YI + 8ily)ilx + F2(XI + 8ilx, YI + 8ily)ily
for an appropriate 8 satisfying 0 ::::; 8 ::::; 1. (This is the Mean Value Theorem for
a function from JR2 to JR. See Problem 3.2.8, page 124.) This equation gives us

ily FI(XI +8ilx, YI +8ily)


ilx F2(XI + 8ilx, YI + 8ily)
As ilx -+ 0 the right side remains bounded. Hence, so does the left side. This
proves that as ilx converges to 0, ily also converges to O. Hence f is continuous
at Xl. After all, ily = f(XI + ilx) - f(xd. Furthermore,

f '( Xl )-1·
-
ily_
lm- - -
FI(XI,yd
ilx F2(XI,YI)
Therefore, f is differentiable at Xl. The formula can be written
f'(x) = -FI (x, f(x))1 F2(X, f(x))
and this shows that f' is continuous at x, provided that X is in the open interval
(xo - c, Xo + c). •
Theorem 2. Implicit Function Theorem for Many Variables.
Let F : JRn x JR -+ JR, and suppose that F(xo, Yo) = 0 for some Xo E ]Rn
and Yo E R If all n + 1 partial derivatives DiF exist and are continuous
in a neighborhood of (xo, Yo) and if Dn+IF(xo, Yo) =1= 0, then there is a
continuously differentiable function f defined on a neighborhood of Xo
such that F(x, f(x)) = 0, f(xo) = Yo, and
Dd(x) = -DiF(x,f(x))IDn+IF(x,f(x)) (1::::; i::::; n)
Proof. This is left as a problem (Problem 3.4.4).
Example 1. F(x, y) = x 2+y2 + 1 or x 2+y2 or x 2+y2 -1. (Three phenomena

are illustrated.) •
If we expect to generalize the preceding theorems to normed linear spaces,
there will be several difficulties. Of course, division by F2 will become multi-
plication by F 2- 1 , and the invertibility of the Frechet derivative will have to be
hypothesized. A more serious problem occurs in defining the value of y corre-
sponding to x. The order properties of the real line were used in the preceding
proofs; in the more general theorems, an appeal to a fixed point theorem will be
substituted.
Definition. Let X, Y, Z be Banach spaces. Let F: X x Y -+ Z be a mapping.
The Cartesian product X x Y is also a Banach space if we give it the norm
II(x,y)11 Ilxll IIYII·
= + If they exist, the partial derivatives of Fat (xo,Yo) are
bounded linear operators DIF(xo, Yo) and D2F(xo, Yo) such that
lim IIF(xo + h, Yo) - F(xo, Yo) - DIF(xo, yo)hll/llhil = 0 (h E X, h -+ 0)
and
lim IIF(xo, Yo + k) - F(xo, Yo) - D2F(xo, yo)kll/llkil = 0 (k E Y, k -+ 0)
Thus DIF(xo, Yo) E C(X, Z) and D2F(xo, Yo) E C(Y, Z). We often use the
notation Fi in place of DiF.
138 Chapter 3 Calculus in Banach Spaces

Theorem 3. General Implicit Function Theorem. Let X, Y,


and Z be normed linear spaces, Y being assumed complete. Let 0 be
an open set in X x Y. Let F: 0 -t Z. Let (xo, Yo) EO. Assume that
F is continuous at (xo, Yo), that F(xo, Yo) = 0, that D2F exists in 0,
that D2F is continuous at (xo, Yo), and that D2F(xo, Yo) is invertible.
Then there is a function f defined on a neighborhood of Xo such that
F(x,f(x)) = 0, f(xo) = Yo, f is continuous at xo, and f is unique
in the sense that any other such function must agree with f on some
neighborhood of Xo.

Proof. We can assume that (xo, Yo) = (0,0). Select 8 > °so that
{(x,y) : IIxll ~ 8, IIYII ~ 8} c 0

Put A = D 2 F(0, 0). Then A E .c(Y, Z) and A-I E .c(Z, Y). For each x satisfying
Ilxll~ 8 we define Gx(Y) = Y - A-I F(x, y). Here lIyll ~ 8. Observe that if Gx
has a fixed point y. , then

from which we conclude that F(x, yO) = 0. Let us therefore set about proving
that G x has a fixed point. We shall employ the Contraction Mapping Theorem.
(Chapter 4, Section 2, page 177). We have

By the continuity of D2F at (0,0) we can reduce (j if necessary such that

Now Gx(O) = -A- 1 F(x,0) = -A- 1 {F(x,0) - F(O,O)}. Let


the continuity of F at (0,0) we can find 80 E (0,8) so that
°< c < 8. By

If IIxll ~ 80 and IIYII ~ c, then by the Mean Value Theorem III of Section 2,
page 123,
II
IIGx(y)1I ~ IIGx(O) + IIGx(Y) - Gx(O) I
~ ~c + sup IIG~(AY)II . IIYII
0';;;>'';;;1
c c
~2+2=c

Define U = {y E Y : IIYII ~ c}. We have shown that, for each x satisfying


IIxll ~ 80 , the function Gx maps U into U. We also know that IIG~(Y)II ~ ~. By
Problem 1, Gx has a unique fixed point y in U. Since this fixed point depends
on x, we write y = f(x), thus defining f. From the observations above we infer

°
that
F(x,J(x)) =
Section 3.4 Implicit Function Theorems 139

Since F(O,O) = 0, it follows that Go(O) = O. By the uniqueness of the fixed


point, 0 = f(O). Since e was arbitrary in (0,6) we have this conclusion: For each
e in (0,6) there is a 6g such that

Our analysis then showed that y = f(x) E U, or Ilf(x)11 : ;


e. As a consequence,
Ilxll : ;6g =::} II!(x)ll::;;
e, showing continuity at O. For the uniqueness,
suppose that J is another function defined on a neighborhood of 0 such that J
is continuous at 0, 7(0) = 0, and F(x,7(x)) = O. If 0 < e < 6, find e > 0 such
that e < 6g and
Ilxll : ; e =::} IIJ(x)ll::;; e
Then J(x) E U. So we have apparently two fixed points, f(x) and J(x), for the
function G x . Since this is not possible, f(x) = J(x) whenever e. Ilxll : ;

Theorem 4. Second Version of the Implicit Function Theorem.
In the preceding theorem, assume further that F is continuously dif-
ferentiable in 0 and that D2F(xo, Yo) is invertible. Then the function
f will be continuously differentiable and

Furthermore, there will exist a neighborhood of Xo in which f is unique.

This theorem can be found in [Dieu], page 265.

Theorem 5. Inverse Function Theorem I. Let f be a continu-


ously differentiable map from an open set 0 in a Banach space into a
normed linear space. If Xo E 0 and if f' (xo) is invertible, then there is
a continuously differentiable function 9 defined on a neighborhood N
of J(xo) such that J(g(y») = y for all yEN.

Proof. For x in 0 and y in the second space, define F(x, y) = f(x) - y. Put
Yo = f(xo) so that F(xo, Yo) = O. Note that DIF(x, y) = f'(X), and thus
DIF(xo, Yo) is invertible. By Theorem 4, there is a neighborhood N of Yo and
a continuously differentiable function 9 defined on N such that F (g(y), y) = 0,
or f(g(y)) - y = 0 for all YEN. •

Theorem 6. Surjective Mapping Theorem I. Let X and Y be


Banach spaces, 0 an open set in X. Let f : 0 -+ Y be a continuously
differentiable map. Let Xo E 0 and Yo = f(xo). If f'(XO) is invertible,
as an element of £(X, Y), then f(O) is a neighborhood of Yo.

Proof. Define F: O~Y -+ Y by putting F(x,y) = f(x)-y. Then F(xo, Yo) =


o and D1F(xo,yo) = f'(xo). (Dl is a partial derivative, as defined previously.)
By hypothesis, DIF(xo, Yo) is invertible. By the Implicit Function Theorem
(with the roles of x and y reversed!), there exist a neighborhood N of Yo and
a function g: N -+ 0 such that g(yo) = Xo and F(g(y),y) = 0 for all YEN.
From the definition of F we have f (g(y)) - y = 0 for all YEN. In other words,
each element y of N is the image under f of some point in 0, namely, g(y). •
140 Chapter 3 Calculus in Banach Spaces

Theorem 7. A Fixed Point Theorem. Let !1 be an open set


in a Banach space X, and let G be a differentiable map from !1 to X.
Suppose that there is a closed ball B == B(xo, r) in 0 such that
(i) k == sup IIG'(x)11 < 1
xEB

(ii) IIG(xo) - xoll < r(l- k)


Then G has a unique fixed point in B.

Proof. First, we show that GIB is a contraction. If Xl


and X2 are in B, then
by the Mean Value Theorem (Theorem 4 in Section 3.2, page 123)

IIG(xd - G(x2)11 ~ supIIG'(XI + A(X2 - xdlllixi - x211


O~.x~l

~ klixi - x211
Second, we show that G maps B into B. If X E B, then

IIG(x) - xoll ~ IIG(x) - G(xo)11 + IIG(xo) - xoll


~ kllx - xoll + r(l -- k)
~ kr + (1 - k)r = r

Since X is complete, B is a complete metric space. By the Contractive Mapping


Theorem (page 177), G has a unique fixed point in B. •

Theorem 8. Inverse Function Theorem II. Let!1 be an open


set in a Banach space X. Let f be a differentiable map from 0 to a
normed space Y. Assume that!1 contains a closed ball B == B(xo,r)
such that
(i) The linear transformation A == f'(xo) is invertible.
(ii) k == SUPXEB III - A-I f'(x)11 < 1
Tllen for each y in Y satisfying IIY -
f(xo)11 < (1 - k)rIIA-1r l the
equation f(x) = y has a unique solution in B.

Proof. Let y be as hypothesized, and define G(x) = X - A-I[f(x) - yj. It


is clear that f(x) = y if and only if x is a fixed point of G. The map G is
differentiable in 0, and G'(x) = I - A-I
f'(x). To verify the hypothesis (i) in
the preceding theorem, write

(x E B)

By the assumptions made about y, we can verify hypothesis (ii) of the preceding
theorem by writing

IIG(xo) - xoll = Ilxo - A-I[J(xo) - yj- xoll


= IIA -lllll!(xo) - YII
~ IIA- I II(1- k)rIIA-Ir l = (1- k)r
By the preceding theorem, G has a unique fixed point in B, which is the unique
solution of f(x) = yin B. •
Section 3.4 Implicit Function Theorems 141

Example 2. Consider a nonlinear Volterra integral equation

x(t) - 2x(0) + ~ !at cos(st)[x(sW ds = y(t) (0 ~ t ~ 1)

in which y E C[O,l]. Notice that when y = 0 the integral equation has the
solution x = O. We ask: Does the equation have solutions when is small? Ilyll
Here, we use the usual sup-norm on C[O, 1], as this makes the space complete.
(Weighted sup-norms would have this property, too.) Write the integral equation
as f(x) = y, where f has the obvious interpretation. Then 1'(x) is given by

[f'(x)h](t) = h(t) - 2h(0) + !at cos(st)x(s)h(s) ds


Let A = 1'(0), so that Ah = h - 2h(0). One verifies easily that A 2 h = h, from
which it follows that A-I = A. In order to use the preceding theorem, with
Xo = 0, we must verify its hypotheses. We have just seen that A is invertible.
Let Ilxll~ r, where r is to be chosen later so that III -
A-I f'(x)11 ~ k < 1.
From an equation above,

I [f'(x)h](t) - (Ah)(t) 1= I!at cos(st)x(s)h(s)dsl


~ Ilhllllxll
It follows that
11f'(x)h - Ahll ~ Ilhllllxll
and that
11f'(x) - All ~ Ilxll ~ r
Since IIAII = IIA-III = 3, we have
III - A-I J'(x)11 = IIA -l(A - J'(x))11 ~ IIA -lllr = 3r
The hypothesis of the preceding theorem requires that 3r ~ k < 1, where k is
to be chosen later. By the preceding theorem, the equation f(x) = y will have
a unique solution if

In order for this bound to be as generous as possible, we let k = ~, arriving at


the restriction Ilyll
< ;k. •
Lemma. Let X and Y be Banach spaces. Let n be an open set
in X, and let f : n -+ Y be a continuously differentiable mapping. If
Xo E nand E > 0, then there is a 8 > 0 such that

Proof. The map x H 1'(x) is continuous from n to £(X, Y). Therefore, in


correspondence with the given E, there is a 8 > 0 such that

Ilx - xoll < 6 ===? 11f'(x) - f'(xo)11 < E


142 Chapter 3 Calculus in Banach Spaces

(We may assume also that B(xo,6) en.} If IIXI - xoll < 6 and IIx2 - xoll < 6,
then the line segment S joining Xl to X2 satisfies S c B(xo, S} c n. By Problem
2, page 145, we have
Ilf(XI} - f(X2} - !'(XO}(XI - x2}11 ~ Ilxl - x211· sup 11!'(x) - !'(xo}11
xES

~€lh-X211 •
Theorem 9. Surjective Mapping Theorem II. Let X and Y
be Banach spaces. Let n be an open set in X. Let f : n -+ Y be
continuously differentiable. If Xo E n and f'(xo) has a right inverse in
.c(Y, X}, then f(n} is a neighborhood of f(xo}.
Proof. Put A = f'(xo} and let L be a member of .c(Y, X} such that AL = I,
where I denotes the identity map on Y. Let c = IILII. By the preceding lemma,
there exists 6 > 0 such that B(xo, 6} en and such that
Ilu - xoll ~ 6, Ilv - xoll ~ 6 ==> Ilf(u} - f(v} - A(u - v}11 ~ fc Ilu - vii
Let Yo = f(xo} and y E B(yo, 6/2c}. We will find X E n such that f(x} = y.
The point x is constructed as the limit of a sequence {xn} defined inductively
as follows. We start with the given Xo. Put Xl = Xo + L(y - Yo}. From then on
we define
Xn+l = Xn - L[f(xn} - f(Xn-l} - A(xn - Xn-l]
By induction we establish that Ilxn - xn-lll ~ 6/2 n and Ilxn - xoll ~ 6. Here
are the details of the induction:
Ilxl - xoll = IIL(y - yoll ~ clly - yoll ~ c6/(2c} = 6/2.
IIXn+1 - xnll ~ cllf(xn} - f(xn-d - A(xn - xn-I}11
~ c(1/2c)llxn - xn-lll ~ o/2 n + 1
IIXn+1 - Xo II ~ IIXn+1 - Xn II + IIXn - xn-lll + ... + Ilxl - Xo II
6 6 6
~ 2n+1 + 2n + ... + 2 ~ 6
Next we observe that the sequence [xnJ has the Cauchy property, since (for
m> n)

Ilxn-xmll ~ IIXn-Xn+III+···+llxm-I-Xmll ~6Cn~1 +2n~2+"-) ~6/2n


Since X is complete, we can define X = limx n . All that remains is to prove
X E nand f(x} = y. Since IIXn - xoll ~ 6, we have Ilx - xoll ~ 6 and x E n.
From the equation defining Xn+l we have
A(Xn+1 - xn} = -AL{J(xn} - f(xn-d - A(xn - xn-d}
= A(xn - Xn-l} - {J(xn} - f(Xn-l)}
By using this equation recursively we reach finally
A(Xn+1 - xn} = A(XI - xo} - {J(xn} - f(Xn-l)}
- {J(xn-d - f(X n-2)} - ... - {J(xt} - f(xo)}
= AL(y - yo} - f(xn} + f(xo}
= Y - Yo - f(x n} + Yo = Y - f(xn}
Let n -+ 00 in this equation to get 0 = y - f(x}.

Section 3.4 Implicit Function Theorems 143

Corollary. Let f be a continuously differentiable function mapping


an open set 0 in a Banach space into a finite-dimensional Banach space.
If Xo E 0 and if f'(xo) is surjective, then f(O) is a neighborhood of
f(xo).

Proof. By comparing this assertion to the preceding theorem, we see that it


will suffice to prove that I' (xo) has a right inverse. It suffices to give the proof in
the case that f maps 0 into Euclidean n-space ]Rn, because all finite-dimensional
Banach spaces are topologically equivalent. Let {el, .. . , en} be the usual basis
for ]Rn. Let X be the Banach space containing 0, and set A = f'(x). Since
A is surjective, there exist points Ul, ... ,Un in X such that AUi = ei. Define
L :]Rn -+ X by requiring Lei = Ui (and that L be linear, of course). Obviously,
ALei = ei, so ALy = y for all y E ]Rn. Also, IILII ~ (2: IluiI12)1/2 because the
Cauchy-Schwarz inequality yields (with y = 2: Ciei)


Example 3. Let f : ]R3 -+ ]R3 be given by

f(x) = y
171 = 2~t + 6 cos 6 - 66
172 = (6 + 6? - 4sin6
173 = log(6 + 1) + 56 + cos 6 - 1

Notice that f(O) = O. We ask: For y close to zero is there an x for which
f(x) = y? To answer this, one can use the Inverse Function Theorem. We
compute the Frechet derivative or Jacobian:

8~r - 6 -6 sin6 COS 6 - 6 ]


f'(x) = [3(6: 6)2 -4cos6 3(6 +6)2
(6 + 1)-1 -sin6

At x = 0 we have

Obviously, 1'(0) is invertible, and so we can conclude that in some neighborhood


of y = 0 there is defined a function 9 such that f(g(y)) = y. •

The direct sum of n normed linear spaces Xl, X 2, ... ,Xn is denoted by
2:~=IEBXi. Its elements are n-tuples x = (Xl,X2, ... ,X n ), where Xi E Xi for
i = 1,2, ... , n. Although many definitions of the norm are suitable, we use
II xii = 2:~=1 Ilxill·
144 Chapter 3 Calculus in Banach Spaces

If X = I:~=l ffiX i and if f is a mapping from an open set of X into a normed


space Y, then the partial derivatives Dd(x), if they exist, are continuous linear
maps (Dd)(x) E £(Xi' Y) such that

Ilf(Xl,'" ,Xi-l,Xi + h,XH1,'" ,xn ) - f(x) - (Dd)(x)hll = o(llhll)


The connection between partial derivatives and a "total" derivative is as one
expects from multivariable calculus. That relationship is formalized next.

Theorem 10. Let f be defined on an open set n in the direct-sum


space X = I:~=1 ffiX i and take values in a normed space Y. Assume
that all the partial derivatives Dd exist in n and are continuous at
a point x in n. Then f is Fn§Chet differentiable at x, and its Frechet
derivative is given by
n
(1) f'(x)h = L Dd(x)hi (h E X)
i=l
Proof. Equation (1) defines a linear transformation from X to Y, and
n n
11f'(x)hll ~ L IIDd(x)hdl ~ L IIDd(x)llllhill
i=l i=l
n

~ l";J";n
max IIDjf(x)11 L. Ilhill
'l=1

= max IIDjf(x)llllhll
l";;J";n

Thus Equation (1) defines a bounded linear transformation. Let


n
G(h) = f(x + h) - f(x) - L Dd(x)hi
i=l

We want to prove that IIG(h)11 = o(llhll). For sufficiently small h, x + h is in n,


and the partial derivatives of G exist at x + h. They are

DiG(h) = Dd(x + h) - Dd(x)

If c is a given positive number, we use the assumed continuity of Dd at x to


find a positive 8 such that for Ilhll < 8 we have IIDiG(h)11 < c, for 1 ~ i ~ n.
Then, by the mean value theorem,

iiG(h)ii ~ iiG(h 1 , h2' ... ,h n ) - G(O, h2, .. . ,hn)ii


+ IIG(O, h 2 , ••• , hn ) - G(O, 0, h3 , .•• , hn)11
+ ... + IIG(O, 0, ... , hn ) - G(O, 0, ... ,0)11
n
~ Lcllhill = cllhll
i=l
Section 3.5 Extremum Problems and Lagrange Multipliers 145

Since c was arbitrary, this shows that IIC(h)11 = o(llhli). •


Problems 3.4

1. Let U be a closed ball in a Banach space. Let F : U+ -+ U, where U+ is an open set


containing U. Prove that if sup{IIF'(x)1I : x E U} < 1, then F has a unique fixed point
in U.
2. Let X be a Banach space and 0 an open set in X containing a, b, Xo, and the line segment
S joining a to b. Prove that

Ilf(b) - f(a) - !,(xo)(b - a)1I ~ lib - all sup 11!'(x) - !'(xo)11 .


xES

[Suggestion: Use the function g(x) = f(x) - f'(xo)x.] Determine whether the same
inequality is true when f'(xo) is replaced by an arbitrary linear operator. In this problem,
f : X -+ Y, where Y is any normed space.
3. Suppose F(xo,yo) = o. If Xl is close to Xo, there should be a YI such that F(XI,yd = o.
Show how Newton's method can be used to obtain YI. (Here F : X x Y -+ Z, and
X, Y, Z are Banach spaces.)
4. Prove Theorem 2.
5. Let f : !1 -+ Y be a continuously differentiable map, where !1 is an open set in a Banach
space, and Y is a normed linear space. Assume that f'(x) is invertible for each X E 0,
and prove that f(!1) is open.
6. Let Q be the point in [0,1] where COSQ = Q. Define X to be the vector space of all
continuously differentiable functions on [0,1] that vanish at the point Q. Define a norm
on X by writing Ilxll = sUPO';;t';;l Ix'(t)l. Prove that there exists a positive number 0
such that if y E X and lIyll < 0, then there exists an x E X satisfying

sin ox + x 0 cos = y

7. Let f be a continuous map from an open set !1 in a Banach space X into a Banach
space Y. Suppose that for some Xo in II, r(xo) exists and is invertible. Prove that f is
one-to-one in some neighborhood of Xo.
8. In Example 2, with the nonlinear integral equation, show that the mapping x ...... f'(x) is
continuous; indeed, it satisfies a Lipschitz condition.
9. Rework Example 2 when the term 2x(0) is replaced by QX(O), for an arbitrary constant
Q. In particular, treat the case when Q = o.

3.5 Extremum Problems and Lagrange Multipliers

A minimum point of a real-valued function f defined on a set M is a point Xo


such that f(xo) ~ f(x) for all x E M. If M has a topology, then the concept
of relative minimum point is defined as a point Xo E M such that for some
neighborhood N of Xo we have f(xo) ~ f(x), for all x in N.

Theorem 1. Necessary Condition for Extremum. Let n be


an open set in a normed linear space, and let f : n --+ R If Xo is a
minimum point of f and if f'(xo) exists, then f'(xo) = O.
146 Chapter 3 Calculus in Banach Spaces

Proof. Let X be the Banach space, and assume f'(xo) =I- o. Then there exists
v E X such that f'(xo)v = -1. By the definition of f'(xo) we can take A > 0
and so small that Xo + AV is in nand

I!(xo + AV) - f(xo) - A!,(xo)vl / Allvll < (211 v llr 1

This means that t[j(xo + AV) - f(xo)] is within distance ~ from -1, and so is
negative. This implies f(xo + AV) < f(xo). •

In this section we will be concerned mostly with constmined extremum


problems. A simple illustrative case is the following. We have two nice functions,
f and g, on JR2 to R We put M = {(x, y) : g(x, y) = O} and seek an extremum
of fiM. (That means f restricted to M.) If the equation g(x,y) = 0 defines y
as a function of x, say y = y(x), then we can look for an unrestricted extremum
of ¢(x) = f(x,y(x)). Hence we try to solve the equation ¢'(x) = O. This leads
to
0= fl(x,y(x)) + f2(x,y(x))y'(x)
= fl(x,y(x)) - f2(x,Y(X))gl(X,Y(X))/g2(X,y(X))

Thus we must solve simultaneously

(1) g(x,y) = 0 and fl(x,y) - f2(X,y)gl(X,y)/g2(X,y) = 0

The method of Lagrange multipliers introduces the function

H(x, y, A) = f(x, y) + Ag(X, y)

and solves simultaneously H1 = H2 = H3 = o. Thus

fl (x, y) + Ag1 (x, y) = f2(x, y) + Ag2(X, y) = g(x, y) = 0

If g2(X, y) =I- 0, then A = - f2(x, y)/ g2(X, y), and we recover system (1). The
method of Lagrange multipliers treats x and y symmetrically, and includes both
cases of the implicit function theorem. Thus y can be a differentiable function
of x, or x can be a differentiable function of y.

Example 1. Let f and 9 be functions from JR2 to JR defined by f(x, y) =


x 2 + y2, g(x,y) = X - Y + 1. The set M = ((x,y) : g(x,y) = O} is the straight
line shown in Figure 3.3. Also shown are some level sets of f, Le., sets of the
type {(x, y) : f (x, y) = c}. At the solution, the gradient of f is parallel to the
gradient of g. The function H is H(x, y, A) = x 2 + y2 + A(X - y + 1), and the
three equations to be solved are 2x + A = 2y - A = x - y + 1 = O. The solution
. ( 1
1S -2' 21 ) . •
Section 3.5 Extremum Problems and Lagrange Multipliers 147

Figure 3.3

Example 2. Let f(x, y) =:: x 2 - y2 and g(x, y) =:: x 2 + y2 - 1. Again we show


M and some level sets of f, which are hyperbolas and straight lines. There
are four extrema; some are maxima and some are minima. Which are which?
The H-function is H =:: x 2 - y2 + A(X 2 + y2 - 1), and the three equations to
solve are 2x + 2AX =:: -2y + 2AY =:: x 2 + y2 - 1 =:: 0. The (x, y, A) solutions are
(0,1,1), (0,-1,1), (1,0,-1), (-1,0,-1). Figure 3.4 is pertinent. •

Figure 3.4

If there are several constraint functions, there will be several Lagrange multipli-
ers, as in the next example.

Example 3. Find tIle minimum distance from a point to a line in JR3. Let the
line be given as the intersection of two planes whose equations are (a, x) =:: k
and (b,x) =:: e. (Here, x, a, and b belong to JR3.) Let the point be c. Then H
should be

Ilx - cl1 2 + A[(a,x) - k] + IL[(b,x) - e]


148 Chapter 3 Calculus in Banach Spaces

This H is a function of (Xl, X2, X3, A, Ji). The five equations to solve are

2(Xl - Cl) + Aal + Jibl = 2(X2 - C2) + Aa2 + Jib2 = 2(X3 - C3) + Aa3 + Jib3 = °
(a, x) - k = (b, x) - £ = °
We see that x is of the form x = c + o:a + (3b. When this is substituted in the
second set of equations, we obtain two linear equations for determining 0: and
(3:
(a, a)o: + (a, b)(3 = k - (a, c) and (a, b)o: + (b, b)(3 = £ - (b, c) •

Theorem 2. Lagrange Multiplier. Let J and 9 be continuously


differentiable real-valued functions on an open set n in a Banach space.
Let M = {x En: g(x) = O}. If Xo is a local minimum point of JIM
and if g'(xo) =f. 0, then f'(xo) = Ag'(Xo) for some A E R

Proof. Let X be the Banach space in question. Select a neighborhood U of


Xo such that
x E Un M = } J(xo) (J(x)
We can assume U c n. Define F : U -+ jR2 by F(x) = (f(x),g(x)). Then
F(xo) = (f(xo), O) and F'(x)v = (f'(x)v,g'(x)v) for all VEX. Observe that
if r < J(xo), then (r,O) is not in F(U). Hence F(U) is not a neighborhood of
F(xo). By the Corollary in Section 4.4, F'(xo) is not surjective (as a linear map
from X to jR2). Hence F'(xo)v = o:(v)(O, Ji) for some continuous linear functional
0:. (Thus 0: E X*.) It follows that f'(xo)v = o:(v)O and g'(xo)v = o:(v)Ji. Since
g' (xo) =f. 0, Ji =f. 0. Therefore,


Theorem 3. Lagrange Multipliers. Let J, gl,"" gn be contin-
uously differentiable real-valued functions defined on an open set n in
a Banach space X. Let M = {x En: gl(X) = ... = gn(x) = O}. If
Xo is a local minimum point of JIM (the restriction of J to M), then
there is a nontrivial1inear relation of the form

Proof. Select a neighborhood U of Xo such that U c n and such that J(xo) (


J(x) for all x E Un M. Define F : U -+ jRn+l by the equation

If r < J(xo), then the point (r, 0, 0, ... ,0) is not in F(U). Thus F(U)
does not contain a neighborhood of the point (f(XO),gl(XO), ... ,gn(XO) ==
(f(xo), 0, 0, ... ,0). By the Corollary in Section 3.4, page 143, F'(xo) is not
surjective. Since the range of F' (xo) is a linear subspace of jRn+l, we now know
that it is a proper subspace of jRn+l. Hence it is contained in a hyperplane
through the origin. This means that for some Ji,Al,'" ,An (not all zero) we
have
Section 3.5 Extremum Problems and Lagrange Multipliers 149

for all v EX. This implies the equation in the statement of the theorem. •
Example 4. Let A be a compact Hermitian operator on a Hilbert space X.
Then IIAII == max{IAI : A E A(A)}, where A(A) is the set of eigenvalues of A.
This is proved by Lemma 2, page 92, together with Problem 22, page 101. Then
by Lemma 2 in Section 2.3, page 85, we have II All == sup{I(Ax,x)1 : Ilxll == I}.
Hence we can find an eigenvalue of A by determining an extremum of (Ax, x)
on the set defined by Ilxll == 1. An alternative is given by the next result. •

Lemma. If A is Hermitian, then the "Rayleigh Quotient" f(x) ==


(Ax,x)/(x,x) has a stationary value at each eigenvector.

Proof. Let Ax == AX, xi- o. Then f(x) == (Ax,x)/(x,x) == A. Recall that the
eigenvalues of a Hermitian operator are real. Let us compute the derivative of
f at x and show that it is O.

lim If(x + h) - f(xWllhll == lim I (Ax + Ah,x + h) - Al/llhil


h-+D (x+h,x+h) I
== lim I(Ax,x) + (Ah,x) + (Ax, h) + (Ah, h) - Allx + hWl/llhllllx + hl1 2
== lim I(h, Ax) + A(X, h) + (Ah, h) - 2ARe(x, h) - A(h, h) 1/IIhlllix + hl1 2

== lim IA(h,x) + A(X, h) + (Ah, h) - 2ARe(x, h) - A(h, h)l/llhllllx + hl1 2

== lim I(Ah,h) - A(h,h)l/llhllllx + hl1 2


== lim I(Ah - Ah, h)l/llhllllx + hl1 2

~ lim IIAh - Ahllllhll/llhlllix + hl1 2


~ lim IIA - AIllllhll/llx + hl1 2 == 0
Thus from the definition of l' (x) as the operator that makes the equation

lim If(x
h-+D
+ h) - f(x) - I'(x)hl I Ilhll == 0

true, we have J'(x) == O.


Since the Rayleigh quotient can be written as

(Ax, x) / (X) x)
~==\A W 'W
it is possible to consider the simpler function F(x) == (Ax, x) restricted to the
unit sphere.

Theorem 4. If A is a Hermitian operator on a Hilbert space,


then each local constrained minimum or maximum point of (Ax, x) on
the unit sphere is an eigenvector of A. The value of (Ax, x) is the
corresponding eigenvalue.

Proof. Use F(x) == (Ax, x) and G(x) == IIxl12 -1. Then

F'(x)h == 2(Ax, h) G'(x)h == 2(x, h)


150 Chapter 3 Calculus in Banach Spaces

Our theorem about Lagrange multipliers gives a necessary condition in order


that x be a local extremum, namely that p,F'(x) + AG'(X) = 0 in a nontrivial
manner. Since Ilxll
= 1, G'(x) ::j:. o. Hence p, ::j:. 0, and by the homogeneity we
can set p, = -1. This leads to

-2(Ax, h) + 2A(X, h) = 0 (h E X)

whence Ax = AX.
Extremum problems with inequality constraints can also be discussed in a

general setting free of dimensionality restrictions. This leads to the so-called
Kuhn-Tucker Theory.
Inequalities in a vector space require some elucidation. An ordered vector
space is a pair (X, ~) in which X is a real vector space and ~ is a partial order
in X that is consistent with the linear structure. This means simply that

x~y =>
x~y, A~O =>
In an ordered vector space, the positive cone is

P={x:x~O}

A cone having vertex v is a set C such that v + A(X - v) E C when x E C


and A ~ o. It is elementary to prove that P is a convex cone having vertex at
o. Also, the partial order can be recovered from P by defining x ~ y to mean
x - YEP.
If X is a normed space with an order as described, then X* is ordered in
a standard way; namely, we define ¢ ~ 0 to mean ¢(x) ~ 0 for all x ~ o. Here
¢EX*.
These matters are well illustrated by the space C[a, bJ, in which the natural
order f ~ 9 is defined to mean f(t) ~ g(t) for all t E [a, b]. The conjugate space
consists of signed measures.
In the next theorem, X and Yare normed linear spaces, and Y is an ordered
vector space. Differentiable functions f : X -+ lR and G : X -+ Yare given. We
seek necessary conditions for a point Xo to maximize f(x) subject to G(x) ~ O.

Theorem 5. If Xo is a local maximum point of f on the set {x :


G(x) ~ O} and if there is an hEX such that G(xo) + G'(xo)h is an
interior point of the positive cone, then there is a nonnegative functional
¢ E y* such that ¢(G(xo)) = 0 and f'(xo) = -¢ 0 G'(xo).

Proof. (Following Luenberger [Lue2]). Working in the space lR x Y, we define


two convex sets
H = { (t, y): for some h, t ~ f'(xo)h and y ~ G(xo) + G'(xo)h }
K = { (t,y): t ~ 0, y ~ 0 } = [0,00) x P
One of the hypotheses in the theorem shows that P has an interior point, and
consequently K has an interior point. No interior point of K lies in H, however.
Section 3.5 Extremum Problems and Lagrange Multipliers 151

In order to prove this, suppose that (t, y) is an interior point of K and belongs
to H. Then for some hEX we have

0< t ~ !,(xo)h
0< y ~ G(xo) + G'(xo)h
Here the inequality y > 0 is interpreted to mean that y is an interior point of
the positive cone P. For A E (0,1) we have

G(xo + Ah) = G(xo) + G'(xo)Ah + O(A)


= (1 - A)G(XO) + A[G(XO) + G'(xo)h] + O(A)
~ A[G(XO) + G'(xo)h] + O(A)
~ AY + O(A)

Since y is an interior point of P, there is an € > 0 such that B(y, €) c P.


By Problem 1, B(AY, A€) c P for all A > O. Select A small enough so that
\\O(A)\\ < A€. Then
AY + O(A) E B(AY, A€) C P
Consequently, G(xo + Ah) ~ O. Similarly, for small A we have

f(xo + Ah) = f(xo) + !'(xo)Ah + O(A)


~ f(xo) + At + O(A) > f(xo)

Thus Xo + Ah lies in the constraint set and produces a larger value in f than
f(xo). This contradiction shows that H is disjoint from the interior of K.
Now use the Separation Theorem (Theorem 2 in Section 7.3, page 343). It
asserts the existence of a hyperplane separating K from H. Thus there exist
J.l E IR and </> E Y· such that iJ.li + \\</>\\ > 0 and

J.tt + </>(y) ~ c when (t, y) E H


J.lt + </>(y) ~ c when (t,y) E K

Since (0,0) E H n K, we see that c = O. From the definition of K, we see that


J.l ~ 0 and </> ~ O. Actually, J.l > O. To verify this, suppose J.l = O. Then </> i= 0,
and </>(y) ~ 0 whenever (t, y) E H. From the hypotheses of the theorem, there
is an h such that G(xo) + G'(xo)h == z is interior to P. By the definition of H,
(f'(xo)h, z) E H, and so </>(z) ~ O. Hence there are points y near z and in P
where </>(y) < O. But this contradicts the fact that </> ~ O.
Since J.l > 0, we can take it to be 1. For any h, the point

(J'(xo)h, G(xo) + G'(xo)h)


belongs to H. Consequent}y,

!,(xo)h + </>[G(xo) + G'(xo)h] ~ 0 (h E X)

Taking h = 0 in this inequality gives us </>[G(xo)] ~ O. Since G(xo) ~ 0 and


</> ~ 0, we have </>[G(xo)] ~ O. Thus </>[G(xo)] = O.
152 Chapter 3 Calculus in Banach Spaces

Now we conclude that

f'(xo)h + ¢[G'(xo)h] :( 0 (h E X)

Since h can be replaced by -h here, it follows that

f'(xo)h + ¢[G'(xo)h] = 0
In other words,
f'(xo) = -¢oG'(xo)

Problems 3.5

1. (a) Use Lagrange multipliers to find the maximum of xy subject to x + y = c. (b) Find
the shortest distance from the point (1,0) to the parabola given by y2 = 4x.
2. Let the equations f(x,y) = 0 and g(x,y) = 0 define two non-intersecting curves in JR2.
What system of equations should be solved if we wish to find minimum or maximum
distances between points on these two curves?
3. Show that in an ordered vector space with positive cone P, if B(x,r) C P, then
B(>'x, >.r) C P for>. ;;::: O.
4. Prove that the positive cone P determines the vector order.
5. Let A be a Hermitian operator on a Hilbert space. Define f(x) = (Ax,x) and g(x) =
(x,x) - 1. What are f'(x) and g'(x)? Find a necessary condition for the extrema of f
on the set M = {x : g(x) = O}. (Use the first theorem on Lagrange multipliers.) Prove
that your necessary condition is fulfilled by any eigenvector of A in M.
6. What is f'(x) in the lemma of this section if x is not an eigenvalue?
7. Use the method of Lagrange multipliers to find a point on the surface
(x - y)2 - z2 = 1 as close as possible to the origin in JR3.
8. Let A and B be Hermitian operators on a real Hilbert space. Prove that the stationary
values of (x, Ax) on the manifold where (x, Bx) = 1 are necessarily numbers>' for which
A - >'B is not invertible.
9. Find the dimensions of a rectangular box (whose edges are parallel to the coordinate
axis) that is contained in the ellipsoid a 2x 2 + /?y2 + c2 z2 = 1 and has maximum volume.
10. Find the least distance between two points, one on the parabola y = x2 and the other
on the parabola y = -(x - 4)2.
11. Find the distance from the point (3,2) to the curve xy = 2.
12. In JR3 the equation x 2 + y2 = 5 describes a cylinder. The equation 6x + 3y + 2z = 6
describes a plane. The intersection of the cylinder and the plane is an ellipse. Find the
points on this ellipse that are nearest the origin and farthest from the origin.
13. Find the minimum and maximum values of xy + yz + zx on the unit sphere in JR3
(x 2 + y2 + z2 = 1). See [Barb], page 21.

3.6 The Calculus of Variations

The "calculus of variations," interpreted broadly, deals with extremum problems


involving functions. It is analogous to the theory of maxima and minima in
elementary calculus, but with the added complication that the unknowns in the
Section 3.6 Calculus of Variations 153

problems are not simple numbers but functions. We begin with some classical
illustrations, posing the problems only, and postponing their solutions until after
some techniques have been explained. Traditional notation is used, in which x
and yare real variables, x being "independent" and y being "dependent." This
harmonizes with most books on this subject.

Example 1. Find the equation of an arc of minimal length joining two points
in the plane. Let the points be (a,o) and (b,j3), where a < b. Let the arc
be given by a continuously differentiable function y = y(x), where y(a) = a
and y(b) = 13. The arc length is given by the integral J:
Jl + y'(X)2 dx. Here
y E C 1 [a, b]. The solution, as we know, is a straight line, and this fact will be
proved later. •

Example 2. Find a function y in C 1 [a, b], satisfying y(a) = a and y(b) = 13,
such that the surface of revolution obtained by rotating the graph of y about the
x-axis has minimum area. To solve this, one starts by recalling from calculus
that the area to be minimized is given by

(1)

The solution turns out to be (in many cases) a catenary, as shown later. Figure
3.5 shows one of these surfaces. •

Example 3. In a vertical plane, with gravity exerting a downward force, we


imagine a particle sliding without friction along a curve joining two points, say

°
(0,0) and (b,j3). There is no loss of generality in taking b > 0, and if the positive
direction of the y-axis is downward, then 13 > also. We ask for the curve along
which the particle would fall in the least time. If the curve is the graph of a
function y in C 1 [0, b), then the time of descent is

l+y'(x)2
(2) ----::c---"-:-'--:-''- dx
2gy(x)

as is shown later. In the integral, 9 is the acceleration due to gravity. This prob-
lem is the "Brachistochrone Problem," posed as a challenge by John Bernoulli
in 1696. Figure 3.6 shows two cases of such curves, corresponding to two choices
of the terminal point (b,j3). Both curves are cycloids, one being a subset of the
other.
154 Chapter 3 Calculus in Banach Spaces

Figure 3.5

,
,
~~~i -----T-~-~
, ::~'l-O :o,'i-=~
-------1-----1---------------------------
:.-~-+-
----t------------t-----------------·---------------------
! !
-1.2 +---~----l~------~ ~---~-~

-1. 01------+-',--+---1---+---+
-1_> ----------
i~
---------r-------- --------"Jo-~-------+-~-------f---
-1. 7,j.....----i----"--i---l---j.-~.L-l

- 1.4 -~- -~-+ --+ ~-I--


, !
Figure 3.6

In 1696, Isaac Newton had just recently become Warden of the Mint and
was in the midst of overseeing a massive recoinage. Nevertheless, when he heard
of the problem, he found that he could not sleep until he had solved it, and
having done so, he published the solution anonymously. Bernoulli, however,
knew at once that the author of the solution was Newton, and in a famous
remark asserted that he "recognized the Lion by the print of its paw". [West] •

The three examples given above have a common form; thus, in each one
Section 3.6 Calculus of Variations 155

there is a nonlinear functional to be minimized, and it has the form

(3) lb F(x,y(x),y'(x))dx

The unknown function y is required to satisfy endpoint conditions y(a) = a and


y(b) = {3. In addition, some smoothness conditions must be imposed on y, since
the functional is allowed to involve y'. The first theorem establishes a necessary
condition for extrema, known as the Euler equation, or the Euler-Lagrange
Equation.

Theorem 1. The Euler Equation. Let F be a mapping


from ]R3 to]Rl, possessing piecewise continuous partial derivatives of
the second order. In order that a function y in C 1 [a, b] minimize
J: F(x, y(x), y'(x)) dx subject to the constraints y(a) = a, y(b) = {3, it
is necessary that Euler's equation hold:

(4) d~F3(X'Y(X)'Y'(x)) = F2(x,y(x),y'(x))


(Here F2 and F3 are partial derivatives.)

Proof. Let u E C 1 [a, b] and u( a) = u( b) = O. Assume that y is a solution of


the problem. For all real 0, y + Ou is a competing function~ Hence

:0 lb F(x, y(x) + Ou(x), y'(x) + Ou'(x)) dxlo=o 0 =

This leads to J: (F2u + F3u') = O. The second term can be integrated by parts.
The result is

l b
[F2(x,y(X),y'(X)) - d~F3(X'Y(X)'Y'(x))] u(x)dx = 0
By invoking the following lemma, we obtain the Euler equation.

Lemma.
o for every u
J:
Ifv is piecewise continuous on [a, b] and if u(x)v(x) dx =
in Cl [a, b] that vanishes at the endpoints a and b, then
v = O.

Proof. Assume the hypotheses, and suppose that v # O. Then there is a


nonempty open interval (a, {3) contained in [a, b] in which v is continuous and
has no zero. We may assume that v(x) > 0 on (a, {3). There is a function u
in C 1[a,b] such that u(x) > 0 on (a,{3) and u(x) = 0 elsewhere in [a,b]. Since
J: uv = J:uv > 0, we have a contradiction, and v = O. •
Example 1 revisited. (Shortest distance between two points.) In this problem,
F(u, v, w) = )1 + w 2. Hence Fl = F2 = 0 and F3 = w(1 + W 2)-1/2. Then
F3(x,y(x),y'(x)) = y'(x) [1 + y'(x)2r 1/ 2
The Euler equation is d~F3(X'Y(X)'Y'(x)) = O. This can be integrated to yield
F3(x,y(x),y'(x)) = constant. Then we find that y' must be constant and that
y(x) = a + m(x - a), where m = ({3 - a)/(b - a). •
156 Chapter 3 Calculus in Banach Spaces

Theorem 2. In Theorem 1, if Fl = 0, then the Euler equation


implies that

(5) y'(x)F3 (x,y(x),y'(x)) - F(x,y(x),y'(x)) = constant


Proof·

Example 2 revisited. In Example 2, the function F in the general theory will


be F(u,v,w) = v(l +W2)1/2, where u = x, v = y(x), and w = y'(x). Then, by
the preceding theorem, WF3 - F is constant. In the present case, it means that

In this equation, multiply by (1 + w2 r/ 2, obtaining

This gives us c2 (1 + w2 ) = v 2 , from which we get

Write this as
dy dx
Vy2 - c2 c

This can be integrated to give cosh- 1 (y/c) = (x/c) + A. Without loss of general-
ity, we take the left-hand endpoint to be (0, a). The curve y = ccosh((x/c) + A)
passes through this point if and only if a = ccosh A. Hence c can be eliminated
to give us a one-parameter family of catenaries:

(6) a
y = cosh A cosh
(COSh A
-a- x + A
)

Here A is the parameter. If this catenary is to pass through the other given
endpoint (b, (3), then A will have to satisfy the equation

{3 = (a/ cosh A) cosh(bcosh A/a + A)

Here a, b, {3 are prescribed and A is to be determined. In [Bl] (pages 85-119) you


will find an exhaustive discussion. Here are the main conclusions, without proof:
I. The one-parameter family of catenaries in Equation (6) has the appearance
shown in Figure 3.7. The "envelope" of the family is defined by g(x) =
min A YA(x), where YA is the function in Equation 6.
II. If the terminal point (b, (3) is below the envelope, no member of the family
(6) passes through it. The problem is then solved by the "Goldschmidt
solution" described below.
Section 3.6 Calculus of Variations 157

III. If the terminal point is above the envelope, two catenaries of the family (6)
pass through it. One of these is a local minimum in the problem but not
necessarily the absolute minimum. If it is not the absolute minimum, the
Goldschmidt solution again solves the problem.
IV. For terminal points sufficiently far above the envelope, the upper catenary
of the two passing through the point is the solution to the problem.
V. The Goldschmidt solution is a broken line from (0, a) to (0,0) to (b,O) and
to (b,j3). It generates a surface of revolution whose area is 7l"(a 2 + 13 2 ). •

Figure 3.7
Example 3 revisited. Consider again the Brachistochrone problem. We are
using (x, y) to denote points in }R2, t will denote time, and s will be arc length.
The derivation of the integral given previously for the time of descent is as
follows. At any point of the curve, the downward force of gravity is mg, where m
is the mass of the particle and 9 is the constant acceleration due to gravity. The
component of this force along the tangent to the curve is mg cos (J = mg (dy / ds),
where (J is the angle between the tangent and the vertical. (See Figure 3.8.) The
velocity of the particle is ds / dt, and its acceleration is d2 s / de.

~dS
dY~
dx
Figure 3.8
By Newton's law of motion (F = ma) we have mg(dy/ds) = m(d 2 s/dt 2 ), or
d2 s/dt 2 = g(dy/ds). Multiply by 2(ds/dt) to get
2 ds d 2 S _ 2 dy ds

r
dt dt 2 - 9 ds dt
whence
!!:.. 2

dt
( ds )
dt
_ 2 dy
- 9 dt and ( ~; = 2gy +C
158 Chapter 3 Calculus in Banach Spaces

If the initial conditions are t = 0, y = 0, and ds/dt = 0, then C = O. Hence

ds !il:::. dt 1
dt = y2gy
ds J29Y

_1_+-=.Y-,.:'(--:,x),-2 dx
2gy(x)

Since we seek to minimize this integral, the factor 1/..j2g may be ignored. The
function F in the general theory is then F(u, v, w) = y'(1 + w2 )/v. Since
Fl = 0, Theorem 2 applies, and we can infer that y' (x )F3 (x, y( x), y' (x)) -
F (x, y( x), y' (x)) = c (constant). For the particular F in this example,
F 3(u, v, w) = w[v(1 + w2 )]-1/2. Thus

After a little algebraic manipulation we get y(x)[1 + y'(X)2] = c- 2. When this


is "separated" we get dx = y'y/(k - y) dy and x = J y'y/(k - y) dy. The
integration is carried out by making a substitution y = ksin 2 (). The result is
x = k(() - 1/2sin2()). Then we have a curve given parametrically by the two
formulas. With I/> = 2() they become x = (k/2)(I/> - sin 1/», y = (k/2)(I- cos 1/».
These are the standard equations of a cycloid. •
Example 4. This is the Brachistochrone Problem, except that the terminal
point is allowed to be anywhere on a given vertical line. Following the previous
discussion, we are led to minimize the expression

1 + y'(X)2
y(x) dx

subject to y E C 2[0, b] and y(O) = O. Notice that the value y(b) is not prescribed.
To solve such a problem, we require a modification of Theorem 1, namely:

Theorem 3. Any function y in C 2 [a, b] that minimizes

lb F(x,y(x),y'(x)) dx

subject to the constraint y(a) = a must satisfy the two conditions

d
(7) dxF3(x,y(x),y'(x)) = F2(x,y(x)y'(x)) and F3(b,y(b),y'(b)) = 0

Proof. This is left as a problem.


Returning now to Example 4, we conclude that F3(b, y(b), y'(b)) = O. This

entails y'(b)/ y'y(b)[1 + y'(b)2] = 0, or y'(b) = O. Thus the slope of our cycloid
Section 3.6 Calculus of Variations 159

must be zero at x = b. The cycloids going through the initial point are given
parametrically by
X = k( ¢ - sin ¢)
{
y = k(l- cos¢)
The slope is
dy dy dx sin¢
dx = d¢ -;- d¢ = 1 - cos ¢
° °
This is at ¢ = 7r. The value x = b corresponds to ¢ = 7r, and k = b/7r. The
solution is given by x = (b/7r)(¢ - sin¢), y = (b/7r)(l- cos¢), ~ ¢ ~ 7r. •

Example 5. (The Generalized Isoperimetric Problem.) Find the function y


that minimizes an integral

lb F(x,y(x),y'(x))dx

subject to constraints that y belong to C 1[a, bj and

lb C(x,y(x),y'(x)) dx = ° y(a) = a y(b) = {3



The next theorem pertains to this problem.

Theorem 4. If F and C map lR3 to lR and have continuous partial


derivatives of the second order, and if y is an element of C 2 [a, bj that
minimizes J: F(x,y(x),y'(x)) dx subject to endpoint constraints and
J: C(x, y(x), y'(x)) dx = 0, then there is a nontrivial linear combina-
tion H = /LF + >'C such that

(8) H 2 (x,y(x),y'(x)) = d~H3(X'Y(X)'Y'(x))


Proof. As in previous problems of this section, we try to obtain a neces-
sary condition for a solution by perturbing the solution in such a way that
the constraints are not violated. Suppose that y is a solution in C 1[a, bj. Let
'r/1 and 'r/2 be two functions in C 1[a, bj that vanish at the endpoints. Consider
the function z = y + (h'r/1 + 82'r/2. It belongs to C 1[a, bj and takes correct val-
ues at the endpoints: z(a) = a, z(b) = {3. We require two perturbing func-
°
tions, 'r/1 and'r/2 because the constraint J:C(x,z(x),z'(x))dx = will be true
only if we allow a relationship between the two parameters 81 and 82. Let

°
1(81,82) = J: F(x,z(x),z'(x))dx and J(81,82) = J:C(x,z(x),z'(x))dx. The
minimum of 1(81,82) under the constraint J(8 1,82) = occurs at 81 = 82 = 0,
because y is a solution of the original problem. By the Theorem on Lagrange
Multipliers (Theorem 3, page 148), there is a nontrivial linear relation of the
form /LI'(O, 0) + ).}'(O, 9) = 0. Thus
al aJ al aJ
/L a81 +>'a8 1 =0 at (81,82)=(0,0), and /L a82 +>'a82 =0 at (0,0)

Following the usual procedure, including an integration by parts, we eventually


obtain Equation (8). •
160 Chapter 3 Calculus in Banach Spaces

Example 6. It is required to find the curve of given length £ joining the


point (-1,0) to the point (1,0) that, together with the interval [-1,1] on the
horizontal axis, encloses the greatest possible area.
We assume that 2 < £ < 7r. Let the curve be given by Y = y(x), where y
belongs to C 1 [-1, 1]. The area to be maximized is then

and the constraints are

y(-I) = y(l) = °
This problem can be treated with Theorem 4, taking

F(x, y, y') = y and G(x, y, y') = }1 + (y')2 - £/2

The necessary condition of Theorem 4 is that for a suitable nontrivial pair of


coefficients J.l and A

(In these equations, subscript 2 means a partial derivative with respect to the
second argument of the function, and so on.) In the case being considered, we
have F2 = 1, F3 = 0, G 2 = 0, and G 3 = y'(x)([1 + y'(X)2]-1/2. The necessary
condition then reads
J.l-A~ y'(x) =0
dx }1 + y'(X)2
°
If J.l = 0, then A must be as well. Hence we are free to set J.l
the previous equation, arriving at
= 1 and integrate

AY'(X)
x- = Cl
}1 + y'(x)2
This can be solved for y' (x):

Another integration leads to

We see that the curve is a circle by writing this last equation in the form
Section 3.6 Calculus of Variations 161

Since the circle must pass through the points (-1,0) and (1,0), we find that

*,
Cl = 0 and that 1 + c~ = >.2. When the condition on the length of the arc is
imposed, we obtain £. = 2>' Arcsin from which>. can be computed. •
Example 7. (The Classical Isoperimetric Problem.) Among all the plane
curves having a prescribed length, find one enclosing the greatest area. We
assume a parametric representation x = x(t) and y = y(t) with continuously
differentiable functions. We can also assume that 0 ~ t ~ b and that x(O) = x(b),
y(O) = y(b) so the curve is closed. Let us assume further that as t increases from
o to b, the curve is described in the counterclockwise direction. The region
enclosed is then always on the left. Recall Green's Theorem, [Wid1], page 223:

where R is the region enclosed by the curve r and the subscripts denote partial
derivatives. A special case of Green's Theorem is

~
21r
r(-y dx + x dy) ~2 f'lRr(1 + 1) dx dy
= = Area of R

Thus our isoperimetric problem is to maximize the integral

lo
b
( - ydx
dt
dY ) dt
- +x-
dt

r r
subject to

fob ( ~~ + (~~ dt = constant

This isoperimetric problem involves again the minimization of an integral subject


to a constraint expressed as an integral. But now we have two unknown functions
to be determined. A straightforward extension of Theorem 4 applies in this
situation. Suppose that we wish to minimize

lb F(t, x(t), x'(t), y(t), y'(t)) dt

subject to the usual endpoint constraints and a constraint

1b G (t, x(t), x' (t), y(t), y' (t)) dt = 0

The Euler necessary condition is that for a suitable nontrivial linear combination
H = Jl.F+>.G,
H2 (t,x,x',y,y') = :t H3 (t,x,X',y,y')

H4 (t,x,x'y,y') = !H5 (t,x,X',y,y')


162 Chapter 3 Calculus in Banach Spaces

If we apply this result to Example 7, we will use

H(t,x,x'y,y') == J-L(xy' - yx') + )..Jx,2 + y,2


The Euler equations are

J-Ly' ! [-JlY + )..x' (X,2 + y'2) -1/2]

:t
{
-J-LX' == [J-LX + )..y'(X,2 + y'2fl/2]
Upon integrating these with respect to t, we obtain
2J-LY == )..x' (X,2 + y'2fl/2 + A
-2J-Lx == )..y'(X,2 +y'2f 1/ 2 - B
If J-L == 0, we infer that x' == y' == 0, and then the "curve" is a straight line.
Hence J-L =I- 0, and by homogeneity we can assume J-L == ~. Then y - A ==
)..X'(X,2 + y'2)-1/2 and x - B == _)..y'(X,2 + y'2)-1/2. Square these two equations
and add to obtain the equation of a circle: (x - B)2 + (y - A)2 == )..2. •

Applications to Geometrical Optics. Fermat's Principle states that a ray


of light passing between two points will follow a path that minimizes the elapsed
time. In a homogeneous medium, the velocity of light is constant, and the least
elapsed time will occur for the shortest path, which is a straight line. Consider
now two homogeneous media separated by a plane. Let the velocities of light
in the two media be Cl and C2. What is the path of a ray of light from a point
in the first medium to a point in the second? We assume that the path lies in
a plane. By the remarks made above, the path consists of two lines meeting at
the plane that separates the two media.
(x],y])

y-axis
'-
velocity c 1
x-axis
velocity c2

Figure 3.9

If the coordinate system is as shown in Figure 3.9, and if the unknown point
on the x-axis is (x,O) then the time of passage is

T == ~J(x
Cl
- xd 2 + yr + ~J(x -
C2
X2)2 + y~ == cl 1Pl + C2 1P2
Section 3.6 Calculus of Variations 163

For an extremum we want dTI dx = O. Thus

_ldpl+ -ldp2_ 0
cl dx c2 dx - ,
-1 1 -1 1
c l -(X-Xl)+C2 -(X-X2)=O
PI P2
cIl sin¢l = c2"l sin¢2

This last equation is known as Snell's Law.


Now consider a medium in which the velocity of light is a function of y; let
us say c = c(y). This would be the case in the Earth's atmosphere or in the
ocean. Think of the medium as being composed of many thin layers, in each of
which the velocity of light is constant. See Figure 3.10, in which three layers are
shown.

velocity c 1

velocity c2
,
<l>3~ velocity c3

Figure 3.10

Snell's Law yields

sin ¢l = sin ¢2 = sin ¢3 = ... = k constant


Cl C2 C3
For a continuously varying speed c(y), the path of a ray of light should satisfy
sin ¢(y)/c(y) = k. Notice that the slope of the curve is

y' (x) = tan G- ¢) = cot ¢ = cos ¢I sin ¢

= Vl-sin2¢/sin¢= Jl-k 2 c2 /kc

J
Hence
kc(y) dy
and
x = Jl _ k2c(y)2
Example 8. What is the path of a light beam if the velocity of light in the
medium is c = ay (where a is a constant)?
Solution. The path is the graph of a function y such that

X= J kay
Jl- k 2 a 2 y 2
d
Y

The integration produces


164 Chapter 3 Calculus in Banach Spaces

Here A and k are constants that can be adjusted so that the path passes through
two given points. The equation can be written in the form

or in the standard form of a circle:


1
(x - A)2 + y2 = __
k 2 Q2
The analysis above can also be based directly on Fermat's Principle. The
time taken to traverse a small piece of the path having length ~s is ~s/c(y).
The total time elapsed is then

l a
b }1 + Y'(x)2
-'-------;--=-:---'--'-
c(y)
dx

This is to be minimized under the constraint that y(a) = Q and y(b) = {3. Here
the ray of light is to pass from (a, Q) to (b, {3) in the shortest time. By Theorem
2, a necessary condition on y can be expressed (after some work) as

[c(y(x))rl [1 + y'(x)2r l / 2 = constant



In order to handle problems in which there are several unknown functions
to be determined, one needs the following theorem.

Theorem 5. Suppose that Yl, ... ,Yn are functions (of t) in C 2 [a, b]
that minimize the integral

lb F(Yl,." ,Yn'Y~, ... ,y~) dt


subject to endpoint constraints that prescribe values for all Yi(a), Yi(b).
Then the Euler Equations hold:
d of of
(9) (1 :S; i :S; n)
dt oy~ 0Yi

Proof. Take functions 'T}l, ... ,'T}n in C 2 [a, b] that vanish at the endpoints. The
expression
lb F(YI + th'T}l,'" ,Yn + On'T}n) dt

will have a minimum when (0 1 , ... , On) = (0,0, ... ,0). Proceeding as in previous
proofs, one arrives at the given equations. •
Geodesic Problems. Find the shortest arc lying on a given surface and
joining two points on the surface. Let the surface be defined by z = z(x, y). Let
the two points be (xo, Yo, zo) and (Xl,Yl,zd. Arc length is
Section 3.6 Calculus of Variations 165

If the curve is given parametrically as x = x(t), y = y(t), z = z(x(t), y(t)),


o ~ t ~ 1, then our problem is to minimize
(10 11 J X,2 + y,2 + (ZxX' + zyy'F dt

subject to x E C 2[0, 1], y E C2[0, ll, x(O) = xo, x(l) = Xl, y(O) = Yo, y(l) = Y1.
Example 9. We search for geodesics on a cylinder. Let the surface be the
cylinder x 2 + z2 = 1, or z = (1 - x 2)1/2 (upper-half cylinder). In the general
theory, F(x, y, x', y') = y'x,2 + y,2 + (zxx' + Zyy')2. In this particular case this
is
F = [X,2 + y,2 + Z;X,2] 1/2 = [(1 _ X2)-lX'2 + y'2] 1/2
Then computations show that
of xx,2 of x' of of y'
-=0
ox (l-x 2)2F ox' (1- x 2)F oy oy' F
To simplify the work we take t to be arc length and drop the requirement that
o ~ t ~ 1. Since dt = ds = y'X,2 + y,2 + Z,2 dt, we have X,2 + y,2 + z'2 = 1 and
F(x,y,x',y') = 1 along the minimizing curve. The Euler equations yield
(1 - X2)X" + 2xx,2
and y" = 0
(1 - X 2 )2
The first of these can be written x" = xx,2/(x 2 - 1). The second one gives
y = at + b, for appropriate constants a and b that depend on the boundary
conditions. The condition 1 = F2 leads to x,2/(1 - x 2) + y,2 = 1 and then to
x,21 (1- x 2) = 1- a2. The Euler equation for x then simplifies to x" = (a 2 - 1)x.
There are three interesting cases:
Case 1: a = 1. Then x" = 0, and thus both x(t) and y(t) are linear expressions
in t. The path is a straight line on the surface (necessarily parallel to
the y-axis).
Case 2: a = O. Then x" = -x, and x = ccos(t + d) for suitable constants c
and d. The condition x,2/(1 - x 2) = 1 gives us c = 1. It follows that
x = cos(t + d), y = b, and z = VI - x 2 = sin(t + d). The curve is a
circle parallel to the xz-plane.
Case 3: 0 < a < 1. Then x = ccos( VI - a2 t+d), and as before, c = 1. Again
z = sin( ~ t + d), and y = at + b. The curve is a spiral. •

Examples of Problems in the Calculus of Variations with No Solutions.


Some interesting examples are given in [CRl, Vol. 1.
I. Minimize the integral J; y'1 + y,2 dx subject to constraints y(O) = y(l) = 0,
y'(O) = y'(I) = 1. An admissible curve is shown in Figure 3.11, but there
is none of least length, since the infimum of the admissible lengths is 1, but
is not attained by an admissible y.

~ 1
o ~
Figure 3.11
166 Chapter 3 Calculus in Banach Spaces

II. Minimize 121 X2 y'(x? dx subject to constraints that y be piecewise contin-


uously differentiable, continuous, and satisfy y(-l) = -1, y(l) = 1. An
admissible y is shown in Figure 3.12, and the value of the integral for this
function is 2c:/3. The infimum is 0 but is not attained by an admissible y.
This example was given by Weierstrass himself!

-\ +1

Figure 3.12
Direct Methods in the Calculus of Variations. These are methods that
proceed directly to the minimization of the given functional without first looking
at necessary conditions. Such methods sometimes yield a constructive proof
of existence of the solution. (Methods based solely on the use of necessary
conditions never establish existence of the solution.)
The Rayleigh-Ritz Method. (We shall consider this again in Chapter 4.)
Suppose that U is a set of "admissible" functions, and <I> is a functional on U
that we desire to minimize. Put p = inf {<I> (u) : u E U}. We assume p > -00,
and seek a u E U such that <I> (u) = p. The problem, of course, is that the
infimum defining p need not be attained. In the Rayleigh-Ritz method, we start
with a sequence of functions WI, W2, . .. such that every linear combination
Cl WI + ... + Cn Wn is admissible. Also, we must assume that for each U E U and
for each E > 0 there is a linear combination v of the Wi such that <I> ( v) :( <I> (u) + Eo
For each n we select Vn in the linear span of WI, ... ,Wn to minimize <I> ( vn ). This
is an ordinary minimization problem for n real parameters Cl, ... , Cn . It can be
attacked with the ordinary techniques of calculus.
IDa
Example 10. We wish to minimize the expression I: (4); + 4>~) dxdy sub-
ject to the constraints that 4> be a continuously differentiable function on the
rectangle R = {(x,y) : 0:( x:( a, 0:( y:( b}, that 4> = 0 on the perimeter of
II
R, and that R 4>2 dx dy = 1. A suitable set of base functions for this problem
is the doubly indexed sequence

2 . mrx . m7ry
unm(x, y) = ~ sm - - sm -b- (n,m;;:: 1)
vab a

It turns out that this is an orthonormal set with respect to the inner prod-
uct (u,v) = IIR
u(x,y)v(x,y) dxdy. We are looking for a function 4> =
I:~m=1 CnmU nm that will solve the problem. Clearly, the function 4> vanishes
on the perimeter of R. The condition II 4>2 = 1 means I:~m=1 C;'m = 1 by the
Parseval identity (page 73). Now we compute

a 2 n7r n7rx. m7ry


~ -cos--sm-
"!lUnm(X,y) =
uX vab a a b-
Section 3.6 Calculus of Variations 167

The system of functions


2 mrx. m7ry
- - cos - - Sill - -
"Jab a b

is also orthonormal. Thus J11>; 2: (n;


R
=
n,m
Cnm ) 2. Similarly,

Hence we are trying to minimize the expression

subject to the constraint L: c;'m = 1. Because of the coefficients n 2 and m 2 , we


obtain a solution by letting CII = 1 and the remaining coefficients all be zero.
Hence 1> = Un = vab
~ sin 7rX sin 7rbY .
a
(This example is taken from [CHj, page
178.) •
Example 11. (Dirichlet Problem for the Circle) The 2-dimensional Dirichlet
problem is to find a function that is harmonic on the interior of a given 2-
dimensional region and takes prescribed values on the boundary of that region.
"Harmonic" means that the function U satisfies Laplace's equation:

U xx + U yy = 0 or (::2 :;2)
+ u(x, y) =0
Laplace's equation arises as the Euler equation in the calculus of variations when
we seek to minimize the integral ffR(u; + u~)dxdy subject to the constraint
that U be twice continuously differentiable and take prescribed values on the
boundary of the region R. To illustrate the Rayleigh-Ritz method, we take R
to be the circle {(x,y): x 2 +y2 ~ 1}. Then polar coordinates are appropriate.
Here are the formulas that are useful. (They are easy but tedious to derive.)
x = r cos () y = r sin () r = v'X2 + y2 () = tan-l(y/x)
x y y x
Ux = U r -:; - U(J r2 uy = U r -r +U(J2
r
u x2 + u 2y = u r2 + r- 2u (J2 dxdy = rdrd()

The integral to be minimized is now

1= 127r 11(u;+r-2u~)rdrd()
The boundary points of the domain are characterized by their value of (). Let
the prescribed boundary values of U be given by f(()). Let f" be continuous.
Then (by classical theorems) f is represented by its Fourier series:

2:' (an cos n() + bn sin n())


00

f (()) = (' means that the first term is halyed)


n=O
168 Chapter 3 Calculus in Banach Spaces

These are the values that u(r, B) must assume when r = 1. We therefore postu-
late that u(r, B) have the form
00

u(r, B) = L:' [fn(r) cosnB + gn(r) sin nB] go = 0


n=O

The integral I consists of two parts, of which the first is h = Jt0 r Jt"
0 u; dB dr.
Now
ur = L:' [J~(r) cosnB + g~(r) sin nB]
and consequently,

Thus h = 7r I: 101 r[J~(r)2 + g~(r)2] dr. Similarly, the other part of I is

12 = 10 1
r- 1 10 u~ dB.
27r
We have

00

u(} = L:' [-nfn(r)sinnB + ngn(r)cosnB]


n=O

and consequently,

I: n21
00 1
Thus h = 7r r- 1 [fn(r)2 + gn(r)2] dr. Hence
n=l 0

1= 7r I:
00

n=O 0
1[rf~(r)2 + rg~(r)2 +
1
n 2r-1 fn(r? + n 2r- 1gn (r)2] dr

We therefore must solve these minimization problems individually:

minimize 10 1[r f~(r)2 + n 2r-1 fn(r?] dr subject to fn(1) = an

minimize 11 [fg~(r? + n 2r- 1gn(r)2] dr subject to gn(l) = bn

Fixing n and concentrating on the function fn, we suppose it is a polynomial of


high degree, m ~ n. Thus fn(r) = Co + C1r + c2r 2 + ... + cmrm. The integral
to be minimized becomes a function of (co, C1, C2, ... , em), and the constraint
is Co +. C1 + C2 + ... + Cm = an· The solution of this minimization problem is
Section 3.6 Calculus of Variations 169

Cn= an, all other Cj being = O. Hence fn(r) = anrn. Similarly, gn(r) = bnrn.
Thus the u-function is

L rn(an cos n9 + bn sin n9)


00

u(r, 9) = ~o +
n=l

Problems 3.6

1. Among all the functions x in Cl[a,b] that satisfy x(a) = 0, x(b) = (3, find the one for
which I: u(t)2X'(t)2 dt is a minimum. Here u is given as an element of C[a, b].

2. Prove Theorem 3, assuming that F has continuous second derivatives.

3. Find a function y E C 2 [0, 1] that minimizes the integral Iol[~Y'(x)2 +y(x)y'(x) +y'(x) +
y(x)]dx. Note that y(O) and y(l) are not specified.
4. Prove this theorem: If {un} is an orthonormal system in L2(S, /1) and {v n } is an or-
thonormal system in L2 (T, v), then {un 0 Vm : 1 ~ n < 00, 1 ~ m < oo} is orthonormal
in L2(S X T). Here Un 0 Vm is the function whose value at (s, t) is un(s)Vm(t). Explain
how this theorem is pertinent to Example 10.
5. Determine the path of a light beam in the xy-plane if the velocity of light is l/y.

6. Find a function u in Cl[O, 1] that minimizes the integral 10 [u'(t)2 + u'(t)] dt subject to
1

the constraints u(O) = 0 and u(l) = 1.


7. Repeat the preceding problem when the integrand is u(t)2 + u'(tj2.
8. Explain what happens in Example 6 if e> 1r. Try to solve the problem using polar
coordinates: r = r(lI), where 0 ~ II ~ 1r.
9. Verify that the family of catenaries in Example 2 passing through the point (0,4) is given
by y = f(c, x), where c is a parameter and

4 coshc
f(c, x) = - - cosh [ - - x - c]
coshc 4

10. Suppose that the path of a ray of light in the xy-plane is along the parabola described
by 2x = y2. What function describes the speed of light in the medium?

11. Find the function u in C 2 [0, 1] that minimizes the integral 101{[u(t)]2+[U'(t)j2} dt subject
to the constraints u(O) = u(l) = 1.
Chapter 4

Basic Approximate
Methods in Analysis

4.1 Discretization 170


4.2 The Method of Iteration 176
4.3 Methods Based on the Neumann Series 186
4.4 Projections and Projection Methods 191
4.5 The Galerkin Method 198
4.6 The Rayleigh-Ritz Method 205
4.7 Collocation Methods 213
4.8 Descent Methods 225
4.9 Conjugate Direction Methods 232
4.10 Methods Based on Homotopy and Continuation 237

In this chapter we will explain and illustrate some important strategies


that can often be used to solve operator equations such as differential equations,
integral equations, and two-point boundary value problems. The methods to be
discussed are as indicated in the table above.
There is often not a clear-cut distinction between these methods, and nu-
merical procedures may combine several different methods in the solution of a
problem. Thus, for example, the Galerkin technique can be interpreted as a pro-
jection method, and iterative procedures can be combined with a discretization
of a problem to effect its solution.

4.1 Discretization

The term "discretize" has come to mean the replacement of a continuum by a


finite subset (or at least a discrete subset) of it. A discrete set is characterized by
the property that each of its points has a neighborhood that contains no other
points of the set. A function defined on the continuum can be restricted to that
discrete set, and the restricted function is a simpler object to determine. For
example, in the numerical solution of a differential equation on an interval [a, b],
it is usual to determine an approximate solution only on a discrete subset of that

170
Section 4.1 Discretization 171

interval, say at points a = to < tl < ... < tn+l = b. From values of a function
u at these points, one can create a function 11 on [a, b] by some interpolation
process. It is important to recognize that the problem itself is usually changed
by the passage to a discrete set.
Let us consider an idealized situation, and enumerate the steps involved in
a solution by "discretization."
1. At the beginning, a problem P is posed that has as its solution a function
u defined on a domain D. Our objective is to determine u, or an approxi-
mation to it.
2. The domain D is replaced by a discrete subset D h , where h is a parameter
that ideally will be allowed to approach zero in order to get finer discrete
sets. The problem P is replaced by a "discrete version" Ph.
3. Problem Ph is solved, yielding a function Vh defined on Dh.
4. By means of an interpolation process, a function Vh is obtained whose do-
main is D and whose values agree with Vh on Dh.
5. The function Vh is regarded as an approximate solution of the original prob-
lem P. Error estimates are made to justify this. In particular, as h -+ 0, Vh
should converge to a solution of P.

Example 1. This· strategy will now be illustrated by a two-point boundary-


value problem:

(1) { u" + au' + bu = c 0<t <1


u(O) = 0 u(l) = 0

The coefficients a, b, and c are allowed to be functions, assumed to be continuous


in the independent variable t. Notice that the differential equation is linear.
That is, the unknown function u occurs in a linear fashion. The boundary-value
problem (1) is problem P in the previous discussion. The domain D is the
interval [0,1].

For a discretization, let us choose equally spaced points in the interval as


follows:

0= to < tl < ... < tn < tn+l = 1 ti = ih, h = l/(n + 1)

The parameter h in the previous discussion is just the step size in the boundary-
value problem.
Directly from Equations (1), we have

(l:::;i:::;n)

That equation is written in abbreviated form as

u~' + aiu~ + biUi = Ci


172 Chapter 4 Basic Approximate Methods in Analysis

A "discrete version" of the original problem is obtained by replacing deriva-


tives in Equation (1) by approximations. Two commonly used formulas (dis-
cussed later, in Lemma 2) are these:

(2) f"(t) = f(t + h) - 2{~t) + f(t - h) _ l~h2 f(4)(7)

(3) f'(t) = f(t + h) - f(t - h) _ ~h2 f"'(7)


2h 6

We use these (without the error terms involving 7) to approximate u' and u" at
the points ti. Since we wish to use u as the solution of the original problem, we
use a different letter to denote the solution of the discretized problem. Thus v
will be a vector of n + 2 components, and Vi is expected to be an approximate
value of U(ti). The problem Ph is

2Vi + Vi-l
(4) {
Vi+l -
h2
Vi+l - Vi-l
+ ai --~ + biVi = Ci
(
1~ i ~ n
)

Va = Vn+l = 0

Here we have written ai = a(ti), and so on. Problem (4) is a system of n linear
equations in n unknowns Vi. It is solved by standard methods of linear algebra,
such as Gaussian elimination. The ith equation in the system can be written in
the form

It is clear that the coefficient matrix for this system is tridiagonal, because the ith
equation contains only the three unknowns Vi-I, Vi, and Vi+l. Furthermore, if h
is small enough and if b(t) < 0, the matrix will be diagonally dominant. Indeed,
assume that hlail ~ 2. Then h- 2 ± ~h-lai is nonnegative and -2h- 2 + bi is
nonpositive. The condition for diagonal dominance in a generic n x n matrix
A = (Aij) is
n
IAiil- 2: IAijl > 0 (i=l, ... ,n)
j=1
#i
In this particular case, the condition becomes

(6)

We write System (5) in the form Av = c, where A is the tridiagonal matrix and
C now denotes the vector having components Ci. The vectors V and C should be
"column vectors."
Let us assume that the linear system has been solved to produce the vector
v. The next step is to "fill in" the values of a continuous function v such that
V(ti) = Vi (1 ~ i ~ n). This can be done in many ways, such as by means of a
cubic spline interpolant. Another way of interpreting this step is to say that we
have extended the function V to the function v.
Section 4.1 Discretization 173

In order to investigate how close the approximate solution may be to the


true solution, we begin by recording three equations that are satisfied by the
true solution:
u" + au' + bu = c

U"(ti) + a(ti)u'(t;} + b(ti)U(t;) = c(t;)

On the other hand, the solution to the discrete problem satisfies the equation

By subtracting one equation from the other, we arrive at an equation for the
"error" e = u - v:

Here d; = f2U(4) (T;) + ia;u(3)(~;). Equation (9) has the same coefficient matrix
as Equation (8). If we denote that matrix by A h , Equation (9) has the form

Now by the lemma that follows, and by Equation (6),

Since b is continuous and negative on [0,1]' there is a positive 8, independent of


i and h, such that -b; ~ 8. Thus IIAhllioo
~ 1/8, and we have

Thus as h -+ 00, the discrete solution converges to the true solution at the speed
O(h2). (O(h) is a generic function such that IO(h)1 ~ ch.) •

Lemma 1. If an n x n matrix A is diagonally dominant, then it is


nonsingular, and
174 Chapter 4 Basic Approximate Methods in Analysis

Proof. Let x be any nonzero vector, and let Y = Ax. Select i so that IXil =
Ilxlloo' Then
n
aiiXi +L aijXj = Yi
j=l
#i
n
laiixil ~ IYil + L laijllxjl
j=l
#i
n n

laiilllxlioo ~ IYil + IXil L laijl ~ IIYlloo + II xlloo L laijl


j=l j=l
#i #i

Hence

IIXlloo (Iaiil- ~ laijl ) ~ IIYlloo


#i

This shows that Y i- O. Thus A maps no nonzero vector into 0, and A is


nonsingular. If we write x = A- 1 y in the above inequality, we obtain

IIA-1YII00 ~ IIY II 00 (Iaiil- t


#i
laijl )-l
and this implies the upper bound in the lemma.

Lemma 2. If f(4) is continuous on (t - h, t + h), then

f"(t) = h- 2[f(t + h) - 2f(t) + f(t - h)] - ~h2 f(4)(0


12

f'(t) = (2h)-1 [f(t + h) - f(t - h)] - ~h2 fill (1])

Proof. We derive the first formula and leave the second as a problem. By
Taylor's Theorem we have

f(t + h) = f(t) + hf'(t) + !h 2f"(t) + !h3 f"'(t) + ~h4 f(4)(6)


2 6 24

f(t - h) = f(t) - hf'(t) + !h 2 f"(t) - !h 3 flll(t) + ~h4 f(4)(6)


2 6 24
Upon adding these two equations, we get
Section 4.1 Discretization 175

Upon rearranging this, we obtain

Observe that the expression ~[j(4)(6) + 1(4)(6)] is the average of two values of
1(4) on the interval [t - h, t + h]. Its value therefore lies between the maximum
and minimum of 1(4) on this interval. If 1(4) is continuous, this value is assumed
at some point ~ in the same interval. Hence the error term can be written as
-h 2 1(4)(~)/12. •
Example 2. We give another illustration of the discretization strategy. Con-
sider a linear integral equation, such as

lb k(s,t)x(s)ds = v(t)

In this equation, the kernel k and the function v are prescribed. We seek the
unknown function x.
Suppose that a quadrature formula of the type

is available. (The points Sj need not be equally spaced.) Taking t = Si in the


integral equation, we have

(1 ~ i ~ n)

Applying the quadrature formula leads to a discrete version of the integral equa-
tion: n
2: cjk(sj,Si)X(Sj) = v(sd (1 ~ i ~ n)
j=l

This is a system of n linear equations in the unknowns x(Sj); it can be solved


by standard methods. Then an interpolation method can be used to reconstruct
x(t) on the interval a ~ t ~ b. Approximations have been made at two stages,
and the resulting function x is not the solution of the original problem. This
strategy is considered later in more detail (Section 4.7). •
References for this chapter are [AR], [AY], [AIG], [At], [Au], [Bak], [Bar],
[Brez], [BrS], [CCC], [CMY], [Cia], [CH], [Dav], [DMo], [Det], [Dzy], [Eav], [Fie],
[GG], [GZl], [GZ2], [GZ3], [Gold], [Gre], [Gri], [Hen], [HS], [IK], [KA], [KK],
[Kee], [KC], [Kras], [Kr], [LSU], [Leis], [Li], [LM], [Lo], [Luel], [Lue2], [Mey],
[Mil], [Moo], [Morl], [Mor2], [NazI], [Naz2], rOD], [OR], [Ped], [Pry],. [Red],
[Rhl], [Rh2], [Ros], [Sa], [Sm], [Tod], [Wac], [Was], [Wat], [Wi If], and [Zien].
176 Chapter 4 Basic Approximate Methods in Analysis

Problems 4.1

1. Establish the second formula in the second lemma.


2. Derive the following formula and its error term:

f"'(x) ~ [f(x + 2h) - 2f(x + h) + 2f(x - h) - f(x - 2h)]/(2h 3 )

3. To change a boundary value problem

u" + au' + bu =c u(a) = 0 u(!3) = 0

into an equivalent one on the interval [0,1], we change the independent variable from t
to s using the equation t =!3s + a(1 - s). What is the new boundary value problem?
4. To change a boundary value problem

u" + au' + bu = c u(O) =a u(l) = 13


into an equivalent one having homogeneous boundary conditions, we make a change in
the dependent variable v = u - l, where l(t) = a + (13 - alt. What is the new boundary
value problem?
5. A kernel of the form K(s, t) = 2::1 Ui(S)Vi(t) is said to be "separable" or "degenerate."
If such a kernel occurs in the integral equation

x(t) = 11 K(s,t)x(s)ds+w(t)

then a solution can be found in the form x = w + 2:~=1 aivi. Carry out the solution
based on this idea. Illustrate, by using the separable kernel e"~t.

4.2 The Method of Iteration

The term iteration can be applied to any repetitive process, but traditionally
it refers to an algorithm of the following nature:

(1) Xo given, n=O,1,2, ...

We can also write Xn = Fnxo, where FO is the identity map and Fn+l =
F 0 Fn. In such a procedure, the entities Xo, Xl, . .. are usually elements in
some topological space X, and the map F : X --+ X should be continuous. If
lim n --+ oo Xn exists, then it is a fixed point of F, because

(2) F(limxn) = limFxn = limxn+l = limx n


The method of iteration can be considered as one technique for finding fixed
points of operators.
The Contraction Mapping Theorem (due to Banach, 1922) is an elegant
and powerful tool for establishing that the sequence defined in Equation (1)
converges. We require the notion of a contraction, or contractive mapping.
Section 4.2 The Method of Iteration 177

Such a mapping is defined from a metric space X into itself and satisfies an
inequality

(3) d(Fx,Fy):::;; Od(x,y) (X,y E X)

in which 0 is a positive constant less than 1. Complete metric spaces were the
subject of Problem 48 in Section 1.2, page 15. Every Banach space is neces-
sarily a complete metric space, it being assumed that the distance function is
d(x, y) = Ilx-YII.A closed set in a Banach space is also a complete metric space.
Since most of our examples occur in this setting, the reader will lose very little
generality by letting X be a closed subset of a Banach space in the Contraction
Mapping Theorem.

Theorem 1. Contraction Mapping Theorem. If F is a contrac-


tion on a complete metric space X, then F has a unique fixed point ~.
The point ~ is the limit of every sequence generated from an arbitrary
point x by iteration

[x, Fx, F 2x, ... ] (x E X)

Proof. Reverting to the previous notation, we select Xo arbitrarily in X and


define Xn+l = FX n for n = 0, 1,2, .... We have

This argument can be repeated, and we conclude that

(4)

In order to establish the Cauchy property of the sequence [xnl, let n > Nand
m > N. There is no loss of generality in supposing that m ~ n. Then from
Equation (4),

d(xm' xn) :::;; d(xm, Xm-l) + d(Xm-l, Xm-2) + ... + d(Xn+l' xn)
: :; [om-l + Om-2 + ... + on]d(Xl'XO)
:::;; [ON + oN+l + ... ]d(Xl'XO)
= ON (1- O)-ld(Xl' XO)

Since 0 :::;; 0 < 1, limN-->oo ON = O. This proves the Cauchy property. Since the
space X is complete, the sequence converges to a point ~. Since the contractive
property implies directly that F is continuous, the argument in Equation (2)
shows that ~ is a fixed point of F.
If "l is also a fixed point of F, then we have

(5) d(~,"l) = d(F~,F"l):::;; Od(~,"l)

If ~ -11], then d(t;,1]) > 0, and Inequality (5) leads to the contradiction () ~ 1.
This proves the uniqueness of the fixed point. •
178 Chapter 4 Basic Approximate Methods in Analysis

The iterative procedure is well illustrated by a Fredholm integral equation,


x = Fx, where

(6) (Fx)(t) = 11 K(s,t,x(s))ds+w(t) (0 ~ t ~ 1)

It is assumed that w is continuous and that K(s, t, r) is continuous on the domain


in ]R3 defined by the inequalities

-oo<r<oo

We will seek a solution x in the space C[O, 1]. This space is complete if it is
given the standard norm
/lxll 00 = sup
t
Ix(t)1

In order to see whether F is a contraction, we estimate IIFu - Fvll:

(7) I(Fu)(t) - (Fv)(t)1 ~ l1IK(s,t,U(S)) - K(s,t,v(s))lds

If K satisfies a Lipschitz condition of the type

(8) IK(s, t,~) - K(s, t, 1])1 ~ Ol~ -1]1 (0 < 1)

then from Equation (7) we get

I(Fu)(t) - (Fv)(t)1 ~ Ollu - vii


and the contraction condition

(9) IIFu - Fvll ~ Ollu - vii


By Banach's theorem, the iteration X n +1 = FX n leads to a solution, starting
from any function Xo in C[O, 1]. This proves the following result.

Theorem 2. If K satisfies the hypotheses in the preceding para-


graph, then the integral equation (6) has a unique solution in the space
C[O,l].

Example 1. Consider the nonlinear Fredholm equation

(10) x(t) = ~ 11 cos(stx(s)) ds

By the mean value theorem,

1cos(stO - cos(st1])1 = 1sin(st()1 Ist~ - st1]1 ~ I~ -1]1


Section 4.2 The Method of Iteration 179

Thus the preceding theory is applicable with 8 = ~. If the iteration is begun


with Xo = 0, the next two steps are X1(t) = ~ and X2(t) = C 1 sin(t/2). The
next element in the sequence is given by

X3(t) = ~ 11 Cos(tsin Dds


This integration cannot be effected in elementary functions. In fact, it is analo-
gous to the Bessel function Jo, whose definition is

Jo(z) = - 11"
a
7r
cos(zsin8)d8

If the iteration method is to be continued in this example, numerical procedures


for indefinite integration will be needed. •
The method of iteration can also be applied to differential equations, usually
by first turning them into equivalent integral equations. The procedure is of great
theoretical importance, as it is capable of yielding existence theorems with very
little effort. We present some important theorems to illustrate this topic.

Theorem 3. Let 8 be an interval of the form 8 = [0, bJ. Let f be


a continuous map of 8 x R to R Assume a Lipschitz condition in the
second argument:
If(s, tr) - f(s, t2)1 ~ Alt1 - t21
where A is a constant depending only on f. Then the initial-value
problem
x'(s) = f(s,x(s)) x(O) = (3
has a unique solution in C(8).

Proof. We introduce a new norm in C(S) by defining


Ilxllw = sup Ix(s)1 e- 2AS
sES

The space C(8), accompanied by this norm, is complete. Since the initial-value
problem is equivalent to the integral equation

x=Ax (Ax)(s)={3+ l sf (t,X(t))dt xEC(8)


all we have to do is show that the mapping A has a fixed point. In order for
the Contraction Mapping Theorem to be used, it suffices to establish that A is
a contraction. Let u, v E C(8). Then we have, for 0 ~ s ~ b,

I(Au - Av)(s)1 ~1 s
If(t, u(t)) - f(t, v(t)) Idt

~1 s
Alu(t) - v(t)1 dt

s
=A 1 e2Ate-2Atlu(t)-v(t)ldt

s
:::; Allu - vll w 1 e2At dt

:::; Allu - vliw(2A)-1 e2AS


180 Chapter 4 Basic Approximate Methods in Analysis

From this we conclude that

and that
IIAu - Avll w ~ ~llu - vll w •
Example 2. Does the following initial value problem have a solution in the
space e[O, 10]7
x(O) = 0

This is an illustration of the general theory in which l(s, t) = cos(te S ). By the


mean value theorem,

For 0 ~ s ~ 10 and t E JR,

Hence, the hypothesis of Theorem 3 is satisfied, and our problem has a unique
solution in C[O, 10]. •
Example 3. If 1 is continuous but does not satisfy the Lipschitz condition in
Theorem 3, the conclusions of the theorem may fail. For example, the problem
x' = x 2 / 3 , x(O) = 0 has two solutions, x(s) = 0 and x(s) = s3/27. There is no
Lipschitz condition of the form

(Consider the implications of this inequality when t2 = 0.) •


Example 4. The problem x' = x 2 , x(O) = 1 does not conform to the hypothe-
ses of Theorem 3 because there is no Lipschitz condition. By appealing to other
theorems in the theory of differential equations, one can conclude that there is
a solution in some interval about the initial point s = O. •
In order for us to handle systems of differential equations, the preceding
theorem must be extended. A system of n differential equations accompanied
by initial values has this form:

x~(s) = h(S,Xl(S),X2(S), ... ,xn(s)) Xl(O) = 0


x~(s) = !2(S,Xl(S),X2(S), ... ,xn(s)) X2(0) = 0

The right way of viewing this is as a single equation involving a function


x : S -+ JR n , where S is an interval of the form [0, b]. Likewise, we must have
Section 4.2 The Method of Iteration 181

a function f : S x ]Rn -+ ]Rn. We then adopt any convenient norm on ]Rn, and
define the norm of x to be

The Lipschitz condition on f is

IIf(s, u) - f(s, v)11 ~ >'Ilu - vii u, v E]Rn

The setting for the theorem is now C(S,]Rn), which is the space of all continuous
maps x : S -+ ]Rn, normed with I t.
The equation x/(s) = f(s,x(s)) now
represents the system of differential equations referred to earlier. For further
discussion see the book by Edwards [EdwJ, pp. 153-155.
The use of iteration to solve differential equations predates Banach's result
by many years. Ince [4J says that it was probably known to Cauchy, but was
apparently first published by Liouville in 1838. Picard described it in its gen-
eral form in 1893. It is often referred to as Picard iteration. It is rarely used
directly in the numerical solution of initial value problems because the step-by-
step methods of numerical integration are superior. Here is an artificial example
to show how it works.
Example 5.
X' = 2t(1 + x) x(O) =0
The formula for the Picard iteration in this example is

If Xo = 0, then successive computations yield

It appears that we are producing the partial sums in the Taylor series for et2 -I,
and one verifies readily that this is indeed the solution. •
In some applications it is useful to have the following extension of Banach's
Theorem:

Theorem 4. Let F be a mapping of a complete metric space into


itself such that for some m, F m is contractive. Then F has a unique
fixed point. It is the limit of every sequence [Fkx], for arbitrary x.

Proof. Since F m is contractive, it has a unique fixed point ~ by Theorem 1.


Then
182 Chapter 4 Basic Approximate Methods in Analysis

This shows that F~ is also a fixed point of Fm. By the uniqueness of~, F~ =~.
Thus F has at least one fixed point (namely ~), and ~ can be obtained by
iteration using the function Fm. If x is any fixed point of F, then

Fx=x

Thus x is a fixed point of F m , and x = ~.


It remains to be proved that the sequence Fn x converges to ~ as n -t 00.
Observe that for i E {I, 2, ... , m} we have

by the first part of the proof. If e > 0, we can select an integer N having the
property
(1 ::;; i ::;; m)
Since each integer j greater than Nm can be written as j = nm+i, where n ~ N
and 1 ::;; i ::;; m, we have

This proves that lim Fi x = ~.


To illustrate the application of this theorem, we consider a linear Volterra

integral equation, which typically would have the form

(11) x(t) = it K(t, s)x(s) ds + v(t) x E C[a,b]

(The presence of an indefinite integral classifies this as a Volterra equation.)


Equation (11) can be written as

x=Ax+v

in which A is a linear operator defined by writing

(Ax)(t) = it K(t,s)x(s)ds

It is clear that

From this it follows that

I(A 2 x)(t)1 ::;; it IK(t, s)II(Ax)(s)1 ds

: ; it IK(t, s)IIIKlloo Ilxlloo(S - a) ds

: ; IIKII~ Ilxlloo (t -2 a)2


Section 4.2 The Method of Iteration 183

Repetition of this argument leads to the estimate

This tells us that

Now select m so that IIAml1


< 1. Denote the right side of Equation (11) by
(Fx)(t), so that Fx = Ax + v. Then

F 2x = A2 x + Av + V
F3 X = A3 X + A 2 v + Av + v
and so on. Thus

This shows that F m is a contraction. By Theorem 4, F has a unique fixed point,


which can be obtained by iteration of the map F. Observe that this conclusion
is reached without making strong assurriptions about the kernel K. Our work
establishes the following existence theorem.

Theorem 5. Let v be continuous on [a, b] and let K be continuous


on the square [a, b] x [a, b]. Then the integral equation

x(t) = it K(s, t)x(s) ds + v(t)

has a unique solution in C[a, b].

The next theorem concerns the solvability of a nonlinear equation F(x) = b


in a Hilbert space. The result is due to Zarantonello [Za].

Theorem 6. Let F be a mapping of a Hilbert space into itself such


that
(a) (Fx - Fy,x - y) ~ QIIX - Yl12 (Q> 0)
(b) IIFx - Fyll ~ fJllx - yll
Then F is surjective and injective. Consequently, F- 1 exists .
.
Proof. The injectivity follows at once from (a): If x =j:. y, then Fx =j:. Fy.
For the surjectivity, let w be any point in the Hilbert space. It is to be proved
that, for some x, Fx = w. It is equivalent to prove, for any oX > 0, that an x
exists satisfying x - oX(Fx - w) = x. Define Gx = x - >'(Fx - w), so that our
task is to prove the existence of a fixed point for G. The Contraction Mapping
184 Chapter 4 Basic Approximate Methods in Analysis

Theorem will be applied. To prove that G is a contraction, we let). = a/ f32 in


the following calculation:

IIGx - GyI1 2= Ilx - )'(Fx - w) - Y + )'(Fy - w)11 2


= Ilx - y - )'(Fx - Fy)11 2
= Ilx - Yl12 - 2)'(Fx - Fy,x - y) + ).211Fx _ Fyl12

:'( Ilx - Yl12 _ 2).allx _ Yl12 + ).2f321Ix _ Yl12


= Ilx - Y112(1 - 2).a + ).2f32)

= Ilx - Yl12 (1 - 2a 2/ f32 + a 2/ (32)


= Ilx - Y112(1 - a 2/(32) •

Problems 4.2

1. From the existence theorem proved in the text deduce a similar theorem for the initial
value problem
x'(s) = f(s,x(s)) x(a) =c
2. Let F be a mapping of a Banach space X into itself. Let Xo EX, r
Assume that on the closed ball B(xo,r) we have
> 0, and °:'( A < 1.

IIFx - Fyll :'( Allx - yll


Assume also that IIxo - Fxoll < (1 - A)r. Prove that Fnxo E B(xo, r) and that x" =
lim Fnxo exists. Prove that x" is a fixed point of F and that IIFnxo - x"1I :'( Anr.
3. For what values of A can we be sure that the integral equation

x(t) = A 11 est cosx(s) ds + tan t

has a continuous solution on [0,1]?


4. Prove that there is no contraction of X onto X if X is a compact metric space having at
least two points.
5. Give an example of a Banach space X and a map F : X -t X having both of the following
properties:
(a) IIFx - Fyll < Ilx - yll whenever x -# y
(b) Fx-#x for all x
6. Prove that the integral equation

x(t) = 1 t
[X(S) + s]sinsds

has a unique solution in e[O, 11"/2], and give an iterative process whose limit is the solution.
7. Prove that if X is a compact metric space and F is a mapping from X to X such that
d(Fx, Fy) < d(x, y) when x -# y, then F has a unique fixed point.
8. Let F be a contraction defined on a metric space that is not assumed to be complete.
Prove that
inf d(x, Fx) =
x
°
Section 4.2 The Method of Iteration 185

9. Let F be a mapping on a metric space such that d(Fx, Fy) < d(x, y) when x '" y. Let x
be a point such that the sequence Fnx has a cluster point. Show that this cluster point
is a fixed point of F (Edelstein).
10. Carry out 4 steps of Picard iteration in the initial value problem x' = x + 1, x(O) = O.
11. Give an example of a discontinuous map F : JR ---t JR such that F 0 F is a contraction.
Find the fixed point of F.
12. Extend the theorem in Problem 7 by showing that the fixed point is the limit of Fnx,
for arbitrary x.
13. The diameter of a metric space X is

diam(X) = sup{d(x,y): x EX, Y E X}

This is allowed to be +00. Show that there cannot exist a surjective contraction on a
metric space of finite nonzero diameter. (Cf. Problem 4.)
14. Let X be a Banach space and f a mapping of X into X. Are these two properties of f
equivalent?
(i) f has a fixed point.
(ii) There is a nonempty closed set E in X such that f(E) C E and such that IIf(x)-
f(y)1I < ~ Ilx - yll for all x, y in E.

15. Let T be a contraction on a metric space:

d(Tx, Ty) ::;; >'d(x, y) (>. < 1)

Prove that the set {x : d(x, Tx) ::;; e:} is nonempty, closed, and of diameter at most
c:(1 _ >.)-1.
16. The Volterra integral equation

x(t) = w(t) + it (t + s)x(s) ds

is equivalent to an initial-value problem involving a second-order linear differential equa-


tion. Find it.
17. Let F be a mapping of a complete metric space into itself. If € is a fixed point of Fm
(for some m), does it follow that € is a fixed point of F?
18. Let f : [0, b) x JR ---t JR. Prove that if f and a f / at are continuous (t being the second
argument of f), then the initial-value problem x'(s) = f(s,x(s)), x(O) = f3 has a unique
solution on [0, b).
19. Prove that if F : JRn ---t JRn is a contraction, then 1+ F is a homeomorphism of JRn onto
JRn.
20. Prove that if 9 E C[O,b), then the differential equation x'(s) = cos(x(s)g(s)) with pre-
scribed initial value x(O) = f3 has a unique solution.
21. A map F : X ---t X, where X is a metric space, is said to be nonexpansive if d(Fx, Fy) ::;;
d(x, y) for all x and y. Prove that if F is nonexpansive and ifthe sequence [Fnx) converges
for each x, then the map x >-t limn Fn x is continuous.
22. Let Co be the Banach space of all real sequences x = [Xl, X2, ••. ) that tend to zero.
The norm is defined by IIxll = maxi IXil. Find all the fixed points of the mapping
f: Co ---t Co defined by f(x) = [1,X1,X2, •.. ). Show that f is nonexpansive, according to
the definition in the preceding problem. Explain the significance of this example vis-a.-vis
the Contraction Mapping Theorem.
23. Let U be a closed set in a complete metric space X. Let f be a contraction defined on
U and taking values in X. Let>. be the contraction constant. Suppose that Xo E U,
186 Chapter 4 Basic Approximate Methods in Analysis

XI = f(xo), r = >'(1 - A)-Id(xo, xd, and B(xJ, r) C U. Prove that f has a fixed point
in B(xJ, r).
24. Prove that the following integral equation (in which u is the unknown function) has a
continuous solution if 1>'1 < i.
u(t) - >.11 J t : 3 sin[tu(s)] ds = et (0 ~ t ~ 1)

25. Let F be a contraction defined on a Banach space. Prove that I - F is invertible and
that (I - F)-I = limn Hn, where Ho = I and Hn+1 = 1+ FHn.
26. Prove that the following integral equation has a solution in the space G[O, 1].

x(t) = et + 11 cos[t 2 - s2 + x(s) sin(s)] ds

27. In the study of radiative transfer, one encounters integral equations of the form

u(t) = a(t) + 11 u(s)k(t - s)ds

in which u represents the flux density of the radiation at a specified wave length. Prove
that this equation has a solution in the special case k(t) = sin t.

4.3 Methods Based on the Neumann Series

Recall the Neumann Theorem in Section 1.5, page 28, which asserts that if a
linear operator on a Banach space, A: X --+ X, has operator norm less than 1,
then I - A is invertible, and
00

(1) (I - A)-I = LAn


n=O

The series in this equation is known as the Neumann series for (I - A)-I.
This theorem is easy to remember because it is the analogue of the familiar
geometric series for complex numbers:

1
(2) --=l+z+z 2
+z 3 + ... Izl < 1
1-z
In using the Neumann series, one can generate the partial sums Xn
I:Z=o Akv by setting Xo = Vo = v and computing inductively
(3) vn = AVn-1 Xn = Xn-I + Vn (n=1,2, ... )

Another iteration is suggested in Problem 25.


For linear operator equations in a Banach space, one should not overlook the
possibility of a solution by the Neumann series. This remark will be illustrated
with some examples.
Section 4.3 Methods Based on the Neumann Series 187

Example 1. First, consider an integral equation of the form

(4) x(t) -..\ 11 K(s, t)x(s) ds = v(t) (0 ~ t ~ 1)

In this equation, K and v are prescribed, and x is to be determined. For certain


values of ..\, solutions will exist. Write the equation in the form

(5) (1 - "\A)x = v

in which A is the integral operator in Equation (4). If we have chosen a suitable


Banach space and if II"\AII
< 1, then the Neumann series gives a formula for the
solution:
00

(6) x = (I - "\A)-IV = 2)"\Atv = v + "\Av +..\2 A 2 v +


n=Q
...

Example 2. For a concrete example of this, consider

(7) x(t) = ..\ 11 et-Sx(s) ds + v(t)

Here, we use an operator A defined by

(Ax)(t) = 11 et-Sx(s)ds

If we compute A 2 x, the result is

= 11 11et - s eS- 17 x(a) dads

= 11 [11 et - s eS- 17 ds]X(a)da

= 11 et - 17 x(a)da = (Ax)(t)

This shows that A2 = A. Consequently, the solution by the Neumann series in


Equation (6) simplifies to

:17 = V + "\Av +..\2 Av +..\3 Av + ...

+ (_1__
= v
1-..\
I)Av = v + -..\-Av
1-,,\ •
Example 3. Another important application of the Neumann series occurs in a
process called iterative refinement. Suppose that we wish to solve an operator
188 Chapter 4 Basic Approximate Methods in Analysis

equation Ax = v. If A is invertible, the solution is x = A-IV. Suppose now that


we are in possession of an approximate right inverse of A. We mean by that
B
an operator such that III - ABII< 1. Can we use B to solve the problem? It
is amazing that the answer is "Yes." Obviously, Xo = Bv is a first approximation
to x. By the Neumann theorem, we know that AB is invertible and that
00

(AB)-l = L(I - AB)k


k=O

It is clear that the vector x = B(AB)-lV is a solution, because Ax


AB(AB)-lV = v. Hence x =
00

(8) B(AB)-lV = B L(I - AB)kv = Bv + B(I - AB)v + B(I - AB)2V + ...•


k=O
Theorem. The partial sums in the series of Equation (8) can be
computed by the algorithm Xo = Bv, Xn+l = Xn + B(v - Ax n ).

Proof. Let the sequence [xnl be defined by the algorithm, and let
n
Yn = B L(I - AB)kv
k=O

We wish to prove that Xn = Yn for all n. For n = 0, we have Yo = Bv = Xo.


Now assume that for some n, Xn = Yn. We shall prove that Xn+l = Yn+l. Put
Sn = LZ=o(I - AB)k, so that Yn = BSnv. Then

Xn+l = Xn + B(v - AXn) = Yn + Bv - BAYn


= BSnv + Bv - BABSnv = B(Sn + I - ABSn)v

= B[(I - AB)Sn + I]v = B [~(I - ABl+ 1 + I] v


= BSn+1 v = Yn+l

The algorithm in the preceding theorem is known as iterative refinement.
The vector v - AXn is called the residual vector associated with Xn . If the
hypothesis III - ABII < 1 is fulfilled, the Neumann series converges, the partial
sums in Equation (8) converge, and by the theorem, the sequence [xnl converges
to the solution of the problem. The residuals therefore converge to zero.
The method of iterative refinement is commonly applied in numerically
solving systems of linear equations. Let such a system be written in the form
Ax = v, in which A is an N x N matrix, assumed invertible. The numerical
solution of such a system on a computer involves a finite sequence of linear
operations being applied to v to produce the numerical solution Xo. Thus (since
every linear operator on ]Rn is effected by a matrix) we have Xo = Bv, in which
B is a certain N x N matrix. Ideally, B would equal A-I and Xo would be
the correct solution. Because of roundoff errors (which are inevitable), B is
Section 4.3 Methods Based on the Neumann Series 189

only an approximate inverse of A. If, in fact, 111-


ABII < 1, then the theory
outlined previously applies. Thus the initial approximation Xo = Bv can be
"refined" by adding BTo to it, where TO is the residual corresponding to Xo.
Thus TO = V - Axo. From Xl = Xo + BTo further refinement is possible by
adding BTl, and so on. The numerical success of this process depends upon the
residuals being computed with higher precision than is used in the remaining
calculations.
Problems 4.3

1. The problem Ax = v is equivalent to the fixed-point problem Fx = x, if we define


Fx = x - Ax + v. Suppose that A is a linear operator and that III - All < 1. Show that
F is a contraction. Let Xo = v and Xn +1 = Fx n . Show that Xn is the partial sum of the
Neumann series appropriate to the problem Ax = v .

2. Prove that if A is invertible and if the operator B satisfies IIA - BII < II A -111- 1, then
B is invertible. What does this imply about the set of invertible elements in £(X, X)?
(Here X is a Banach space.)
3. Prove that if infA III - >'AII < 1, then A is invertible.
4. If IIAII is small, then (I - A)-l ~ 1+ A. Find € > 0 such that the condition IIAII < €
implies

5. Make this statement precise: If IIAB - III < 1, then 2B - BAB is superior to B as an
approximate inverse of A.
6. Prove that if X is a Banach space, if A E £(X, X), and if IIAII < 1, then the iteration
Xn +1 = AXn +b
converges to a solution of the equation x = Ax + b from any starting point Xo.
7. Let X and Y be Banach spaces. Show that the set n of invertible elements in £(X, Y) is
an open set and that the map f: n -+ £(Y,X) defined by f(A) = A-1 is continuously
differentiable.
8. Give an example of an operator A that has a right inverse but is not invertible. Observe
that in the theory of iterative refinement, A need not be invertible.
9. Prove that if the equation Ax = v has a solution xo and if III - BAli < 1, then Xo =
(BA)-l Bv, and a suitable modification of iterative refinement will work.
10. In Example 2, prove that the solution given there is correct for all >. satisfying>. # 1. In
particular, it is not necessary to assume that II>'AII < 1 in this example.
11. In Example 2, compute IIAII.
12. Show how to solve the equation (I - >.A)x = v when A is idempotent (Le., A2 = A).
13. Let A be a bounded linear operator on a normed linear space. Prove that if A is nilpotent
(i.e., Am = 0 for some m ~ 0), then I - A is invertible. Give a formula for (I - A)-l.
14. Prove this generalization of the Neumann Theorem. If A is a bounded linear transfor-
mation from a Banach space X into X such that the sequence Sn = l:~=o Ak has the
Cauchy property, then (I _A)-l exists and equals limn-+oo Sn. Give an example to show
that this is a generalization.
15. Prove or disprove: If A is a bounded linear operator on a Banach space and if IIAmll < 1
for some m, then (I - A)-l = l:;;"=oAk.
16. A Volterra integral operator is one of the form

(Ax)(t) = it K(s, t)x(s) ds


190 Chapter 4 Basic Approximate Methods in Analysis

Assume that A maps eta, b] into eta, b]. Prove that (I - A)-1 exists and is given by
the usual Neumann series. Refer to Section 4.2 for further information about Volterra
integral equations.
17. Define A : e[O, 1] -t e[O, 1] by the following equation, and prove that A is surjective.

(Ax)(t) = 11 cos(st)x(s) ds + 2x(t)

18. Prove that the set of nonsingular n x n matrices is open and dense in the set of all n x n
matrices.
19. Let ¢ E e[O, 1] and satisfy ¢(t) > 0 on [0,1]. Put K(s, t) = ¢(s)/¢(t) and

(Ax)(t) = 11 K(s, t)x(s) ds

Prove that A2 = A. What are the implications for the integral equations AAx = x + w?
20. Suppose that the operator A satisfies a polynomial equation 2::7=0 CjAj = 0 in which
co -# o. Prove that A is invertible and give a formula for its inverse.
21. Prove that if IIAII < 1, then
_ 1 _ :( 11(1 _ A)-III :( _I_
1+ IIAII ~ ~ 1- IIAII

22. Assume that Am+l = A for some integer m ~ 1, and show how to solve the equation
x - AAx b. =
23. Investigate the nature of the solutions to the Fredholm integral equation x(t) = 1 +
Jo
1
x(st) ds.

24. Define F: e[O, 1] -t e[O, 1] by the equation

(Fx)(t) = x(t) -11 K{t,x(s»)ds

Make reasonable assumptions about K and compute the Frechet derivative F'(x). Make
further assumptions about K and prove that F'(x) is invertible.
25. In Example 3, do we have A-I = B(AB)-I?
26. Find the connection between the Neumann Theorem and Lemma 1 in Section 4.1. (That
lemma concerns diagonally dominant matrices.)
27. Let a and b be elements of a (possibly noncommutative) ring with unit 1. Show that the
partial sums of the series 2::;;"=0
b(1 - ab)k can be computed by the formulas Xo = b and
Xn+l = Xn + b(1 - ax n ).

28. Define an operator A from e[O, 1] into e[O, 1] by the equation

(Ax)(t) = x(t) -11 x(s>[s2 +~] ds


Prove that A is invertible and give a series for its inverse.

29. Consider the integral operator (Kx)(t) = J;


k(t, s)x(s) ds, where x E C[0,1] and k E
e([O,I] x [0,1]). Prove that 1- K is invertible. Prove that this assertion may be false if
the upper limit in the integral is replaced by 1.
Section 4.4 Projections and Projection Methods 191

4.4 Projections and Projection Methods

Consider a normed linear space X. An element P of £(X,X) is called a pro-


jection if P is idempotent: p 2 ::::: P. Notice particularly that linearity and
continuity are incorporated in the definition. Obvious examples are P ::::: 0 and
P ::::: 1. In Hilbert space, if {VI, V2, ... } is a finite or infinite orthonormal system,
the equation
(1)
j

defines a projection. To prove this, notice first that


PVi::::: L(Vi,Vj)Vj::::: L§ijVj::::: Vi
j j

Thus P leaves undisturbed each Vi. Consequently,


p 2 x::::: P(Px)::::: L(x,Vj)PVj::::: L(x,Vj)Vj::::: Px
j j

Here are some elementary results about projections.

Theorem 1. The range of a projection is identical with the set of


its fixed points.

Proof. Let P : X -+ X be a projection and V its range. If V E V, then v::::: Px


for some x, and consequently,
Pv ::::: p2X ::::: Px ::::: V
Thus v is a fixed point of P. The reverse inclusion is obvious: If x::::: Px, then
x is in the range of P. •

Theorem 2. If P is a projection, then so is 1 - P. The range of


each is the null space of the other.

Proof·
(1 - p)2 ::::: (1 - P)(1 - P) ::::: 1 - 2P + p2 ::::: 1 - P
Use Rand N for "range" and "null space." Then the preceding theorem shows
that
(2) R(P) ::::: N(1 - P)
Applying this to 1 - P, we get
(3) R(1 - P) ::::: N(P) •
It should be noted that the range of a projection P is necessarily closed,
because the continuity of P implies that the set {x : (I - P)x ::::: 0 } is closed.
This is a special property not possessed by all elements of £(X, X). Notice also
that P acts like the identity on its range. Thus, every projection can be regarded
as a continuous linear extension of the identity operator defined initially on a
subspace of the given space. If P is a projection of X onto V, then V is closed,
and we say that "P is a projection of X onto V," writing P : X -» V, where
the double arrow signifies a surjection.
192 Chapter 4 Basic Approximate Methods in Analysis

Theorem 3. The adjoint of a projection is also a projection.


Proof. Let P be a projection defined on the normed space X. Recall that its
adjoint P' maps X' to X' and is defined by the equation
P*</>=</>oP </>EX'
Thus it follows that
(p.)2</> = P*(P*</» = (</>oP)oP=</>op 2 = </>oP=P*</> •
In the next theorem, the structure of a projection having a finite-
dimensional range is revealed.
Theorem 4. Let P be a projection of a normed space X onto a
finite-dimensional subspace V. If {VI, ... ,vn } is any basis for V then
there exist functionals </>i in X' such that
( 4) </>i (Vj) = 8ij (1 :;;; i, j :;;; n)

(5) (x E X)
i=1
Proof. Select 1/Ji E V' such that for any v E V,
n

i=1
The functionals 1/Ji are linear and continuous, by Corollary 1 in Section 1.5, page
26. (See also the proof of Corollary 2 in the same section.) For each x E X,
Px E V, and so Px = L::=I1/Ji(PX)Vi. Hence we can let </>i(X) = 1/Ji(PX).
Being a composition of continuous linear maps, ¢i is also linear and continuous.
The equation PVj = Vj implies that </>i(Vj) = 8ij by the uniqueness of the
representation of Px as a linear combination of the basis vectors Vi. •
A set of vectors VI, V2, ... and a set of functionals </>1, </>2, ... is said to form a
biorthogonal system if </>i(Vj) = 8ij for all i and j. The book [Brez] is devoted
to this topic.
In practical problems, a projection is often used to provide approximations.
Thus, if x is an element of a normed space X and if V is a subspace of X, it
may be desired to approximate x by an element of V. If V E V, the error or
deviation of x from V is Ilx - vii,
and the minimum deviation or distance
from x to V is
dist(x, V) = inf Ilx -
vEV
vii
This quantity represents the best that can be done in approximating x by an
element of V. In most normed spaces it is quite difficult to determine a best
approximation to x. That would be an element v E V such that
(6) Ilx - vii
= dist(x, V)
Such an element need not exist. It will exist if V is finite dimensional and in
certain other special cases. Usually, we find a convenient projection P : X V ---#

and accept Px as an approximation to x. In Hilbert space, we can use the


orthogonal projection of X onto V and thereby obtain the best approximation.
In most spaces, best approximations-even if they exist-cannot be obtained by
using linear maps.
Section 4.4 Projections and Projection Methods 193

Theorem 5. If P is a projection of X onto a subspace V, then for


allxEX,

(7) Ilx - Pxll :,;; III - pil dist(x, V)

Proof. For any v E V, Pv = v. Hence

Ilx - Pxll = II(x - v) - P(x - v)11 = II(I - P)(x - v)11 :,;; III - plllix - vii

Now take an infimum as v ranges over V.


As remarked above, the Hilbert space case is especially favorable, since the

orthogonal projection onto a subspace does yield best approximations. If X is
a Hilbert space and P : X --# V is a projection, we call it the orthogonal
projection onto V if

(8) x - Px..l V (x E X)

If this is the case, then by the Pythagorean Law,

(9)

Hence P and I - P have operator norms at most 1. Inequality (7) now shows that
Px is the best approximation to x in the subspace V. Furthermore, x - Px is the
best approximation of x in V.L. (This last space is the orthogonal complement
of V in the Hilbert space X.)
Example 1. Consider the familiar space era, bJ. In it we single out for special
attention the subspace II n - 1 consisting of all polynomials of degree at most n-1.
This has dimension n. Now select points tl < t2 < ... < tn in [a, bj, and define
polynomials
fi(S) = IT
j=1
s-tj
ti - tj
#i
These polynomials have degree n - 1 and satisfy the equation

(l:';;i,j:';;n)

This is a special case of Equation (4) above. The operator L, defined for x E
era, bjby the equation
n
Lx = L x(ti)fi
i=l

is the Lagrange Interpolation Operator; it is a projection.


For any t E [a, b], we can define a functional t* on the space era, bj by

writing t*(x) = x(t), where x runs over era, bj. This functional is called a point
evaluation functional. Notice that in Example 1, the functions f 1 ,f2 , ..• ,fn
and the functionals ti, ti, .. . , t~ form together a biorthogonal system, as defined
just after Theorem 4.
194 Chapter 4 Basic Approximate Methods in Analysis

Besides being used directly to provide approximations to elements in a


normed linear space, projections are used to solve operator equations. Let us
illustrate this in a Hilbert space X. Suppose that it is desired to solve an op-
erator equation Ax = b. Here, A E £(X,X); it could be an integral operator,
for example. Let [Uj : j E N] be an orthonormal basis for X. (In such a case, X
must be separable.) The equation Ax = b is equivalent to an infinite system of
equations:

(10) (j E N)

In attempting to find an approximate solution to this problem, we solve the


finite system
(1 ~ j ~ n)
This is the same as

(11)

where Pn is the orthogonal projection defined by


n
(12) Pny = 2:)y,Uj)Uj
j=1

Does this strategy have any chance of success? It depends on whether the
sequence [x n ), arising as outlined above, converges. Assume that Xn -+ x. Let
us verify that x is a solution: Ax = b. By the continuity of A, AXn -+ Ax. Since
Ilpnll= 1, Pn(Ax n - Ax) -+ O. But PnAx n = Pnb by our choice of x n . Hence
Pnb - PnAx -+ O. In the limit, this gives us b = Ax.
Notice that this proof uses the essential fact that PnY -+ Y for all y. For
our general theorem (in any Banach space) this assumption is needed.

Theorem 6. Let [Pn] be a sequence of projections on a Banach


space X, and assume that PnY -+ Y for each Y in X. Let b E X and
A E £(X, X). For each n let Xn be a point such that Pn(Ax n - b) = O.
If Xn -+ x, then Ax = b.

Proof. SincePny-+ y, we also have IlpnylI-+ IIYII,and thereforesuPn IIpnyll <


00 for each y. Since X is complete, we may apply the Uniform Boundedness Prin-
ciple (Section 1.7, page 42) and conclude that SUPn Ilpnll < 00. By the continuity
of A, AXn -+ Ax. By the boundedness of IIpnll, we have Pn(AXn - Ax) -+ O.
By the choice of x n , PnAx n = Pnb. Hence Pnb - PnAx -+ O. In the limit, this
yields b = Ax. •
A projection method for solving an equation of the form Ax = b, where
A E £(X, X), begins by selecting a sequence of subspaces

(13)
Section 4.4 Projections and Projection Methods 195

and associated projections Pn : X ---» Vn . For each n, we find an Xn that satisfies

(14)

Often we insist that Xn E Vn , but this is not essential. One hopes that the
sequence [xnJ will converge to a solution of the original problem. We shall give
some positive results in this direction. These apply to a problem of the form

(15) x-Ax=b

Theorem 7. Let P be a projection of the normed space X


onto a subspace V. Suppose that x E X, x - Ax - b = 0, x E V, and
P(X - Ax - b) = O. If I - PA is invertible, then

(16) Ilx - xii ~ I (I - PA)-lllllx - Pxll


Proof. From Equation (15), PAx = Px - Pb. Hence

(17) x - PAx = x - Px + Pb

Since x E V, it follows that Px = X. Consequently,

(18) x- PAx = Pb

Subtraction between Equations (17) and (18) gives

x - x - PA(x - x) = x - Px

or
(I - P A)(x - x) = x - Px
Thus we have
x - x = (I - PA)-I(X - Px)

This leads to Inequality (16).



Let us see how this theorem can be applied in the case where X is a Hilbert
space. Let [VI, V2, ... J be an orthonormal basis for X, and suppose that we wish
to solve

(19) x-Ax=b

in which A E £(X, X) and IIAII< 1. Let Vn be the linear subspace of dimension


n generated by {VI, .... , Vn }, and let Pn be the orthogonal projection of X onto
Vn . The familiar formula for Pn is
n
(20) Pnx = l)x,Vj)Vj
j=1
196 Chapter 4 Basic Approximate Methods in Analysis

In applying the projection method to Equation (19), let us select elements


Xn in Vn for which

(21)

By the preceding theorem, the actual solution x is related to the approximate


solution Xn by the inequality

(22)

Now Ilpnll = I, and so IIPnAl1 :::;; IIAII < 1. Hence


II(I - PnA)-111 :::;; (l-IIAID- 1

(This estimate comes from the proof of the Neumann Theorem.) Also, we know
that
Ilx - Pn X l1 2
= II. f
t=n+l
2
(X,Vi) Vi Il = f
i=n+l
I(X,ViW ~0
We conclude therefore from Inequality (22) that the approximate solutions Xn
converge to x as n ~ 00. We summarize this discussion in the next theorem.

Theorem 8. Let [VI, V2, ... J be an orthonormal basis in a Hilbert


space X. Let A E .c(X, X) and IIAII
< 1. For each n, let Xn be a linear
combination of VI, ... ,Vn chosen so that

(23) (l:::;;i:::;;n)

Then [xnJ converges to the solution of the equation x - Ax = b.

Of course, the Neumann Theorem can be used to solve the equation (J -A)x = b.
It gives x = (I - A)-lb = :L:~=o Anb. There seems to be no obvious connection
between this solution and the one provided by Theorem 8.
In the general projection method, in solving the equation

(24) Pn(Ax - b) = 0

we need not confine ourselves to the case where x is chosen in the range of Pn .
Instead, we can let x be a linear combination of other prescribed elements, say
x = L~=1 CiUi' In this case, we attempt to choose Ci so that

(25) Pn(tCjAUj-b) =0
)=1

Suppose that Pn is a projection of rank n having the explicit formula


n
(26) Pnx = L tPi(X)Vi
i=1
Section 4.4 Projections and Projection Methods 197

Then Equation (25) gives us

(27) ePi('i:CjAUj - b) =0 (1 ~ i ~ n)
J=1
or
n
(28) L cjePi(Auj) = ePi(b) (1 ~ i ~ n)
j=1
This is a system of linear equations having coefficient matrix (ePi (Auj )). In the
next two sections we shall see examples of this procedure.

Problems 4.4

1. Let {UI, ... , Un} be a linearly independent set in a inner-product space. Prove that the
Gram matrix, whose elements are (Ui,Uj), is nonsingular.
2. Let P be a projection of a normed space X onto a subspace V. Prove that Px = 0 if and
only if <p(x) = 0 for each <P in the range of P·.
3. Let PI, P2, . .. be a sequence of projections on a normed space X. Suppose that
Pn+IPn = Pn for all n and that the union of the ranges of these projections is dense in
X. Suppose further that sUPn IIPnl1 < 00. Prove that Pnx -t X for all x E X.
4. Let H, P2, . .. be a sequence of projections on a Banach space X. Prove that if Pnx -t X
for all x E X, then sUPn lipnll < 00 and the union of the ranges of the projections is
dense in X. Hint: The Uniform Boundedness Theorem is useful.
5. Let X be a Banach space and H, P2, . .. projections on X such that Pnx -t X for every
x. Suppose that A is an invertible element of £(X, X). For each n, let Xn be a point
such that Pnxn = Xn and Pn(Axn - b) = O. Prove or disprove that the sequence [xnl
necessarily converges to the solution of the equation Ax = b.
6. Let {<PI, ... ,<Pn} be a linearly independent set in X*. Is there a projection P: X -t X
having rank n of the form Px = I::I <Pi(X)Vi? (The rank of a linear operator is the
dimension of its range.)
7. Adopt the notation of Theorem 4, and prove that <Pi (Px) = <Pi (x) for all i in {1, 2, ... , n}
and for all x in X.
8. Prove that the operator L in Example 1 is a projection. Prove that IILII = II I: Il!illi oo '
9. Let A and P be elements of £( X, X), where p2 = P. Let V denote the range of P. Show
that P AJV E £(V, V). Is P AJV invertible?
10. Prove a variant of Theorem 6 in which P is an arbitrary linear operator and x satisfies
Px=x.
11. In the setting of Theorem 8, prove that the solution to the problem is given by x =
2::"=0
Anb. How is this solution related to the one given in the theorem, namely, x =
limx n ?
12. Consider the familiar sequence space Co. (It was described in Problem 1.2.16, page 12.)
We define a projection P : Co -t Co by selecting any set of integers J and setting

x(n) (n EN" J)
(Px)(n) = { 0
(n E J)

Prove that P is a projection. Identify the null space and range of P. Give the formula
for 1- P. Compute IiPIl and III - pli. How many projections of this type are there?
What is the distance between any two different such projections?
198 Chapter 4 Basic Approximate Methods in Analysis

13. Let {UI' U2, ... , Un} and {VI, V2, ... ,vn } be sets in a Hilbert space, the second set be-
ing assumed to be linearly independent. Define Ax = I::I (x, Ui)Vi. Determine the
necessary and sufficient conditions on {Ui} in order that A be a projection, i.e., A2 = A.
14. (Variation on Problem 5.) Let X be a Banach space, and let PI, P2, ... be projections
on X such that Pnx -+ x for each x in X. Assume that /lPn /I = 1 for all n. Let A
be a linear operator such that /II - A/I < 1. If the points Xn satisfy Pnxn = Xn and
=
Pn(Ax n - b) 0, then the sequence [Xn] converges to a solution of the equation Ax b. =
15. In JR2, let U = (1,1), V = (1,0), and Px = (x, u)v. Prove that P is a projection. Using the
Euclidean norm in JR2, compute /lpii. This problem illustrates the fact that projections
on Hilbert spaces need not have norm 1.
16. Is a norm-l projection defined on a Hilbert space necessarily an orthogonal projection?
17. Explain why point-evaluation functionals, as defined on the space era, b], cannot be
defined on any of the spaces LP[a, b].

4.5 The Galerkin Method

The procedure that goes by the name of the mathematician Galerkin is one of
the projection methods, in fact, the one described at length in the preceding
section. We review the method briefly, and then discuss concrete examples of
its use.
We wish to solve an equation of the form
(1) Au= b
in which A is an operator acting on a Hilbert space U. A finite-dimensional
subspace V is chosen in U, and we let P denote the orthogonal projection of U
u
onto V. Then we find E V such that
(2) P(Au - b) = 0

If VI, V2,"" Vn is a basis for V, and if we set u= L:;=1 CjVj, then Equation (2)
leads to
n
(3) 2: Cj (AVj, Vi) = (b, Vi) (1 ~ i ~ n)
j=1

These are the "Galerkin Equations" for U.


Here is an example of the Galerkin method in the subject of partial differ-
ential equations. We recall the definition of the Laplacian operator V'2 (also
denoted by ~):
a2 u a2 u
V' 2u = ax2 + ay2
An important problem, known as the Dirichlet problem, is to find a function
u = u(x,y) that obeys Laplace's equation V' 2 u = 0 in a bounded open set ("re-
gion") n in the plane and takes prescribed values on the boundary of the region
(denoted by an). Thus there are two conditions on the unknown function u:

{
V'2u = 0 in n
(4)
u(x, y) = g(x, y) on an
Section 4.5 The Galerkin Method 199

A function U that has continuous second-order partial derivatives and sat-


isfies '\7 2u = 0 in a region 0 is said to be harmonic in O. In the Dirichlet
problem, the function 9 is defined only on 00 and should be continuous. Thus
the Dirichlet problem has the goal of reconstructing a harmonic function in a
region 0 from a knowledge of its values on the boundary of O. It furnishes a
nice example of the recovery of a function from incomplete information. (The
general topic of optimal recovery of functions has many other specific examples,
such as in computed tomography, where the density function of a solid object is
to be recovered from information produced by X-ray scanning.)
In applying Galerkin's method to the Dirichlet problem, it is advantageous
to select base functions UI, ... , Un that are harmonic in O. Then an arbitrary
linear combination 'L7=1 CjUj will also be harmonic, and we need only to adjust
the coefficients so that the boundary conditions are approximately satisfied.
I
That could mean that 'L7=1 CjUj - gil
is to be minimized, where the norm is
one that involves only function values on 00. In Galerkin's method, however,
we select Cj so that
n
(5) (2::CjUj - g, Ui) = 0 i = 1, ... ,n
j=1
where the inner product could be a line integral around the boundary of n.
A plenitude of harmonic functions can be obtained from the fact that the
real and imaginary parts of a holomorphic function of a complex variable are
harmonic. To prove this, let w be holomorphic, and let w = U + iv, where U and
v are the real and imaginary parts of w. By the Cauchy-Riemann Equations,
we have

02U + 02u = ~ ou + ~ ou = ~ ov + ~ (_ OV) = 0


ox 2 oy2 ox ox oy oy ox 8y oy ox

The proof for v comes from observing that it is the real part of -iw.
To illustrate this, consider the function z >-+ Z2. We have

w = Z2 = (x + iy? = x 2 - y2 + 2ixy = U + iv
Thus the functions U = x 2 - y2 and v = 2xy are harmonic. (See Problem 10.)
The Dirichlet problem is frequently encountered with Poisson's Equation:

{
'\72u = f in 0
(6)
U = 9 on 00

One way of solving (6) is to solve two related but easier problems:

(7) {
'\72v ~ f on 0 {
'\72w = 0 on 0
v = 0 on 00 w = 9 on 00

Clearly, the function U = v + w will then solve (6). The problem involving w
was discussed previously. The Galerkin procedure for approximating v begins
with the selection of base functions VI, V2,... that vanish on 00. Then an
200 Chapter 4 Basic Approximate Methods in Analysis

approximate solution v is sought having the form v = ,£7=1 CjVj. The usual
Galerkin criterion is applied, so we have to solve the linear equations
n
(8) 2::>j (V' 2Vj, Vi) = (1, Vi) i = 1, ... ,n
j=1

The inner product here should be defined by the equation

(9) (u,V) = l u(x, y)v(x, y) dxdy

Theorem 1. If n is a region to which Green's Theorem applies,


then the Laplacian is self-adjoint with respect to the inner product (9)
when applied to functions vanishing on an.

Proof. Using subscripts to denote partial derivatives, we write Green's Theo-


rem in the form

Jnr (Qx - Py) = r (Pdx + Qdy)


Jan
Applying this to the functions Q = UV x - vU x and P = vUy - uVy, we obtain

Since V and u vanish on an, we conclude that (u, V' 2v) = (V' 2u, v). •
A remark about Equation (8) is in order. Some authors argue that the
coefficients Cj should be chosen to minimize the expression

where the Hilbert space norm is being used, corresponding to the inner product
in Equation (9). This is a problem of approximating f as well as possible by a
linear combination of the functions V' 2 Vi (1 ~ i ~ n). The solution is obtained
via the normal equations
n
L Cj V' 2Vj - f ..l V' 2Vi (1 ~ i ~ n)
j=1

This leads to the system


n
(10) L Cj (V' 2Vj, V' 2Vi) = (1, V' 2Vi) (1 ~ i ~ n)
j=1

This is not the classical Galerkin method, although it is an example of the


general theory presented in Section 6.4.
An existence theorem for solutions of the Dirichlet problem is quoted here,
from [Gar], page 288. It concerns domains with smooth boundary. Other such
theorems can be found in [Kello].
Section 4.5 The Galerkin Method 201

Theorem 2. Let n be a bounded open set in ]R2 whose boundary


an consists of a finite number of simple closed curves. Assume the
existence of a number r > 0 such that at each point of an there are
two circles of radius r tangent to an at that point, one circle in n-
and the other in ]R2 "n. Let 9 be a twice continuously differentiable
function on n. Then the Dirichlet problem (4) has a unique solution.

For boundary-value problems involving differential equations, the Galerkin


strategy can be applied after first turning the boundary-value problem into a
"variational form." This typically leads to particular cases of a general problem
that we now describe.
Two Hilbert spaces U and V are prescribed, and there is given a bilinear
functional on U x V. ("Bilinear" means linear in each variable.) Calling this
functional B, let us make further assumptions as follows:
(a) IB(u, v)1 ~ allulllivil
(b) infllull=l sUPllvll=l IB(u, v)1 = f3 > 0
(c) If v:f. 0, then sUPu B(u, v) > 0
With this setting established, there is a standard problem to be solved, namely,
given a specific point z in V, to find w in U such that, for all v in V, B( w, v) =
(z, v). The following theorem concerning this problem is Babuska's generalization
of a theorem proved first by Lax and Milgram. The proof given is adapted from
[OdR].

Theorem 3. Babuska-Lax-Milgram Under the hypotheses listed


above we have the following: For each z in V there is a unique w in U
such that
B(w,v) = (z,v) forallvinV
Furthermore, w depends linearly and continuously on z.

Proof. As usual, we define the u-sections of B by Bu(v) = B(u, v). Then each
Bu is a continuous linear functional on V. Indeed,

IIBul1 = sup{IB(u, v)1 : v E V, Ilvil = I} ~ allull

By the Riesz Representation Theorem (Section 2.3, Theorem 1, page 81), there
corresponds to each u in U a unique point Au in V such that Bu(v) = (Au, v).
Elementary arguments show that A is a linear map of U into V. Thus,

(Au, v) = Bu(v) = B(u, v)


The continuity of A follows from the inequality IIAul1 IIBul1 ~ allull. The
operator A is also bounded from below:

IIAuil = sup (Au, v) = sup IB(u, v)1 ~ f3llull


Ilvll=l IIvll=l
202 Chapter 4 Basic Approximate Methods in Analysis

In order to prove that the range of A is closed, let [vnl be a convergent sequence
in the range of A. Write Vn = Aun , and note that by the Cauchy property

0= lim
n,m~oo
lim IIAun - Aumll ~ (3 n,m
Ilvn - vmll = n,m lim Ilun - umll

Consequently, [unl is a Cauchy sequence. Let U = Um n Un. By the continuity of


A, Vn = AU n -+ Au, showing that limn Vn is in the range of A. Next, we wish
to establish that the range of A is dense in V. If it is not, then the closure of
the range is a proper subspace of V. Select a nonzero vector p orthogonal to the
range of A. Then (Au,p) = 0 for all u. Equivalently, B(u,p) = 0, contrary to
the hypothesis (3) on B. At this juncture, we know that A-I exists as a linear
map. Its continuity follows from the fact that A is bounded below: Ifu = A-lV,
then the inequality IIAul1 ~ {3llull implies Ilvll ~ (3IIA-IVII· The equation we
seek to solve is B(w,v) = (z,v) for all v. Equivalently, (Aw,v) = (z,v). Hence
Aw = z and w = A-Iz. Since there is no other choice for w, we conclude that
it is unique and depends continuously and linearly on z.

If a problem has been recast into the form of finding a vector w for which
B(w,v) = (z,v), as described above, then the Galerkin procedure can be used
to solve this problem on a succession of finite-dimensional subspaces Un C U
and Vn C V.
Reviewing the details of this strategy, we start by assuming that dim(Un ) =
dim(Vn ) == n. Select bases {Ui} for Un and {Vi} for Vn . A solution Wn to the
"partial problem" is sought:

(1 ~ i ~ n)

A "trial solution" is hypothesized: Wn = L~ CjUj. We must now solve the


following system of n linear equations in the n unknown quantities Cj:

n
'2:CjB(Uj,Vi) = (Z,Vi) (1 ~ i ~ n)
j=1

In order to have at this. stage a nonsingular n x n matrix B(Uj, Vi), we would


have to make an assumption like hypothesis (b) for the two spaces Un and Vn .
For example, we could assume
b*) There is a positive {3n such that

sup IB(u,v)1 ~ {3nllull


vEVn , \lv\\=1

Problem 14 asks for a proof that this hypothesis will guarantee the nonsingularity
of the matrix described above.
Section 4.5 The Galerkin Method 203

Example 1. Consider the two-point boundary-value problem

(pu')' - qu = f u(a) = 0 u(b) = 0

This is a Sturm-Liouville problem, the subject of Theorem 1 in the next section


(page 206) as well as Section 2.5, pages 105ff. In order to apply Theorem 3,
one requires the bilinear form and linear functional appropriate to the problem.
They are revealed by a standard procedure: Multiply the differential equation
by a function v that vanishes at the endpoints a and b, and then use integration

lb lb
by parts:
[v(pu')' - vqu] = vf

vpu'l~ -lb -lb = lb


v'pu' vqu vf

lb + lb -
[pu'v' quv] = fv

B(u, v) = lb + = -l
(pu'v' quv) (f,v)
b
fv

There is much more to be said about this problem, but here we wish to emphasize
only the formal construction of the maps that enter into Theorem 3.
Example 2. The steady-state distribution of heat in a two-dimensional domain
n is governed by Poisson's Equation:

Here, u(x,y) is the temperature at the location (x,y) in 1R2, and f is the heat-
source function. If the temperature on the boundary an is held constant, then,
with suitable units for the measurement of temperature, we may take u(x, y) =
o on an. This simple case leads to the problem of discovering u such that
B(u, v) = (f,v) for all v, where

B(u, v) = -l (uxvx + UyV y )

To arrive at this form of the problem, first write the equivalent equation

for all v

The integral form of this is

l vV 2 u = In vf for all v

The integral on the left is treated by using Green's Theorem (also known as
Gauss's Theorem). (This theorem plays the role of integration by parts for
multivariate functions.) It states that

+ Qy) = (Pdy - Qdx)


Jo{(P x {
JaO
204 Chapter 4 Basic Approximate Methods in Analysis

This equation holds true under mild assumptions on P, Q, n, and an. (See
[Wid].) Exploiting the hypothesis of zero boundary values, we have

10 vV2u = 10 (uxx + Uyy)v


= 10 [(uxv)x + (uyv)y - UxVx - UyVy]

= r (uxv - uyv) - Jnr(uxv x + UyVy)


Jan
= - 10 (uxvx + UyVy) = B(u, v)

References. For the classical theory of harmonic functions consult [Kello].
For the existence theory for solutions to the Dirichlet Problem, see [Gar]; in
particular, Theorem 2 above can be found in that reference. For the Galerkin
method, consult [KK], [KA], [OdR], [Kee], and [Gre], [Gri].

Problems 4.5

1. ([Mil], page 115) Find an approximate solution of the two-point boundary-value problem

x" + tx' + x = 2t x(O) =1 x(1) =0


by using Galerkin's method with trial functions

2. Find an approximate solution of the problem

(tx')' +x = t x(O) = 0 x(1) = 1

by using Galerkin's method and the trial solution

x(t) = t + t(1 - t)(Cl + C2t)

3. Invent an efficient algorithm for generating the sequences of harmonic functions [Un], [v n ],
where zn = Un + iVn and n = 0, 1, 2, ...
4. Prove that a differentiable function f : lR -t lR such that infx f'(x) > 0 is necessarily
surjective. Show by an example that the simpler condition f'(x) > 0 is not adequate.
5. A sequence VI, V2, . .. in a Banach space U is called a basis (or more exactly a Schauder
basis) if each x E U has a unique representation as a convergent series in U of the form
x = L::'=I an(x)vn . The an depend on x in a continuous and linear manner, i.e.,
an E U·. If U has a Schauder basis, then U is separable. Prove that an(v m ) =
8nm .
Prove that no loss of generality occurs in assuming that IIvnll = 1 for all n. See problems
24-26 in Section 1.6, pages 38 and 39.
6. (Continuation) Prove that for each n, the map Pn defined by the equation Pnx =
L:;=Iak(x)vk is a (bounded, linear) projection.

7. (Continuation) Prove that sUPn IIpnll < 00.


8. (Continuation) What are the Galerkin equations for the problem Ax b when the
projections in the preceding problems are employed?
Section 4.6 The Rayleigh-Ritz Method 205

9. (Continuation) Use Galerkin's method to solve Ax = b when A is defined by AV n =


",,00 '-2 d b ",,00 2-n
L.Jj=nJ Vj an = L.Jn=l Vn .

10. Use the computer system Mathematica to find the real and imaginary parts of zn for
n = 1 to n = 10. Version 2 (or later) of Mathematica will be necessary. The input to do
this all at once is

n-l; While [(n-n+l)<ll. Print[ComplexExpand[(x+I y)An]]]

11. Use Green's Theorem to show that the operator A defined by

Au = auxx + buxy + CU yy
is Hermitian on the space of functions having continuous partial derivatives of orders 0,
1, 2 in 0 and vanishing on ao. In the definition of A, the coefficients are constants.
12. Give an elementary proof of Green's Theorem for any rectangle in ]R2 whose sides are
parallel to the coordinate axes.
13. Solve the Dirichlet problem on the unit disk in ]R2 when the boundary values are given
by the expression 8x 4 - 8y4 + 1.
14. Prove that the matrix (B( Uj, Vi)) described following Theorem 3 of this section is non-
singular if the hypothesis (b*) is fulfilled by the spaces Un and Vn .

4.6 The Rayleigh-Ritz Method

The basic strategy of this method is to recast a problem such as

(1) F(x) =0 (x E X)

into an extremal problem, i.e., a problem of finding the maximum or minimum


of some functional <1>. This extremum problem is then solved on an increasing
family of subspaces in the normed space X:

It is obvious that there are many ways to create an extremum problem


equivalent to the problem in (1). For example, we can put

(2) <I>(x) = IIF(x)11


If this choice is made with a linear problem in a Hilbert space, we are led to
a procedure very much like Galerkin's Method. Suppose that F(x) = Ax - v,
where A is a linear operator on a Hilbert space. The minimization of IIAx - vii,
where x E Un, reduces to a standard least-squares calculation. Suppose that Un
has a basis [Ul,U2, ... ,unJ. Then let x = L:;=l CjUj. The minimum of
206 Chapter 4 Basic Approximate Methods in Analysis

is obtained when the coefficient vector (Cl' C2, ... ,cn ) satisfies the "normal" equa-
tions
n
(3) 2:::: cjAuj - v ..1 A(Un)
j=l

This means that


n
(4) 2:::: Cj (Auj, AUi) = (v, AUi) (1 ~ i ~ n)
j=l

These are not the Galerkin equations (Equation (3) in Section 4.5, page 198).
The Rayleigh-Ritz method (in its classical formulation) applies to differen-
tial equations, and the functional <P is directly related to the differential equa-
tion. We illustrate with a two-point boundary-value problem, in which all the
functions are assumed to be sufficiently smooth. (They are functions of t.)

(px')' - qx = f
(5) {
x(a) =a x(b) = (3
In correspondence with this problem, a functional <P is defined by

(6)

Theorem 1. If x is a function in C 2 [a, b] that minimizes <p( x) locally


subject to the constraints x(a) = a and x(b) = (3, then x solves the
problem in (5).

Proof. Assume the hypotheses, and let y be any element of C 2 [a, b] such that
y(a) = y(b) = o. We use what is known as a variational argument. For each
real number ,x, x +,Xy is a competitor of x in the minimization of <P. Hence the
function ,x ~ <p(x + ,Xy) has a local minimum at ,x = O. We compute

d d Ib
d,X <p(x + ,Xy) = d,X a [(x' + 'xy'?p + (x + ,Xy)2q + 2(x + ,Xy)f] dt

= 21b [(x' + ,Xy')y'p + (x + ,Xy)yq + yf] dt

Evaluating this derivative at ,x = 0 and setting the result equal to 0 yields the
necessary condition

(7) lb (px'y' + qxy + fy) dt = 0

We use iL' "gration by parts on the first term, like this:

b
I px'y' = PX'yl b - lb (px')'y =- Ib (px')'y
a a a a
Section 4.6 The Rayleigh-Ritz Method 207

Here the fact that y(a) = y(b) = 0 has been exploited. Equation (7) now reads

(8) lb [-(px')' + qx + flY dt = 0

(The steps just described are the same as those in Example 1, page 203.) Since
y is an arbitrary element of C 2 [a, b] vanishing at a and b, we conclude from
Equation (8) that
-(px')' + qx + f = 0
The details of this last argument are as follows. Let z = -(px')' + qx + f.
Then J:z(t)y(t) dt = 0 for all functions y of the type described above. Suppose
that z 1= o. Then for some r, z(r) 1= O. For definiteness, let us assume that
z(r) == £ > o. Then there is a closed interval J c (a, b) in which z(t) ~ £/2.
There is an open interval I containing J in which z(t) > o. Let y be a C 2

J:
function that is constantly equal to 1 on J and constantly equal to 0 on the
complement of I. Then z(t)y(t) dt > o. •
Theorem 2. Assume that p(t) > 0 and q(t) ~ 0 on [a,b]. Ifx is a
function in C 2 [a, b] that solves the boundary-value problem (5) then x
is the unique local minimizer of <1> subject to the boundary conditions
as constraints.

Proof. Let z E C 2 [a,b], z 1= x, z(a) = 0:, and z(b) = (3. Then the function
y = z - x satisfies O-boundary conditions but is not O. By calculations like those
in the preceding proof,

Using integration by parts on the middle term, we find that it is zero. Then
Equation (9) shows that <1>(z) > <1>(x). •

Returning now to the two-point boundary-value problem in Equations (5),


we make the simplifying assumption that the boundary conditions are x(O) =
x( 1) = o. (In Problems 4.1.3 - 4, page 176, it was noted that simple changes
of variable can be employed to arrange this state of affairs.) The subspaces
Un should now be composed of functions that satisfy x(O) = x(l) = o. For
example, we could use linear combinations of terms ti(1 - t)j where i,j ~ 1. If
x = 2:;=1 CjUj is substituted in the functional <1> of Equation (6), the result is
a real-valued function of (C1, C2, ... , Cn ) to be minimized. The minimum occurs
when
(i=I,2, ... ,n)

When the calculations indic~ted here are carried out, the system of equations
to be solved emerges as
n
LaijCj = bi (1 ~ i ~ n)
j=1
208 Chapter 4 Basic Approximate Methods in Analysis

11 [p(t)u~(t)uj(t) +
where
aij = q(t)Ui(t)Uj(t)] dt

bi = -1 1
J(t)Ui(t) dt

This completes the description of the Rayleigh-Ritz method for this problem.
In order to prove theorems about the convergence of the method, some
preliminaries must be dealt with. The following lemma is formulated for an
arbitrary topological space. Refer to Chapter 7, Section 6, pages 361ff, for basic
topology.

Lemma. Let II! be an upper semicontinuous nonlinear functional


defined on a topological space X. Let Xl C X 2 C ... be subsets of X
such that U~=l Xn is dense in X. Then as n too, we have

(10) inf II! (x) -J.. inf II! (x )


xEXn xEX

Proof. Let p = infxEx lI!(x). (We permit p = -00.) For any r > p, the set

0= { x EX: lI!(x) < r }

is nonempty. It is open, because II! is upper semicontinuous. Since U Xn is


dense, it intersects O. Select m such that Xm no contains a point, say~. Then
for n ~ m
p!( inf II! (x )!( inf II! (x ) !( II!(O < r
xEX n xEX m •
In applying the Rayleigh-Ritz method to the two-point boundary-value
problem

(11) (px')' - qx = J
(12) x(O) = x(l) = 0

we take X to be the space {x E C2[0, 1] : x(O) = x(l) = O}, normed by defining


Ilxll Ilxll oo Ilx'lloo·
= + Next, we assume that the finite-dimensional subspaces
Un in X are nested (Un C U n +1 ) and that U~=l Un is dense in X. Thus, the
elements of Un satisfy (12). Notice that X has codimension 2 in C2[0, 1]. In fact,
X EBIl1 = C 2 [0, 1], where III denotes the subspace of all first-degree polynomials.
The functional <l> is

<l>(x) = 11 [p(x'? + qx 2 + 2Jx] dt

The functional <l> attains its infimum on each finite-dimensional subspace of


X. A proof of this is outlined in Problems 6, 7, and 8.
Section 4.6 The Rayleigh-Ritz Method 209

Theorem 3. Let x denote the solution of the boundary-value prob-


lem (11,12), and let Xn be a point in Un that minimizes <I> on Un.
Assume q ~ 0 and p > 0 on [0,1]. Then xn(t) -+ x(t) uniformly.

Proof. Since we have assumed that U~=1 Un is dense in X, the preceding


lemma and Theorem 2 imply that <I>(x n) .l. <I>(x). Notice that our choice of norm
on X guarantees the continuity of <I>. In the following, we use the standard

11
inner-product notation
(u, v) = u(t)v(t) dt

and I 112is the accompanying quadratic norm. From Equation (9) in the proof
of Theorem 2, we have

<I>(x n ) - <I>(x) = 11 [p. (x~ - X')2 + q. (x n - X)2] dt

~ ior p. (x~ _ X')2 dt ~ O.;;t';;l


1
min p(t)llx~ - x'll~

This shows that Ilx~ - x'I12


-+ O. By obvious manipulations, including the
Cauchy-Schwarz inequality, we have now

Ixn(s) - x(s)1 = 11 [x~(t)


8
- x'(t)] dtl ~ 1Ix~(t) - x'(t)1 dt
8

~ 11 Ix~(t) - x'(t)1 dt = (Ix~ - x'l, 1) ~ Ilx~ - x'I12 .111112


This shows that

As an illustration of the preceding theorem, consider the subspaces

Un = {wv : v E lIn } w(t) = t(l - t)

The union of these subspaces is the space of all functions having the form
n
t t---+ t(l - t) L aktk
k=O

Is this subspace dense in the space

X = { x E C 2[0, 1] : x(O) = x(l) = 0 } ?

The norm on X is defined to be Ilxll = IlxlLx> + Ilx't). To prove density,


let x be any element of X, and let € > o. By the Weierstrass Approximation
Theorem, there is a polynomial p such that lip - x'iloo < 00. Define u(s) =
210 Chapter 4 Basic Approximate Methods in Analysis

f; p(t) dt, so that u' = p, u(O) = 0, and lIu' - x' 1100 < c. Then lu(s) - x(s)1 =
I f;[u'(t) - x'(t)] dtl < c. Since x(l) = 0, lu(1)1 < c. Put v(t) = tu(1). Then
Iv'(t)1 = lu(I)1 < c and Iv(t)1 ~ lu(I)1 < c. Notice that u - v is a polynomial
that takes the value 0 at 0 and 1. Hence u - v contains w(t) = t(1 - t) as a
factor, and belongs to one of the spaces Un. Also,

Ilu - v - xll oo < Ilu - xll oo + Ilvlloo < 2c


Ilu' - Vi - x/lloo < Ilu' - x/ll oo + Ilv'lloo < 2c
Thus X can be approximated with arbitrary precision by elements of U~l Un.
To summarize, we state the following theorem.

Theorem 4. In the two-point boundary-value problem described in


Equations (11,12), assume that pi, q, and f are continuous. Assume
further that p > 0 and q ~ o. If, for each n, Xn is the polynomial of
degree n that minimizes II> subject to the constraint xn(O) = x n(l) = 0,
then [x n ] converges uniformly to a solution of Equations (11, 12).

Another illustration of the Rayleigh-Ritz method is provided by a boundary-


value problem involving Poisson's equation in two variables:

(13)

In correspondence with this problem, we set up the functional

lI>(u) = 10 (u~ + u~ + 2uJ) dxdy


We shall now show that any function u that minimizes 11>( u) under the constraint
that u = g on &0. must solve the boundary-value problem (13).
Proceeding as before, take a function w that vanishes on &0., and consider
lI>(u + AW). Since u is a minimum point of 11>,

This leads, by straightforward calculation, to the equation

(14) 10 (uxwx + UyWy + fw) dxdy = 0


In order to proceed, we require Green's Theorem, which asserts (under reason-
able hypotheses concerning 0.) that

r(Px + Qy) dxdy = iaor (Pdy - Qdx)


io
Section 4.6 The Rayleigh-Ritz Method 211

Using this, we have

10 (UxW x + UyW y ) = 10 [(WUx)x + (WUy)y - w\7 2U]

= { (-WU y dx + WU x dy) - { w\7 2 u


Jan Jn
Since W vanishes on an, we can now write (14) in the form

J10 (-w\7 2 u + fw)dxdy =0

Since W is almost arbitrary, this equation implies that

There are many problems in applied mathematics that arise naturally as


minimization problems. This occurs, for example when a configuration involving
minimum energy is sought in a mechanical system. One inclusive setting for such
problems is described here, and an elegant, easy, theorem is proved concerning
it.
Let X be a Banach space, and suppose that a continuous, symmetric bilinear
functional B is given. Let ¢ be a continuous linear functional on X, and let a
closed convex set K be prescribed. We seek the minimum of B(x,x) + ¢(x) on
K.

Theorem 5. In addition to the hypotheses in the preceding para-


graph, assume that B is "elliptic," in the sense that B(x, x) ~ ,8llxl12
for ,8 of
some > O. Then the minimum B(x, x) + ¢(x) on K is attained
at a unique point of K.

Proof. The bilinear form B defines an inner product on X. The norm arising
from the inner product is written IlxilB
= y'B(x, x). Since B is continuous,
there is a positive constant a such that IB(x,y)1 ~ Consequently,allxllllYII.
IlxilB allxll·
~ On the other hand, from the condition of ellipticity, we have
IlxilB v11llxll·
~ Thus the two norms on X
are equivalent. Hence, (X,II·IIB)
is complete and therefore a Hilbert space. Also, K is a closed convex set in
this Hilbert space. By the Riesz Representation Theorem, ¢(x) = -2B(v,x) for
some v in X. Write

B(x, x) + ¢(x) = B(x - V,x - v) - B(v, v)

.
This shows that our'minimization problem is the standard one of finding a point
of K closest to v, in the Hilbert space setting. Theorem 2 in Section 2.1 (page
64) applies and establishes the existence of a unique point x in K solving the
~~.
Historical note. A biographical article about George Green is [Cannl], and a
book by the same author is [Cann2]. When you are next in England, you would
212 Chapter 4 Basic Approximate Methods in Analysis

enjoy visiting Nottingham and seeing the well-restored mill, of which George
Green was the proprietor. His collected works have been published in [Green].

Problems 4.6

1. Let u be an element of C[O, 1] such that

11 u(t)x(t) dt =0

whenever x is an element of C[O, 1] satisfying the equation 2::'=1 [x(1/n)]2 = O. Prove


or disprove that u is necessarily O.
2. Let t1, t2, ... , tm be specified points in [0,1], and let x be an element of C[O, 1] that
vanishes at the points t1, ... , t m . Can we approximate x with arbitrary precision by a
polynomial that also vanishes at these points?
3. Solve the two-point boundary-value problem

x(O) =6 x(l) =~
Suggestion: Multiply by x' and integrate, or try some likely functions containing param-
eters.
4. Let x be an element of C 2 [a, b] that minimizes the functional

<I>(x) = lb [p(x')2 + qx 2 + 2Jx] dt

subject to the constraints x(a) = (t and x'(b) = O. Find the two-point boundary-value
problem that x solves.
5. Use au elementary change of variable in the problem

{
(px')' - qx =J
x(a) =a x(b) = f3
to find an equivalent problem having homogeneous boundary conditions. Thus, yea) =
y(b) = 0 in the new variable. Is the new problem also of Sturm-Liouville form?

6. Define X = {x E C 2 [a,b] : x(a) = x(b) = O}. Prove that if x E X then Ilx'll oo ~


IIxlloo(b-a)-l. Try to improve the bound.
7. (Continuation) In the Sturm-Liouville problem described by Equations (11) and (12),
assume that pet) ~ .5 > 0 and that q ~ O. Prove that for x E X,

8. (Continuation) Use the two preceding problems to show that on any finite-dimensional
subspace in X, the infimum of <I> is attained.
9. Solve the two-point boundary-value problem

x(O) =0 x(l) =1
This is deceptively similar to Problem 3, but harder. Look for a solution of the form
x(t) = 2:::0 antn. You should find a "general" solution of the differential equation
containing two arbitrary constants ao and a1. All remaining coefficients can then be
Section 4.7 Collocation Methods 213

obtained by a recurrence relation. After imposing the condition x(O) = 0, your solution
will contain only the powers t, t 4 , t1, t lO , ..• The parameter al will be available to secure
the remaining boundary condition, x(l) = 1. Reference: [Dav].
10. Let [a, b] be a compact interval in JR, and define

Ilxlloc = sup Ix(t)1 IIxl12 ={


b
lIX(t),2 dt
}1/2
a~t~b

Prove that if Ilx~112 ---+ 0 and if infa~t~b Ixn(t)1 ---+ 0, then IlxnlLXl ---+ O. Is the result
true when the interval is replaced by [a, oo)? Assume x~ continuous.
11. In C 1 [a, b] consider the two norms in the preceding problem. Prove that

Ilxlloc ~ Ix(a)1 + kllx'I12 where k = (b - a)1/2

12. Consider the two-point boundary-value problem

x" = f(t, x) x(O) = Q x(l) = (3


and the functional
<l>(x) = 11 [(x')2 + 2g(t, x)] dt

Assume that f = og/ox. Prove that any C2-function that minimizes <l>(x) subject to the
constraints x(O) = Q and x(l) = (3 is a solution of the boundary-value problem. Show by
example that the converse is not necessarily true.
13. Prove that there exists a polynomial of degree 5 such that p(O) = p'(O) = p"(O) = p'(I) =
p"(I) = 0 and p(l) = 1.
14. (Continuation) Let a < b. Using the polynomial in the preceding problem, show that
there exists a polynomial q of degree 5 such that q( a) = q' (a) = q" ~a) = q' (b) = q" ( b) = 0
and q(b) = 1. With the help of q construct a nondecreasing C -function f such that
f(t) = 0 on (-00, a) and f(t) = 1 on (b,oo).
15. Solve the integral equation

x(t) = sin t + 1" (t 2 + s2)x(s) ds

16. This two-point boundary value problem is easily solved:

x"(t) = cost x(O) = X(7r) = 0


Use the Ritz method to solve it, employing trial functions of the form x(t) = an sin nt. L
Orthogonality relations among the trigonometric functions will be useful in minimizing
the Ritz functional.

4.7 Collocation Methods

The collocation method for solving a linear operator equation

(1) Ax= b
214 Chapter 4 Basic Approximate Methods in Analysis

is one of the projection methods, as described in Section 4.4. It begins by


selecting base functions U1, ... ,Un and linear functionals ¢1, ... , ¢n. We try to
satisfy Equation (1) with an x ofform x = 'L7=1 CjUj. This leads to the equation

n
(2) LcjAuj=b
j=l

In general, this system is inconsistent, because b is usually not in the linear


space generated by AU1, AU2, . .. ,Aun . We apply the functionals ¢i to Equa-
tion (2), however, and arrive at a set of n linear equations for the n unknowns
n
LCj¢i(Auj) = ¢i(b)
j=l
Of course, care must be taken to ensure that the n x n matrix whose elements
are ¢i (Auj) is nonsingular. In the classical collocation method, the problem (1)
involves a function space; i.e., the unknown x is a function. Then the functionals
are chosen to be point-evaluation functionals:

¢i(X) = x(t i )

Here, the points ti have been specified in the domain of x.


Let us see how a two-point boundary-value problem can be solved approxi-
mately by the method of collocation. We take a linear problem with zero bound-
ary conditions:

x" + px' + qx = f
(3) {
x(O) = x(l) = 0

As usual, it makes matters easier to select base functions that satisfy the ho-
mogeneous part of the problem. Suppose that we letuj(t) = (1 - t)t j for
j = 1,2, ... ,n. As the functionals, we use ¢i(X) = x(t i ), where the points
t 1, ... , tn can be chosen in the interval [0, 1]. For example, we can let ti = (i-l )h,
where h = 1/(n - 1). The operator A is defined by

Ax = x" + px' + qx
and by computing we find that

(AUj)(t) = j(j - l)t j - 2 - j(j + 1)tj - 1 + p(t)[jt j - 1 - (j + l)t j ] + q(t)[t j - tj+1]

The matrix whose elements are (AUj)(ti) is easily written down, but it is not
instructive to do so. It will probably be an ill-conditioned matrix, because the
base functions we have chosen are not suited to numerical work. Better choices
for the base functions Ui would be the Chebyshev polynomials (suitable to the
interval in question) or a set of B-splines.
More examples of collocation techniques will be given later in this sec-
tion, but first we shall discuss the important technique of turning a two-point
Section 4.7 Collocation Methods 215

boundary-value problem into an equivalent integral equation. This continues a


theme introduced in Section 2.5. The integral equation can then be shown to
have a solution by applying a fixed-point theorem, and in this way we obtain an
existence theorem for the original two-point boundary-value problem.
We consider a two-point problem of the form

x"=j(t,X) O~t~l
(4) {
x(O)=O x(l)=O

There is no loss of generality in assuming that the interval of interest is [0,1]


and that the boundary conditions are homogeneous, because if these hypotheses
are not fulfilled at the beginning, they can be brought about by suitable changes
in the variables. (In this connection, refer to Problems 3 and 4 in Section 4.1,
page 176.) Observe that Equation (1) is, in general, nonlinear. We assume that
j is continuous on [0,1] x R
The Green's function for the boundary value problem (4) is defined to
be the function

t(l-S) O~t~s~l
(5) G(t, s) = {
s(l - t) O~s~t~l

Notice that G is defined on the unit square in the st-plane, and vanishes on the
boundary of the square. Although G is continuous, its partial derivatives have
jump discontinuities along the line s = t. Using the Green's function as kernel,
we define an integral equation

(6) x(t) = -10 G(t, s)J(s,x(s)) ds


1

Theorem 1. Each solution of the boundary-value problem (4) solves


the integral equation (6) and conversely.

Proof. Let x be any function in C[O, 1], and define y by writing

yet) = -11 G(t,s)J(s,x(s))ds

= -lot G(t,s)J(s,x(s))ds+ jt G(t,s)j(s,x(s))ds


We intend to differentiate in this equation, and the reader should recall the
general rule:
d
dt
it a h(s)ds = h(t)

Then the chain rule gives us

d
-
dt a
l k (t)
h(s)ds = h(k(t))k'(t)
216 Char ter 4 Basic Approximate Methods in Analysis

Now, for y'(t) we have

y'(t) = -G(t,t)f(t,x(t)) -1t Gt(t,s)f(s,x(s)) ds

+ G(t,t)f(t,x(t)) + /t Gt(t,s)f(s,x(s))ds

= 1t sf(s,x(s)) ds + /t(l_ s)f(s,x(s)) ds


A second differentiation yields
(7) y"(t) = tf(t,x(t)) + (1- t)f(t,x(t)) = f(t,x(t))
If x is a solution of the integral equation, then y = x, and our calculation (7)
shows that x" = y" = f(t,x). Since G(t, s) = 0 on the boundary of the square,
y(O) = y(l) = o. Hence x(O) = x(l) = O. This proves half of the theorem.
Now suppose that x is a solution of the boundary-value problem. The above
calculation (7) shows that
y"(t) = f(t,x(t)) = x"(t)
It follows that the two functions x and y can differ only by a linear function of
t (because x" - y" = 0). Since x(t) and y(t) take the same values at t = 0 and
t = 1, we conclude that x = y. Thus x solves the integral equation. •

Theorem 2. Let f(s, t) be continuous in the domain defined by the


inequalities 0 ~ s ~ 1, -00 < t < 00. Assume also that f satisfies a
Lipschitz condition in this domain:
(8) If(s, td - J(s, t2)1 ~ klt1 - t21 (k < 8)
Then the integral equation (6) has a unique solution in C[O, 1].

Proof. Consider the nonlinear mapping F : C[O, 1] --+ C[O, 1] defined by

(Fx)(t) = -1 1
G(t, s)J(s,x(s)) ds x E C[O, 1]
We shall prove that F is a contraction. We have

i(Fu)(t) - (Fv)(t)i ~11 G(t, s)iJ(s,u(s)) - f(s, v(s))i ds

~ 11 k G(t,s)iu(s) - v(s)i ds

~ kllu - vll oo 11 G(t, s) ds

= (k/8)llu - vll oo
It follows that
IIFu - Fvll
oo ~ (k/8)llu - vll
oo
and that F is a contraction. Now apply Banach's Theorem, page 177, taking
note of the fact that C[O, 1], with the supremum norm, is complete. •
Section 4.7 Collocation Methods 217

Corollary. If the function f satisfies the hypotheses of Theorem 2,


then the boundary-value problem (1) has a unique solution in e[O, 1].

Example 1. Consider the two-point boundary-value problem

xl/(t) = ~ exp { ~(t + 1) cos [x(t) + 7 - 3t]} -1 ~ t ~ 1


(9) {
x(-I) = -10 x(l) =-4

Our existence theorem does not apply to this directly, and some changes of
variables are called for. We set

z(t) = x(t) - 3t + 7

and find that z should solve this problem:

zl/(t) = ~expn(t+ l)cosz(t)}


(10) {
z(-I) = z(+I) = 0

Next we set
t = -1 + 2s y(s) = z(t)
and find that y should solve this problem:

yl/(s) = 2exp{ scosy(s)}


(11) {
y(O) = y(l) = 0

To this problem we can apply the preceding corollary. The function f(s,r) =
2escosr satisfies a Lipschitz condition, as we see by applying the mean value
theorem:
If(s, rl) - f(s, r2) = I~~ (s,r3)1 h - r21
The derivative here is bounded as follows

12eSCosr( -ssin r)1 ~ 2e ~ 5.436

Since the Lipschitz constant 2e is less than 8, the boundary-value problem (11)
has a solution y. Hence (9) has a solution x, and it is given by

x(t) = y((t + 1)/2) + 3t - 7



For the practical solution of boundary-value problems, one usually relies
on numerical methods (some of which have already been discussed) such as
discretization, Galerkin's method, and collocation. For the problem considered
above, namely

xl/=f(t,X) O~t~1
(12) {
x(O) = x(1) = 0
218 Chapter 4 Basic Approximate Methods in Analysis

there is now an additional method of proceeding. One can set up an equivalent


integral equation,

(13) x(t) = 11 -G(t,s)f(s,x(s))ds

and solve it instead. If we discretize both problems (12) and (13) in a certain
uniform way, the two new problems will be equivalent, a result to which we now
turn our attention.
The standard discretization of the boundary value problem (12) is done by
introducing a formula for numerical differentiation, as in Section 4.1. For the
integral equation, we require a formula for numerical integration, and choose for
this purpose a simple Riemann sum. Thus the discretized problems are

{
Yi-1 - 2Yi + Yi+1 = h 2 f(t i, Yi)
(14)
Yo = Yn+1 = 0
n
(15) Yi = -h LG(ti,tj)f(tj,Yj) O~i~n+l
j=1
In both of these we have set h = 1/(n + 1) and ti = ih. Of course, Y E IRn+2.
Notice that we have used the fact that G vanishes on the boundary of the square.

Theorem 3. Problems (14) and (15) are equivalent.

Proof. We proceed as in the proof of Theorem 1, which concerns the "undis-


cretized" problems. The role of the second derivative is now played by a set of
linear functionals Li defined on IR n +2 by the equation
(16)
Here Z is an arbitrary element of IR n+2 written in the form Z = (zo, Z1, ... , Zn+ 1).
Now let (Yo, . .. , Yn+1) be arbitrary, and let Z be defined by
n
(17) Zi = -h LG(ti,tj)f(tj,Yj) O~i~n+l
j=1
We assert that LiZ = f(ti, Yi) for i = 1,2, ... , n. In order to prove this, apply
Li to z, using the linearity of L i . The result is
n
(18) LiZ = -h Lf(tj,Yj)LiG(.,tj )
j=1
Now G(t,s) is a linear function of t in each of the two intervals 0 ~ t ~ sand
s ~ t ~ 1. Thus LiG(·, tj) = 0 unless i = j. In the case i = j we have
LiG(·, ti) = h- 2 [G(ti-1' ti) - 2G(ti' ti) + G(ti+1' t i )]
= h- 2 [ti-1(1- t i ) - 2ti(l- ti) + ti(l- ti+d]
= h- 2 [(1- ti)(ti-1 - t i ) + t i (1- ti+1 - 1 + ti)]
= h- 2 [-h(l- t i ) - hti] = _h- 1
Section 4.7 Collocation Methods 219

Thus from Equation (18) we have, as asserted,

Liz = -h!(ti,Yi)(-h- 1 ) = !(ti,Yi)

Now suppose that (Yo, . .. ,Yn+l) solves the equations in (15). Then Zi = Yi
for 0 ~ i ~ n + 1. Consequently, Liy = LiZ = !(ti , y;). Since Zo = Zn+l = 0,
from (17), we have also Yo = Yn+l = O. Thus Y solves the equations in (14).
Conversely, if Y solves the equations in (14), then Liy = !(t i , Yi) = LiZ.
Since the second divided differences of Y and Z are equal, these two vectors can
differ only by an arithmetic progression. But Yn+l = Zn+l and Yo = zo, so the
vectors are in fact identical. Thus Y satisfies the equations in (15). •
Now reconsider the integral equation (13), which is equivalent to the
boundary-value problem (12). One advantage of the integral equation is that
many different numerical quadrature formulas can be applied to it. The most ac-
curate of these formulas do not employ equally spaced nodes. The idea of using
unequally spaced points in the discretized problem of (14) would not normally
be entertained, as that would only complicate matters without producing any
obvious advantage in precision. The quadrature formulas of maximal accuracy
are well known, however, and are certainly to be recommended in the numerical
solution of integral equations, in spite of their involving unequally spaced nodes.
A quadrature formula of the type needed here will have the form

(19)

in which 9 E C[O, 1], 0 ~ Sj ~ 1, and the Aj are coefficients, often called


weights. The Riemann sums employed in Equation (15) are obviously obtained
by a formula of the type in (19). However, there are other formulas that are
markedly superior. The result of using formula (19) to discretize the integral
equation (13) is

Notice that this equation can be used used in a practical way in functional
iteration. We can start with any Yo in e[O, 1] and define inductively
n
Ym+l(t) = - 'L,AjG(t,sj)!(Sj,Ym(Sj))
j=l

The right-hand side of this equation is certainly computable. It is a linear


combination of sections GSj. One can also use collocation at the points Si to
proceed. This will lead to
n
(20) Yi = - 'L,AjG(si,Sj)!(Sj,Yj) (1 ~ i ~ n)
j=l

In this equation, Yi is an approximation to x( Si)' In general, Equation (20) will


represent a system of n nonlinear equations in the n unknowns (Yl, ... , Yn). The
220 Chapter 4 Basic Approximate Methods in Analysis

solution of such a system may be a difficult matter, but for the moment we shall
suppose that a solution Y = (Y1,"" Yn) has been obtained. Let us use x to
denote the solution function for the integral equation (13). It is to be hoped
that !Yi - X(Si)! will be small. Here the nodes of the quadrature formula are
Sl, ... , Sn. Two functions that enter the theorem are

(21) u(t) = 11
o
G(t,s)J(s,x(s))ds- tAjG(t'Sj)J(Sj,x(Sj))
j=l

(22) v(t) = 11
o
G(t, s) ds - t AjG(t, Sj)
j=l
If the quadrature formula is a good one, these functions will be small in norm.
We continue to assume the Lipschitz inequality (8) on f.

Theorem 4. If k(l + 811vllo) < 8, and if the weights Aj in Equation


(19) are all positive, then for i = 1,2, ... ,n,

where A = [1 - kG + IlvIloo)]-l
Proof. Let Ei = !X(Si) - Yi! and E = maXEi' Then for each i we have

Ei = III G(Si,s)J(s,x(s))ds- tAjG(si,Sj)J(Sj'Yj)1


o j=l
n n
= IU(Si) + LAjG(Si,Sj)!(Sj,X(Sj») - LAjG(Si,Sj)!(Sj,Yj)1
j=l j=l
n
: :; Ilull oo + LAjG(Si,Sj)IJ(Sj,x(Sj)) - J(Sj,Yj)1
j=l

~ Ilull oo + kEG + I/ v l/oo)


It follows that E :::;; I/ul/ oo + kE (k + /Iv 1/00) . When this inequality is solved for E,
the result is the one stated in the theorem. •

Theorem 5. The "discretized" integral equation (20) can be solved


by the method of iteration if +kG Ilviloo)
< 1 and Aj > O. Note that
k, v, and Aj are as in Equations (8), (22), and (19).

Proof. Interpret Equation (20) as posing a fixed-point problem for a mapping


U : ]Rn -+ ]Rn whose definition is
n
(UY)i = - 'L,AjG(Si,Sj)J(sj,Yj)
j=l
Section 4.7 Collocation Methods 221

A short calculation will show that this is a contraction:


n
!(Uy - UZ)i! ~ LAjG(Si, Sj)!f(sj,Yj) - f(sj, Zj)!
j=1
n
~ k max IYj - Zj IL A"G(Si, s,,)
J ,,=1

As in the preceding proof, the sum in this last inequality has the upper bound
~+ Ilvlloo'
Hence we have


The Lipschitz condition in (8) is usually established by estimating the
partial derivative 12 == 8f(t, s)/8s and using the mean value theorem. If
1121 ~ k < 8 on the domain where 0 ~ t ~ 1 and -00 < S < 00, then we
can also use Newton's method to solve the discretized integral equation in (20).
The equations that govern the procedure can be derived in the following way.
Suppose that an approximate solution (Yl, Y2, . .. ,Yn) for system (20) is avail-
able. We seek to calculate corrections hi so that the vector (Yl + hI, ... ,Yn + h n )
will be an exact solution of (20). Thus we desire that
n
(23) Yi + hi = - L AjG(Si, Sj)f(Sj, Yj + hj)
j=1

Of course, we take just the linear terms in the Taylor expansion of the nonlinear
expression f(sj,Yj + hj) and use the resulting linear equations to solve for the
hi' These linear equations are
n
(24) Yi+hi =- LAjG(si,Sj)[f(Sj,Yj) + hjh(sj,Yj)]
j=1

(Having made this approximation, we can no longer expect the corrections to


produce the exact solution; hence the need for iteration.) When Equation (24)
is rearranged we have
n
(25) hi + LEijhj = di (1 ~ i ~ n)
j=1

in which

and
n
di = -Yi - LAjG(si,sj)f(sj,Yj)
j=1
222 Chapter 4 Basic Approximate Methods in Analysis

Equation (25) has the form (I + E)h = d. We can see that 1+ E is invertible
(nonsingular) by verifying that E 00 < 1: I I
n n
IIElloo = max L IEijl = max
• j=1
L AjG(si, sj)lh(sj, Yj)1
• j=1

If our numerical integration formula is sufficiently accurate, then will be Ilviloo


small enough to yield IIElloo
< 1, since k < 8.
This section is concluded with a few remarks about the quadrature formulas
mentioned above. Such a formula is of the type

(26) lb
a
x(t)w(t) dt ;::::: :ti=1 Aix(ti) x E C[a,b]

The function w is assumed to be positive on [a, b] and remains fixed in the


discussion. The points ti are called "nodes." They are fixed in [a, b]. The coeffi-
cients Ai are termed "weights." The formula is expected to be used on arbitrary
functions x in C[a, b].

Theorem 6. If the nodes and the function ware prescribed, then


there exists a formula of the type displayed in Equation (26) that is
exact for all polynomials of degree at most n - 1.

Proof. Recall the Lagrange interpolation operator described in Example 1 of


Section 4.4, page 193. Its formula is
n
(27) Lx = L X(ti)€i
i=1
Since L is a projection of C[a, b] onto ITn-1' we have Lx = x for all x E ITn-l>
and consequently, for such an x,

(28) 1 a
b
x(t)w(t) dt = 1Li=1
a
b n
X(ti)€i(t)W(t) dt =
n
L x(ti )
i=1
1 a
b
€i(t)W(t) dt

Example 2. If (t1,t2, t3) = (-1,0,+1) and [a,b] = [-1,1]' what is the quadra-
ture formula produced by the preceding method when w(t) = I? We follow the
prescription and begin with the functions €i:

€1(t) = (t - t2)(t - t3)(t1 - t2)-1(t1 - t3)-1 = t(t - 1)/2


€2(t) = (t - td(t - t3)(t2 - td- 1(t2 - t3)-1 = 1 - t2
€3(t) = t(t + 1)/2 (by symmetry)
Section 4.7 Collocation Methods 223

The integrals J~l Ci(t) dt are ~, ~, and ~, and the quadrature formula is therefore

(29) ill x(t) dt ~ ~x( -1) + ~x(O) + ~x(l)


The formula gives correct values for each x E IT2' by the analysis in the proof
of Theorem 6. As a matter of fact, the formula is correct on IT3 because of
symmetries in Equation (29). This formula is Simpson's Rule. •

Theorem 7. Gaussian Quadrature. For appropriate nodes and


weights, the quadrature formula (26) is exact on IT2n-l.

Proof. Define an inner product on C[a, b] by putting

(30) (x,y) = lb x(t)y(t)w(t)dt

Let p be the unique monic polynomial in ITn that is orthogonal to ITn-l' or-
thogonality being defined by the inner product (30). Let the nodes t tn be
the zeros of p. These are known to be simple zeros and lie in (a, b), although we
l,... ,
do not stop to prove this. (See [Ch], page 111.) By Theorem 6, there is a set
of weights Ai for which the quadrature formula (26) is exact on ITn-l. We now
show that it is exact on IT 2n - l . Let x E IT2n-l. By the division algorithm, we
can write x = qp + r, where q (the quotient) and r (the remainder) belong to
ITn-l. Now write

lb = lb + lbxw qpw rw

Since p ~ ITn-l and q E ITn-l the integral J qpw is zero. Since p(t i ) = 0, we
have X(ti) = r(ti). Finally, since r E TIn-I> the quadrature formula (26) is exact
for r. Putting these facts together yields

lba
x(t)w(t)dt = lb
a
r(t)w(t)dt = tAir(ti) = tAix(t i )
i=l i=l

Formulas that conform to Theorem 7 are known as Gaussian quadrature


formulas.

Theorem 8. The weights in a Gaussian quadrature formula are


positive.

Proof. Suppose that Formula (26) is exact on II 2n - l . Then it will integrate


C; exactly:
0< 1 a
b
C;(t)w(t) dt =
n
L AiC;(ti) = Aj
i=l

224 Chapter 4 Basic Approximate Methods in Analysis

Problems 4.7

1. Refer to the proof of Theorem 3 and show that if Z is a vector in IRn+2 for which LiZ = 0
(1 ~ i ~ n), then Z is an arithmetic progression.
2. Prove that if, in Equation (28), a = -b, w is even, and the nodes are symmetrically
placed about the origin, then the formula will give correct results on when n is odd.TIn
3. Prove that if the formula (26) is exact on TI2n-I' then the nodes must be the zeros of
a polynomial orthogonal to TIn-I.

4. Let (x, y) = fl x(t)y(t) dt. Verify that the polynomial p(t) = t 3- ~t is orthogonal to
TI2. Find the Gaussian quadrature formula for this case, Le., n = 3, w(t) == 1, a = -1,
b = +1.
5. Define

Verify that this improper integral converges whenever x and y are continuous functions on
the interval [-1,1]. Accepting the fact that the Chebyshev polynomial T3(t) = 4t 3 -3t is
orthogonal to TI2' find the Gaussian quadrature formula in this case. Hint: T3(COSO) =
cos 30. Use the change of variable t = cos 0 to facilitate the work.
6. Consider this 2-point boundary value problem:

x(O) =0 x(l) =1
By using Theorem 2, show that the problem has a unique solution in the space in C[O, 1].
7. Prove that the general second-order linear differential equation

ux" + vx' + wx =f
can be put into Sturm-Liouville form, assuming that u > 0, by applying an integrating
factor exp J(v - u')/u.

8. Prove that J: IG(t,s)lds =~.


9. Find the Green's function for the problem

{
X' = f(t,x)
x(O) =0
Prove that it is correct.
10. Prove that if x E C[O, 1] and if x satisfies the integral relation (6), in which f is continuous,
then x E C2[0, 1].
11. Prove that this two-point boundary value problem has no solution:

x" +x =0 x(O) =3 x(7r) = 7

12. Convert the two-point boundary value problem in Problem 11 to an equivalent homoge-
neous problem on the interval [0,1], and explain why Theorem 2 and its corollary do not
apply.
13. An integral equation of the form

x(t) + lb K(t,s)f(s,x(s))ds = v(t)


Section 4.8 Descent Methods 225

is called a Hammerstein equation. Show that it can be written in the form x+AFx = v,
where A and F are respectively a linear and a nonlinear operator defined by

(Ax)(t) = lb K(t, s)x(s) ds (F(x))(t) = f{t, x(t))

14. (Continuation) Show that the boundary-value problem

x"(t) + g{x(t)) = v(t) x(O) = x(l) = 0


is equivalent to a Hammerstein integral equation.
15. Consider the initial-value problem

x" + ux' + vx = f x(O) = ex, x'(O) = /3 (t )! 0)

Show that this is equivalent to the Volterra integral equation

x(t) + it K(t, s)x(s) ds = f(t) - /3u(t) - exv(t) - /3tv(t)

in which K(t,s) = u(t) + v(t)(t - s).


16. For what two-point boundary-value problem is this the Green's Function?

g(s,t) = min{s,t} - ~st o ~ s,t ~ 1

17. Prove that if Uo = 0 and L;u = 0 for i = 1,2, ... , then U; = ia for a suitable constant a.
(Refer to the proof of Theorem 3, page 218, for definitions.)
18. Write down the fixed-point problem that is equivalent to the boundary-value problem in
Equation (11), page 217. Take one step in the iteration, starting with Yo(t) = O. Check
your answer against ours: Yl(t) = 2[e t + (1 - e)t - 1J.
19. Consider a numerical integration formula

j b
x(t)w(t) dt ~ 2: A;x(t;)
n

a i=l

Assume that w is positive and continuous on [a, bJ. Assume also that t; are n distinct
points in [a, bJ. Prove that the formula gives correct results for "most" functions in eta, bJ.
Interpret the word "most" in terms of dimension of certain subspaces.
20. Prove that the following two-point boundary-value problem has a continuous solution
and that the solution satisfies x(t) = x(l - t):

x"(t) = sin(x - t) sin(x + t -1) x(O) = x(1) = 0 (0 < t < 1)

4.8 Descent Methods

Here, as in the Rayleigh-Ritz method, we assume that a problem confronting us


has been somehow recast as a minimization problem. Assume therefore that a
226 Chapter 4 Basic Approximate Methods in Analysis

functional <I> : K --+ IR is given, where K (the domain of <I» is a subset of some
Banach space X. Usually <I> is nonlinear. Let

(1) p = inf <I> (x )


xEK

We admit the possibility that p = -00. The objective is to find a point Xo in K


that yields
<I>(Xo) = p
It is obvious that many problems of best approximation are of this nature;
in such problems <I>(x) would be the distance between x and some element z that
was to be approximated. The domain of <I> would typically be a linear subspace
consisting of all the approximants.
Another familiar problem that can be recast as a minimization problem is
the two-point boundary value problem for a Sturm-Liouville equation. When
the Rayleigh-Ritz method is applied, we seek a minimum of the functional

In the calculus of variations, similar functionals are encountered. For ex-


ample, in the "brachistochrone" problem (page 153),

A goal somewhat more modest than finding the minimum point is to gen-
erate a minimizing sequence for <I>. That means a sequence Xl, X2, . .. in K
such that

(2)

The sequence itself mayor may not converge.

Theorem 1. A lower semicontinuous functional on a compact set


attains its infimum.

Proof. Recall that lower semi continuity of <I> means that each set of the form

K)..={XEK:<I>(X):::;A}

is closed. If A > p, then K).. is nonempty. The family of closed sets {K)..
A > p} has the finite-intersection property (Le., the intersection of any finite
subcollection is nonempty). Since the space is compact,

(see [Kel]). Any point in this intersection satisfies the inequality <I>(x) :::; p. •
Section 4.8 Descent Methods 227

The preceding theorem can be proved also for a space that is only count-
ably compact. This term signifies that any countable open cover of the space
has a finite subcover. ([Kel] page 162). A consequence is that each sequence
in the space has a cluster point. Let Xn be chosen so that <I>(x n ) < p + lin.
Let x· be a cluster point of the sequence [x n ]. Then <I>(x·) ::;; p. Indeed, if
this inequality is false, then for some m, <I>(x·) > p + 11m. Since <I> is lower
semicontinuous, the set

0= {x: <I>(x) > p+ 11m}

is a neighborhood of x·. Since x· is a cluster point, the sequence [x n ] is frequently


in O. But this is absurd, since x m , X m +1,'" lie outside of O.
A standard strategy for proving existence theorems consists of the following
three steps:
I. Formulate the existence question in terms of minimizing a functional <I> on
a set K.
II. Find a topology r for K such that K is r-compact and <I> is r-Iower-
semicontinuous.
III. Apply the preceding theorem.
(The two requirements on the topology are in opposition to each other.
The bigger the topology, the more difficult it is for K to be compact, but the
easier it is for <I> to be lower semicontinuous.) Examples of this strategy can be
given in spectral theory, approximation theory, and other fields. Here we wish to
concentrate on methods for constructing a minimizing sequence for a functional
<I>.
If <I> is a functional on a normed space X, and x EX, then the Frechet
derivative of <I> at x mayor may not exist. If it exists, it is an element <I>'(x) of
X· and has the property

(3) <I>(x + h) - <I>(x) - <I>'(x)h = o(h) (h E X)

These matters are discussed fully in Chapter 3. The linear functional <I>' (x) is
usually called the gradient of <I> at x. In the special case when X = lRn, and
x = (6,6, ... , ~n), h = (111,112, ... , 11n), it has the form
8<I>
L
n
(4) <I>'(x)h = ~(X)11i hE lRn
i=1 ~.
If <I>' (x) exists at a specific point x, then for any hEX, we have

d
(5) dt <I>(x + th)lt=o = <I>'(x)h

Indeed, by the chain rule (Section 3.2, page 121),

d
dt <I>(x + th) = <I>'(x + th)h
The left-hand side of Equation (5) is called the directional derivative of <I> at
x in the direction h. The existence of the Frechet derivative is sufficient for the
228 Chapter 4 Basic Approximate Methods in Analysis

existence of the directional derivative, but not necessary. (An example occurs
in Problem 2.) The mapping
d
h 1----+ dt <I> (x + th) It=O
is called the Gateaux derivative. (The mathematician R. Gateaux was killed
while serving as a soldier in the First World War, September 1914.)
If, among all h of norm 1 in X, there is a vector for which <I>' (x) h is a
maximum, this vector is said to point in the direction of steepest ascent. Its
negative gives the direction of steepest descent. These matters are most easily
understood when X is a Hilbert space. Suppose, then, that <I> is a functional
on a Hilbert space X and that <I>'(x) exists for some point x. By the Riesz
representation theorem for functionals on a Hilbert space, the functional <I>' (x)
is represented by a vector v in X, so that <I>'(x)h = (h, v): If Ilhll
= 1, then by
the Cauchy-Schwarz inequality,
Ilhllllvil = Ilvll
(h,v) ~ l(h,v)1 ~
We have equality here if and only if h is taken to be v/llvll.Thus the (unnor-
malized) direction of steepest ascent is v.
An iterative procedure called the method of steepest descent can now
be described. If any point x is given, the direction of steepest descent at x is
computed. Let this be v. The functional <I> is now minimized along the "ray"
consisting of points x + tv, t E JR. This is done by a familiar technique from
elementary calculus; namely, we solve for t in the equation
d
dt <I> (x + tv) = 0

If the appropriate t is denoted by t, then the process is repeated with x replaced


by x+tv.
These matters will now be illustrated by a special but important problem,
namely, the problem of solving the equation
(6) Ax= b
in which A is a positive definite self-adjoint operator on a real Hilbert space X,
and b EX. In symbols, the hypotheses on A are that
(7) (Ax, y) = (x, Ay)
(8) (Ax, x) > 0 if x f. 0
For this problem, we define the functional
(9) <I>(x) = (Ax - 2b,x)
Theorem 2. Under the hypotheses above, a point y satisfies the
equation Ay = b if and only if y is a global minimum point of <I>.

Proof. Let x be an arbitrary point and v any nonzero vector. Then


<I>(x + tv) = (Ax + tAv - 2b,x + tv)
(10) = (Ax - 2b, x) + t(Ax - 2b, v) + t(Av, x) + t 2 (Av, v)
= <I>{x) + 2t(Ax - b, v) + t2(Av, v)
Section 4.8 Descent Methods 229

The derivative of this expression, as a function of t, is

d
(11) dt <I> (x + tv) = 2(Ax - b, v) + 2t(Av, v)

The minimum of <I>(x + tv) occurs when this derivative is zero. The value of t
for which this happens is

(12) t= (b-Ax,v)(Av,v)-l
When this value is substituted in Equation (10) the result is

(13) <I>(x + tv) = <I>(x) - (b - Ax, v)2(Av, v)-l

This shows that we can cause <I>(x) to decrease by passing to the point x + tv,
except when b - Ax1-v. If b - Ax =J 0, then many directions v can be chosen for
our purpose, but if Ax = b, we cannot decrease <I>(x). •
In the problem under consideration, the directional derivative of <I> is ob-
tained by putting t = 0 in Equation (11):

(14) ddt <I>(x + tv)1 t=o = 2(Ax - b, v)

It follows that the direction of steepest descent is the residual vector r = b- Ax.
(Positive scalar factors can be ignored in specifying a direction vector.) The
algorithm for steepest descent in this problem is therefore described by these
formulas:

(15)

Since the method of steepest descent is not competitive with the conjugate
direction methods on this problem, we will not go into further detail, but simply
state without proof the following theorem. See [KA], pages 606-608.

Theorem 3. If A is self-adjoint and satisfies

inf (Ax, x) >0


IIxll=l

then the steepest-descent sequence in Equation (15) converges to the


solution of the equation Ax = b.

There is more to this theorem than meets the eye, .because the hypotheses on
A imply its invertibility, and consequently the equation Ax = b has a unique
solution for each b in the Hilbert space. See the lemma in Section 4.9, page 234,
for the appropriate formal result.
230 Chapter 4 Basic Approximate Methods in Analysis

Example. We consider the problem Ax = b when

How does the method of steepest descent perform on this example? We prefer
to let Mathematica do the work, and give it these inputs:
A={{1.,2.},{2.,5.}}
b={3.,1.}
Inverse [A]
%.b
The output is A-I = [~2 ~2] and the solution, x = (13, -5f. Next, we
program Mathematica to compute 10 steps of steepest descent, starting at x =
(0,0). The following input accomplishes this.
x={O. ,O.}
Do [r=b-A.xiPrint [r] iphi=-x.(r+b)iPrint[phi] i
t=(r.r)/(r.A.r)iy=x+t rix=YiPrint[x],{10}]
After 10 steps, the output is x = (5.7, -1.7) and <I> = -22.4587. Since the
solution is x· = (13, -5) and <I> = -34, the algorithm works very slowly. Of
course, with some starting points, the solution will be obtained in one step.
Such starting points are x* + SV, for any eigenvector v. Here are Mathematica
commands to compute eigenvectors of A:
A={{1.,2.},{2.,5.}}
Eigenvectors [N[A]]
If we start the steepest descent process at a remote point such as x· + 100v, the
first step (carried out numerically) gives a point very close to x·. The contours
(level sets) of <I> for this example are shown in Figure 4.1.
o

-2

-4

-6

-8

o 5 10 15 20
Figure 4.1
Section 4.8 Descent Methods 231

Problems 4.8

1. Refer to Equation (13) and discuss the problem of determining v so as to maximize

The solution should be, of course, v = Xo - x (or a multiple of it), where Xo = A -1 b.


2. Define f : IR2 -+ IR by putting f(x, y) = 0 if x = 0 and f(x, y) = xy2(x 2+y4)-1 otherwise.
Prove that f has a Gateaux derivative at 0 in every direction, but that f'(0) does not
exist. Show, in fact, that f is discontinuous at O.
3. Denote by C2 the linear space of all continuous real-valued functions on [0,1] with
L2-norm. Prove that point-evaluation functionals are discontinuous on C2. A point-
evaluation functional is of the form t* for some t E [0,1]' where t*(x) = x(t) for all
x E C2. Does t* have a directional derivative?
4. What is the direction of steepest descent for the function

<I>(x) = ~i + sin(66) + exp6


at the point (1,2, 3)? Does this function attain its minimum on IR3?
5. Let A be a bounded linear operator on a real Hilbert space X. How does the functional

behave on a ray x + tv? Where is the minimum point on this ray, and what is the
minimum value of <I> on this ray? What is the direction of steepest descent? What are
the answers if A is self-adjoint?
6. A functional <I> is said to be convex if the condition 0 < A < 1 implies
<I>(AX + (1 - A)y) ~ A<I>(X) + (1 - A)<I>(y)

for any two points x and y. Is the functional x >-t (Ax-2b, x) convex when A is Hermitian
and positive definite?
7. Let A be any bounded linear operator on a Hilbert space, and let H be a positive definite
Hermitian operator. Put
<I>(x) = (b - Ax, H(b - Ax»
Discuss methods for solving Ax = b based upon the minimization of <I>. Investigate the
equivalence of the two problems, give the Gateaux derivative of <I>, and derive the formula
for steepest descent. In the latter, the method of Lagrange multipliers would be helpful.
Determine the amount by which <I>(x) decreases in each step.
8. What happens to the theory if the coefficient 2 is replaced by 1 in Equation (9)?
9. Prove that when the method of steepest descent is applied to the problem Ax = b the
minimum value of <I> is -(x, b), where x is the solution of the problem.
10. Let the method of steepest descent be applied to solve the equation Ax = b, as described
in the text. Show that

Xn+2 = Xn + (tn + tn+llrn - tntn+1Arn


rn+1 = (I - tnA)rn
11. Prove that if v is an eigenvector of A and ifAxo = b, then the method of steepest descent
will produce the solution in one step if started at Xo + sv.
12. Let A be a bounded linear operator on a complex Hilbert space. We assume that A is
self-adjoint and positive definite. Prove that ifAxo = b, then Xo minimizes the functional

<I>(x) = (Ax, x) - (x, b) - (b, x)


232 Chapter 4 Basic Approximate Methods in Analysis

13. Let f : IR -t IR be continuous. Let g(t) = f(t) everywhere except at t = 0, where we


define g(O) = f(O) - 1. Is glower semicontinuous? Generalize.
14. Prove that the method of steepest descent, as given in Equation (15), has this property:

Show that if b is not in the closure of the range of A, then <I>(xn) -t -00.

15. Prove that if infllxll=l (Ax, x) = m > 0, then the method of steepest descent (described
in Equation (15» has this property:

16. In the method of steepest descent, we expect successive direction vectors to be orthog-
onal to each other. Why? Prove that this actually occurs in the example described by
Equation (15).
17. In the method of steepest descent applied to the equation Ax = b, explain how it is
possible for <I> to be bounded below on each line yet not bounded below on the whole
Hilbert space.
18. Use the definition of Gateaux derivative given on page 228 or in Problem 3.1.21 (page
120) to verify that Equation (5) gives the Gateaux derivative of <I>.

4.9 Conjugate Direction Methods

In this section we continue our study of algorithms for solving the equation

(1) Ax= b

assuming throughout that A is an operator on a real Hilbert space X. Later, the


application of these methods to general optimization problems will be considered.
Recall (from Theorem 2 in the preceding section, page 228) that when A is
self-adjoint and positive definite, solving Equation (1) is equivalent to minimizing
the functional

(2) <I>(x) = (Ax - 2b,x)

A general descent algorithm goes as follows. At the nth step, a vector Xn is avail-
able from prior computations. By means of some strategy, a "search direction"
is determined. This is a vector V n . Then we let

(3)

Formula (3) ensures that <I>(Xn+l) will be as small as possible when Xn+l is
restricted to the ray Xn + tvn .
In this algorithm, considerable freedom is present in choosing the search
direction Vn . For example, in the method of steepest descent, Vn = b - Axn .
We shall discuss an alternative that has many advantages over steepest descent.
One advantage is that the idea of searching for a minimum value of a functional
Section 4.9 Conjugate Direction Methods 233

is abandoned, and we retain only the algorithm in Equations (3). The operator
(or matrix in the finite-dimensional case) need not be self-adjoint or positive
definite. Finally, the direction vectors vn are subject to weaker hypotheses.
First some definitions are needed. For an operator A, a sequence of vectors
VI, V2, ... in X is said to be A-orthogonal if

(4)
This new concept reduces to the familiar type of orthogonality if A is the identity
operator. The descent algorithm (3) is called a conjugate direction method
if the search directions VI, V2,'" are nonzero and form an A-orthogonal set.
A slightly stronger hypothesis is that our set of vectors Vi is A-orthonormal,
meaning that the condition (vi,Avj) = bij is fulfilled. The formula for On in
Equation (3) is then simpler.

Theorem 1. In the conjugate direction algorithm (3), using an


A-orthonormal set of direction vectors, each residual rn = b - AXn is
orthogonal to all the previous search directions VI, ... , V n -1.

Proof. Let rn = b - Ax n . We wish to prove that

(5) (n = 2,3, ... )

First we observe that

Consequently, by the definition of 01, we have

Now assume that Equation (5) is true for a certain index n. In order to prove
Equation (5) for n + 1, let 1 ::s; i ::s; n and use Equation (6) to write

(8)

For i < n, both terms on the right side of Equation (8) are zero. For i = n the
definition of On shows that the right side is zero, as in Equation (7). •

Corollary. Let A be an m x m matrix. Let {VI, . .. ,vm } be an A-


orthonormal set of vectors. Then the conjugate direction algorithm (3)
produces a solution to the problem in (1) no later than the (m + l)st
step. Thus AXm+1 = b.

Proof. By the preceding theorem, rm +1 .L {VI, ... , V m }. But this set of m


vectors is linearly independent, because

Hence rm+1 = O. Thus b - AXm+1 = O.



234 Chapter 4 Basic Approximate Methods in Analysis

Lemma. If A is an Hermitian operator on a Hilbert space and


satisfies the inequality

(9) IIAxl1 ~ mllxll (m > 0)


then A is invertible, and IIA-lil ~ 11m.

Proof. (The techniques of this proof were used previously, in Theorem 3 of


Section 4.5, page 201.) Recall from Theorem 3 of Section 2.3 (page 84) that the
Hermitian property implies continuity and self-adjointness. In order to prove
that the range of A, R(A), is closed, let [Ynl be a convergent sequence in R(A).
Write Yn = AXn and Yn ---+ y. Then [Ynl has the Cauchy property. By using (9)
we get

This shows that [xnl is a Cauchy sequence. Hence Xn ---+ x for some x. By the
continuity of A, Yn = AX n ---+ Ax, and Y = Ax E R(A).
Next we observe that R(A)l. = O. Indeed, if Y E R(A)l. then for every x,

(Ay, x) = (y, Ax) = 0


This implies that Ay = 0, and then Y = 0 by Inequality (9). Since R(A) is
closed and R(A)l. = 0, we infer that R(A) = X. Since the null space of A is 0,
A-I exists as a (possibly unbounded) operator. But if Y = Ax, then

IIYII = IIAxl1 ~ mllxll = mlIA-lYII


whence IIA-lyll ~ ~IIYII. •
Theorem 2. Let A be a self-adjoint operator on a Hilbert space X.
Assume that

(10) (m > 0)

Let VI, V2, . .. be an A-orthonormal sequence whose linear span is dense


in X. Then the conjugate direction algorithm

(11)

produces a sequence that converges to A-I b from any starting point


Xl·

Proof. (After [Lue2]) Putting On = (v n , b - Ax n ), we have, from Equation


(11),
X2 = Xl + 0lVl
X3 = X2 + 02V2 = Xl + 0lVl + 02 V2
and so on. Thus, in general,

(12)
Section 4.9 Conjugate Direction Methods 235

From Equation (12) and the A-orthogonal property,

(13)

From the definition of Q n and Equation (13), we get

Qn = (v n , b - AXn) = (vn' b - AXI - AX n + AXl)


= (v n , b - AXl) = (vn' A(A-lb - xd)

This shows that the right side of Equation (12) represents the partial sum of the
Fourier series of A-I b - Xl, if we use for this expansion the inner product

[X, y] = (x, Ay)

These two inner products lead to the same topology on X because of Equation
(10). Hence Xn - Xl -t A-lb - Xl. •

In the conjugate direction algorithm, there is still some freedom in the choice
of the direction vectors Vi. In the conjugate gradient method, these vectors
are generated in such a way that (for each n) Xn minimizes <I> on a certain
linear variety of dimension n - 1. The conjugate gradient algorithm appears in
a number of different versions. For a theoretical analysis of the method, this
version seems to be the best:
I. To start, let Xl be arbitrary, and define VI = b - AXI.
II. Given Xn and Vn , we set

(14) Xn+l = Xn + QnVn Q n = (b - Ax n , vn)(vn , AVn)-l


(15) Vn+l = b - AXn+l - f3nvn f3n = (b - AXn+l' Avn ) (Vn, AVn)-l

Theorem 3. Let A be a self-adjoint operator on a Hilbert space.


Assume that for some positive m and M,

(16)

Then the sequence [xnl generated by the conjugate gradient algorithm


converges to A-I b.

Proof. Throughout the proof, the nth residual is defined to be

(17) n= 1,2,3, ...

It follows that

First we establish two orthogonality relations:

(19) (rn+l' vn)=O n=1,2, .. .


(20) (Vn+l, Avn) = 0 n=I,2, .. .
236 Chapter 4 Basic Approximate Methods in Analysis

These are consequences of formulas (18), (14), and (15), as follows:

(rn+l, Vn ) = (rn - anAvn , vn ) = (rn, v n ) - an (Av n , vn )


= (rn, v n ) - (rn, v n ) = 0
(Vn+l, Avn ) = (rn+l - .Bnvn, Avn) = (rn+l, Avn) - .Bn (v n , Avn )
= (rn+l, Avn) - (rn+l, Avn) = 0

Define a sequence en by the equation

From this we have

(21)

Using Equation (18) we can express en+l as follows:

en+l = (rn+l, A-1rn+l)


= (rn - anAvn,A-1(r n - anAvn )}
= (rn - anAvn,A-1rn - anvn )
= (rn, A -lrn ) - an (Av n , A -lrn ) - an (rn, vn ) + a;' (Av n , v n )
= en - an (vn , rn) - an (v n , rn) + an (v n , rn)
=en -an(vn,rn}
= en [l -an(vn,rn}/en]

In order to show that en converges geometrically to zero, it suffices to prove that


the bracketed expression in the previous equation is less than 1- m/M. We will
prove two inequalities that accomplish this objective, namely

(22) an ~ l/M
(23) (vn,rn}/en ~ m

From Equations (15) and (19) we have

(24)

Equation (21) and the Cauchy-Schwarz inequality imply that

This leads to mllA-lrnll ~ Ilrnll. Inequality (23) now follows from


men = m(rn,A-1rn) ~ mllrnllllA-1rnll ~ 11rn11 2 = (rn, rn) = (vn, rn)

To prove Inequality (22), we start with Equation (15), written in the form

rn = Vn + .Bn-lvn-l
Section 4.10 Methods Based on Homotopy and Continuation 237

From this we conclude that

Since (vn-l,Av n ) = (AVn-l,V n ) = 0 by Equation (20), we obtain

Thus, using (16) and (24) we have

Hence
an = (rn,vn)(vn,Aun)-l ~ 11M
At this stage, we have established that

Consequently, en --+ o. From Inequality (21) we conclude that A-1rn --+ 0, or


A-1b - Xn --+ o. •
Problems 4.9

1. Let A be an m X m matrix that is symmetric and positive definite. Let U be an m x m


matrix whose columns form an A-orthonormal set. Prove that U T AU = I.
2. Let A be an m xm symmetric matrix such that (x, Ax) f. 0 when x f. O. Let {Ul, ... , Um}
be a basis for lRm. Define VI = Ul and

(k=1,2, ... ,m-l)

Prove that {VI, V2, ... , v m } is an A-orthogonal basis for lRm.


3. Show that if A is an m X m symmetric positive definite matrix and if {VI, ... ,vm} is an
A-orthonormal set, then the solution of Ax = b is x = 2::::1
(b, Vi)Vi.

4.10 Methods Based on Homotopy and Continuation

In this section we address the problem of finding the roots of an equation


or the zeros of a mapping

(1) f(x) = 0

Here f can be a mapping from one Banach space to another, say f : X --+ Y.
This problem is so general that it includes systems of algebraic equations, integral
equations, differential equations, and so on. We will describe a tactic called the
238 Chapter 4 Basic Approximate Methods in Analysis

continuation method for attacking this problem. The discussion is adapted


from that in [KC].
The fundamental idea of the continuation method is to embed the given
problem in a one-parameter family of problems, using a parameter t that runs
over the interval [O,lJ. The original problem will correspond to t = 1, and
another problem whose solution is known will correspond to t = o. For example,
we can define

(2) h(t,x) = tf(x) + (1- t)g(x)


The equation g(x) = 0 should have a known solution. The next step is to select
points to, t 1 , . .. such that

0= to < tl < t2 < ... < tm = 1

One then attempts to solve each equation h( ti, x) = 0, (0 ,,;; i ,,;; m). Assuming
that some iterative method will be used (such as Newton's method), it makes
sense to use the solution at the ith step as the starting point in computing a
solution at the (i + 1)st step. .
This whole procedure is designed to cure the difficulty that plagues Newton's
method, viz., the need for a good starting point.
The relationship (2), which embeds the original problem (1) in a family of
problems, is an example of a homotopy that connects the two functions f and
g. In general, a homotopy can be any continuous connection between f and g.
Formally, a homotopy between two functions f, g : X -+ Y is a continuous map

(3) h : [0, 1J x X -+ Y

such that h(O, x) = g(x) and h(l, x) = f(x). If such a map exists, we say that f
is homotopic to g. This is an equivalence relation among the continuous maps
from X to Y, where X and Y can be any two topological spaces.
An elementary homotopy that is often used in the continuation method is

h(t,x) = tf(x)+ (1- t)[f(x) - f(xo)]


(4)
= f(x) + (t - l)f(xo)
Here Xo can be any point in X, and it is clear that Xo will be a solution of the
problem when t = O.
If the equation h(t,x) = 0 has a unique root for each t E [0,1]' then that
root is a function of t, and we can write x(t) as the unique member of X that
makes the equation h(t,x(t)) = 0 true. The set

(5) { x( t) : 0";; t ,,;; 1 }

can be interpreted as an arc or curve in X, parametrized by t. This arc starts


at the known point x(O) and proceeds to the solution of our problem, x(l).
The continuation method determines this curve by computing points on it,
x(to),x(t 1 ), ... ,x(tm).
Section 4.10 Methods Based on Homotopy and Continuation 239

If the function t H x(t) is differentiable and if h is differentiable, then the


Implicit Function Theorems of Section 3.4, pages 136ff, enable us to compute
x'(t). By following this idea, we can describe the curve in Equation (5) by a
differential equation. Assuming an arbitrary homotopy, we have

(6) O=h(t,x(t))

Differentiating with respect to t, we obtain

(7) 0= hl(t,X(t)) + h2(t,X(t))x'(t)


in which subscripts denote partial derivatives. Thus

(8)

This is a differential equation for x. Its initial value is known, because x(O) has
been assumed to be known. Upon integrating this differential equation across
the interval 0 ~ t ~ 1 (usually by numerical procedures), one reaches the value
x(I), which is the solution to Equation (1).
Example 1. Let X = Y = ]R2, and define

[
sin 6 + e~2 - 3 ]
f(x) = (6 + 3)2 - 6 - 4

A convenient homotopy is defined by Equation (4), and we select the starting


point Xo = (5,3). The derivatives on the right side of Equation (8) are computed
to be

h2 = f , (x) =
[COS 6
-1

where a = sin 5 + e 3 - 3 and b = 27. The inverse of J'(x) is

'x
[J()]
_l_-.!..[26 1+6
-A
-e~2]
C
U COS"l

The differential equation that controls the path leading away from the point Xo is
Equation (8). In this concrete case it is a pair of ordinary differential equations:

When this system was integrated numerically on the interval 0 ~ t :::; 1, the
terminal value of x (at t = 1) was close to (12, 1). In order to find a more
240 CImpter 4 Basic Approximate Methods in Analysis

accurate solution, we can use Newton's iteration starting at the point produced
by the homotopy method. The Newton iteration replaces any approximate root
x by x - 15, the correction 15 being defined by

(These matters are the subject of Section 3.3, beginning at page 125.) In the
current example, the vector 15 is

Five steps of the Newton iteration produced these results:

6 6
k=O 12.000000000000000000 1.0000000000000000000
k=1 12.691334908752890571 1.0864168635941113213
k=2 12.628177397290770959 1.0777753827891591357
k=3 12.628268254380085321 1.0777773669468545670
k=4 12.628268254564651450 1.0777773669690025700
k=5 12.628268254564651450 1.0777773669690025700

The curve {x(t) : 0::::; t ::::; 1} is shown in Figure 4.2



3 ____ ~ !

2. S -------- -------- --------i -~\


2r---r----T---~---~r-~H

1. Sl----/------+-----+------+----+-l/

6 8 10 12
Figure 4.2
In an example such as this one, the differential equation need not be solved
numerically with high precision, because the objective is to end at a point near
the solution-in fact, near enough so that the classical Newton method will
succeed if started at that point.
A formal result that gives some conditions under which the homotopy
method will succeed is as follows. This result is from [OR].

Theorem. If f : IRn --+ IR n is continuously differentiable and if


IIf'(x)-lll is bounded on IRn , then for any Xo E IRn there is a unique
curve {x(t) : 0 ::::; t ::::; 1} in IR n such that f(x(t» + (t - 1)f(xo) = 0,
Section 4.10 Methods Based on Homotopy and Continuation 241

o~ t ~ 1. The function t H x(t) is a continuously-differentiable


solution of the initial value problem x' = - f'(X)-l f(xo), x(O) = Xo.

Another way of describing the path x(t) has been given by Garcia and Zangwill
[GZ]. We start with the equation h(t,x) = 0, assuming now that x E lR,n and
t E [0,1]. A vector y E lR,n+l is defined by

where 6,6, . .. are the components of x. Thus our equation is simply h(y) =
O. Each component of y, including t, is now allowed to be a function of an
independent variable s, and we write h(y( s)) = O. Differentiation with respect
to s leads to the basic differential equation

(9) h'(y)y'(s) = 0

The variables sand t start at O. The initial value of x is x(O) = Xo. Thus
suitable starting values are available for the differential equation (9).
Since f and 9 are maps of lR,n into lR,n, h is a map of lR,n+l into lR,n. The
Fnkhet derivative h'(y) is therefore represented by an n x (n + 1) matrix, A.
The vector y'(s) has n + 1 components, which we denote by 1J~, 1J~, .. . , 1J~+1' By
appealing to the lemma below, we can obtain another form for Equation (9),
namely

(10) (1 ~ j ~ n + 1)
where Aj is the n x n matrix that results from A by deleting its jth column.
Let us illustrate this formalism with a problem similar to the one in Example 1.
Example 2. Let f be the mapping
x _ [~r 3~i +
f()- 66+6
- 3]
We take the starting point Xo = (1,1) and use the homotopy of Equation (4).
Then
h(t,x) =
- 3~~ + 2 + t]
[~r66 -1 +7t
The differential equation (9) is given by

(11)

It is preferable to use Equation (10), however, and to write the differential


equations in the form

t' = -(2~r + 6~~) t(O) = 0


(12) { ~i = 6 +426 ~l(O) = 1
~~ = -(6 - 146) 6(0) = 1
242 Chapter 4 Basic Approximate Methods in Analysis

The derivatives in this system are with respect to s. Since we want t to run from
o to 1, it is clear (from the equation governing t) that we must let s proceed
to the left. Alternatively, we can appeal to the homogeneity in the system, and
simply change the signs on the right side of (12). Following the latter course,
and performing a numerical integration, we arrive at these two points:
s = .087 , t = .969, ~1 = -2.94 , 6 = 1.97
s = .088 , t = 1.010 , 6 = -3.02, 6 = 2.01

Either of these can be used to start a Newton iteration, as was done in Example
1. The path generated by this homotopy is shown in Figure 4.3. •
2.2

1.8

1.6

1.4

1.2

-3 -2 -1

Figure 4.3

A drawback to the method used in Example 2 is that one has no a priori knowl-
edge of the value of s corresponding to t = 1. In practice, this may necessitate
several computer runs.

Lemma. Let A be an n x (n + 1) matrix. A solution of the equation


Ax = 0 is given by Xj = (-I)jdet(Aj), where Aj is the matrix A
without its column j.

Proof. Select any row (for example the ith row) in A and adjoin a copy of it
as a new row at the top of A. This creates an (n + 1) x (n + 1) matrix B that
is obviously singular, because row i of A occurs twice in B. In expanding the
determinant of B by the elements in its top row we obtain
n+1 n+1
0= detB = ~)-I)jaij det(A j ) = L aijXj
j=1 j=1

Since this is true for i = 1,2, ... ,n, we have Ax = O.


The connection between the homotopy methods and Newton's method is

deeper than may be seen at first glance. Let us start with the homotopy

h(t,x) = f(x) - e-tf(xo)

In this equation t will run from 0 to 00. We seek a curve or path, x = x(t), on
which
0= h(t,x(t)) = f(x(t)) - e-tf(xo)
Section 4.10 Methods Based on Homotopy and Continuation 243

As usual, differentiation with respect to t will lead to a differential equation


describing the path:
(13) 0 = f'(x(t))x'(t) + e- t f(xo) = f'(x(t))x'(t) + f(x(t))

(14) x'(t) = - f'(x(t)r l f(x(t))


If this differential equation is integrated using Euler's method and step size 1,
the result is the formula

This is, of course, the formula for Newton's method. It is clear that one can
expect to obtain better results by solving the differential equation (14) with
a more accurate numerical method (incorporating a variable step size). These
matters have been thoroughly explored by Smale and others. See, for example,
[Sm].
Application to Linear Programming. The homotopy method can be
used to solve linear programming problems. This approach leads naturally to
the algorithm proposed in 1984 by Karmarkar [Kar]. In explaining the homotopy
method in this context, we follow closely the description in [BroS].
Consider the standard linear programming problem
maximize cT x
(15) {
subject to Ax = b and x ;;:: 0
Here, c E ]Rn, x E ]Rn, b E ]Rm, and A is an m x n matrix. We start with a
feasible point, i.e., a point X O that satisfies the constraints. The feasible set
is
F = { x E]Rn : Ax = b and x;;:: 0 }
Our intention is to move from xO to a succession of other points, remaining
always in F, and increasing the value of the objective function, cT x. It is
clear that if we move from XO to Xl, the difference Xl - x O must lie in the null
space of A. We shall try to find a curve t H x(t) in the feasible set, starting at
X O and leading to a solution of the extremal problem. Our requirements are

(i) x(t) ;;:: 0 for t ;;:: 0


(ii) Ax(t) = b for t ;;:: 0
(iii) cT x(t) is increasing for t ;;:: O.
The curve will be defined by an initial-value problem:
(16) x' = F(x) x(O) = xO
The task facing us is to determine a suitable F. In order to satisfy condition (i),
we shall arrange that whenever a component Xi approaches 0, its velocity x~(t)
shall also approach O. This can be accomplished by letting D(x) be the diagonal
matrix

D(x) = [
Xl X2 . 0]
o Xn
244 Chapter 4 Basic Approximate Methods in Analysis

and assuming that for some bounded function C,

(17) F(x) = D(x)C(x)

If this is the case, then from Equations (15) and (17) we shall have

x; = XiCi(X)
and clearly x~ -+ 0 if Xi -+ O.
In order to satisfy requirement (ii), it suffices to require Ax' = O. Indeed,
if Ax' = 0 then Ax(t) is constant as a function of t. Since Ax(O) = b, we have
Ax(t) = b for all t. Since x' = F = DC, we must require ADC = O. This is
most conveniently arranged by letting C = PH, where H is any function, and
P is the orthogonal projection onto the null space of AD.
Finally, in order to secure property (iii), we should select H so that cT x(t)
is increasing. Thus, we want

d
0< dt (c T x(t)) = cT x' = cT F(x) = cT DC = cT DPH
A convenient choice for H is Dc, for then we have, (using v = Dc),

cTDPH = cTDPDc = vTpv = (v,Pv)


= (v - Pv + Pv,Pv) = (Pv,Pv) ~ 0

Notice that v - Pv is orthogonal to the range of P, and (v - Pv, Pv) = O.


The final version of our initial-value problem is

(18) x' = D(x)P(x)D(x)c x(O) = xO

The theoretical formula for P is

(19) P = 1- (ADf [(AD)(AD)T] -1 AD


The validity of this depends upon B == AD having full rank, so that BBT will
be nonsingular. This will, in turn, require Xi > 0 for each component. Thus the
points x(t) should remain in the interior of the set

{x: x~O}

In particular, xO should be so chosen. In practice, Pv is computed not by (19)


but by solving the equation BBT z = Bv and noting that

Pv = v- BTz

As mentioned earlier, the initial-value problem (18) need not be solved very
accurately. A variation of the Euler Method can be used. Recall that the Euler
Method for Equation (16) advances the solution by

x(t + 6) = x(t) + 6x'(t) = x(t) + 6F(x)


Section 4.10 Methods Based on Homotopy and Continuation 245

Using this type of formula, we generate a sequence of vectors XO, Xl , . .. by the


equation
Xk+l = xk + JkF(xk)
Although it is tempting to take 15k as large as possible subject to the requirement
X k+ l E F, that will lead to a point Xk+l having at least one zero component.
As pointed out previously, that will introduce other difficulties. What seems to
work well in practice is to take 15k approximately 9/10 of the maximum possible
step. This maximum step is easily computed; it is the maximum A for which
X k + l ~ O. (The constraint Ax = b is maintained automatically.)

Problems 4.10

1. Solve the system of equations

x - 2y + y2 + y3 - 4 = -x - y + 2y2 - 1 = 0
by the homotopy method used in Example 2, starting with the point (0,0). (All the
calculations can be performed without recourse to numerical methods.)
2. Consider the homotopy h(t,x) = tf(x) + (1- t)g(x), in which
f(x) = x2 - 5x + 6 g(x)=x 2 -1

Show that there is no path connecting a zero of g to a zero of f.


3. Let y = y( s) be a differentiable function from IR to IR n satisfying the differential equation
(9). Assume that h(y(O)) = O. Prove that h(y(s)) = O.
4. If the homotopy method of Example 2 is to be used on the system

sinx + cosy + e XY = tan- 1 (x + y) - xy =0


starting at (0,0), what is the system of differential equations that will govern the path?
5. Prove 'that homotopy is an equivalence relation among the continuous maps from one
topological space to another.
6. Are the functions f(x) = sin x and g(x) = cos x homotopic?
7. Consider these maps of [0,1] into [0,1] U [2,3]:

f(t) =0 g(t) = 2

Are they homotopic?


8. To find ,j2 we can solve the equation f(x) = x 2 - 2 = O. Let Xo = 1 and h(t,x) =
tf(x)+(1-t)[f(x)- f(xo)]. Determine the initial value problem that arises from Equation
(8). Solve it in closed form and verify that x(1) = ,j2.
9. In Example 1 are the hypotheses of the Ortega-Rheinboldt theorem fulfilled?
10. Prove that any two continuous maps from a topological space into a normed linear space
are homotopic.
Chapter 5

Distributions

5.1 Definition and Examples 246


5.2 Derivatives of Distributions 253
5.3 Convergence of Distributions 257
5.4 Multiplication of Distributions by Functions 260
5.5 Convolutions 268
5.6 Differential Operators 273
5.7 Distributions with Compact Support 280

5.1 Definition and Examples

The theory of distributions originated in the work of Laurent Schwartz in


the era 1945-1952 [SchI2]. Earlier work by Sobolev was along similar lines.
The objective was to treat functions as functionals, and to notice that when
so interpreted, differentiation was always possible. This opened the way to the
study of partial differential equations by new methods that bypassed the classical
restrictions on functions. The functionals that now become the focus of study
are called "distributions"-not to be confused with distributions in probability
theory! The term "generalized functions" is also used, especially by Russian
authors.
Since this exposition of distribution theory is addressed to readers who are
seeing these matters for the first time, we have used notation that maintains
distinctions between entities that, i~ other literature, are often denoted by a
single symbol. For example, we use f to denote the distribution arising from a
locally integrable function f. We use ¢j -- ¢ to signify the special convergence
defined in a space of test functions, and we use 80: for a distributional derivative,
in contrast to DO: for a classical derivative.
In these first sections of Chapter 5 we consider all test functions to be
defined on lR n , not on a prescribed open set n. This frees the exposition from
some additional complication.
We begin with the notion of a multi-index. This is any n-tuple of non-
negative integers

246
Section 5.1 Definition and Examples 247

The order of a multi-index Q is the quantity


n

i=l
If Q is a multi-index, there is a partial differential operator D'" corresponding to
it. Its definition is
a) "'1 ( a ) "'2 (a) "'n
( aXl al"'l
D'" = aX2 ... aXn = ax~l ... ax~n
This operates on functions of n real variables Xl, ... , Xn. Thus, for example, if
n = 3 and Q = (3,0,4), then

'"
D ¢= aXla3a¢x34
7

The space COO(JR n ) consists of all functions ¢ : JRn -+ JR such that D"'¢ E
C(JR n ) for each multi-index Q. Thus, the partial derivatives of ¢ of all orders
exist and are continuous.
A vector space 1>, called the space of test functions, is now introduced.
Its elements are all the functions in COO(JR n ) having compact support. The
support of a function ¢ is the closure of {x : ¢( x) i- O}. Another notation for
1> is C~(JRn). The value of n is usually fixed in our discussion. If we want to
show n in the notation, we can write 1>(JRn ).
At first glance, it may seem that 1> is empty! After all, an analytic function
that vanishes on an open nonempty set must be 0 everywhere. But that is a
theorem about complex-valued functions of complex variables, whereas we are
here considering real-valued functions of real variables.
An important example of a function in 1> is given by the formula
c. exp(lxl2 - 1)-1 if X E JRn and Ixi < 1
(1) p(x) = {
o if X E JRn and Ixi ~ 1
where c is chosen so that J p(x) dx = 1. Here and elsewhere we use Ixl for the
Euclidean norm:

Ixl~ (t,xlf
The graph of p in the case n = 1 is shown in Figure 5.1 .
.8

0.6

0.4

0.2

-1 -0.5 0.5

Figure 5.1 Graph of p

The fact that p E D is not at all obvious, and the next two lemmas are inserted
solely to establish this fact.
248 Chapter 5 Distributions

Lemma 1. For any polynomial P, the function f : JR -+ JR defined


by
P(l/x)e-l/X x>O
f(x) = {
o x:!(O
is ill COO(JR).

Proof. First we show that f is continuous. The only questionable point is


x = O. We have
limf(x) = lim P(l/x) = lim P(t)
x.l.O x.l.O exp(l/x) ttoo exp(t)
By using L'Hopital's rule repeatedly on this last limit, we see that its value is O.
Hence f is continuous. Differentiation of f gives

!,(x) = {Q(1/x)e- 1 / X x> 0


o x<O

where Q(x) = x 2 [P(x) - P'(x)]. By the first part of the proof, lim !,(x) = O. It
x.l.O
remains only to be proved that f'(O) = o. We have, by the mean value theorem,

1'(0) = lim f(h) - f(O) = lim !,(~(h)) = 0


h~O h h~O

where ~(h) is strictly between 0 and h. (Note that h can be positive or negative
in this argument.) We have shown that

Q(l/x)e-l/X x> 0
!,(x) = {
o x:!(O

This has the same form as f, and therefore f' is continuous. The argument can
be repeated indefinitely. The reader should observe that our argument requires
the following version of the mean value theorem: If 9 is continuous on [a, b] and
differentiable on (a, b), then for some ~ in (a, b)

g'(O = [g(b) - g(a)]!(b - a)



Lemma 2. The function p defined in Equation (1) belongs to 1).

Proof. The function f in the preceding lemma (with P(x) = 1) has the prop-
erty that p(x) = cf(1-lxI 2 ). Thusp = cfog, where g(x) = 1-lx1 2 and belongs
to coo(JRn). By the chain rule, Do.p can be expressed as a sum of products of
ordinary derivatives of f with various partial derivatives of g. Since these are
all continuous, DO. p E C(JRn) for all multi-indices Q. •

The support of a function ¢ is denoted by supp(¢). An element ¢ of COO (JRn)


such that

supp(¢) C {x: Ixl :!( I}


Section 5.1 Definition and Examples 249

is called a mollifier. The function p defined above is thus a mollifier. If cP is a


mollifier, then the scaled versions of cP, defined by

(2)

playa role in certain arguments, such as in Sections 5.5 and 6.8. They, too, are
mollifiers.
The linear space 1) is now furnished with a notion of sequential convergence.
A sequence [cPj] in 1) converges to 0 ifthere is a single compact set K containing
the supports of all cPj, and if for each multi-index 0:,

DC< cPj -+ 0 uniformly on K

We write cPj --# 0 if these two conditions are fulfilled. Further, we write cPj --# cP
if and only if cPj - cP --» o. The use of the symbol --» is to remind the reader of
the special nature of convergence in 1). Uniform convergence to 0 on K of the
sequence DC<cPj means that

sup I(DC<cPj)(x)l-+ 0 as j -+ 00
xEK

Since all cPj vanish outside of K, we also have

sup I(DC<cPj)(x)l-+ 0
xElRn

Continuity and other topological notions will be based upon the convergence
of sequences as just defined. In particular, a map F from 1) into a topological
space is continuous if the condition cPj -+> cP implies the condition F( cPj) -+
F(cP). The legitimacy of defining topological notions by means of sequential
convergence is a matter that would require an excursus into the theory of locally
convex linear topological spaces. We refer the reader to [Rul] for these matters.
The next result gives an example of this type of continuity.

Theorem 1. For every multi-index 0:, DC< is a continuous linear


transformation of 1) into 1).

Proof. The linearity is a familiar feature of differentiation. For the continuity,


it suffices to prove continuity at 0 because DC< is linear. Thus, suppose that
cPj E 1) and cPj --» o. Let K be a compact set containing the supports of all
the functions cPj. Then Df3 cPj (x) -+ 0 uniformly on K for every multi-index {3.
Consequently, Df3 DC<cPj(x) -+ 0 uniformly for each {3, and so DC<cPj -+> 0, by the
definition of convergence in 1). •
A distribution is a continuous linear functional on 1). Continuity of such
a linear function T is defined by this implication:

The space of all distributions is denoted by 1)', or by 1)'(JRn).


250 Chapter 5 Distributions

Example 1. A Dirac distribution 6F, is defined by selecting ~ E lR n and writing

(3) (¢ E 1»

It is a distribution, because firstly, it is linear:

Secondly, it is continuous because the condition ¢j --+> 0 implies that ¢j(~) -+ o.


If we write 6 without a subscript it refers to evaluation at OJ i.e., 6 = 60 • •

Example 2. The Heaviside distribution is defined, when n = 1, by

(4) ii(¢) = 1 00
¢(x) dx (¢ E 1»

Example 3. Let f : lRn -+ lR be continuous. With f we associate a distribution
1by means of the definition
(5) i<¢) = J f(x)¢(x) dx (¢ E 1»

1
The linearity of is obvious. For the continuity, we observe that if ¢j --+> 0, then
there is a compact K containing the supports of the ¢j. Then we have

because ¢j --+> 0 entails sup l¢j(x)l-+


x
o. •
Example 4. Fix a multi-index Q and define

(¢ E 1»

This is a distribution. (The proof involves the use of Theorem 1.)



Example 5. If H is the Heaviside function, defined by the equation

1 if x ~ 0
H(x) = {
o if x <0
then Example 2 above illustrates the principle in Example 3, although H is
obviously not continuous. •

The distributions f described in Example 3 are the subject of the next


theorem.
Section 5.1 Definition and Examples 251

Theorem 2. If 1 E C(JR n ), then 1,


as defined in Equation (5), is a
distribution. The map 1 H 1
is linear and injective from C(JR n ) into
1)'.

Proof. We h!l:ve already seen that 1 is a distribution. The linearity of the


mapping 1 H 1 follows from the equation

(adl + a2h)~(¢) = j(ad1 + a2h)¢ = al j h¢ + a2 j h¢

= at!l(¢) + ad2(¢) = (adl + ad)(¢)


For the injective property it suffices to prove that if 1 t= 0, then 1 t= O. Supposing
that 1 t= 0, let ~ be a point where 1(0 t= O. Select j such that I(x) is of
one sign in the ball around ~ of radius 1/j. Then Pj(x - ~), as defined in
Equation 2, is positive in this same ball about ~ and vanishes elsewhere. Hence
J I(x)pj(x -~) dx t= O. This means that J(¢) t= 0 if ¢(x) = Pj(x - ~). •
Example 3 shows that in a certain natural way, each continuous fu~ction
1 : JRn--t JR "is" a distribution. That is, we can associate a distribution 1 with
I. In fact, the same is true for some functions that are not continuous. The
appropriate family of functions is described now.
A Lebesgue-measurable function 1 : JRn --t JR is said to be locally inte-
grable if for every compact set K c JR n , JK
I/(x)ldx < 00. As is usual when
dealing with measurable functions, we define two functions to be equivalent
if they differ only on a set of measure zero. The equivalence classes of locally
integrable functions make up the space Ltoc(JR n ).
We mention, without proof, the result corresponding to the preceding the-
orem for the case of locally integrable functions. See [Rul] page 142, or [Lanl]
page 277.

Theorem 3. If 1 is l~ca1ly integrable, then the equation J( ¢) = J1¢


defines a distribution 1 that does not depend on the repr~sentative
selected from the equivalence class of I. The mapping 1 H 1 is linear
and injective from Ltoc(JR n ) into 1)'.

Theorem 4. Let Jl be a positive Borel measure on JRn such that


Jl(K) < 00 for each compact set K in JR n . Then Jl induces a distribution
T by the formula

T(¢) = ( (¢ 1»)
iR,n ¢(x) dJl(x) E

Proof. The linearity is obvious. For the continuity of T, let ¢j


E 1) and
¢j O. Then there is a compact set K containing the supports of all ¢j.
-++

Consequently,

IT(¢j)1 ~ ( l¢j(x)1 dJl(x) ~ sup l¢j(Y)1 ( dJl(x)


iK yEK iK
= Jl(K) sup l¢j(y)1 --t 0
yEK •
252 Chapter 5 Distributions

The distributions described in Theorem 3 are said to be regular.


Suggested references for this chapter are [Ad], [Con], [Dono], [Edw], [Fol],
[Fri], [Friel], [Fried], [GV], [Gri], [Ho], [Horv], [Hu], [Jon], [Maz], rOD], [RS],
[Rul], [Schl], [SchI2], [So], [Yo], [Ze], [Zem], and [Zie].

Problems 5.1

1. Describe the null space of DO in the case n = 2. Do this first when the domain is coc(JRn)
and second when it is 1>.
2. Let I : JR --t JR. Suppose that I' exists and is continuous in the two intervals
(-00,0) , (0,00). Assume further that limJ'(x) = limJ'(x). Does it follow that f'
x.j.O xtO
is continuous on JR? Examples and theorems are wanted.
°
3. Prove that for each xo E JRn and for each r > there is an element </J of 1> such that the
set {x: </J(x) ¥ o} is the open ball B(xo, r) having center xo and radius r.
4. Prove that if 0 is any bounded open set in JRn, then there exists an element </J of 1> such
that {x: </J(x) ¥ o} = O. Hints: Use the functions in Problem 3. Maybe a series of such
functions)' 2- k</Jk will be useful. Don't forget that the points of JRn whose coordinates
are ration;tlform a dense set.
5. For each v E JRn there is a translation operator Evon 1>. Its definition is (Ev</J)(x) =
</J(x - v). Prove that Ev is linear, continuous, injective, surjective, and invertible from
1> to 1>.
6. For each </J in coo(JRn) there is a mUltiplication operator M</> defined on 1> by the
equation M</>1/; = </J'I/J. Prove that M</> is linear and continuous from 1> into 1>. Under
what conditions will M</> be injective? surjective? invertible?

7. For suitable </J there is a composition operator C</> defined on 1> by the equation
C</>1/; = 1/; 0 </J. What must be assumed about </J in order that C</> map 1> into 1>? Prove
that C</> is linear and continuous from 1> to:D. Find conditions for C</> to be injective,
surjective, or invertible.
8. Prove that E v , as defined in Problem 5, has this property for all test functions </J and 1/;:

What is the analogous property for M</> in Problem 6?

9. Prove that if T is a distribution and if A is a continuous linear map of 1> into 1>, then
To A is a distribution. Use the notation of the preceding problems and identify 8~ 0 E v ,
8~ 0 M</>, and 8~ 0 C</> in elementary terms. What is (8~ 0 Ev 0 M8 0 C",)(</J)?

10. Show that DO Di3 = Do+i3, and that consequently DO Di3 = Di3 DO.
11. Let </J E 1>. Prove that if there exists a multi-index a for which Det</J = 0, then </J = 0.
°
Suggestion: Do the cases lal = and lal = 1 first. Proceed by induction on lal.
12. Prove (in detail) that each test function is uniformly continuous.
13. Prove that 1> is a ring without unit under pointwise multiplication. Prove that 1> is an
ideal in the ring coc(JRn). This means that I</J E 1> when I E coo and </J E 1>.

14. For </J E 1>(JR), define T(</J) = L:;;"=o(Dk</J)(k). Prove that T is a distribution.

°for
°
15. Give an example of a sequence [</JjJ in 1> such that [Det</JjJ converges uniformly to
each multi-index a, yet [</JjJ does not converge to in the topology of 1>.
16. Show that supp( </J) is not always the same as {x : </J( x) ¥ o}. Which of these sets contains
the other? When are these sets identical?
Section 5.2 Derivatives of Distributions 253

17. A distribution T.is said to be of order 0 if there is a constant C such that IT(¢)I ~
CIi¢li oo (for all test functions ¢). Which regular distributions are of order O?
18. Prove that the Dirac distributions in Example 1 are not regular.
19. Give a rough estimate of c in Equation 1. (Start with n = 1.)
20. Show that the notion of convergence in 1) is consistent with the linear structure in 1).

5.2 Derivatives of Distributions

We have seen that the space 1)' of distributions is very large; it contains (im-
ages of) all continuous functions on IR n and even all locally integrable functions.
Then, too, it contains functionals on 1) that are not readily identified with func-
tions. Such, for example, is the Dirac distribution, which is a "point-evaluation"
functional. We now will define derivatives of distributions, taking care that
the new notion of derivative will coincide with the classical one when both are
meaningful.
Definition. If T is a distribution and 0: is a multi-index, then ao.T is the
distribution defined by
(1)
Notice that it is a little simpler to write ao.T = T 0 ( - D)o.. The first
question is whether ao.T is a distribution. Its linearity is clear, since T and DO.
are linear. Its continuity follows by the same reasoning. (Here Theorem 1 from
the preceding section is needed.)
The next question is whether this new definition is consistent with the old.
Let f be a function on IR n such that DO. f exists and is continuous whenever
f
10:1 ~ k. Then is a distribution, and when 10:1 ~ k,
(2)
To verify this, we write (for any test function </»

(3)
(DO. f)~(</» = J (DO. f)</> = (_1)10. 1 J f DO.</> = (_1)10. 1 f(Do.</»

= (ao. f)(</»
In this calculation integration by parts was used repeatedly. Here is how a single
integration by parts works:

1 ax,
00

-00
af
- . </> dXi = f<I>\
00

-00
-
1 00

-00
a</>
f -.
ax, dXi
Since </> E 1), </> vanishes outside some compact set, and the first term on the
right-hand side of the equation is zero. Each application of integration by parts
transfers one derivative from f to </> and changes the sign of the integral. The
number of these steps is 10:1 = I:~=l O:i·
Now, it can happen that ao. f i- (DO. f)~ for a function f that does not have
continuous partial derivatives. For an example, the reader should consult [Rul],
page 144.
254 Chapter 5 Distributions

Example 1. Let ii be the Heaviside distribution (Example 2, page 250), and


let 8 be the Dirac distribution at 0 (Example 1, page 250). Then with n = 1
and 0: = (1), we have aii = 8. Indeed, for any test function ¢,

(aii)(¢) = -ii(D¢) = -l)() ¢' = ¢(O) - ¢(oo) = ¢(O) = 8(¢) •

Example 2. Again let n = 1 and 0: = (1), so that D is an ordinary derivative.


Let
X if x~O
I(x) = {
o if x:(;O
One is tempted to say that I' is the Heaviside function H. But this is not true,
since 1'(0) is undefined in the classical sense. However, a1
= ii, and so in the
sense of distributions the equation I' = H becomes correct. •
The nomenclature that is often used in these matters is as follows: A "dis-
tribution derivative" (or a "distributional derivative") of a function I is a distri-
bution T such that (1)' = T. In the general case of an operator D'\ we require
1
ao: = T. If T is a regular distribution, say T = g, then the defining equation is
(¢E1»)

Example 3. What is the distribution derivative of the function I(x) = Ixl?


g, where 9 is a function such that for all test functions ¢,
It is a distribution

J g¢ = - J N'= - [°00 (-X)¢/(X) dx -100 X¢/(X) dx

= x¢(x) [00 - [~ ¢(x) dx - x¢(x)l~ + 00 ¢(x) dx 1


= [°00 (-l)¢(x)dx+ l°O(+l)¢(x)dx
Thus
-I
g(x) = { x<O} = 2H(x)-1
+1 x~O

We say that I' = 9 in the sense of distributions, or a1 = g. Note particularly


that f does not have a "classical" derivative. •
Example 4. What is the distribution derivative f" when I(x) = Ixl? If we
blindly use the techniques of classical calculus, we have from Examples 1 and
3, I' = 2H - 1 and f" = 28. This procedure is justified by the next theorem .

Section 5.2 Derivatives of Distributions 255

Theorem 1. The operators [)C> are linear from 1>' into 1>'. Further-
more, ()O'a = a ao. = ao.+ f3 for any pair of multi-indices.
f3 f3

Proof. The linearity of ao. is obvious from the definition, Equation (1). The
commutative property rests upon a theorem of classical calculus that states that
lor any f
C ·
unctIOn f 0 f two varia a2 f an d ayax
. bles, I·f axay a2 f eXIst
. an d are contmuous,
.

then they are equal. Therefore, for any ¢ E 1>, we have DO. Df3¢ = Df3 Do.¢.
Consequently, for an arbitrary distribution T we have

ao.(af3T) = (_l)lo.l(af3T) 0 DO. = (_1)10.1( _1)1f31 To Df3 0 DO.


= (_1)1f31( _1)10. 1To DO. 0 Df3
= (-I)If3I(ao.T)oDf3
= af3ao.T

Theorem 2. For n = 1 (i.e., for functions of one variable), every
distribution is the derivative of another distribution.

Proof. Prior to beginning the proof we define some linear maps. Let 1 be the

1:
distribution defined by the constant 1:

1(¢) = ¢(x) dx (¢ E 1»

Let M be the kernel (null space) of 1. Then M is a closed hyperplane in 1>.


Select a test function 'IjJ such that 1('IjJ) = 1, and define

1:
A¢ = ¢ -1(¢)'IjJ (¢ E 1»

(B¢)(x) = ¢(y) dy (¢E M)

We observe that if ¢ EM, then B¢ E 1>.


Now let T be any distribution, and set S = -To BoA. It is to be shown
that S is a distribution and that as = T. Because A¢ E M for every test
function, BA¢ E 1>. Since BoA is continuous from 1> into 1>, one concludes
that S is a distribution. Finally, we compute

(as)(¢) = -S(¢') = T(BA¢') = T(B¢') = T¢

Here we used the elementary facts that 1(¢') = 0 and that B¢' = ¢. •
256 Chapter 5 Distributions

Theorem 3. Let n = 1, and let T be a distribution for which


aT = O. Then T is c for some constant c.

Proof. Adopt the notation of the preceding proof. The familiar equation

¢(x) = dx
d jX -00 ¢(y)dy

says that ¢ = DB¢, and this is valid for all ¢ E M. Since A¢ E M for all ¢ E 1),
we have A¢ = DBA¢ for all ¢ E 1). Consequently, if aT = 0, then for all test
functions
T(¢) = T(A¢ + I(¢)¢) = T(DBA¢) + I(¢)T(¢)
= -(8T)(BA¢) + T(¢)I(¢)
= T(¢)I(¢)
Thus T = c, with c = T(¢).
We state without proof a generalization of Theorem 2.

Theorem 4. If T is a distribution and K is a compact set in JR n ,
then there exists an f E C(JRn) and a multi-index 0: such that for all
¢ E 1) whose supports are in K,

Proof. For the proof, consult [Ru1]' page 152.



Problems 5.2

1. Let f be a C1-function on (-00,0] and on [0,00). Let a = limf(x) -limf(x). Express


x.j.O xto
the distribution derivative of f in terms of a and familiar distributions.

2. Let 8 and ii be the Dirac and Heaviside distributions. What are an8 and an ii?
3. Find all the distributions T for which a"T = 0 whenever 101 = 1.

4. Use notation introduced in the proof of Theorem 2. Prove that loA = 0, that DBA = A,
and that A 0 D = D. Prove that BoA is continuous on 1).
5. The characteristic function of a set A is the function XA defined by

X A(S) = { 1 if sEA
o if s It A
If A = (a, b) C JR, what is the distributional derivative of X A?
6. For what functions f on JR is the equation a1 = l' true?
7. Let n = 1. Prove that D : 1) ~ 1) is injective. Prove that a : 1)1 ~ 1)1 is not injective.
8. Work in 1)(JRn). Let a be a multi-index such that 101 = 1. Prove or disprove that
D" : 1) ~ 1) is injective. Prove or disprove that a" : 1)1 ~ 1)1 is injective.
9. Let n = 1. Is every test function the derivative of a test function?
Section 5.3 Convergence of Distributions 257

10. Let f E GOC(JR) and let H be the Heaviside function. Compute the distributional deriva-
tives of fH. Show by induction that am(fH) = HD m f + 2::;:'=-01D k f(0)a m - k - 1 8.
11. Prove that the hyperplane M defined in the proof of Theorem 2 is the range of the
d
operator dx when the latter is interpreted as acting from 1)(JR1 ) into 1)(JR1).

12. If two locally integrable functions are the same except on a set of measure 0, then the
corresponding distributions are the same. If H is the Heaviside function, then H'(x) = 0
except on a set of measure O. Therefore, the distributional derivative of H should be O.
Explain the fallacy in this argument.
13. Find the distributional derivative of this function:

COSx x>O
f(x) = {
sinx x ~0

14. Define A on 1)(JR2) by putting

Prove that A maps 1)(JR2 ) into 1)(JR1).


15. Refer to the proof of Theorem 2 and show that BA is a surjective map of 1) to 1).
16. Refer to Problem 5. Does the characteristic function of every measurable set have a
distributional derivative?
17. Consider the maps f ..... 1, DO, and ao. Draw a commutative diagram expressing the
consistency of DO and ao in Equation (1).
18. Find a distribution T on JR such that a2 T + T = 8.

5.3 Convergence of Distributions

If [Tj ] is a sequence of distributions, we will write T j ~ 0 if and only if


Tj (¢) ~ 0 for each test function ¢. If T is a distribution, Tj ~ T means that
T j - T ~ O. The reader will recognize this as weak* convergence of a sequence
of linear functionals. It is also "pointwise" convergence, meaning convergence at
each point in the domain. Topological notions in 1)', such as continuity, will be
based on this notion of convergence (which we refer to simply as "convergence
of distributions"). For example, we have the following theorem.

Theorem 1. For every multi-index a, ao. is a continuous linear map


of 1)' into 1)' .

Proof. Let T j ~ O. In order to prove that ao.Tj ~ 0, we select an arbitrary


test function ¢, and attempt to prove that (ao.Tj)(¢) ~ O. This means that
(-1)10.ITj(Do.¢) ~ 0, which is certainly true, because Do.¢ is a test function.
See Theorem 1 in Section 5.2. •
An important result, whose proof can be found, for example, in [Rul] page
146, or [Ho] page 38, is the following.
258 Chapter 5 Distributions

Theorem 2. If a sequence of distributions [Tj ] has the property


that [Tj (1))] is convergent for each test function 1>, then the equation
T(1)) = limj Tj (1)) defines a distribution T, and Tj ~ T.

The theorem asserts that for a sequence of distributions Tj if limj Tj (1))


exists in lR for every test function 1>, then the equation

defines a distribution. There is no question about T being well-defined. Its


linearity is also trivial, since we have

The only real issue is whether T is continuous, and the proof of this requires
some topological vector space theory beyond the scope of this chapter.

Corollary 1. A series of distributions, L:j:l Tj , converges to a


distribution if and only iffor each test function 1> the series L:j:l Tj (1))
is convergent in R

Corollary 2. If L: Tj is a convergent series of distributions, then


for any multi-index a, aa L:Tj = L: aaTj .

Proof. By Theorem 1, aa is continuous. Hence


The previous theorem and its corollaries stand in sharp contrast to the
situation that prevails for classical derivatives and functions. Thus one can
construct a pointwise convergent sequence of continuous functions whose limit
is discontinuous. For example, consider the functions fk shown in Figure 5.2.

1
k
Figure 5.2
Section 5.3 Convergence of Distributions 259

Similarly, even a uniformly convergent series of continuously differentiable func-


tions can fail to satisfy the equation

d d
- L!k=L - fk
dx dx

A famous example of this phenomenon is provided by the Weierstrass nondiffer-


entiable function
00

f(x) = LTkcos3kx
k=I

This function is continuous but not differentiable at any point! (This example
is treated in [Ti2] and [Ch]. See also Section 7.8 in this book, pages 374ff, where
some graphics are displayed.)

Example. Let fn(x) = cosnx. This sequence of f~ctions does not converge.
Is the same true for the accompanying distributions fn? To answer this, we take
any test function ¢> and contemplate the effect of fn on it:

inc¢»~ = 1: ¢>(x)cosnxdx = lb ¢>(x) cos nx dx

Here the interval [a, b] is chosen to contain the support of ¢>. For large values
of n the Coo function is being integrated with the highly oscillatory function
fn. This produces very small values because of a cancellation of positive areas
and negative areas. The limit will be zero, and hence fn -+ O. This conclusion
can also be justified by writing in = 9~, where 9neX) = sinnx/n. We see that
gn -+ 0 uniformly, and that the equations !in -+ 0 and Tn -+ 0 follow, in 1)' .

Theorem 3. Let i, iI, 12, .. · belong to Lloc(JR n), and suppose
that fj -+ f pointwise almost everywhere. If there is an element
9 E Lloc(JR n) such that Ifjl ~ g, then 1J -+ Tin 1)'.

Proof. The question is whether

(1)

for all test functions ¢>. We have h¢> E LI(K) if K is the support of ¢>. Further-
more, Ih¢>1 ~ gi¢>l and (fj¢»(x) -+ (f¢»(x) almost everywhere. Hence by the
Lebesgue Dominated Convergence Theorem (Section 8.6, page 406), Equation
(1) is valid. •
260 Chapter 5 Distributions

Theorem 4. Let [/jl be a sequence of nonnegative functions in


Ltoc(lRn) such that f fj = 1 for each j and such that

for all positive r. Then J; -+ 8 (the Dirac distribution).


Proof. Let ¢> E 1:> and put 'I/J = ¢> - ¢>(O). Let E > 0, and select r > 0 so that
I'I/J(X) I < E when Ixi < r. Then

I Jfj¢> - ¢(O)I = I J fj[¢> - ¢>(O)ll = I J/j'I/J1 ~ JIfj'I/Jl

~ 1 Ixl<r
Ifj'I/Jl + 1
Ixl:;;,r
1/j'I/J1

Taking the limit as j -+ 00, we obtain

IlimJ;(¢»-8(¢»1 ~E
J

Since E was arbitrary, limj J;(¢» = 8(¢». •


Problems 5.3

1. What is the distributional derivative of the Weierstrass nondifferentiable function men-


tioned in this section?
2. Let {Te : (J E A} be a set of distributions indexed with a real parameter (J; i.e. A C JR.
Make a suitable definition for lime->r Te.
3. (Continuation) If </> E 1:>, if n = 1, and if (J E JR, let </>e(x) = </>(x - (J). If T
is a distribution, let Te be the distribution defined by Te(</» = T(</>e). Prove that
lime->o (J-l(Te - T) = aT.
4. Let [~j] be a sequence of distinct points in JRn, and let OJ be the corresponding Dirac
distributions. Under what conditions does L:;:l
CjOj represent a distribution? (Here
Cj E IR.) A necessary and sufficient condition on [Ci] would be ideal.

5. Use Theorem 4 to prove that if f is a nonneg~ive element of Ltoc(JRn) such that ff =1


and if fj(x) =jnf(jx) for j = 1,2, ... , then fJ ---t o.
6. Do these sequences have the properties described in Theorem 4?
(a) fj(x) = j/[7r(1 + j2X2)]

(b) fj(x) =j7r- l / 2 exp(-px 2 )


7. Let the real line be partitioned by points -00 = xo < Xl < ... < Xn+l = 00, and suppose
that f is a piecewise continuously differentiable function, with breaks at Xl, ... ,X n · Prove
that the distributional derivative of f is (f/)~ + L:~=1 CiOXi' where Ci is the magnitude of
the jump in f at Xi; i.e., Ci = f(Xi + 0) - f(Xi - 0). Notice that this problem emphasizes
Section 5.4 Multiplication of Distributions by Functions 261

the difference between (f')~ and (f~)'. The prime symbol has different meanings in
different contexts.

5.4 Multiplication of Distributions by Functions

Before getting to the main topic of this section, let us record some results from
multivariate algebra and calculus.
Recall the definition of the classical binomial coefficients:

if 0 ~ k ~ m
(1)
otherwise

These are the coefficients that make the Binomial Theorem true:

(2)

The multivariate version of this theorem is presented below.


Definitions. For two multi-indices 0: and (3 in zn,
we write (3 ~ 0: if (3i ~ O:i
for each i = 1,2, ... ,n. We denote by 0: + (3 the multi-index having components

(l~i~n)

If (3 ~ 0:, then 0: - (3 is the multi-index whose components are O:i - (3i' Finally,
if (3 ~ 0:, we define

(3) (~) = (~J (~~) ... (~:)

The function x >-7 xO: is a monomial. For n = 3, here are seven typical mono-
mials:
1

These are the building blocks for polynomials. The degree of a monomial xO:
is defined to be 10:1. Thus, in the examples, the degrees are 10, 9, 3, 0, 1, 1, and
1. A polynomial in n variables is a function
262 Chapter 5 Distributions

in which the sum is finite, the Co are real numbers, and a E Z+'. The degree of
pis
max{lal : Co =I o}
If all Co are 0, then p(x) = 0, and we assign the degree -00 in this case. A
polynomial of degree 0 is a constant function. Here are some examples, again
with n = 3:
PI (x) = 3 + 2XI - 7X~X3 + 2XIX~X~
P2(X) = v2XIX3 - 7l"X~X~X3
These have degrees 12 and 9, respectively.
The completely general polynomial of degree at most k in n variables can
be written as
X f-t L
coxo
lol~k

This sum has (k!n) terms, as established later in Theorem 2.


We have seen in Section 5.1 that multi-indices are also useful in defining
differential operators. If we set D = (0.° , 0. a ,... , 0.a ). Then in a natural
UXI UX2 UX n
way we define

= IT ax. i = -::--"o-;-:-"""'o"-;:----=-
n aOi alol
DO -0-
ax lax 2 .•. axon
i=l' 1 2 n

A further definition is
n
a! = al!a2! ... an! = II ail
i=l

The n-dimensional binomial coefficients are then expressible in the form

if 0 ~ f3 ~ a

otherwise

Multivariate Binomial Theorem. For all x and y in R. n and all


a in Z+',

Proof·
Section 5.4 Multiplication of Distributions by Functions 263

-- '"'
L.. ... '"'
"I
L.. II ai Xif3i Yi"i- f3 i --
"n n ( ) 'L..
"' IIn ( ai ) IIn Xjf3j Yj"rf3j
f3l =0 f3n=O i=l (3i 0';;;f3';;;" i=l (3i j=l


We will usually abbreviate the inner product (x, y) of two vectors x, Y E JRn
by the simpler notation xy.

Multinomial Theorem Let x, Y E JRn and mEN. Then

(xy)m = '"' ~ x"y"


m'
L.. a!
l"l=m
Proof. It suffices to consider only the special case y = (1, 1, ... , 1), because
xy = ((XlYl, ... , xnYn), (1, ... ,1))
For this special case we proceed by induction on n. The case n = 1 is trivially
true, and the case n = 2 is the usual binomial formula:
m! ~ m! m-j
(Xl + X2) m _
-
'"' "I "2 _
L.. -,-,Xl X2 - L.. .'( _ .),X l X2
j

"1+"2=m al .a2 . j=O). m ).


Suppose that the multinomial formula is true for a particular value of n. The
proof for the next case goes as follows. Let X = (Xl, ... ,x n ), w = (Xl, ... ,xn+d,
a = (al, ... , an), and (3 = (al, ... , an+d. Then
(Xl + ... + Xn+l)m = [(Xl + ... + xn) + Xn+l]m

-_ '"'
L..
m ()
m. '"'
L.. L.,,x"m-j
x n+l
j=O ) l"l=j a.

~ '"' m! j ! " m-j


= L.. L.. j!(m _ j)! a! X X n +l
J=O I"I=J

= '"' m! wf3 = '"' m! wf3


L.. (m - j)!a! L.. (3!
1f3I=m 1f3I=m
In this calculation, we let (3 = (al, ... ,an,m - j), where lal
The linear space of all polynomials of degree at most m in n real variables
= j. •
is denoted by IIm(JR n ). Thus each element of this space can be written as
p(x) = L c"x"
1"I';;;m
Consequently, the set of monomials
{x >-+ x" : lal ~ m}
n
spans IIm(JR ). Is this set in fact a basis?
264 Chapter 5 Distributions

Theorem 1 The set of monomials x H XU on JRn is linearly inde-


pendent.

Proof. If n = 1, the monomials are the elementary functions Xl H x{ for


j = 0,1,2, ... They form a linearly independent set, because a nontrivial linear
combination L'l'=o Cjx{ cannot vanish as a function. (Indeed, it can have at
most m zeros.)
Suppose now that our assertion has been proved for dimension n - 1. Let
X = (XI, ... ,Xn ) and a = (al, ... ,an ). Suppose that LUEJCUxu = 0, where
the sum is over a finite set J C Z~. Put

Jk = {a E J : al = k}

Then for some m, J = Jo U··· U J m , and we can write

By the one-variable case, we infer that for k = 0, ... , m,


L cuX~2 .. ·x~n =0
uEJk

Note that as a runs over Jk, the multi-indices (a2, . .. , an) are all distinct. By
the induction hypothesis, we then infer that for all a E Jk, Cu = O. Since k runs
from 0 to m, all Cu are O. •
We want to calculate the dimension of lIm(JR n). The following lemma is
needed before this can be done. Its proof is left as a problem.

Lemma 1 For n = 1,2,3, ... and m = 0,1,2, ... we have

Theorem 2 The dimension of lIm (JRn) is (m: n) .

Proof. The preceding theorem asserts that a basis for lIm(JRn) is {x H XU :


lal ~ m}. Here x E JR n . Using # to denote the number of elements in a set, we
have only to prove

#{aEZ~:lal~m}= ( m+n)
n

We use induction on n. For n = 1, the formula is correct, since

#{a E Z+: a ~ m} = m + 1 = ( m+
1 1)
Section 5.4 Multiplication of Distributions by Functions 265

Assume that the formula is correct for a particular n. For the next case we write

#{ 0 E Z~+l : 101 ~ m} = # kQo {0 E Z~+l : 0n+l = k, ~ 0i ~ m - k}


+n )
= L #{ 0 E Z~ : 101 ~ m -
m
k} = LTn ( m- k
n
k=O k=O

= f (k:n) (m::t 1)
k=O
=

In the last step, we applied Lemma 1. •

Theorem 3. (The Leibniz Formula) If ¢ and 1j; are test functions,


then for any multi-index 0 we have

(4) DQ;(¢1j;) = L (O)Df3¢. DQ;-f31j;


f3~Q; f3
Proof. (after Horvath) We use induction on 101. If 101 = 0, then 0 =
(0,0, ... ,0), and both sides of Equation (4) reduce to ¢1j;.
Now suppose that (4) has been established for all multi-indices 0 such that
101 ~ m. Let, be a multi-index of order m + 1. By renumbering the variables
if necessary we can assume that ,1 ~ 1. Let 0 = (rl - 1,,2,,3, ... "n). Then
D"'I = Dl DQ;, where Dl denotes a/ aXl' Since 101 ~ m, the induction hypothesis
applies to DQ;, and hence

D"'I(¢1j;) = DIDQ;(¢1j;) = Dl L e)Df3¢. DQ;-f31j;


f3~Q;
(5)
= L (;) [D Df3¢· DQ;-f31j; + Df3¢· D DQ;-f31j;]
1 1
f3~Q;

Now we set f3 = (f31,f32,oo.,f3n) and f3' = (f31 + 1,f32,oo.,f3n). Observe that


f3~ 0 if and only if f3' ~ ,. Hence the first part of the sum in Equation (5) can
be written as

(6)
266 Chapter 5 Distributions

The second part of the sum in Equation (5) can be written as

(7)

Now invoke the easily proved identity

Adding these two parts of the sum in Equation (5), we obtain


Since 1)' is a vector space, a distribution can be be multiplied by a constant
to produce another distribution. Multiplication of a distribution T by a function
f E Coo (JR n ) can also be defined:

U· T)(</J) = TU</J)·

Notice that if f is a constant function, say f(x) = c, then f· T, as just defined,


agrees with cT. In order to verify that f . T is a distribution, let </Jj ~ 0 in
1). Then there is a single compact set K containing the supports of all </Jj, and
DQ;</Jj ~ 0 uniformly on K for all multi-indices. If f E COO (JRn) then by Leibniz's
formula,
DQ;U</Jj) = L (a)DfJ f· DQ;-fJ</Jj ~0
fJ~o: (3
This proves that f</Jj ~ 0 in 1). Since T is continuous,

Hence f . T is continuous. Its linearity is obvious.


Here is a theorem from elementary calculus, extended to products of func-
tions and distributions.

Theorem 4. Let D be a simple partial derivative, say D = 88 ,


Xi
and let 8 be the corresponding distribution derivative. IfT E 1)' and
f E COO(JRn ), then

aUT) = D f . T +f . 8T
Section 5.4 Multiplication of Distributions by Functions 267

Proof·

(Df· T + f· aT)(</» = T(Df· </» + aT(f· </» = T(Df· </» - T(D(f</»)


= T(Df· </» - T(Df· </> + f· D</»
= -T(f· D</» = -(fT)(D</» = (a(fT))(</» •

Theorem 5. Let n = 1, let T be a distribution, and let u be an


element of Coo (JR). If aT + uT = i, for some f in C(JR) , then T = 9
for some gin C1(JR), and g' + ug = f.

Proof. If u = 0, then aT = f. Write f = h', where

h(x) = l x
f(y) dy.

Then hE C1(JR). From the equation

a(T - h) = aT - i = 0
we conclude that T - h~ for some constant c. (See Theorem 3 in Section 5.2,
page 256.) Hence T = h + c.
If u is not zero, let v = exp J u dx. Then v' = vu and v E Coo (JR). Then vT
is well-defined, and by Theorem 4,

a(vT) = v'T + vaT = v(uT + aT) = vi = vf

By t~first part of the proof, we have vT = 9 for some g E C 1 (JR). Hence


T = g/v. It is easily verified that (g/v)' + u(g/v) = f. •

Theorem 6. If </> E 1), then

Proof. The right side is just the limit of Riemann sums for the integral. In
the case n = 2, we set up a lattice of points in JR 2 . These points are of the form
(ih,jh) = h(i,j) = ha, where a runs over the set of all multi-integers, having
positive or negative entries. Each square created by four adjacent lattice points
has area h 2 • •

Problems 5.4
1. Prove that if vE coo(JRn) and if f E Ltoc(JRn ), then ;;J = vi
2. For integers n and m, (,~) = (n.:'m)' Is a similar result true for multi-indices?
3. Prove that if T j -t T in 1)1 and if f E Coo (JR n ), then fTj -t fT.
4. Let 8 be a test function such that 8(0) f. O. Prove that every test function is the sum of
a multiple of 8 and a test function that vanishes at O.
5. (Continuation) Prove that if f E Ck(JR) and f(O) = 0, then f(x)/x, when defined appro-
priately at 0, is in Ck-1(JR).
268 Chapter 5 Distributions

6. (Continuation) Let n = 1 and put 'Ij;(x) = x. Prove that a distribution T that satisfies
'lj;T = 0 must be a scalar multiple of the Dirac distribution.
7. For fixed 'Ij; in COO(JR n ) there is a multiplication operator M", defined on 1)1 by the
equation M",T = 'lj;T. Prove that M", is linear and continuous.
8. Prove that the product of a COO-function and a regular distribution is a regular distri-
bution.
9. Prove that if hi = f and h E C 1 (JR), then ot" = J
10. Let </> E Coo(JR) and let H be the Heaviside function. Compute o(</>H).
11. Prove the Leibniz formula for the product of a COO-function and a distribution.
12. Define addition of multi-indices a and {3 by the formula (0 + (3)i = 0i + {3i for 1 :::;; i :::;; n.
If {3 :::;; 0, we can define subtraction of {3 from a by (a - (3)i = 0i - {3i' Define also
a! = 01!02!'" on!. Prove that

a a!
if {3:::;; a
({3) = (3!(0 - (3)!

13. (Continuation) Express (1 + Ixl 2)m as a linear combination of "monomials" x"'. Here
x E JRn, mEN, a E Nn.
14. Prove that for any multi-index 0,

D"'(l + Ixl2)m = L c~x~(1 + IxI 2 r- I"'1


I~I";;I"'I

15. Verify the formal Taylor's expansion

f(x + h) = "'" ~ h'" D'" f(x)


La!

16. The binomial coefficients are often displayed in "Pascal's triangle":

2
3 3
4 6 4

Each entry in Pascal's triangle (other than the l's) is the sum of the two elements
appearing above it to the right and left. Prove that this statement is correct, Le., that

L (~) = 2n. Is a similar result true for the sum L


n
17. Prove that (;)?
k=O 1.61";;1",1
18. Prove that the number of multi-indices {3 that satisfy the inequality 0 :::;; {3 :::;; a is
(1 + (1)'" (1 + on).
19. Find a single general proof that can establish the univariate and multivariate cases of
the Binomial Theorem and the Leibniz Formula. This proof would exploit the similarity
between the formulas

20. Prove Lemma 1.


Section 5.5 Convolutions 269

5.5 Convolutions

The convolution of two functions f and ¢ on ]Rn is a function f * ¢ whose


defining equation is

(1) (f * ¢)(x) = r
illt n
f(y)¢(x - y) dy

The integral will certainly exist if ¢ E 1> and if f E Lfoc (]Rn), because for each
x, the integration takes place over a compact subset of ]Rn. With a change of
variable in the integral, y = x - z, one proves that

(f * ¢)(x) = r
illtn
f(x - z)¢(z) dz = (¢ * f)(x)

In taking the convolution of two functions, one can expect that some favor-
able properties of one factor will be inherited by the convolution function. This
vague concept will be illustrated now in several ways. Suppose that f is merely
integrable, while ¢ is a test function. In Equation (1), suppose that n = 1, and
that we wish to differentiate f * ¢ (with respect to x, of course). On the right

1:
side of the equation, x appears only in the function ¢, and consequently

(f * ¢)'(x) = f(y)¢'(x - y) dy

The differentiability of the factor ¢ is inherited by the convolution product f *¢.


This phenomenon persists with higher derivatives and with many variables.
It follows from what has already been said that if¢ is a polynomial of degree
at most k, then so is f * ¢. This is because the (k + l)st derivative of f * ¢ will
be zero. Similarly, if ¢ is a periodic function, then so is f * ¢.
We shall see that convolutions are useful in approximating functions by
smooth functions. Here the "mollifiers" of Section 5.1 playa role. Let ¢ be a
mollifier; that is, ¢ E 1>, ¢ ~ 0, J ¢ = 1, and ¢(x) = 0 when Ixl
~ 1. Define
¢j (x) = jn ¢(j x). It is easy to verify that J¢j = 1. (In this discussion j ranges
over the positive integers.) Then

f(x) - (f * ¢j)(x) = f(x) - J f(x - z)¢j(z) dz

= J f(x)¢j(z) dz - J f(x - z)¢j(z) dz

= J [f(x) - f(x - z)]¢j(z)dz

Since ¢(x) vanishes outside the unit ball in ]Rn, ¢j(x) vanishes outside the ball
of radius 1/j, as is easily verified. Hence in the equation above the only values
of z that have any effect are those for which Izl
< 1/j. If f is uniformly
continuous, the calculation shows that f * ¢j(x) is close to f(x), and we have
therefore approximated f by the smooth function f * ¢. Variations on this idea
will appear from time to time.
270 Chapter 5 Distributions

Using special linear operators B and Ex defined by


(2) (Ex¢)(Y) = ¢(y - x)
(3) (B¢)(y) = ¢(-y)
we can write Equation (1) in the form

(4) (J * ¢)(x) = j(Ex B¢)


For fELloe (JRn) and ¢ E 2> we have

(5)
Exf(¢) = JExf· ¢ = J f(y - x)¢(y) dy = J f(z)¢(z + x) dz

= j(E- x¢)

Equations 4 and 5 suggest the following definition.


Definition. 1fT is a distribution, ExT is defined to be TE_ x . If ¢ E 2>, then
the convolution T * ¢ is defined by (T * ¢)(x) = T(ExB¢).

Lemma 1. For T E 2>' and ¢ E 2>,


(6) Ex(T * ¢) = (ExT) * ¢ = T * Ex¢
Proof. Straightforward calculation, using some results in Problem I, gives us
[Ex(T * ¢)](y) = (T * ¢)(y - x) = T(Ey_xB¢)
[(ExT) * ¢](y) = (ExT)(EyB¢) = T(E_xEyB¢) = T(Ey_xB¢)
[T * Ex¢](Y) = T(EyBEx¢) = T(EyE_xB¢) = T(Ey_xB¢) •
Lemma 2. IfT is a distribution and if ¢j -# ¢ in 2>, then T * ¢i -+
T * ¢ pointwise.

Proof. By linearity (see Problem 3), it suffices to consider the case when ¢ = o.
- # 0 in 2>, then for all x,
If ¢j

(T * ¢j)(x) = T(ExB¢j) -+ 0
by the continuity of B, Ex, and T (Problem 8).

Lemma 3. Let [Xj] be a sequence of points in JRn converging to x.
For each ¢ E 2>,
(7)

Proof. If K I = {x, Xl, X2, .•. } and if K 2 is the support of ¢, then (as is easily
verified) the supports of E xj ¢ are contained in the compact set

K I +K2 ={u+V:UEKI , VEK2 }

Now we observe that


(8) (Exj¢)(Y) -+ (Ex¢)(Y) uniformly for y E KI + K2
Section 5.5 Convolutions 271

Indeed, for a given € > 0 there is a 8 > 0 such that


lu - vi < 8 ==:;.I¢(u) - ¢(v)1 < €

(This is uniform continuity of the continuous function ¢ on a compact set.)


Hence if IXj - xl < 8, then I¢(y - Xj) - ¢(y - x)1 < e. It now follows that

(DO EXj¢)(Y) ---+ (DO Ex¢)(Y) uniformly for y E KI + K2


because DO Ex.¢
)
= ExDo¢, and (8) can be applied to DO¢.
J •
Lemma 4. Let e = (1,0, ... ,0),0 < It I < 1, and Ft = C l (Eo - Ete ).
Then for each test function ¢, Ft ¢ --# 88¢ as t ---+ O. (This convergence
Xl
is in the topology of 'D .)

Proof. Since It I < 1, there is a single compact set K containing the supports of
Ft ¢ and 88¢ . By the mean value theorem (used twice) we have (for 0 < 0,0' < 1)
Xl

I:~ (X) - (Ft¢)(X) =I I:~ (X) - Cl[¢(x) - ¢(x - te)ll

= I~(x) - 8¢ (X - Ote) I
8XI 8XI

The norm used here is the supremum norm on K. Our inequality shows that
as t ---+ 0, (Ft¢)(x) ---+ (:~) (x) uniformly in X on K. Since ¢ can be any test
function, we can apply our conclusion to DO¢, inferring that FtDO¢ converges
uniformly to -88 DO¢ on K. Since DO commutes with Ft (Problem 9) and
Xl
with other derivatives, we conclude that DO Ft ¢ converges uniformly on K to
DO 88¢. This proves that the convergence of Ft ¢ is in accordance with the
Xl
notion of convergence adopted in 'D. •

Theorem. 1fT is a distribution and if ¢ is a test function, then for


each multi-index 0:,

(9)
Proof. From Equations (3) and (2) we infer that

DOB = (_1)1 0 1 BDo


DOEx = Ex DO
272 Chapter 5 Distributions

Hence

(8°T * ¢)(x) = (8°T)(ExB¢) = (_1)1 0 1T DO ExB¢


= (_l)loIT ExDoB¢ = T Ex BDo¢ = T * DO¢
This proves one part of Equation (9). For the other part, consider the special case
8
Q = (0, ... ,0,1,0, ... ,0). Thus D = -8 . Let e = (0, ... ,0,1,0, ... ,0) E JRn,
Xi
and Ft = t-1(Eo - Ete). Since

lim ¢(y) - ¢(y - te) = lim [C1(Eo - E te )¢] (y)


88Xi ¢(y) = t~O t t~O

= lim(Ft¢)(y)
t~O

we have D¢ = limFt ¢, by Lemma 4. Using Lemma 1, we have


t~O

By Lemma 2, we can let t -+ ° to obtain

D(T * ¢) = T * D¢
By iteration of this basic result we obtain, for any multi-index Q,


Corollary. JfT E 1)' and ¢ E 1), then T * ¢ E coo(JRn).
Proof. We have to prove that DO(T*¢) E C(JR n) for all multi-indices Q. Put
'¢ = DO¢. Then by the theorem,

To see that T * '¢ is continuous, let [Xj] be a sequence in JRn tending to x. By


Lemma 3,

Problems 5.5
1. Prove that
(a) ExEy == Ex+y
(b) BEx == E-xB
(c) ¢J* t/J == 1/1* e/>
(d) liyExB == lixEy
2. Prove that Ex : 1)1 -t 1)1 is linear, continuous, injective, and surjective.
3. Prove that for fixed T E 1)1 the map ¢ t-t T * e/> is linear from 1) to COO(JR n ).
4. Prove that if T E 1)1 and ¢ E 1), then

T(e/» == Ii(T * B¢) (Ii == Dirac distribution)


Section 5.6 Differential Operators 273

5. Fixing a distribution T, define the convolution operator GT by GT¢> = T * ¢>. Show that
tixGT = T ExB.
6. Prove that the vector sum of two compact sets in ]Rn is compact. Show by example that
the vector sum of two closed sets need not be closed. Show that the vector sum of a
compact set and a closed set is closed.
7. For 27r-periodic functions, define (f * g) (x) = J02" f(y)g(x - y) dy. Compute the convo-
lution of f(x) = sin x and g(x) = cosx.
8. Prove that B and Ex are continuous linear maps of 1) into 1). Are they injective? Are
they surjective?
9. Prove that DQEx = ExDQ.
10. What is ti * ¢>?
11. Which of these equations is (or are) valid?
(a) B(Ex(¢>(Y))) = ¢>(x - y)
(b) (B(Ex¢»)(Y) = ¢>(x - y)
(c) Ex(B(¢>(y))) = ¢>(-x - y)
(d) Ex(B(¢>(y))) = ¢>(x - y)

5.6 Differential Operators

Definition. A linear differential operator with constant coefficients is any


finite sum of terms cCiDCi. Such an operator has the representation

The constants CCi may be complex numbers. Clearly, A can be applied to any
function in cm(lRn).

Definition. A distribution T is called a fundamental solution of the oper-


ator L cCiD Ci if L cCiaCiT is the Dirac distribution.

Example 1. What are the fundamental solutions of the operator D in the case
n = I? (D = ~).
dx We seek all the distributions T that satisfy aT
_ = 8. We saw
in Example 1 of Section 5.2 (page 254) that aH = 8, where H is the Heaviside
function. Thus Ii is one of the fundamental solutions. Since the distributions
sought are exactly those for which aT = aIi, we see by Theorem 3 in Section
5.2 (page 256) that T = Ii + c for some constant c.

Theorem 1. The Malgrange-Ehrenpreis Theorem. Every


operator L cCiD Ci has a fundamental solution.

For the proof of this basic theorem, consult [Ho] page 189, or [Ru1] page 195.
The next theorem reveals the importance of fundamental solutions in the study
of partial differential equations.
274 Chapter 5 Distributions

Theorem 2. Let A be a linear differential operator with constant


coefficients, and let T be a distribution that is a fundamental solution
of A. Then for each test function </>, A(T * </» = </>.

Proof. Let A = L,caDa. Then L,ca8 aT = 8. The basic formula (the theo-
rem of Section 5, page 271) states that

D a (T*</»=8 a T*</>
From this we conclude that

In the last step we use the calculation

(8 * </»(x) = 8(ExB</» = (ExB</»(O) = (B</»(O - x) = </>(x) •

Example 2. We use the theory of distributions to find a solution of the


differential equation :: = </>, where </> is ~ test function. By Example 1.:.. one
fundamental solution is the distribution H. By the preceding theorem, H * </>
will solve the differential equation. We have, with a simple change of variable,

u(x) = (if * </»(x) = 1: H(y)</>(x - y) dy = 1~ </>(z) dz •

Example 3. Let us search for a solution of the differential equation

u' +au = </>

using distribution theory. First, we try to discover a fundamental solution, i.e.,


a distribution T such that ar
+ aT = 8. If T is such a distribution and if
v(x) = eax , then

8( v . T) = Dv . T + v . ar = av . T + v . (8 - aT) =v .8 =8
Consequently, by Example 1,
1 -
v . T = if + c and T = -(H + C)
v
Thus T is a regular distribution f, and since c is arbitrary, we use c = 0, arriving
at
f(x) = e- ax H(x)

1:
A solution to the differential equation is then given by

u(x) = (f * </»(x) = e- ay H(y)</>(x - y) dy

= 100
e-ay</>(x - y) dy
Section 5.6 Differential Operators 275

This formula produces a solution if <P is bounded and of class C l . •

The Laplacian. In the following paragraphs, a fundamental solution to the


Laplacian operator will be derived. This operator, denoted by ~ or by \7 2 , is
given by
[J2 [J2
~=-+ ... + -
ax? ax~
At first, some elementary calculations need to be recorded. The notation is
x = (Xl, ... ,xn) and Ixi = (xi + ... + X~)1/2.

Lemma 1. For x
a
=f. 0, -a Ixi = Xjlxl
-1
.
Xj

Proof·

-a a (Xl2 + ... + Xn2)1/2 = "2l( Xl2 + ... + Xn2)-1/2()


a Ix I = -a 2Xj = Xj IX1-1 •
Xj Xj

Lemma 2.

Proof·

Lemma 3. For X =f. 0, and 9 E C2(0, (0),

~g(lxJ) = g"(lxl) + (n - l)lxl- l g'(x)


Proof·

= g"(lxl)lxI 2 Ixl- 2 + g'(lxJ)(nlxl- l -lxI 2 Ixl- 3 )


= g"(lxl) + (n - l)lxlg'(x) •
276 Chapter 5 Distributions

For reasons that become clear later, we require a function 9 (not a con-
stant) such that .6.g(lxl) = 0 throughout ]Rn, with the exception of the singular
point x = O. By Lemma 3, we see that 9 must satisfy the following differential
equation, in which the notation r = Ixl has been introduced:
n-1
g"(r) +-r
- g'(r) = 0

This can be written in the form

g"(r)/g'(r) = (1- n)r- 1

and this can be interpreted as


d
dr logg'(r) = (1- n)r- 1

From this we infer that


log g' (r) = (1 - n) log r + log c
g'(r) = cr 1 - n

If n ~ 3, this last equation gives g(r) = r 2 - n as the desired solution. Thus we


have proved the following result:

Theorem 3. If n ~ 3 then .6.lxI 2 - n = 0 at all points of]Rn except


x =0.

Of course, this theorem can be proved by a direct verification that Ixl 2 - n


satisfies Laplace's equation, except at O. The fact that Laplace's equation is not
satisfied at 0 is of special importance in what follows. Let f(x) = IxI 2 - n . As
1
usual, will denote the corresponding distribution.
In accordance with the definition of derivative of a distribution, we have

For any test function </>,

(1)

The integral on the right is improper because of the singular point at O. It is


therefore defined to be

(2)

For sufficiently small £, the support of </> will be contained in {x : Ixl ~ £-1}.
The integral in (2) can be over the set
Section 5.6 Differential Operators 277

An appeal will be made to Green's Second Identity, which states that for regions
n satisfying certain mildhypotheses,

f (uilv - vilu) = f (u~v - v~u) . N


In Jan
The three-dimensional version of this can be found in [Hur] page 489, [MT] page
449, [Tay1] page 459, or [Las] page 118. The n-dimensional form can be found
in [Fla] page 83. In the formula, N denotes the unit normal vector to the surface
an. Applying Green's formula to the integral in Equation (2), we notice that
illxl 2- n = 0 in A€. Hence the integral is

(3)

The boundary of A€ is the union of two spheres whose radii are c and c 1 . On
the outer boundary, </> = ~ </> = 0 because the support of </> is interior to A€. The
following computation will also be needed:

The first term on the right in Equation (3) is estimated as follows

Hence when c --+ 0, this term approaches O. The symbol an represents the "area"
of the unit sphere in IRn. As for the other term,

IJ1xl=€
f [</>(x) - </>(0)]~lxI2-n . NI dS::;; (n - 2) f IxI1-nl</>(x) - </>(0)1 dS
J1xl=€

::;; (n - 2)c 1- n max I</>(x) - </>(0)1 f dS


Ixl=€ J1xl=€
= (n - 2)cl-nw(c)ancn-l --+ 0

In this calculation, w(c) is the maximum of I</>(x) - </>(0)1 on the sphere defined
by Ixl = c. Obviously, w(c) --+ 0, because </> is continuous. Thus the integral in
Equation (3) is

Hence this is the value of the integral in Equation (1). We have established,
therefore, that ill = (2 - n)a n t5. Summarizing, we have the following result.
278 Chapter 5 Distributions

Theorem 4. A fundamental solution of the Laplacian operator in


I 12 - n
dimension n ~ 3 is the distribution corresponding to (x ) ,where
2- nUn
Un denotes the area of the unit sphere in lRn.

Example 4. Find a fundamental solution of the operator A defined (for n = 1)


by the equation
A¢ = ¢" + 2a¢' + b¢ (¢ E 1))
We sEZk a distribution T such that AT = 8. Let us look for a regular distribution,
T = f. Using the definition of derivatives of distributions, we have

= i:
(A1)(¢) = 1(¢" - 2a¢' + b¢)

/(x) [¢"(x) - 2a¢'(x) + b¢(x)] dx

Guided by previous examples, we guess that / should have as its support the
interval [0, 00). The integral above then is restricted to the same interval. Using
integration by parts, we obtain

N'I ~ - 1= I' ¢' - 2a N ~ + 2a 1= /' ¢ + b1= N


I

= - /(O)¢' (0) - I' ¢I: + 1= !" ¢ + 2a/(0)¢(0) + 1= (2al' + bf)¢

= - /(O)¢'(O) + I'(O)¢(O) + 2a/(0)¢(0) + 1= (t" + 2al' + bJ)¢

The easiest way to make this last expression simplify to ¢(O) is to define / on
[0,00) in such a way that

(i) !" + 2al' + b/ = 0


(ii) /(0) = 0
(iii) I' (0) = 1

This is an initial-value problem, which can be solved by writing down the general
solution of the equation in (i) and adjusting the coefficients in it to achieve (ii)
and (iii). The characteristic equation of the differential equation in (i) is

Its roots are -a ± va 2 - b. Let d = va 2 - b. If d -I- 0, then the general solution


of (i) is

Upon imposing the conditions (ii) and (iii) we find that

x~O

x<O
Section 5.6 Differential Operators 279

The case when d = 0 is left as Problem 14.


A linear differential operator with nonconstant coefficients is typically of

the form
(4)

In order for this to interact properly with distributions, it is necessary to assume


that c'" E coo(JRn). Then AT is defined, when T is a distribution, by

(5)

Remember that T 0 D'" is a distribution; multiplication of this distribution by


the COO-function c'" is well-defined (as in Section 5.5). The result of applying
this to a test function ¢ is therefore

(6)

Notice that the parentheses in Equation (5) are necessary because c",T 0 D is
ambiguous; it could mean (c",T) 0 D.
It is useful to define the formal adjoint of the operator A in Equation (4).
It is
(7)

Notice that this definition is in harmony with the definition of adjoint for oper-
ators on Hilbert space, for we have
(AT)(¢) = T(A*¢) (T E 1)' , ¢ E 1))
and this can be written in the notation of linear functionals as
(AT,¢) = (T,A*¢) (T E 1)' , ¢ E 1))
Using Example 4 as a model, we can now prove a theorem about funda-
mental solutions of ordinary differential operators (i.e., n = 1).

Theorem 5. Consider the operator


m dj
A= L Cj(x) dxj
j=O

in which Cj E COO(JR) and cm(x) # 0 for all x. This operator has a


fundamental solution that is a regular distribution.

Proof. We find a function! defined on [0,00) such that


m
(i) LCj(x)!(j)(x) =0
j=O

(iii) (0 ~ j ~ m - 2)
280 Chapter 5 Distributions

Such a function exists by the theory of ordinary differential equations. In par-


ticular, an initial-value problem has a unique solution that is defined on any
interval [0, b], provided that the coefficient functions are continuous there and
the leading coefficient does not have a zero in [0, b]. We also extend f to all of
IR by setting f(x) =_0 on the interval (-00,0). With the function f in hand, we
must verify that Af = Ii. This is done as in Example 4. •

Problems 5.6
1. If the coefficients c'" are constants, then the formal adjoint of the operator c",D'" is L
)'(-I)I"'l c",D"'. If the former is denoted by A, then the latter is denoted by A*. Prove
that for any distribution T, AT = To A *.
2. (Continuation) Prove that the Laplacian

A= L (8:J
n 2

i=l

is self-adjoint; i.e., A* = A.
3. Solve the equation Y" + 2Y' +Y = 8 + 8' in the distribution sense, using a function of
the form Y(x) = H(x)f(x).
4. If P is a polynomial in n variables, say P = L c",x"', and if D is the n-tuple

( -88 '88 , ... , -88 ), then what should we mean by P(D)?


Xl X2 Xn
5. Fix y E JRn and let f(x) = e(x,y) for X E JRn. Prove that P(D)f = P(y)f. Express this
result in the language of eigenvalues and eigenvectors.
6. If the functions v'" belong to GOO (JRn), then a differential operator L v",D'" has a coun-
terpart I: Va a'" that acts on distributions. Consider the operator

and compute its effect on the Dirac distribution on JR2.


82
7. Let n = 2 and find a fundamental solution of the operator - - - . (Try an analogue of
aXl8x2
the Heaviside distribution.)
8. What is the null space of the operator in Problem 7, interpreting it as a map on Gce (JR2)?
9. What is the null space of the operator in Problem 7 if we interpret it as a map on 1)'?

10. Let n = 1, and find a fundamental solution of the operator dxd?2' Use it to give a solution
to u" = ,p in the form of an integral.
11. Let = 1 and f(x) = eiklxl. Show that a multiple of J is a fundamental solution of the
::2 +
n

operator k 2 I. Give an integral that solves the equation u" + k 2 u = ,p.


12. Let p be a function such that p and 1 belong to Gl{JR). Define f(x) to be
P 0
dt/p(t) if J.x
X > 0 and to be 0 if X ~ O. Show that, in the distributional sense, (pf')' = 8.

13. In Examples 2 and 3 find more general solutions by retaining the constant in H + c.
14. Complete Example 4 by obtaining the fundamental solution when d = O.
Section 5.7 Distributions with Compact Support 281

5.7 Distributions with Compact Support

In this section we prove a theorem on partitions of. unity in the space 1) of


test functions. Then we define the support of a distribution, and study the
distributions whose support is compact. In particular, the convolution S * T of
two distributions can be defined if one of Sand T has compact support. Recall
the more fundamental notion of the support of a function f. It means the
closure of the set {x : f(x) i- O}.

Lemma 1. There is a function f E COO(JR) such that 0 ~ f ~ 1,


f(x) = 0 on (-00,0], and f(x) = 1 on [1,00).

Proof. Define
exp[x 2 /(x 2 - 1)] Ixl < 1
g(x) = {
o Ixl ~ 1
and


9(X-l) x~l
f(x) = {
1 otherwise
The graphs of f and 9 are shown in Figure 5.3.

1 ~~~--~-+--~~-----~ .1 -~-~

O.s o . SI-----!----+---f-+-----i
o . 61-----'--+--+--+--+-·---j o .61----I--+-+--+------j
o . 41~~·---+-+ 0.4
0_2

-2 2 -2 2

Figure 5.3

Lemma 2. If Xo E ]Rn and p > r > 0, then there is a test function


¢ such that
(i) 0 ~ ¢ ~ 1
(ii) ¢(x) = 1 if Ix - xol ~ r
(iii) ¢(x) = 0 if Ix - xol ~ p.

Proof. Use the function f from the preceding lemma, and define

¢(x) = 1- f(alx - xol 2 - b)

with a = (p2_r2)-1 and b = r 2a. If Ix-xol ~ r, then alx-xoI 2-b ~ ar2-b = 0,


so ¢(x) = 1. If Ix - xol ~ p, then alx - xol 2 - b ~ ap2 - b = a(p2 - r2) = 1, so
¢(x) = O. •
282 Chapter 5 Distributions

Theorem 1. Partitions of Unity. Let A be a collection of open


sets in ]Rn, whose union is denoted by n. Then there is a sequence [</>i]
in 1) (called a "partition of unity subordinate to A") such that:
a. 0:::;; </>i :::;; 1 for i = 1,2, ...
h. For each i there is an Vi E A such that SUPP(</>i) C Vi.
c. For each compact subset K ofn there is an index m such that

on a neighborhood of K.

Proof. (Rudin) Let [B(Xi' ri)] denote the sequence of all closed balls in ]Rn
having rational center Xi, rational radius ri, and contained in a member of A.
By the preceding lemma, there exists for each i a test function 'lj;i such that
0:::;; 'lj;i:::;; 1, 'lj;i(X) = Ion B(xi,rd2), and 'lj;i(X) = 0 outside of B(xi,ri). Put
</>1 = 'lj;1 and

(1) (i ~ 2)
It is clear that on the complement of B(Xi' ri), we have 'lj;i(X) = 0 and </>i(X) = o.
By induction we now prove that

(2)

Equation (2) is obviously correct for i = 1. If it is correct for the index i-I,
then it is correct for i because
</>1 + ... + </>i = 1- [(1-11>1) ... (1-'lj;i-1)] + [(l-'lj;1)··· (l-'lj;i-1)]'lj;i
= 1 - [(l-'lj;d··· (1 -'lj;i-d(1-'lj;i)]

Since 0 :::;; 'lj;i :::;; 1 for all i, we see from Equation (2) that
00

On the other hand, if X E U::1 B(Xi' rd2), then 'lj;i(X) = 1 for some i in
{I, ... ,m}. Then </>l(X) + ... + </>m(x) = 1 from Equation (2). Since the open
balls B(Xi' rd2) cover n, each compact set K in n is contained in a finite union
U::1 B(xi,rd2). This establishes (c). •
Fixing a distribution T, we consider a closed set F in ]Rn having this prop-
erty:

(3) T(</» = 0 for all test functions </> satisfying supp(</» C]Rn" F

Theorem 2. Let supp(T) denote the intersection of all closed sets


having property (3). Then supp(T) is the smallest closed set having
property (3).

Proof. Let F be the family of all closed sets F having property (3). Then

supp(T) = n{F: F E F}
Section 5.7 Distributions with Compact Support 283

Being an intersection of closed sets, supp(T) is itself closed. The only question
is whether it has property (3). To verify this, let cf> be a test function such that
supp(cf» c JR n " supp(T). It is to be shown that T(cf» == o. By De Morgan's
Law,

By the preceding theorem, there is a partition of unity ['!j1j] subordinate to the


family of open sets {JR n " F : F E F}. Since supp(cf» is compact, there exists
(by Theorem 1) an index m such that
m
L '!j1i(X) == 1 on a neighborhood of supp(cf»
i=l

Notice that cf> == cf> 2::;:1 '!j1i, because if cf>(x) == 0, the equation is obviously true,
while if cf>(x) i- 0, then x E supp(cf» and 2::;:1 1/Ji(X) == 1. Hence, by the linearity
ofT,

Again by Theorem 1, there exists for each i an Fi E F such that

Since Fi E F, Fi has property (3), and we conclude that T(cf>'!j1i) == 0 for 1 ~ i ~


m. By Equation (4), T(cf» == O. •
Notice that suppO has two different meanings: one for functions on JRn and
another for distributions. This is the conventional practice. By Problem 10, the
two definitions are compatible.
Example 1. The support of the Dirac distribution (j is the set {OJ. If cf> is
a test function for which supp(cf» C JR n " {OJ then clearly (j(cf» == O. •

Example 2. The support of the Heaviside distribution if is the set [0,(0) .

Definition. The space E is defined to be the space Coo (JR n ) with convergence

defined as follows: cf>j -+ 0 if for each multi-index 0, Dacf>j(x) converges uni-
formly to 0 on every compact set.

Theorem 3. Each distribution having a compact support has an


extension to E that is continuous.

Proof. Let T be a distribution for which supp(T) is compact. By the theo-


rem on partitions of unity, there is a test function '!j1 such that '!j1(x) == 1 on a
neighborhood of supp(T). Define T on E by the equation T(cf» = T(cf>'!j1). This is
meaningful because cf>'!j1 E 1). Now we wish to establish that T is continuous on
E. To this end, let cf>j E E and suppose that cf>j -+ 0, the convergence being as
prescribed in E. All the functions cf>j1/J vanish outside of supp( '!j1), and for each
284 Chapter 5 Distributions

multi-index 0:, DOt(¢>j'I/J) converges uniformly to 0 by the Leibniz formula. Hence


--» 0 in 1>. By the continuity of T and the definition of T,
¢>p/J
T(¢>j) = T(¢>j¢) --t 0

Finally, we must prove that T is an extension of T. Let ¢> be any test


function; we want to show that T(¢» = T(¢». Equivalent equations are T(¢>¢) =
T(¢» and T(¢>¢ - ¢» = o. To establish the latter, it suffices to show that
supp( ¢>¢ - ¢» C JR n " supp(T)
(Here we have used Theorem 2.) Since ¢>¢ - ¢> = ¢>. (¢ - 1), it is enough to prove
that
supp( ¢ - 1) C JR n " supp(T)
To this end, let x E supp(¢ - 1). By definition of a support, we can write
x = limxj, where (¢ -1)(xj) "10. Since ¢(Xj) "I 1, we have Xj ¢. N, where N is
an open neighborhood of supp(T) on which ¢ is identically 1. Since Xj E JRn "N,
we have x E JRn "N because the latter is closed. Hence x E JR n " supp(T). •

Theorem 4. Each continuous linear functional on E is an extension


of some distribution having compact support.

Proof. Let L be a continuous linear functional on E. Let T = L 11>, which


denotes the restriction of L to 1>. It is easily seen that T is a distribution. In
order to prove that the support of T is compact, suppose otherwise. Then for
each k there is a test function ¢>k whose support is contained in {x : Ixi > k}
such that T(¢>k) = 1. It follows that ¢>k --t 0 in E, whereas L(¢>k) = 1. In order
to prove that L = T, as in the preceding proof select rj E 1> so that rj(x) = 1
if Ixi :::; j and rj(x) = 0 if Ixi ~ 2j. If ¢> E E, then rj¢> --t ¢> in E. Hence
L(¢» = limL(rj¢» = limT(rj¢» = T(¢»
because supp(T) C supp(rj) for all sufficiently large j.
The preceding two theorems say in effect that the space E' consisting of all

continuous linear functionals on E can be identified with a subset of 1>', viz. the
set of all distributions having compact support.
Recall that the convolution of a test function ¢> with a function f E L~oc (JR n )
has been defined by

(4) (f*¢»(X)= r
JJRn
f(y)¢>(x-y)dy

The convolution of a distribution T with a test function ¢> has been defined by
(5) (T * ¢»(x) = T(ExB¢»
where (B¢»(x) = ¢>(-x) and (Ex¢)(Y) = ¢(y - x).
Now observe that if T has compact support, then T (or more properly, its
extension T) can operate (as a linear functional) on any element of E. Conse-
quently, in this case, (5) is meaningful not only for ¢> E 1> but also for ¢> E E.
Equation (5) is adopted as the definition of the convolution of a distribution
having compact support with a function in coo(JRn).
Section 5.7 Distributions with Compact Support 285

Theorem 5. If T is a distribution with compact support and if


¢ E E, then T * ¢ E E.
Proof. See [Ru1]' Theorem 6.35, page 159.
Now let Sand T be two distributions, at least one of which has compact

support. We define S * T to be a distribution whose action is given by the
following formula:

(6) (S * T)(¢) = 6(S * (T * B¢)) (¢ E 2))

Here 6(¢) = ¢(O) and (B¢)(x) = ¢(-x). We first verify that (6) is meaningful,
i.e., that each argument is in the domain of the operator that is applied to it.
Obviously, B¢ E 2) and T * B¢ E E by the corollary in Section 5.5, page 272. If
8 has compact support, then by the preceding lemma, 8 * (T * B¢) E E. Hence
6 can be applied. On the other hand, if T has compact support, then T * B¢
is an element of E having compact support; in other words, an element of 2).
Then 8 * (T * B¢) belongs to E, and again 6 can be applied.
It is a fact that we do not stop to prove that 8 * T is a continuous linear
functional on 2); thus it is a distribution. (See [Ru1], page 160].)
Finally, we indicate the source of the definition in Equation (6). If 8 and T
are regular distributions, then they correspond to functions J and 9 in L}oc(JR n).
In that case,

(8 * T)(¢) = (J;g)(¢) = I (f * g)(x)¢(x) dx


= II J(y)g(x-y)¢(x)dydx

On the other hand,

6(J* (g*(B¢))) = (J* (g*(B¢)))(O) = I J(y)(g*B¢)(-y)dy

= II J(y)g(z)(B¢)(-y-z)dzdy

= II J(y)g(z)¢(y + z) dz dy

= II J(y)g(x - y)¢(x) dx dy

Problems 5.7
1. Refer to the theorem concerning partitions of unity, page 282, and prove that for each x
there is an index j such that I/>i (x) = 0 for all i > j.
2. Let [xil be a list of all the rational points in IRn. Define T by T(I/» = L::':1 2- i 4>(Xi),
where I/> is any test function. Prove that T is a distribution and supp(T) = IRn.
3. For a distribution T, let ;:1 be the family of all closed sets F such that T( 4» = 0
when I/> I F = O. Let ;:2 be the family of all closed sets F such that T(I/» = 0 when
supp(l/» c IR n "F. Show that ;:1 is generally a proper subset of ;:2.
286 Chapter 5 Distributions

4. Refer to the theorem on partitions of unity and prove that 1j; L~=l ¢i ..... 1j; as j ..... 00,
provided that supp(1j;) C O.
5. Prove that the extension of T as defined in the proof of Theorem 3 is independent of the
particular 1j; chosen in the proof.
6. If ¢ E 2), T E 2)', and supp(¢) n supp(T) = 0 (the empty set), then T(¢) = O.
7. If T E 2)' and supp(T) = 0, what conclusion can be drawn?
8. Show that if ¢ E COO(lR n ), if T E 2)', and if ¢(x) = 1 on a neighborhood of supp(T),
then ¢T = T.
9. Why, in proving Lemma 1, can we not take 9 to be a multiple of the function p introduced
in Section 7.1? _
10. Let f E C(lRn). Show that supp(f) = supp(f).
11. Let T be an arbitrary distribution, and let K be a compact set. Show that there exists
a distribution S having compact support such that S(¢) = T(¢) for all test functions ¢
that satisfy supp(¢) C K.
12. Prove that a distribution can have at most one continuous extension on c.
13. Prove that if a distribution does not have compact support, then it cannot have a con-
tinuous extension on C.
14. Let T be a distribution and N a neighborhood of supp(T). Prove that for any test
function ¢, T(¢) depends only on ¢ IN.
15. Prove or disprove: If two test functions ¢ and 1j; take the same values on the support of
a distribution T, then T(¢) = T(1j;).
16. Refer to the theorem on partitions of unity and prove that the balls B(x;, r;/2) cover the
complement of the support of T.
17. Prove that if f and 9 are in LtoclRn, then for all test functions ¢,

(f * g)(¢) = f(B(g * B¢»


Chapter 6

The Fourier Transform

6.1 Definitions and Basic Properties 287


6.2 The Schwartz Space S 294
6.3 The Inversion Theorems 301
6.4 The Plancherel Theorem 305
6.5 Applications of the Fourier Transform 310
6.6 Applications to Partial Differential Equations 318
6.7 Tempered Distributions 321
6.8 Sobolev Spaces 325

6.1 Definitions and Basic Properties

The concept of an integral transform is undoubtedly familiar to the reader in


its manifestation as the Laplace transform. This is a useful mechanism for
handling certain differential equations. In general, integral transforms are helpful
in problems where there is a function f to be determined from an equation that
it satisfies. A judiciously chosen transform is then applied to that equation, the
result being a simpler equation in the transformed function F. After this simpler
equation has been solved for F, the inverse transform is applied to obtain f. We
illustrate with the Laplace transform.
Example. Consider the initial value problem

(1) 1" - f' - 21 = 0 1(0) = a f'(0) = (3

The Laplace transform of 1 is the function F defined by

F( 8) = 1 00
I(t)e- st dt

The theory of this transform enables us to write down the equation satisfied by
F:

(2) (8 2 - 8 - 2)F(8) + (1- 8)a - {3 = 0

287
288 Chapter 6 The Fourier Transform

Thus the Laplace transform has turned a differential equation (1) into an alge-
braic equation (2). The solution of (2) is
F(s) = (13 + o:s - 0:)/(s2 - S - 2)
By taking the inverse Laplace transform, we obtain f :
f(t) = -Ho: + (3)e 2t + ~(20: - (3)e- t
The Fourier transform, now to be taken up, has applications of the type

just outlined as well as a myriad of other uses in mathematics, especially in
partial differential equations. The Fourier transform can be defined on any
locally compact Abelian group, but we confine our attention to IR n (which is
such a group). The material presented here is accessible in many authoritative
sources, such as [Ru1], [Ru2], [Ru3], [SW], and [Fol].
The reader should be aware that in the literature there is very little uni-
formity in the definition of the Fourier transform. We have chosen to use here
the definition of Stein and Weiss [SWJ. It has a number of advantages, not the
least of which is harmony with their monograph. It is the same as the definition
used by Horvath [HorvJ, Dym and McKean [DM], and Lieb and Loss [LLJ. Other
favorable features of this definition are the simplicity of the inversion formula,
elegance in the Plancherel Theorem, and its suitability in approximation theory.
We define a set of functions called characters e y by the formula

Here we have written


xY = (x, y) = x . Y = XIYl + X2Y2 + ... + xnYn
where the Xi and Yi are components of the vectors x and y. Notice that each
character maps IR n to the unit circle in the complex plane. Some convenient
properties of the characters are summarized in the next result, the proof of
which has been relegated to the problems.

Theorem 1. The characters satisfy these equations:


(a) ey(u + v) = ey(u)ey(v)
(b) Euey = ey(-u)ey where (Euf)(x) = f(x - u)
(c) ey(x) = ex(Y)
(d) ey(Ax) = eAy(x) (A E C)

The Fourier transform of a function f in L 1 (IRn) is the function i defined


by the equation

i(y) = ( e-21rixy f(x) dx


JJRn
In this equation, f can be complex-valued. The kernel e-27rixy is obviously
complex-valued, but x and y run over IRn. Notice that
[(y) = (j, ey )
since in dealing with complex-valued functions, the conjugate of the second
function appears under the integral defining the inner product.
Section 6.1 Definitions and Basic Properties 289

Example. Let n = 1 and let f be given by

Ion [-1,1]
f(x) = {
o elsewhere

Then

~ 1 . e-27rixy X =l
f(y) = ( e-27rtxy dx = - - . - I
L1 -2'Trzy
x=-l

1 e27riy _ e- 27riy

sin(2'TrY)
-2'Triy 'Try 2i 'Try

The function x f----l- sin (2'Trx )/ ('Trx) is called the sine function. It plays an
important role in signal processing and approximation theory. See [CL], Chapter
29, and the further references given there.
If f E L1(JR n ), what can be said of j? Later, we shall prove that it is
continuous and vanishes at 00. For the present we simply note that it is bounded.
Indeed, 1111100 ~ Ilf11 1, because

(3) lj(y)1 ~ Jle-27riXYllf(x)1 dx = J If(x)1 dx = IIfl11

In order to use the Fourier transform effectively, it is essential to know how


it interacts with other linear operators on functions, such as translation and
differentiation. The next theorem begins to establish results of that type.

Theorem 2. Let E denote the translation operator, defined by


(Eyf)(x) = f(x - y). Then we have Eyf = cyj and eyf = Eyf

Proof. We verify the first equation and leave the second to the problems. We
have

Eyf(x) = Jf(u - y)e- 27rixu du = J !(v)e- 27riX (v+y) dv

= e-27rixy J !(v)e-27riXV dv = e_y(x)j(x)



Recall the definition of the convolution of two functions, as given in Section
5.5, page 269:
(f * g) (x) = ( f(y)g(x - y) dy
JJRn
290 Chapter 6 The Fourier Transform

Theorem 3. If I and 9 belong to L1(]Rn), then the same is true of


1* g, and

Proof. ([Smi]) Define a function h on ]Rn x ]Rn by the equation

h(x, y) = g(x - y)

Let us prove that h is measurable. It is not enough to observe that the map
(x, y) ~ x - Y is continuous and that 9 is measurable, because the composition
of a measurable function with a continuous function need not be measurable.
For any open set 0 we must show that h- 1 (0) is measurable. Define a linear
transformation A by A(x, y) = (x - y, x + y). The following equivalences are
obvious:

(x,y) Eh- 1 (0) -¢=:::} h(x,y)EO


-¢=:::} g(x - y) EO
-¢=:::} x-yEg- 1 (0)
-¢=:::} (x - y,x + y) E g-l(O) x]Rn
-¢=:::} A(x,y) E g-l(O) x]Rn
-¢=:::} (x, y) E A-I [g-l(O) X ]Rn]

This shows that

Since 9 is measurable, g-l(O) and g-l(O) X ]Rn are measurable sets. Since A
is invertible, A-I is a linear transformation; it carries each measurable set to
another measurable set. Hence h- 1 (0) is measurable. Here we use the theorem
that a function of class C 1 from ]Rn to ]Rn maps measurable sets into measurable
sets, and apply that theorem to A-I.
The function F(x, y) = I(y)g(x - y) is measurable, and

JJ IF(x,y)1 dxdy = JII(y)1 J Ig(x - y)1 dxdy

= JII(y)lllgI11 dy = 111111 IIgl11


By Fubini's Theorem (See Chapter 8, page 426), F is integrable (i.e., F E
L1(]Rn X ]Rn)). By the Fubini Theorem again,

III * gill = flU * g)(x)1 dx ~ ff IF(x, y)1 dydx = 111111 IIgl11 •


Theorem 3 can be found in many references, such as [Smi] page 334, [Ru3] page
156, [Gol] page 19.
Section 6.1 Definitions and Basic Properties 291

Theorem 4. Iff and 9 belong to £1 (JRn), then

Proof. We use the Fubini Theorem again:

J;g(x) = I e_x(y)(f * g)(y)dy = I cx(Y) I f(u)g(y - u)dudy

=II e_x{u+y-u)f(u)g(y-u)dudy

=I e-x(u)f(u) I e_x(y - u)g(y - u) dudy

=I e-x(u)f(u) I e_x(y - u)g(y - u)dydu

= I cx(u)f(u)du I e_x(z)g(z)dz = j(x)g(x)



Theorem 5. If f E £1 (JRn), then j E Co(JRn). Thus, f is continuous
and "vanishes at 00."

Proof. From the definition of j,

If y converges to x through a sequence of values yj, then the integrand is bounded


above by 2If(z)l, and converges to 0 pointwise (i.e., for each z). By the Lebesgue
dominated convergence theorem (Chapter 8, page 406), the integral tends to o.
Hence j(Yj) -+ j(x).
In order to see that j vanishes at infinity, we note that -1 = e-1I:i and
compute j as follows, using r = 21x1 2 :

j(x) = I f(y)e- 2 11:iX Y dy = J- f(u)e- 211:iX(1J+x/r) du

With the change of variable y = u + x/r this becomes

~ =
f(x) I ( X) e-~1I:tXY
- f y - -:;: ~. dy

It follows that

J
and that
21j(x)1 ~ If(Y) - f(y - ~) Idy
292 Chapter 6 The Fourier Transform

At this point we want to say that as x tends to infinity, x/r -+ 0 (because


r = 2IxI 2 ), and the right-hand side of the previous inequality tends to zero.
That assertion is justified by Lemma 3 of Section 6.4, page 306. •
Suggested references for this chapter are [Ad], [BN], [Bac], [Brl, [eL], [DMJ,
[Fol], [GVl, [Goll, [Gre], [Gri], [Hell, [Ho], [Horv], [Kat], [Ko], [Lanll, [L1], [Loa],
[RS], [Rull, [Ru2l, [Ru3l, [Schl], [SW], [Til], [Ti2l, [Wal], [Wie], [Yo], [Ze], and
[Zeml·
The table of Fourier transforms presented next uses some definitions and
proofs that emerge in later sections.

Table of Fourier Transforms

Function Its Fourier Transform Definitions


n
f xY == LXjYj
j=l

(Evf)(x) == f(x - v)

P(D)f p+j p+(X) == P(27rix)

19 (f * g)(x) == J f(y)g(x - y) d,

Pf P-(D)j P-(x) == P (-~)


27rt

>..-nsl/Aj (SAf)(X) == f(>..x)


sin(27rx) X =={I xEA
X[-l,l] x t---+ -'---'-
7rX A 0 x ~A
2
X t---+ e -7rX

x t---+ (x 2 + a 2)-1
sin(27rx)
x t---+ --''---Co. X[-l,l]
7rX
Section 6.1 Definitions and Basic Properties 293

Problems 6.1
1. Prove Theorem 1.
2. Does the group JRn have any continuous characters other than those described in the
text?
3. Express 8(f * etl in terms of a Fourier transform.
4. What are the characters of the additive group Z?
5. Find the Fourier transform of the function

f(x) = {ocosx 0~ x ~ 1
elsewhere.

6. Prove the second assertion of Theorem 2. ~


7. Prove that the mapping F that takes a function f into f is linear and continuous from
Ll(JRn) to C(JRn). (Note: F is also called the Fourier transform.)
8. Prove that if,\ > 0 and hex) = f(x/'\), then hex) = ,\nj(,\x).
9. Prove that if hex) = f( -x), then hex) = j(x).
10. Prove that Ll (JRn) with convolution as product is a commutative Banach algebra. A
Banach algebra is a Banach space in which a mUltiplication has been introduced such
that x(yz) = (xy)z, x(y + z) = xy + xz, (x + y)z = xz + yz, '\(xy) = ('\x)y = x('\y),
and IIxyll ~ Ilxlillyll·
11. Does the Banach algebra described in Problem 10 have a unit element? That is, does
there exist an element u such that u * f = f * u = f for all f?
12. Show that the function f(x) = (1 + ix)-2, x E JR, has the property that l(s) > 0 for
s < 0 and j(s) = 0 otherwise.
13. Prove that the function 4>(x) = x 2 has the property that for each f E Ll(JRn), 4> 0 1
is the Fourier transform of some function in £l(JRn). What other functions 4> have this
property?
14. Prove that the function f(x) = exp(x - eX) has as its Fourier transform the function
j(s) = r(l-is), and that j(s) is never O. The Gamma function is expounded in [Wid1].
15. (This problem relates to the proof of Theorem 4.) Let f : X --+ Y and 9 : Y --+ Z be
two functions. Show that for any subset A in Z, (g 0 f)-leA) = f-l(g-l(A)). Now let
X = Y = R.. Show that if 9 is continuous and f is measurable, then 9 0 f is measurable.
Explain why fog need not be measurable.
16. How are the Fourier transforms of f and J related?
17. Prove that if fj --+ f in £l(JR n ), then hex) --+ j(x) uniformly in JRn.
18. What logic can there be for the following approximate formula?

L
k=N
j(x) ~ f(k)e-27rikx
k=-N

Under what conditions does the approximate equation become an exact equation?

I:
19. (The Autocorrelation Theorem) Prove that if

g(x) = f(u)f(u+x)du

lil
then 9 = 2 .
J
20. Prove that f * 9 = f g.J J
21. Assume that f is real-valued and prove that the maximum value of f * Bf occurs at the
origin. The definition of B is (Bf)(x) = f( -x).
22. Recall the Heaviside function H from Section 5.1, page 250. Define f(x) = e- ax H(x)
and g(x) = e- bx H(x). Compute f * g, assuming that 0 < a < b.
23. Prove that if f is real-valued, then lil
2 is an even function.
24. We have adopted the following definition of the Fourier transform:

(Flf)(y) = r
iRn
e-27rixy f(x) dx
294 Chapter 6 The Fourier Transform

1.
Other books and papers sometimes use an alternative definition:

(F2f)(y) = -1)n/2
( e-'xy f(x) dx
211" Rn

Find the relationship between these two transforms.


25. (Continuation) Prove that the inverse Fourier transforms obey this formula:
r;1 = (2n)-n/2 F 2- 1081 / 2 1< where (8),f)(x) = f(>.x)
26. (Generalization of Theorem 3) Prove that if f E LP(IRn) and 9 E Lq(IRn), then the
convolution f * 9 is well-defined, and Ilf * glLXl ~ Ilfllpllgllq'
27. Let f be the characteristic function of the interval [-1/2,1/2J. Thus, f(x) = 1 if Ixl ~
1/2, and f(x) = 0 otherwise. Define g(x) = (1 -lxl)f(x/2) and show that f * f = g.
28. Prove the "Modulation Theorem": If g(x) =f(x)cos(ax), then

g(x) = !2 1(~
211"
+ x) + !2 1(~
211"
- x)
29. Define the operator B by the equation (Bf)(x) = f( -x), and prove that f *Bf is always
even if f is real-valued.

6.2 The Schwartz Space

The space S, also denoted by S(JRn ), is the set of all ¢ in coo(JR n) such that
p. DCi¢ is a bounded function, for each polynomial P and each multi-index Q.
Functions with this property are said to be "rapidly decreasing," and the space
itself is called the Schwartz space. In the case n = 1, membership in the
Schwartz space simply requires sUPx Ixm¢(kl(x)1 to be finite for all m and k.
Example 1. The Gaussian function ¢ defined by
2
¢(x) = e- 1xl
belongs to S.
It is easily seen, with the aid of Leibniz's formula, that if ¢ E S, then

p. ¢ E S for any polynomial P, and DCi¢ E S for any multi-index Q.
We note that S(JR n ) is a subspace of Ll(JR n ). This is because functions in
S decrease with sufficient rapidity to be integrable. Specifically, if ¢ E S, then
the function x t----+ (1 + IxI2)n¢(x) is bounded, say by M. Then

= 1
M 00 r n - 1wn (1 + r 2)-n dr < 00
In this calculation we used "polar" coordinates and the "method of shells." The
thickness of the shell is dr, the radius of the shell is r, and the area of the shell
is rn-1wn , where Wn denotes the area of the unit sphere in JRn.
Definition. In S, convergence is defined by saying that ¢j -» 0 if and only
if P(x)· DCi¢j(X) -t 0 uniformly in JRn for each multi-index Q and for each
polynomial P. In other terms, lip·
DCi¢j 1100 -t 0 for every multi-index Q and
every polynomial P, the sup-norm being computed over JR n .
Section 6.2 The Schwartz Space 295

Lemma 1. If P is a polynomial, then the mapping ¢ f--+ P . ¢ is


linear and continuous from S into S.

Proof. Let ¢j ...... o. We ask whether Q . D{3 (P . ¢j) -+ 0 uniformly for each
polynomial Q and multi-index (3. By using the Leibniz formula, this expression
can be exhibited as a sum of terms Q,. DO¢j, where the Q, are polynomials and
0: is a multi-index such that 0: ~ (3. Each of these terms individually converges
uniformly to zero, because that is a consequence of ¢j ...... 0 in S. Therefore, their
sum also converges to O. •

Lemma 2. If g E S, then the mapping ¢ f--+ g¢ is linear and


continuous from S into S

Proof. This is left to the problems.



Lemma 3. For any multi-index 0:, the mapping ¢ f--+ DO¢ is linear
and continuous from S into S.

Proof. This is left to the problems.



In studying how the Fourier transform interacts with differential operators,
it is convenient to adopt the following definitions. Let P be a polynomial in
n variables. Then P has a representation as a finite sum P(x) = I: coxo, in
which each 0: is a multi-index, x = (Xl, ... ,xn ), Co is a complex number, and
XO = X~l x~2 ... x~n. Each function X f--+ XO is called a monomial. We define
also

Lemma 4. The function e y defined by ey(x) = e27rixy obeys the


equation P(D)e y = P(27riy)e y for any polynomial P.

Proof. It suffices to deal with the case of one monomial and establish that
DOe y = (27riy)Oe y . We have
296 Chapter 6 The Fourier Transform

Thus, by induction, we have

Consequently, DQ ey = (27riY1 )Q1 (27riY2)Q2 ... (27riYn) Qn ey = (27riytey •


The next result illustrates how the Fourier transform can simplify certain
processes, such as differentiation. In this respect, it mimics the performance of
the Laplace transform.

Theorem 1. If 1:. E S, and if P is a polynomial, then


[p( D /(27ri) )4»" p. 4>. Equivalently, [P(D)4»" = P+¢, where
P+(x) = P(27rix).

Proof. We have to show that

Since the Fourier map f t--+ 1is linear, it suffices to prove that

Equivalently, we must prove that

Thus we must prove that

In this integral we can use integration by parts repeatedly to transfer all deriva-
tives from 4> to the kernel function e_ y . Each use of integration by parts will
introduce a factor of -1. Observe that no boundary values enter during the
integration by parts, since 4> E S. Using also the preceding lemma, we find that
the integral becomes successively

(2~jQI(-1)IQI J 4>(x) [DQe_y(x)) (x) dx

= (-l)IQIC~JIQI J 4>(x)(-27riy)Qe_ y(x)dx

=yQ J 4>(x)e_ y(x)dx = yQ¢(y) •


Section 6.2 The Schwartz Space 297

Example 2. Let A denote the Laplacian operator; i.e.,

a )2
A=L n
j=1
(
ax
J

Then A = P(D/(27ri)) if P is defined to be

P(x) = (- 47r 2 ) (xi + x~ + ... + x~) = -47r2\x\2


Hence, for </> E S,

Equivalently,

Theorem 2. If </> E Sand P is a polynomial, then P( - D / (27ri))¢; =
P</>. Equivalently, P(D)¢; = P;;P, where p*(y) = P( -27riy).

Proof. We insert the variables, and interpret P(D) as differentiating with


respect to the variable x. Thus, with the help of Lemma 4, we have

[P(D)¢;] (x) = P(D) J e_x(Y)</>(Y) dy = P(D) J e_y(x)</>(y)dy

= J [P(D)e_ y] (x)¢(y) dy = J P( -27riy)e_ y(x)¢(y) dy

= J P*(y)cx(Y)¢(Y) dy = P*¢(x) •

In the preceding proof, one requires the following theorem from calculus.
See, for example, [Wid 1] page 352, or [Bart] page 271.

Theorem 3. If f and a f / ax are continuous functions on ~2, then,


provided that the integral on the right converges, we have

d {'Xl (Xl a
dx 10 f(x, t) dt = 10 ax f(x, t) dt

Theorem 4. The mapping ¢ r---+ ¢; is continuous and linear from S


into S.

Proof. First we must prove that j E S when ¢ E S. It is to be shown that ¢;


is a COO-function and that P . DOt¢ is bounded for each polynomial P and for
each multi-index Q. In Theorem 5 of Section 6.1 (page 291), we noted that ¢; is
continuous. By Theorem 2 above, DOt¢; = Q.¢ for an appropriate polynomial
Q. Since Q. </> E S (by Lemma 1), we know that Q.¢ is continuous, and can
therefore conclude that Da¢; is continuous. Hence ¢; E Coo.
298 Chapter 6 The Fourier Transform

Now we ask whether p. DCt¢> is bounded. By the preceding remarks and


Theorem 1,

(1) p. DCt¢ = p. Q.¢ = [P(2~i) (Q. ¢»] A


Since P(D/(27ri))(Q· ¢» E S, its Fourier transform is bounded, as indicated in
Equation (3) of Section 6.1, page 289.
For the continuity of the map, let ¢>j ...... 0 in S. We want to prove that ¢j ...... 0
in S. That means that P . DCt¢j -+ 0 uniformly for any polynomial P and any
multi-index Q. By Equation (1) above, the question to be addressed is whether
[P(D/(27ri))(Q· ¢>j)]" -+ 0 uniformly. If we put 'l/Jj = P(D/(27ri))(Q· ¢>j), we
ask whether :;j;j(t) -+ 0 uniformly. Now, 'l/Jj E S, and 'l/Jj ...... 0 in S by Lemmas 1
and 2. Hence (1 + IxI 2)n'I/Jj(x) -+ 0 uniformly. It follows that for a given € > 0
there is an integer m such that (1+ IxI2)nl'I/Jj(x)1 < € whenever j > m. For such

Jl'l/Jj(x)1 J+
j,
dx < € (1 IxI 2)-n dx = c€
and this shows that J I'l/Jjl -+ O. From the inequality

we infer that :;j;j(x) -+ 0 uniformly.


This section concludes with a proof of the Poisson summation formula. This

important result states that under suitable hypotheses on the function f, the
following equation is valid:

(1) L f(v) = L 1(v)


I/EZ n I/EZn

A variety of hypotheses can be adopted for this result. See, for example, [SW]
page 252, [Lanl] page 373, [Yo] page 149, [Gri] page 32, [Wall page 60, [Kat]
page 129, [Fri] page 104, [Ho] page 177, [DM] page 111, [Fol] page 337, [Til]
page 60.

Theorem 5. Poisson Summation Formula. If f E C(lRn) and if

s~p (If(x)1 + I1(X)I) (1 + Ixlr+€ < 00


for some € > 0, then LI/EZn f(v) = LI/EZn 1(1/).
Proof. Let c equal the supremum in the hypotheses. Then for \\x\\oo : :; 1 and
v =F 0 we have

If(x + v)1 : :; c( 1 + Ix + Vi) -n-€ :::;; c( 1 + \Ix + v\loo) -n-E


Section 6.2 The Schwartz Space 299

(In verifying these calculations, notice that the exponents are negative.) Then
we have

00
L If(x + v)1 ~ cL Ilvll:,n-€ = cL L Ilvll:,n-€
v,",o v ,",0 j=11IvII00=j

cLrn-€#{v: Ilvll = j}
00

=
j=1

= c Lrn-€(cd n-1 ) = C2 Lr 1 -€ < 00


00 00

j=1 j=1

By a theorem of Weierstrass (the "M-Test", page 373)), this proves that the
function F(x) = Lv f(x + v) is continuous, for its series is absolutely and
uniformly convergent. The function F is integer-periodic: For Jl E zn,
F(x + Jl) = L f(x + Jl + v) = L f(x + v) = F(x)
v v

Let Q = [0, l)n) the unit cube in JR n . The Fourier coefficients of the periodic
function Fare

Av = 1 Q
F(x)e- 27riVX dx = 1L
Q ~
f(x + Jl)e-27riVX dx

=L 1 f(y)e- 27riv (y-p.) dy = L 1 f(y)e-27rivy dy


p. Q+p. p. Q+p.

Our hypotheses on f are strong enough to imply that f E £1 (JR n ). Hence


i i
E Co(JRn). The hypothesis on shows that the series L j(x + v) is absolutely
and uniformly convergent. Hence

v v

From this we see that the Fourier series of F, Lv Ave27riVX, is uniformly and
absolutely convergent. By the classical theory of Fourier series (such as in [Zy],
vol. II, page 300) we have

It follows that
L f(v) = F(O) = L All = :E j(v) •
v v
300 Chapter 6 The Fourier Transform

Problems 6.2
1. Prove Lemma 2.
2. Prove Lemma 3.
3. Let f be an even function in L1(JR). Show that

~f(t) = 1 00
f(x)cos(27rtx)dx

The right-hand side of this equation is known as the Fourier Cosine Transform of f.
4. Let B be the operator such that (Bf)(x) = f( -x). Find formulas for ii] and Bj How
are these related?
5. What is the Leibniz formula appropriate for the operator (D/(27ri»"'?
6. Let f E L1(JRn) and g(x) = f(Ax), where A is a nonsingular n x n matrix. Find the
relation between / and g.
7. Prove that, after Fourier transforms have been taken, the differential equation f'(x) +
xf(x) = 0 becomes (fJ'(t) + 47r 2 tf(t) = o.
8. Give a complete formal proof that P . f E S whenever P is a polynomial and f E S.
9. Prove that for a function of n variables having the special form

n
f(X1, ... ,Xn ) = II/j(Xj)
j=1

we have
n
f(tl> ... , tn) = II jj(tj)
j=1

10. Prove that if f E L1(JR) and f > 0, then

(t # 0)

11. Let fm(x) = 1 if Ixl :::;; m, and fm(x) = 0 otherwise. (Here x E JR, and m = 1,2, .... )
Compute fm * It and show that it is the Fourier transform of a function in L1(JR).
12. Interpret Lemma 4 as a statement about eigenvalues and eigenvectors of a differential
operator.
13. Prove, by using Fubini's Theorem, that for functions f and 9 in U(JRn), J /g J= fi·
14. Explain why e- 1xl is not in S.
15. Prove that if ¢> E S, then ~ exists for any multi-index Q.
16. Let P be a polynomial on JRn and let 9 be an element of COO (JRn) such that Ig(x)1 :::;; IP(x)1
for all x E JRn. Is the mapping f o--t gf continuous from S into S?
17. Prove that S is the subspace of Coo (JRn) consisting of all functions </> such that for each
Q, the map x o--t x"'(D"'</>)(x) is bounded.

18. Show that </>j -» 0 in S if and only if P(D)( Q¢>j) -t 0 uniformly in JRn for all polynomials
P and Q.
19. Prove that if P is a polynomial and c is a scalar, then P(cD)e ll = P(27ricy)ell.
20. Prove that if P is a polynomial and c is a scalar, then P(cD)¢ = P;;P, where Pc(x) =
P( -27ricx).
21. Prove that P(-D)¢ =-;+;;, where P+(x) = P(27rix).
22. Prove that for x E JRn, Ix"'l:::;; Ixl''''l.
23. Using the operator B in Problem 4, prove that = BJ. f
24. Prove that
dk
dx k eX = eX Pk(X)
2 2

where the polynomials Pk are defined recursively by the equations PQ(x) 1 and
Pk+1(X) = 2XPk(X) + p~(x).
Section 6.3 The Inversion Theorems 301

25. Prove that if f obeys the differential equation

then the same is true of i


26. Prove that for each n there is a constant en such that

#{v E Zn : IIvlLXl = j} = cnjn-l


Suggestion: Compute #{v E zn : IIvIL", ~ j} first.
27. Prove that for </> E S(lRn) and>' i- 0,

L </>(x + >.v) = L >.-n;t;(*)e21fiVX/),


vEIn vEzn

28. Evaluate ~;;"=-oc (1 + k2)-1 by using Theorem 5.


29. Evaluate ~::1(k4 +a 4)-1 by using Theorem 5.

I:
30. The first moment of a function f is defined to be

xf(x)dx

Prove that under suitable hypotheses, the first moment is (jj'(O)/( -2ni).

6.3 The Inversion Theorems

In the previous section it was shown that the operator F defined by F(4)) = J;
is linear and continuous from S into S. In this section our goal is to prove that
F is surjective and invertible, and to give an elegant formula for F- 1 .

Theorem 1. The function (J defined on JRn by (J(x) = e- 1rX2 is a


fixed point of the Fourier transform. Thus, = (J. e
Proof. First observe that the notation is
n
x2 = xx = x . x = (x, x) = L x~ = Ixl 2

j=1

We prove our result first when n = 1 and then derive the general case. Define,
for x E JR, the analogous function 1j;(x) = e-...x 2 • Since 1j;'(x) = e- 1rX2 (_27rx) =
-27rx1j;(x), we see that 1j; is the unique solution of the initial-value problem

(1) 1j;'(x) + 27rx1j;(x) = 0 1j;(0) = 1

By Problem 7 in Section 6.2, or the direct use of Theorems 1 and 2 (pages


296-297), we obtain, by taking Fourier transforms in Equation (1),

(~)'(x) + 27rx~(x) = 0
302 Chapter 6 The Fourier Transform

The initial value of 'lj; is

~
'lj;(0) 100 100
= -00 'lj;(X) dx = -00 e- 7rX 2
dx =1

(See Problem 10 for this.) We have seen that 'lj; and;;; are two solutions of
the initial-value problem (1). By the theory of ordinary differential equations,
=;;;.
'lj; =
This proves the theorem for n 1. Now we notice that

8(x) = exp [-7r(x~ + ... +x~)]

By Problem 9 of Section 6.2, page 300,


n n
O(x) = II ;;;(Xj) = II 'lj;(Xj) = 8(x) •
j=l j=l

Theorem 2. First Inversion Theorem. If ¢ E S(JRn ), then

Proof. We use the conjurer's tricks of smoke and mirrors. Let 8 be the func-
tion in the preceding theorem, and put g(x) = 8(xj)..). Then g(y) = )..nO()..y).
(Problem 8 in Section 6.1, page 293.) By Problem 13 in Section 6.2, page 300,

J $(y)8(*) dy = J $(y)g(y) dy = J ¢(y)g(y) dy =)..n J ¢(y)O()..y) dy

= J ¢G)O(u) du

In the preceding calculation, let).. = k, where kEN, and contemplate letting


k -+ 00. In order to use the Dominated Convergence Theorem (Section 8.6, page
406), we must establish L1-bounds on the integrands. Here they are:

1$(y)8(D I ~ 1$(y)111 81100 $ E Ll(JRn )


I¢G) I ~ 11¢lIooIO(u)1 0 E Ll(JRn )
Then by the Dominated Convergence Theorem,

(2) 8(0) J$(y) dy = ¢(O) J O(u) du


Section 6.3 The Inversion Theorems 303

But we have, by the special properties of B,

1 = B(O) = 0(0) = J B(x) dx = J O(x) dx

Thus Equation (2) becomes

(3) J ¢(y) dy = 4>(0)

This result is now applied to the shifted function E- x 4>:

By Theorem 2 in Section 6.1, page 289, this is equivalent to

J ¢(y) . ex(Y) dy = 4>(x)



Theorem 3. The Fourier transform operator F from S(JRn ) to
S(JRn ) is a continuous linear bijection, and F- 1 = F3.

Proof. The continuity and linearity of F were established by Theorem 4 in


Section 6.2, page 297. The fact that F is surjective is established by writing the
basic inversion formula from the preceding theorem as

4>(x) = J¢(y)ex(y)dy = J¢(y)e_x(-y)dy = J ¢(-u)e_x(u)du = [F(B¢)](x)


Here B is the operator such that (B4»(x) = 4>(-x). The inversion formula also
shows that F is injective, for if ¢ = 0, then obviously 4> = O. Again by the
inversion formula,

(¢)(y) = J¢(x) . e_y(x) dx = 4>( -y) = (B4»(Y)


Thus ;:'2 = B. It follows that F4 =I and F3 F = I. •

Theorem 4. Second Inversion Theorem. If f and f belong to


L1(JR n ), then for almost all x,

f(x) = { [(y)e 2'!riXY dy


Jan
Proof. Assume that f and i are in L1(JRn). Let 4> E S. Then by Theorem 2,
4>(x) =Jex¢. By Problem 13 in Section 6.2, page 300, Ji4> = J f¢. Hence if
we put F(y) = Jiey, then we have (with the help of the Fubini theorem)

J ¢(x)f(x) dx = J 4>(x)[(x) dx = JJ ex(Y)¢(Y) dy [(x) dx


304 Chapter 6 The Fourier Transform

Thus J'l/J(x)(f - F)(x) dx = 0 for all 'l/J E S, because ¢ can be any element of
S. The same equation is true for all 'l/J E 1>, since V is a subset of S. Now apply
Theorem 2 of Section 5.1, page 251, according to which 9 = 0 when 9 = o. The
conclusion is that that f(x) = F(x) almost everywhere. •

Problems 6.3
1. Does:F commute with the operators B and Ex?
2. Find the inverse Fourier transform of the function

sint It I :::; 7r
{
f(t) = 0 It I > 7r

3. Explain why, for 1> E S,

4. Let f E Ll(JR) and define h(x) = J: f(t)dt. Prove that if h E Ll(JR), then h(t)
(27rit)-1 j(t).
5. For the function f(x) = e- 1xl , show that j(x) = 2/(1 + 47r 2 x 2 ). Show that jis analytic
in a horizontal strip in the complex plane described by the inequality IIm(z)1 < 1/(47r2 ).
=
(Here n 1.)
6. Let f(x) = e- X for x ;;:: 0 and let f(x) = 0 for x < O. Find j and verify by direct
J
integration that f(x) = j. ex·
7. In Section 6.1 we saw that the following is a Fourier transform pair:

I -l:::;x:::;l
f(x) = {
o otherwise

Prove that f belongs to £l(JR) but j does not. Explain why this does not violate the
inversion theorem.
8. Using Theorem 1 of this section and Problem 8 in Section 6.1, page 293, prove that the
Fourier transform of the function 1>(x) = e- ax2 is

Prove also that :F- 11> = :F1>. Prove that this last equation follows from the sole fact that
1> is an even function.
9. Prove that if f is odd and belongs to Ll(JR), then

. 1
t~

- f(t) =
2 0
00
f(x) sin(27rtx) dx

The right-hand side of this equation defines the Fourier Sine Transform of f.
10. Prove that

This can be accomplished by considering the square of this integral, which can be written
as the (double) integral of e-(x 2 +y2) over JR2. This double integral can be computed by
polar coordinates.
Section 6.4 The Plancherel Theorem 305

6.4 The Plancherel Theorem

This section is devoted to extending the Fourier operator from the Schwartz
space S(JRn ) to L2(JR n ). It turns out that the extended operato!; has a number
of endearing properties, leading one to conclude that L2(JRn) is the "natural"
setting for this important operator.

Lemma 1. If f and g belong to the Schwartz space S(JRn ), then


1
f * g also belongs to S(JRn), and furthermore, fg = * g.

Proof. Since f and g belong to S, so does fg by Lemma 2 in Section 6.2,


page 295. By Theorem 4 of Section 6.2, page 297, 1,
g, and fg belong to
S. Consequently, 19 belongs to S. By Theorem 4 of Section 6.1, page 290,
r;g 19.
= Hence r;g E S, and by the inversion theorem (Theorem 2 in the
1
preceding section), f * g E S. Using the operator F such that F(f) = and the
operator B such that (Bf)(x) = f(-x), we have

1* 9 = F- 1F(1 * g) = F- 1(F2f· F2g)


= F-1(Bf· Bg) = F- 1B(fg) = F- 1F2(fg) = fg •

Proof. By the theorem in Section 5.5, page 271,

(T E ~I,¢ E~)

(1)

(Recall that the definition of convolution involving distributions was made to


conform to the ordinary convolution if the distribution arises from a function.)
Now f * g is continuous for any continuous g with compact support, as is easily
seen from writing

(f * g)(x) - (f * g)(y) = Jf(u) [g(x - u) - g(y - u)] du

Applying this to the right side of Equation (1), we see that DQ(f*g) is continuous
for every multi-index Q. •

The Lebesgue space U(JRn), where 1 :::;; p < 00, has as elements all mea-
surable functions f such that Ifl P E Ll(JR n ). The norm is Ilfllp
= IlifIPII~/p.
Further information about these spaces is found in Section 8.7, pages 409ff.
306 Chapter 6 The Fourier Transform

Lemma 3. The translation operator Ex has the following continuity


property: Ifl ~ p < 00 and I E LP(JR n ), then the mapping x >------+ Exl
is continuous from JRn to LP(JRn).

Proof. The continuous functions with compact support form a dense set in
LP, if 1 ~ P < 00. Hence, if e > 0, then there exists such a continuous function
h for which 111-hll p ~ e. Let the support of h be contained in the ball Br of
radius r centered at 0. By the uniform continuity of h there is a 8 > 0 such that

Ix- yi < 8 Ih(x) - h(y)1 < c

There is no loss of generality in supposing that 8 < r. If Ix - yl < 8, then

From the triangle inequality it follows that

IIExl - Eylilp ~ IIExl - Exhllp + IIExh - Eyhll p+ IIEyh - Ey/ilp


= IIEx(f - h)llp + IIEx h - Eyhll p+ IIEy(h - f)llp

~ III - hll p+ c(4rt/ p+ Ilh - Illp


~ 2c + c(4r)n/ p

In the next result we use a function P E 1.> such that P ;::: 0 and JP = 1. This is
a "mollifier." Then Pk is defined by Pk(X) = knp(kx), for k = 1,2, ...

Theorem 1. If IE U(JRn), then I *Pk -+ I in the metric of Ll(JR n ).


Furthermore, 1* Pk E COO (JRn ).

Proof· Since J Pk = I,

(f * Pk)(X) - I(x) = J [/(x - z) - I(X)]Pk(Z) dz

Hence by Fubini's Theorem (Chapter 8, page 426)

JII * Pk - II ~ JJ I/(x - z) - l(x)lpk(z) dz dx

= JJI/(x - z) - l(x)1 dx Pk(Z) dz

Here we need Lemma 3: If I E Ll(JR n ) and c > 0, then there is a 8 > 0 such
that liEd - 1111
~ c whenever Izl ~ 8. If p(x) = 0 when Ixl > r, then Pk(X) = 0
when Ixl > r/k. Hence when r/k ~ 8 we will have * Pk - III /111
~ c. Lemma 2
shows that 1* Pk E COO(JRn ). •
Section 6.4 The Plancherel Theorem 307

Theorem 2. The space of test functions 1>(JRn) is a dense subspace


of £1 (JRn ).

Proof. Let I E Ll(JRn), and let c > O. We wish to find an element of 1>(Rn)
within distance c of I. The function 1* Pk from the preceding theorem would
be a candidate, but it need not have compact support. So, we do the natural
thing, which is to define

I(x) if Ixl ~ m
Im(x) ={
o elsewhere

Then Im(x) -+ I(x) pointwise, and the Dominated Convergence Theorem (page
406) gives us J Ilml -+ J III. Consequently, we can select an integer m such that
11/111 -111m 111 < c/2. Then
[ I/(x)1 dx < c/2
J1xl>m

Now select a "mollifier" P; i.e., P is a nonnegative test function such that JP = 1.


As usual, let Pk(X) = knp(kx). By Theorem 1, there is an index k such that
111m * Pk - Imlll < c/2. Hence

Observe that 1m * Pk has compact support and belongs to coo(JRn), by Lemma


2. Hence 1m * Pk is in 1>. •

Plancherel's Theorem. The Fourier operator :F defined originally


on S(JRn ) has a unique continuous extension defined on L2(JRn), and
this extended operator is an isometry of L2(JR n ) onto L2(JRn ).

Proof. For two functions in S we have the Parseval formula:

(f,g) = (i,9) or

This is proved with the following calculation, in which the inversion theorem is
used:

(1, 9) = J JJ
i(y) g(y) dy = l(x)e-21rixy g(y) dx dy

= J J J
I(x) e21riXYg(y)dydx = I(x)g(x)dx

This leads to the isometry property for functions in S:

II/II~ = J11 Jif = = 11i11~


308 Chapter 6 The Fourier Transform

Since S is dense in L2(JR n ) (Problem 4), F has a unique continuous extension


with the same bound, IIFII~ 1 (Problem 6). It is then easily seen that the
extension is also an isometry (Problem 7). •
The extension of F referred to in this theorem is sometimes called the
Fourier-Plancherel transform. One must not assume that the usual formula for
1 can be used for I E L2(JRn), because the integrand in the usual formula

i(x) = f l(y)e-27riXY dy

need not be integrable (Le., in Ll). However, I is an L2 limit of Ll functions,


because U(JRn) is a dense subset of L2(JR n ). For example, we can use this
sequence:
I(x) if Ixl ~ m
Im(x) = {
o if Ixl > m
Since I belongs to L2, 1m belongs to Ll. Indeed, letting Xm be the characteristic
function of the ball {x : Ixl ~ m}, we have by the Cauchy-Schwarz inequality

The sequence Urn] converges to I in the metric of L2 because

III - Imll~ = J I/(x) - Im(x)12 dx = 1 Ixl>m


l/(x)1 2 dx ~0

1
It now follows that = lim 1m,the limit being taken in the L2 sense. This state
of affairs is often expressed by writing

i(y) = L.I.M·l l(x)e-27riXY dy


m ..... oo JxJ"';m

In this equation, L.I.M. stands for "limit in the mean," and this refers to a limit
in the space L2.
Another procedure for generating a sequence that converges to in L2 is to 1
select an orthonormal basis [un] for L2, to express I in terms of the basis, and
to take Fourier transforms:

L (f, um)um
00

I =
m=O

1= L
00

(f, um)u;.
m=O

This manipulation is justified by the linearity and continuity of the Fourier


transform operator acting on L2. In order to be practical, this formula must
employ an orthonormal basis of functions whose Fourier transforms are known.
Section 6.4 The Plancherel Theorem 309

An example in JR is given by the Hermite functions hm of Problem 12. They


form an orthogonal basis for L2(JR). The functions Hm = hm/llhmil provide an
orthonormal basis, and by Problem 12,

i = L (f, Hm)Hm = L (f, Hm)( _i)m Hm


00 00

m=O

Having seen that the Fourier operator :F is continuous from L2(JRn ) to


L2(JR2) and from Ll(JRn) to Loo(JRn)r one might ask whether it is continuous
from £P(JRn ) to Lq(JRn ) in general, when ~ + ~ = 1. The answer is "Yes" if
p q
p :0:;; q. We quote without proof the Hausdorff-Young Theorem: If 1 :0:;; P :0:;; 2, and
if~ + ~ = 1, then the Fourier operntor is continuous from £P(JRn ) to Lq(JRn),
p q
and its norm is not greater than 1. For further information, consult [RS] and
[SW]. The exact value of the norm has been established by Beckner [Bee].

Problems 6.4
1.In Lemma 2, can we conclude that 1* 9 belongs to Co(JRn)?
2. Explain why Lemma 3 is not true mrthe space Loo(JRn).
3. Prove that S C L2(JRn).
4. Prove that S is dense in L2(JRn).
5. Prove that if I, 9 E S, then fIg = f 1'9.
6. Prove this theorem: Let Y be a deJllle subspace of a normed linear space' X. Let A E
.c(Y, Z), where Z is a Banach space, Then there is a unique A E .c(X, Z) such that
A I Y = A and IIAII = IIAII. Suggestions: If x EX, then there is a sequence y" E Y such
that y" -+ x. Put Ax = limAy". Show that the limit exists and is independent of the
sequence y".
7. In the situation of Problem 6, show that if A is an isometry, then so is A.
8. Show that neither of the inclusions Ll(JRn ) C L2(JRn), L2(JRn ) C Ll(JRn ) is valid.
9. Find an element of L2(JR) '- Ll(JR) and compute its Fourier-Plancherel transform.
10. Prove that the Fourier transform of the function

I(x) = {e-o aZ x ~0
x<o

1
is (21rix + a)-l. Here, a> O. Show that E L2(JR) '- Ll(JR).
11. Prove that the equation jg = 1*9holds for functions in L2(JRn).
12. The eigenvalues of:F : L2(JR) -t L2(JR) are ±1, ±i, and no others. Show that hm =
(-i)mhm, where h m is the Hermite function

Suggestions: Prove that hm+l(x) = h~(x) - 21rxhm (x). Then prove that hm+l(X) =
-i(hm)'(x)~+ 21rixhm (x). Show that the functions (-l)mhm obey the same recurrence
relation as h m .
13. For I and 9 in S, prove that

Then prove this for functions in L2(JRn).


14. Generalize Lemma 2 so that it applies to I E Ltoc(JR n ). Can the hypotheses on I be
relaxed?
310 Chapter 6 The Fourier Transform

15. Define f(x} = 0 for x < 1 and f(x} = X-I for x;:;, 1. Find 1.
Useful reference: Chapter 5
in [AS].
1
16. Is this reasoning correct? If f E L1, then is continuous. Since L1 is dense in L2, the
same conclusion must hold for f E L2.
17. The variance of a function f is defined to be liufli/llfli, where u(x} = x. Prove this
version of the Uncertainty Principle: The product of the variances of f and 1cannot be
less than 1/(411"}.

6.5 Applications of the Fqurier Transform

We will give some representative examples to show how the Fourier transform
can be used to solve differential equations and integral equations. Then, an
application to multi-variate interpolation will be presented. These are what
might be called direct applications, as contrasted with applications to other
branches of abstract mathematics.

Example 1. Let n = 1 and D = d~' If P is a polynomial, say P(A) = 2:;:0 CjAj ,


then P(D) is a linear differential operator with constant coefficients:

(1) P{D) = Lm
cjIY = L cj{27ri)3
. m . ( D
-2.
)j
j=O j=O 7rZ

Consider the ordinary differential equation

(2) P{D)u = 9 -oo<x<oo

in which 9 is given and is assumed to be an element of £1(1R). Apply the Fourier


transform F to both sides of Equation (2). Then use Theorem 1 of Section 6.2
(page 296), which asserts that if u E S, then

(3) F[P(D)u] = P+F(u)

where P+(x) = P(27rix). The transformed version of Equation (2) is therefore

(4) P+F(u) = F(g)

The solution of Equation (4) is

(5) F(u) = F(g)/P+

The function u is recovered by taking the inverse transformation, if it exists:

(6)

Theorem 4 in Section 6.1, page 291, states that

(7) F(¢ * ¢) = F(¢) . F(¢)


Section 6.5 Applications of the Fourier Transform 311

An equivalent formulation, in terms of F- 1 , is

(8) ¢ * 1jJ = F- 1 [F(¢)· F(1jJ)]

If h is a function such that Ii = 1/ P+, then Equations (6) and (8) yield

(9)

i:
In detail,

(10) u(x) = g(y)h(x - y) dy

The function h must be obtained by the equation h


Example 2. This is a concrete case of Example 1, namely
= F- 1 (1/ P+).

(11) u'(x) + bu(x) = e- 1xl (b> 0, b i- 1)

The Fourier transform of the function g(x) = e- 1xl is g(t) = 2/(1 + 41l' 2 t 2 )
(Problem 5 of Section 6.3, page 304). Hence the Fourier transform of Equation
(11) is

Solving for ii, we have

By the Inversion Theorem,

To simplify this, substitute z = 21l'x, to obtain

u(t) =:;
Joo (1 +z2)(b+iz)
1 e dz
-00
itz

The integrand, call it f(z), has poles at z = +i, -i, and ib. In order to evaluate
this integral, we use the residue calculus, as outlined at the end of this section.
Let the complex variable be expressed as z = x + iy. Then

For t > 0 we see that

lim sup {lzj(z)1


r~oo
Izl = r, Tm(z) ;:,: O} = 0
312 Chapter 6 The Fourier Transform

I:
Hence by Theorem 4 at the end of this section,

J(z) dz = 27ri x (residue at i + residue at ib)

By partial fraction decomposition we obtain

J( z) = itz r(2ib - 2i)-1 _ (2ib + 2i)-1 (i - ib2 )-1]


e l z-z
· z+z . + z-z·b

Hence the residues at i, -i, and ib are respectively

2i(b-l) 2i(b + 1)

Thus for t > 0,

e- t 2e- bt
=b-l
- - +I-b2
--

Similarly, for t < 0,


_e t
u(t) = 1 + b •
L:
Example 3. Consider the integral equation

k(x - s)u(s)ds = g(x)

in which k and 9 are given, and u is an unknown function. We can write

After taking Fourier transforms and using Theorem 4 in Section 6.1 (page 290)
we have

whence u= 9fk and u = F-1(g/k).


For a concrete case, contemplate this integral equation:

1 00
-00 e-1x-s1u(s) ds = e- x
2
/2

Here, the functions k and 9 in the above discussion are

k(x) = e- 1xl
Section 6.5 Applications of the Fourier Transform 313

From Problems 6.3.5 and 6.3.8 (page 304), we have these Fourier transforms:

~ 2
k( x) = -:-(1-+-4-7r-::C2-:
x2O-:-)

(It turns out that we do not require g.) Hence

To take the inverse transform, use the principle in Theorem 1 of Section 6.2
(page 296) that P{D)g = p+ . g. We let P(x) = (1 - x2)/2, so that P+(x) =
P(27rix) = (1 + 47r 2X2)/2. Then

u= P+g = P{D)g
The inverse transform then gives us


As another example of applications of the Fourier transform, we consider a
problem of multi-variate interpolation. First, what is meant by "multi-variate
interpolation"? Let us work, as usual, in ]Rn. Suppose that at a finite set of
points called "nodes" we have data, interpreted as the values of some unknown
function. We will assume that the nodes are all different from one another.
Since we will not need the components of the nodes, we can use the notation
Xl, X2, ... ,X m for the set of nodes. Let the corresponding data values be real
numbers AI, A2, ... , Am. We seek now an "interpolating" function for this infor-
mation. That will be some nice, smooth function that is defined everywhere and
takes the values Ai at the nodes Xi. (Polynomials are not recommended for this
task.) One way of obtaining a simple interpolating function is to start with a
suitable function f, and use linear combinations of its translates to do the job.
Thus, we will try to accomplish the interpolation with a function of the form
m
x>---t L,cjf(x - Xj)
j=l

When the interpolation conditions are imposed, we arrive at the equations


m

L, Cjf(Xi - Xj) = Ai (1 ~ i ~ m)
j=l

This is a system of m linear equations in the m unknowns Cj. How can we be sure
that the system has a solution? Since we want to be able to solve this problem
for any Ai, we must have a nonsingular coefficient matrix. This can be called the
"interpolation matrix"; it is the matrix Aij = f(Xi - Xj). A striking theorem
gives us an immense class of useful functions f to play the role described above.
314 Chapter 6 The Fourier Transform

Theorem 1. If J is the Fourier transform of a positive function


in L1 (JR n ), then for any finite set of points Xl, X2, ... ,X m in JRn the
matrix having elements J(x; - Xj) will be positive definite (and hence
nonsingular).

Proof. Let J = g, where 9 E L1(JR n ) and g(x) > 0 everywhere. The interpola-
tion matrix in question must be shown to be positive definite. This means that
u' Au> 0 for all nonzero vectors u in em. We undertake a calculation of this
quadratic form:

m m
u' Au = L L UkAkjUj = L L UkUJi(Xk - Xj)
k=l j=l

Here we have written

L uje 27r;YXj
m
h(y) =
j=l
So far, we have proved only that the interpolation matrix A is nonnegative
definite. How can we conclude that the final integral above is positive? It will
suffice to establish that the functions y f----7 e27riyxj form a linearly independent
set, for in our computation, the vector U was not zero. Once we have the linear
independence, it will follow that Ih(y)12 is positive somewhere in JR n , and by
continuity will be positive on an open set. Since 9 is positive everywhere, the
final integral above would have to be positive. The linear independence is proved
separately in two lemmas. •

Lemma 1. Let AI, ... ,Am be m distinct complex numbers, and let
C1, ... , Cm be complex numbers. If 2:,;' 1cje AjZ = 0 for all z in a subset
of e that has an accumulation point, then 2:,;'1 ICj I = o.

Proof. Use induction on m. If m = 1, the result is obvious, because e A1Z is


not zero for any z E C. If the lemma has been established for a certain integer
m-1, then we can prove it for m as follows. Let J(z) = 2:,~ cjeAjZ, and suppose
that J(Zk) = 0 for some convergent sequence [ZkJ. Since J is an entire function,
we infer that J(z) = 0 for all z in C. (See, for example, [Ti2J page 88, or [Ru3J
page 226.) Consider now the function

F(z) = ! [e- Amz J(z)] = !Lj=l


m
cje(Aj-Am)z
m-1
=L
j=l
Cj (Aj - Am )e(Aj-Am)Z
Section 6.5 Applications of the Fourier Transform 315

Since J = 0, we have F = o. By the induction hypothesis, Cj(Aj - Am) = 0 for


1 ~ j ~ m - 1. Since the Aj are distinct, we infer that Cl = ... = Cm - l = o.
The function J then reduces to J(z) = cme Amz . Since J = 0, Cm = O. •

Lemma 2. Let WI, ... , wm be m distinct points in C n . Let


Cl, ... ,Cm be complex numbers. If 2:;:1 CjeWjX = 0 for all x in a
nonempty open subset ofll~n, then 2:;"=1 ICjl = o.

Proof. Let 0 be an open set in R n having the stated property. Select ~ E 0


such that the complex inner products Wj~ are all different. This is possible
by the following reasoning. The condition on ~ can be expressed in the form
Wj~ =I Wk~ for 1 ~ j < k ~ m. This, in turn, means that ~ does not lie in any
of the sets

(1 ~ j <k ~ m)

Each set Hjk is the intersection of two hyperplanes in Rn. (See Problem 4.)
Hence each Hjk is a set of Lebesgue measure 0 in R n , and the same is true of
any countable union of such sets. The finite family of sets Hjk therefore cannot
cover the open set 0, which must have positive measure. Now define, for t E C,
the function J(t) = 2:~ cje(wjf.l t . Since ~ E 0, our hypothesis gives us J(l) = O.
Let U be a neighborhood of 1 in C such that t~ E 0 when t E U. Since J(t) = 0
on U, Lemma 1 shows that 2:;"=1 ICjl = o. •
More information on the topic of interpolation can be found in the textbook
[eLl. Functions of the type J, as in Theorem 1, are said to be "strictly positive
definite on Rn." They are often used in neural networks, in the "hidden layers,"
where most of the heavy computing is done.
The remainder of this section is devoted to a review of the residue calculus.
This group of techniques is often needed in evaluating the integrals that arise in
inverting the Fourier transform.

Theorem 2. Laurent's Theorem. Let J be a function that is


analytic inside and on a circle C in the complex plane, except for having
an isolated singularity at the center (. Then at each point inside C with
the exception of ( we have

00
1 ( J(z) dz
Cn =-
27ri Jc (z - ()n+l
n=-oo

The coefficient C-l is called the residue of J at (. By Laurent's theorem, the


residue is also given by

(12) 1 . ( J(z) dz
C-l = -2
7rZ Jc
316 Chapter 6 The Fourier Transform

Example 4. The integral fa eZ /Z4 dz, where C is the unit circle, can be
computed with the principle in Equation (12). Indeed, the given integral is 27ri
times the residue of eZ / z4 at O. Since

4 (
e/z
Z
= Z2 z3
l+z+-+-+··· ) /
z4
2! 3!

we see that the residue is i and the integral is ~7ri.



Theorem 3 The Residue Theorem. Let C be a simple closed
curve inside of which J is analytic with the exception of isolated singu-
larities at the points (1, ... , (m. Then ~
27rz
r J(z)dz is the sum of the
lc
residues of J at (1, ... , (m.

Proof. Draw mutually disjoint circles C I , ... , Cm around the singUlarities and
contained within C. The integral around the path shown in the figure is zero,
by Cauchy's integral theorem. (Figure 6.1a depicts the case m = 2.) Therefore,

0= rJ(z)dz - lCIr J(z)dz _ ... - lCr J(z)dz


lc m

In this equation, divide by 27ri and note that the negative terms are the residues
Of!at(I, ... ,(m. •

o
(a) (b)

Figure 6.1

Example 5. Let us compute 1 c z


dz
-2--'
+1
where C is the circle described by
Iz - il = 1. By the preceding theorem, the integral is 27ri times the sum of the
residues inside C. We have

1 1 i/2 i/2
J(z) = z2 +1 = (z + i)(z - i) z+i z-i
Section 6.5 Applications of the Fourier Transform 317

The residue at i is therefore -i/2, and the value of the integral is 1r. •

Example 6. Let us compute the integral in Example 5 when C is the circle


Iz - il = 3. This circle is large enough to enclose both singularities. The residue
at -i is i/2, and the sum of the residues is O. The integral is therefore O. (This
illustrates the next theorem.) •

Theorem 4. If J is a proper rational function and if the curve C


encloses all the poles of J, then Ie
J(z)dz = O.

Proof. Write J = p/q, where p and q are polynomials. Since J is proper, the
degree of p is less than that of q. Hence the point at 00 is not a singularity
of f. Now, C is the boundary of one region containing the poles, and it is
also the boundary of the complementary region in which J is analytic. Hence
Ie J(z)dz = O. •

Theorem 5. Let J be analytic in the closed upper half-plane with


the exception of a finite number of poles, none of which are on the real
axis. Define

Mr == sup {lzJ(z)1 : Izl = r, I(z) ~ O}

If Mr converges to 0 as r --+ 00, then -2'


1
1rZ
1 00

-00
J(z) dz is the sum of the
residues at the poles in the upper half-plane.

Proof. Consider the region shown in Figure 6.1b, where C is the semicircular
arc and r is chosen so large that all the poles of J lying in the upper half-
plane are contained in the semicircular region. On C we have z = re i8 and
dz = ire i8 d6. Hence

By Theorem 3,

j-r
r J(z) dz + (
le
J(z) dz = 21ri x (sum ofresidues)

By taking the limit as r --+ 0, we obtain the desired result.



Problems 6.5
1. Solve this integral equation

2. Solve this integral equation

[~ u(s) ds -1 00
u(s) ds + u(x) = f(x)
318 Chapter 6 The Fourier Transform

3. What happens in Example 2 if b = 1? Is the solution u(t) = sinh t?


4. Prove that if wEen, then w = u + iv for suitable points u and v in IRn. We will write
u = R( w) and v = I( v). Prove, then, that for x E IR n , the equation xw = 0 holds if and
only if xR(w) = 0 and xI(w) = O.
5. Let 9 be analytic in a circle C centered at zo, and let f be analytic in C except for having
a simple pole at zoo What is the residue of gf at zo?
6. Prove that arbitrary data at arbitrary nodes Xl, X2, •.. ,Xm in IRn can be interpolated by
X
a function of the form >---+ L:;;'=1Ck exp( -Ix - xkI2). (These are Gaussian functions.)

6.6 Applications to Partial Differential Equations

Example 1. The simplest case of the heat equation is

(1) Uxx = Ut

in which the subscripts denote partial derivatives. The distribution of heat


in an infinite bar would obey this equation for 00 < x < 00 and t ~ 0. A fully
defined practical problem would consist of the differential equation (1) and some
auxiliary conditions. To illustrate, we consider (1) with initial condition

(2) U(x,O) = f(x) -oo<x<oo

The function f gives the initial temperature distribution in the rod. We define

1:
u(y, t) to be the Fourier transform of U in the space variable. Thus

u(y, t) = u(x, t)e-Z7riXY dx

Taking the Fourier transform in Equations (1) and (2) with respect to the space
variable, we obtain

_47r2y2U(Y, t) = Ut(y, t)
(3) {

u(y,O) = f(y)

Here, again, we use the principle of Theorem 1 in Section 6.2, page 296: P{Dfu =
P+u, where P+(x) = P(27rix).
Equation (3) defines an initial-value problem involving a first-order linear
ordinary differential equation for the function u(y, .). (The variable y can be
ignored, or interpreted as a parameter.) We note that (u)t = (Ut). The phe-
nomenon just observed is typical: Often, a Fourier transform will lead us from
a partial differential equation to an ordinary differential equation. The solution
of (3) is
~ 2 2
(4) u(y, t) = f(y)e- 47r Y t
Section 6.6 Applications to Partial Differential Equations 319

Now let us think of t as a parameter, and ignore it. Write Equation (4) as
2 2
u(y, t) = J(y)G(y, t), where G(y, t) = e- 4 71' y t. Using the principle that ¢ylj; =
~ ~ ~ ~~

~ (Theorem 4 in Section 6.1, page 291), we have

(5) u(-, t) = J(.) * G(·, t)

where G(·, t) is the inverse transform of y f----t e- 4 71'2y Z t. This inverse is


2
G(x,t) = (47rt)-1/Ze- X /(4t), by Problem 8 of Section 6.3, page 304. Conse-
quently,

(6)
Example 2.
u(x, t) = (47rt)-1/21: J(x -
We consider the problem
z)e- z2 /(4t) dz

Uxx = Ut x ;?: 0, t;?: 0
(7) {
u(x,O) = f(x), u(O,t) = 0 x ;?: 0, t;?: 0

This is a minor modification of Example 1. The bar is "semi-infinite," and one


end remains constantly at temperature zero. It is clear that J should have the
property J(O) = u(O, 0) = O. Suppose that we extend J somehow into the interval
(-00,0), and then use the solution of the previous example. Then at x = 0 we
have

1 (""
(8) u(O,t) = (47rt)-Z L"" J(-z)e- Z
2
/
4t dz

The easiest way to ensure that this will be zero (and thus satisfy the bound-
ary condition in our problem) is to extend J to be an odd function. Then the
integrand in Equation (8) is odd, and u(O, t) = 0 automatically. So we define
f( -x) = - f(x) for x > 0, and then Equation (6) gives the solution for Equa-
tion (7). •
Example 3. Again, we consider the heat equation with boundary conditions:

x ;?: 0, t;?: 0
(9)
u(O, t) = g(t)
Because the differential equation is linear and homogeneous, the method of su-
perposition can be applied. We solve two related problems, viz.,

(10) Vxx = Vt V(x,O) = f(x) V(O,t) = 0


(11) wxx = Wt w(x,O) = 0 w(O, t) = g(t)

The solution of (9) will then be u = v + w. The problem in (10) is solved in


Example 2. In (11), we take the sine transform in the space variable, using w S
to denote the transformed function. With the aid of Problem 1, we have
320 Chapter 6 The Fourier 1}ansform

Again this is an ordinary differential equation, linear and of the first order. Its
solution is easily found to be

If w is made into an odd function by setting w(x, t) = -w( -x, t) when x < 0,
then we know from Problem 9 in Section 6.3 (page 304) that

w(y,t) = -2iw s (y,t)

i:
Therefore by the Inversion Theorem (Section 6.3, page 303)

w(x, t) = w(y, t)e21riXY dy

or
w(x,t) = -47Ti 1-00
e21rlXYye-41r 2 Y t
00. 2it 0
e41r 2 Y2 Ug(CT)dCTdy

To simplify this, let z = 27TY. Then

w(x,t) = --i
7T
1 it
00

-00
ze 1XZ
.

0
e- z 2 (t-u)g(CT)dCTdz

Example 4. The Helmholtz Equation is

~u- gu= f

in which ~ is the Laplacian, L~=1 8 2 /8x%. The functions f and 9 are prescribed
on IRn, and u is the unknown function of n variables. We shall look at the special
case when 9 is the constant 1. To illustrate some variety in approaching such
problems, let us simply try the hypothesis that the problem can be solved with
an appropriate convolution: u = f * h. Substitution of this form for u in the
differential equation leads to

Carrying out the differentiation under the integral that defines the convolution,
we obtain

Is there a way to cancel the three occurrences of f in this equation? After


all, Ll is a Banach algebra, with multiplication defined by convolution. But
there are pitfalls here, since there is no unit element, and therefore there are no
inverses. However, the Fourier transform converts the convolutions into ordinary
products, according to Theorem 4 in Section 6.1 (page 291):

(j)(~h)1\ - iii = i
Section 6.7 Tempered Distributions 321

From this equation cancel the factor j, and then express (t.h)" as in Example
2 in Section 6.2 (page 297):

h(x) _ -1
- 1 + 41T21x12
The formula for h itself is obtained by use of the inverse Fourier transform, which
leads to
h(x) = 1Tn/2fo'X! C n/ 2 exp( -t - I1TXI2 It) dt

The calculation leading to this is given in [Ev], page 187. In that reference, a
different definition of the Fourier transform is used, and Problem 6.1.24, page
293, can be helpful in transferring results among different systems. •

Problems 6.6
1. Define the Sine Transform by the equation

Show that (f")S(t) = 21rtf(0) - 41r 2 t 2 fS(t). (Two integrations by parts are needed, in
addition to the assumption that fELl.)
2. Define the Cosine Transform by the equation

Show that (f")C(t) =


-1'(0) - 41r 2 t 2 fC(t).
3. A function can be decomposed into odd and even parts by writing f = fo + fe, in which

fo(x) = ~f(x) - ~f(-x) fe(x) = ~f(x)+ !J(-x)

f
Show that = 2ff - 2if:·
4. If w S is the sine transform in the first variable in the function (x, t) >---t w(x, t), what is
the difference between (wslt and (Wt)s?
5. Define a scaling operator by the equation (S)..f)(x) = flAX). Prove that DO. 0 S).. =
Alo.IS).. 0 DO..
6. If the problem Llu - u = f is solved by the formula u = f * h, for a certain function h,
how can we solve the problem Llu - c2 u = f, assuming c > O?

6.7 Tempered Distributions

Let us recall the definitions of two important spaces. The space 1), the space of
"test functions," consists of all functions in COO (lRn) that have compact support.
(Of course, 1) depends on n, but the notation does not show this.) In 1) we
define convergence by saying that cPj ...... 0 in 1) if there is one compact set
322 Chapter 6 The Fourier Transform

containing the supports of all ¢j and if (DO: ¢j) (x) converges uniformly to 0 for
each multi-index o.
The space S consists of all functions ¢ in Coo (JR n ) such that the function
p. DO:¢ is bounded, for all polynomials P and for all multi-indices o. In S we
defined ¢j -» 0 to mean that p. DO:¢j converges uniformly to 0, for each P and
for each o.
It is clear that 1> c S. A distribution T, being a continuous linear functional
on 1>, mayor may not possess a continuous linear extension to S. If it does
possess such an extension, the distribution T is said to be tempered.

Theorem 1. Every distribution having compact support is tem-


pered.

Proof. Let T be a distribution with compact support K. Select 'l/J E 1> so


that 'l/J(x) = 1 for all x in an open neighborhood of K. We extend T by defining
1'( ¢) = T( ¢'l/J) when ¢ E S. Is l' an extension of T? In other words, do we have
1'( ¢) = T( ¢) for ¢ E 1>? An equivalent question is whether T( 'l/J¢ - ¢) = 0 for
¢ E 1>. We use the definition of the support of T to answer this. We must verify
only that the support of (1 - 'l/J)¢ is contained in JRn "K. This is true because
1 - 'l/J is zero on a neighborhood of K. The linearity of l' is trivial. For the
continuity, suppose that ¢j -» 0 in S. Then for any 0, DO:¢j tends uniformly
to 0, and DO:(¢j'l/J) tends uniformly to 0 by Leibniz's Rule. Since there is one
compact set containing the supports of all 'l/J¢j, we can conclude that 'l/J¢j -» 0
in 1>. By the continuity of T, T('l/J¢j) -+ 0 and T(¢j) -+ O. •
If T is a tempered distribution, it is customary to make no notational dis-
tinction between T and its extension to S. If V is a continuous linear functional
on S, then its restriction VI1> is a distribution. Indeed, the linearity of VID
is obvious, and the continuity is verified as follows. Let ¢j -» 0 in 1>. Since
there is one compact set K containing the supports of all ¢j and since any poly-
nomial is bounded on K, we see that P(x)(DO:¢j)(x) -+ 0 uniformly for any
multi-index 0 and for any polynomial P. Hence ¢j -» 0 in S. Since V is con-
tinuous, (VI1»( ¢j) -+ O. This proves that the space S' of all continuous linear
functionals on S can be identified with the space of all tempered distributions.

Lemma. The set 1>(JRn ) is dense in the space S(JR n).

Proof. Given an element ¢ in S, we must construct a sequence in 1> converging


to ¢ (in the topology of S). For this purpose, select 'l/J E 1> such that 'l/J(x) = 1
whenever Ixl ~ 1. For j = 1,2, ... , let 'l/Jj(x) = 'l/J(xfj). It is obvious that 'l/Jj¢
belongs to 1>. In order to show that these elements converge to ¢ in S, we must
prove that for any polynomial P and any multi-index 0,

(1) p. DO:(¢ - ¢'l/Jj) -+ 0 uniformly in JRn


In the following, P and 0 are fixed. By the Leibniz Formula, the expression in
(1) equals

(2) p. L (~)DO:-/34>'D/3(1-'l/Jj)
/3';;;0:
Section 6.7 Tempered Distributions 323

Since ¢> E S, we must have p. DOI.-{3¢> E S also. Hence IxI 2 1P(x)· DOI.-{3¢>(x) I is
bounded, say by M. This bound can be chosen to serve for all {3 in the range
o ~ {3 ~ ct. Increase M if necessary so that for all (3 in that same range,

Fix an index j. If Ixl ~ j, then 1 - 'lj;j(x) = 0, and the expression in (2) is 0 at


x. If Ixl ~ j, then

Also, for Ixl ~ j, ID{3(l- 'lj;j)1 ~ M. Hence the expression in (2) has modulus
no greater than

This establishes (1).



Theorem 2. Let f be a measurable function such that f / P E
L1(JR ) for some polynomial P. Then J is a tempered distribution.
n

Proof. For ¢> E S, we have

J(¢» = ( f(x)¢>(x) dx
Jan
Suppose that P is a polynomial such that f /P E L1. Write

1(</»= jU/P)(P.</»

Since ¢> E S, P¢> is bounded, and the integral exists. If ¢>j ....... 0 in S, then
P( x )¢>j (x) -t 0 uniformly on JRn, and consequently,


Definition. The Fourier transform of a tempered distribution T is defined by
the equation T(¢» = T(¢) for all ¢> E S. An e~ivalent equation is T = To F,
where F is the Fourier operator mapping ¢> to ¢>.

Theorem 3. IfT is a tempered distribution, then so is T. Moreover,


the map T f-----t T is linear, injective, surjective, and continuous from
Sf to Sf.

Proof. The Fourier operator F is a continuous linear bijection from S onto S


by Theorem 3 in Section 6.3, page 303. Also, F- 1 = F3. Since T = T 0 F, we
see that f is the composition of two continuous linear maps, and is therefore
itself continuous and linear. Hence T is a member of Sf.
324 Chapter 6 The Fourier Transform

For the linearity of the map in question we write

(aT + bU)" = (aT + bU) 0;: = aT 0;: + bU 0;: = aT + bU


For the injectivity, suppose T = O. Then To;: = 0 and T( ¢) = 0 for all ¢
in the range of F. Since;: is surjective from S to S, the range of;: is S. Hence
T(¢) = 0 for all ¢ in S; i.e., T = o.
For the surjectivity, let T be any element of S'. Then T = T 0;:4 =
(T 0 ;:3) 0 F. Note that To:F 3 is in S' by the first part of this proof.
For the continuity, let Tj E S' and Tj ---+ o. This means that Tj (¢) ---+ 0 for
all ¢ in S. Consequently, Tj(¢) = Tj(¢) ---+ 0 and Tj ---+ o. •
Example. Since the Dirac distribution Ii has compact support, it is a
tempered distribution. What is its Fourier transform? We have, for any ¢ E S,

'8(¢) = Ii(¢) = ¢(O) = J ¢(x) dx = 1(¢)

(Remember that the tilde denotes the distribution corresponding to a function.)


Thus, '8 = I •

Theorem 4. 1fT is a tempered distribution and P is a polynomial,


then
and P.T=P(~)T 211"2

Proof. For ¢ in S we have

We used Theorem 1 in Section 6.2, page 296, in this calculation. The other
equation is left as Problem 6. •
Problems 6.7
1. Prove that every f in L1 (IRn) is a tempered distribution.
2. Prove that every polynomial is a tempered distribution.
3. Prove that if f is measurable and satisfies If I ~ IPI for some polynomial P, then f is a
tempered distribution.
4. Prove that the function f defined by f(x) = =
eX, (n 1) is not a tempered distribution.
Note that f is a distribution, however,
1
5. Let f(x) = (Ixl + 1)-1. Explain how it is possible for to belong to L2(JR) in spite of
the fact that the integral J~oo f(x)e-27rixt dx is meaningless.
6. Prove the remaining part of Theorem 4.
7. Under what circumstances is the reciprocal of a polynomial a tempered distribution?
8. Define oa by oa(¢) = ¢(a). Compute Ja.
9. Prove that our definition of the Fourier transform of a tempered distribution is consistent
with the classical Fourier transform of a functiolk.
10. Is the function f(x) = e- ixi a member of S? Is f a tempered distribution?
=
11. What flaw is there in defining T¢ T¢ for T E 1)' and ¢ E V?
12. Find the Fourier transforms of these functions, interpreted as tempered distributions:
(a) g(x) = x (x E JR)
Section 6.8 Sobolev Spaces 325

(b) g(x) = 1 (x E ]Rn)


(c) g(x) = x 2 (x E ]Rn).
13. Let T E V' (]Rn). Why can we not prove that T is tempered by using the density of V in
S and Problem 6.4.6 on page 309?

6.8 Sobolev Spaces

This section provides an introduction to Sobolev spaces. These are Banach


spaces that have become essential in the study of differential equations and the
numerical processes for solving them, such as the finite element method. It is
therefore not surprising that the elements in these Sobolev spaces are functions
possessing derivatives of certain orders. The theory relies on distribution theory
and capitalizes on the fact that distributions have derivatives of all orders.
Our first step is to generalize the theory of distributions slightly by con-
sidering the domain of our test functions to be an arbitrary open set 0 in IRn.
Usually, 0 will remain fixed in any application of the theory. Our test functions
are Coo functions defined on 0 and having compact support. The support of a
test function is, then, a compact subset of O. The space of all these test functions
is denoted by 1)(0). Convergence in this space has the expected meaning: The
assertion </>j ~ 0 means that there is a single compact subset K iii 0 containing
the supports of all </>j, and that on K we have an</>j -t 0 uniformly for each
multi-index Q. The use of the distinctive symbol ~ is to remind us of the very
special concept of convergence employed in this context.
The dual space of 1)(0) is the space of distributions, denoted by 1)1(0). Its
elements are the continuous linear functionals defined on 1)(0). Continuity of a
distribution T means that T preserves limits: From the hypothesis "</>j converges
to </>" we may conclude that "T(</>j) converges to T(</»."
As before, we can create distributions from locally integrable functions.
Local integrability of a function I defined on 0 means that for every compact
set K in 0, the integral IniIi is finite. We then write I E Ltoc(O). All these
definitions are in harmony with the definitions first seen on IRn, and indeed, we
include 0 = IR n as a special case. The distribution corresponding to a locally
integrable function I has been denoted by 1; its definition is now

(1) 1(</» = in I(x)</>(x)dx (</>E 1)(0))

Sometimes we do not belabor the distinction between I and 1, and think of each
I in Ltoc(O) as a distribution. The clear advantage of this is that such functions
will possess derivatives of all orders (in the distribution sense). Derivatives of
this type are called "weak derivatives" or "distribution derivatives" to distinguish
them from the classical derivatives, which are then called "strong" derivatives.
Thus if f E Lt'2S(O) and if Q is a multi-index, DO. f need not exist in the classical
sense, but an f will always exist. Recall that in this book the symbol an was
reserved for distributions, and

(2)
326 Chapter 6 The Fourier Transform

Equation (2) can be written more succinctly as EPT = To (- D)e>. Then, for
any polynomial P, we have p(a)T = To P( -D).
The classical spaces LP(n), for 1 ~ P < 00, are defined as follows. The
elements of LP(n) are the Lebesgue measurable functions f defined on n for
which

(3) Ilfllp == {L If(x)IP dX} lip < 00

The norm here can also be denoted by IlfII


LP (!1)" The resulting normed linear
space is complete. A fine point that complicates matters is that, in fact, the
elements of LP are equivalence classes of functions, two functions being regarded
as equivalent (i.e., belonging to the same equivalence class) if they differ only
on a set of measure zero. Section 8.7 (pages 409ff) provides more information
about the LP spaces.
Definition. To say that a distribution T belongs to LP(n) means that T = 9
for some g E U(n). When this circumstance occurs, we write IITllp
= Ilgllp.
With suitable caution, one can also write T E LP(n).
Let k E Z+ and 1 ~ P ~ 00. The Sobolev space Wk,p(n) consists of all
1
functions f in U(n) such that ae> E U(n) for all multi-indices a satisfying
lal ~ k. More precisely, this space is
{f E U(n) : there exist go E LP(n) such that a o1= 9o, for lal ~ k}
Observe that we need the fact (indicated in Problem 2) that each member of
LP(n) is in L~oc(n). In the space Wk,p(n), a norm is defined by putting

(4)

The verification of the norm axioms is relegated to the problems. Notice that in
Equation (4) the conventions of the above definition are being used.

Theorem 1. The Sobolev spaces Wk,p(n) are complete.

Proof. We begin by observing a useful implication:

To prove this, let <P be a test function. By the Holder Inequality (Section 8.7,
page 409),
Section 6.S Sobolev Spaces 327

Here 1 ~ P < 00 and lip + 11q = 1. Inequality (6) establishes (5).


Now let [fj] be a Cauchy sequence in Wk,p(n). By the definition of this
space, there exist functions gj E U(n) such that 801. = gj for lal ~ k. By h
Problem 2, each gj belongs to Ltoc(n), and therefore gj is a distribution. From
the definition of the norm (4), we see that for each a the sequence [gj] has
the Cauchy property in lJ'(n). Since lJ'(n) is complete (by the Riesz-Fischer
Theorem, page 411), there exist functions fOi. E £P(n) such that gj -t in r
LP(n). By (5), we conclude that gj -t fOl.. In particular, for a = 0, gJ -t fO.
Consider the equation

By letting j -t 00 in (7) and using the convergence of gj, we obtain

This proves that 801. fO = fOl., and shows that for lal ~ k, 801. fO E U(n). By the
definition of the Sobolev space, fO E Wk,p(n). Finally, we write

Ilh - fOII:,p = L 11 8 01. h - 801. fOil: = L IIg7 - rll: -t 0 •


iOl.i:<;;;k iOl.i:<;;;k

The test function space 1>(n) is, in general, not a dense subspace of the
Sobolev space Wk,p(n). This is easy to understand: Each ¢ in 1>(n) has compact
support in n. Consequently, ¢(x) = 0 on the boundary of n. The closure
of 1>(n) in Wk,p(n) can therefore contain only functions that vanish on the
boundary of n. However, the special case n = JRn is satisfactory from this
standpoint:

Theorem 2. The test function subspace 1>(JR n ) is dense in


Wk,P(JR n ).

For the proof of Theorem 2, consult [Hu]. Some closely related theorems
are given in this section.
In many proofs we require a mollifier, which is a test function 'I/J having these
additional properties: 'I/J ~ 0, 'I/J(x) = 0 when Ilxll
~ I, and J'I/J = 1. Then one
puts 'l/Jj(x) = jn'I/J(jx). A "mollification of f with radius E" is then 'l/Jj * f with
11 j < L These matters are discussed in Section 5.1 (pages 246ff) and Section
5.5 (pages 269ft").

Lemma 1. If f E lJ'(JRn) , and if 'l/Jj is as described above, then


f * 'l/Jj -t f in U(JRn ), as j -t 00.

Proof. The case p = 1 is contained in the proof of Theorem 1 in Section 6.4,


page 306. Let B j be the support of 'l/Jj (i.e., the ball at 0 of radius 1/j). By
328 Chapter 6 The Fourier Transform

familiar calculations and HOlder's inequality (Section 8.7, page 409) we have

IU*ljIj)(x)-/(x)1 Ih [/(X- Y)-/(X)]ljIj(Y) dy l

r'
=
J

~ {L, If(x - y) - f(x)I' dy 111,11,


(Here q is the index conjugate to p: pq = p + q.) Hence,

IU * ljIj)(x) - l(xW ~ 111jIj II: h.I/(X - y) - l(x)I P dy


J

Thus, using the Fubini theorem (page 426), we have

We can write this in the form

111* ljIj - III: ~ 111jIj II: h.IIEYI - III: dy


J

where Ey denotes the translation operator defined by (Ey</»(x) = </>(x - y).


Recall, from Lemma 3 in Section 6.4 (page 306), that for a fixed element f in
LP(JRn ), the map y f------t Eyf is continuous from JRn to LP(JR n ). Hence there
corresponds to any positive c a positive 6 such that

Thus if I/j ~ 6, we shall have, from the above inequalities,

where J1(Bj ) is the Lebesgue measure of the ball of radius I/j. By enclosing
that ball in a "cube" of side 2/j, we see that J1(Bj ) ~ (2/j)n. Thus,

In order to estimate the right-hand side in the above inequality, use Problem 6
to get

c(}f/PlIljIjllq = c(}f/ Pjn (q- l l/qllljlllq = c2 njn{l-I/q-I/Pl Il1jlIlq = c2nll1jlllq



Section 6.8 Sobolev Spaces 329

Theorem 3. The set of functions in Wk,P(f!) that are of class Coo


is dense in Wk,P(f!).

Proof. Let B 1 , B 2 , ... be a sequence of open balls such that Bi c f! for all i and
UBi = f!. The center and radius of Bi are indicated by writing' Bi = B(Xi, Ti).
Appealing to Theorem 1 in Section 5.7 (page 282), we obtain a partition of
unity subordinate to the collection of open balls. Thus, we have test functions
<Pi satisfying 0 ::::; <Pi ::::; 1. Further, SUPP(<Pi) C Bi , and for any compact set K
in f!, there exists an integer m such that L:~ <Pi = 1 on a neighborhood of K.
Now suppose that 1 E Wk,P(f!). Let 0 < E < 1/2. Eventually, we shall find a
Coo-function 9 in Wk,P(f!) such that 111 - gil < 2E.
Select a sequence Di .J- 0 such that B(Xi, (1 + Di)Ti) c f! for each i. Define
1; = <Pd· Let gi be a mollification of 1 with radius DiTi. At the same time,
we decrease Di if necessary to obtain the inequality Ilgi - 1;llw k ,p(fl) < E/2i.
(This step requires the preceding lemma.) Define 9 = L: gi. If 0 is a bounded
open set in f!, then 0 is compact, and for some integer m, L:::1 <Pi = 1 on a
neighborhood of o. On 0, we have
m m m
L 1; =L <pd = 1 L <Pi =1
i=1 i=1 i=1
Then we can perform the following calculation, in which the norm in the space
Wk,P(O) is employed (until the last step, where the domain f! enters):

m 00 00

Iii - gil = I L1; - Lgill = I LUi - gi)11


i=1 i=1 i=1
00 00

~ L 11f; - 9ill ~ L 11f; - 9illw ,p(fl) k


i=1 i=1
::::;c:/2+c:/4+···=c:

Another way of interpreting the space Wk,P(f!) will be described here. It
allows one to understand this space without an appeal to distributions. Notice,
to start with, that the set

v= {J E Ck(f!) : 111 Ilk ,P < oo}


is a linear subspace of Wk,P(f!). In drawing this conclusion it is necessary to
identify any classical derivative DC>. 1 with its distributional counterpart 8C>.(1).
Indeed, the elements of Wk,P(f!) have been defined to be distributions.
Since V C Wk,P(f!), the closure of Vis also a subspace, and it is denoted
here by Vk,P(f!). We can also characterize Vk,P(f!) as the completion of V in
the norm II· Ilk .
,P
The true state of affairs is quite simple, as stated in the next
theorem, for the proof of which we refer the reader to [Ad] or [Maz].
330 Chapter 6 The Fourier Transform

Theorem 4. The Meyers-Serrin Theorem.

I";;p<oo

Embedding Theorems. Here we explore the relations that may exist be-
tween two Sobolev spaces, in particular, the relation of one such space being
continuously embedded in another.
For general normed linear spaces (E, II . II E) and (F, II . II F), we say that F
is embedded in E (and write F y E) if
(a) FeE;
(b) There is a constant c such that lilliE , ; ell/li
F for all I E F.
Part (a) of this definition is algebraic: It asserts that F is a linear subspace
of the linear space E. Part (b) is topological: It asserts that the identity map
I: F -+ E is continuous (i.e., bounded). Indeed, if

11111 = sup{IIIfII E : 1lfilF = I} = e


then the inequality in Part (b) follows.
Example 1. Every continuous function on the interval [a, bJ is integrable.
Hence, this simple containment relation is valid: C[a, bJ C Ll [a, bJ. Is this an
embedding? We seek a constant e such that

(f E C[a,bJ)

The constant e = b - a obviously serves:


Example 2. If 1 ,,;; s < r < 00 and if the domain 0 has finite Lebesgue
measure, then LT(O) Y £"(0). To prove this, start with an I in LT(O) and
write r = ps. We may assume that I ~ O. Then f" is in LP(O) because
J J
f"P = y. Use the Holder Inequality (page 409) with conjugate indices p and
q = p/(p - 1):

Taking the 1/ s power in this inequality gives us


Theorem 5. Wl,2(1R) Y WO,OO(IR).

Proof. (In outline. For details, see [LL], Chapter 8.) Let I be an element of
Wl,2(1R). Since 1>(IR) is dense in W 1 ,2(1R), there exists a sequence [Ii] in :D(IR)
converging to I in the norm of Wl,2. Each Ii has compact support and therefore
satisfies Ii(±oo) = O. Since lilI = (fl)' /2, we have
Section 6.8 Sobolev Spaces 331

By taking the limit of a suitable subsequence, we obtain the same equation for
I, at almost all points x. Then, with the aid of the Cauchy-Schwarz inequality
and the inequality between the geometric and arithmetic means, we have

Consequently,
II(x)1 ~ ~ IIIII~ + III/II~
This establishes the embedding inequality:


The next theorem is one of many embedding theorems, and is given here as
just a sample from this vast landscape. It involves one of the spaces W~,p(n).
This space is defined to be the closed subspace of Wk,p(n) generated by the set
of test functions 1J(n). For this theorem and many others in the same area,
consult [Zie] pages 53ff, or [Ad] pages 97ff.

Theorem 6. Let n be an open set in IRn. Let k and j be nonnegative


integers. If 1 ~ P < 00, kp ~ n, and p ~ r ~ np/(n - kp), then

There are in the literature many theorems concerning compact embeddings


of Sobolev spaces. This means, naturally, that the identity map that arises in
the definition is a compact operator, i.e., it maps bounded sets to sets having
compact closure. An example of such a theorem is the next one, known by the
names Rellich and Kondrachov. It is obviously a counterpart of Theorem 6.

Theorem 7. Let no be an open and bounded subset of an open


domain n in IRn. If j ~ 0, k ~ 1, 1 ~ P < 00, 0 < n - kp ~ n, kp ~ n,
and 1 ~ r < np/(n - kp), then there is a compact embedding

For describing embed dings into spaces of continuous functions, define


cb'(n) to be the set of all functions defined on n such that the derivatives DO I
exist, are continuous, and are bounded on n, for all multi-indices 0: satisfying
10:1 ~ m. The norm adopted for this space is

Theorem 8. If k > nm/2, then W~,p(n) '--t cb'(n).

Theorem 8 (and others like it) can be used to establish that a distributional
solution of a partial differential equation is in fact a classical solution.
332 Chapter 6 The Fourier Transform

The Sobolev-Hilbert Spaces. The spaces Wk,2(n) are Hilbert spaces and
are conventionally denoted by Hk(n). For the special case n = IR n , we can
follow Friedlander [FriJ, and define them for arbitrary real indices s as follows.
The space HS(lRn) consists of all tempered distributions T for which

Matters not touched upon here: (1) The importance of conditions on the
boundary of n for more powerful embeddings. (2) The Sobolev spaces for non-
integer values of k. (3) The duality theory of Sobolev spaces; i.e., identifying
their conjugate spaces as function spaces.
Problems 6.8
1. Prove that the norm defined in Equation (4) satisfies all the postulates for a norm.
2. Prove that LP(fl) C Ltoc(fl).
3. Prove that for 1::;; P < 00, 1)(fl) c LP(fl) c 1)'(fl). Show that the embedding of LP(fl)
in 1)'(fl) is continuous and injective.
4. Show that the function
I Ixl < 1
f(x) =
{
o Ixl ~ 1
belongs to WO'P(JR) but not to W1,p(JR).
5. Prove this theorem of W.H. Young. If f E LP(JR) and 9 E L1(JR), then f * 9 E LP(JR), and

(See [HewSj, page 414, for a stronger result.)


6. Prove that if ¢ E 1>(JRn) and ¢j(x) = jn¢(jx), then lI¢j IIq = jn(Q-1)/QIl¢lI q.
7. Why can we not define a more general Sobolev space, say W,!;'(fl), where a is a multi-
J
index, and admit all functions such that &(3 E LP(fl) for all multi-indices (3 ::;; a? What
would the norm be? Would the space be complete?
8. Find the norm of the identity operator for these embeddings: (a) era, bj '--t L2 [a, bj; (b)
£1 '--t £2; (c) (JRn, II ·1100) '--t (JR n , II ·112).
9. Prove that if m ~ k, then Wm,P(fl) C Wk,P(fl), and this set inclusion is actually a
continuous embedding. That is, the identity map of wm,p(fl) into Wk,P(fl) is continuous.
What is the relationship between the norms in these two spaces?
J
10. If E LP(fl), does it follow that f E LP(fl)?
11. Let 0 be an open set in JRn that contains the closed ball B(x,r). Prove that for some
p> r, B(x, p) is contained in O. (This was used in the proof of Theorem 2.J
12. Prove that the following formula defines an inner product in the space W k , (fl):

(f,g) =.2.::
"~k n
1(D"f)(D"g)dx

13. Let 9 and h be locally integrable functions on the open set fl. If

l g(x)¢(x) dx = l h(x)D"¢(x) dx

for all ¢ E 1)(fl), what conclusion can be drawn?


14. Prove that if ¢ E 1>(fl), then extending this function to JRn by setting ¢(x) = 0 on
JRn '-.. fl produces a function in 1)(JRn).
15. Prove that if 101 ::;; k, then D" is a continuous linear transformation from Wk,P(fl) into
LP(fl).
Chapter 7

Additional Topics

7.1 Fixed-Point Theorems 333


7.2 Selection Theorems 339
7.3 Separation Theorems 342
7.4 The Arzela-Ascoli Theorem 347
7.5 Compact Operators and the Fredholm Theory 351
7.6 Topological Spaces 361
7.7 Linear Topological Spaces 367
7.8 Analytic Pitfalls 373

7.1 Fixed-Point Theorems

The Contraction Mapping Theorem was proved in Section 4.2, and was accompa-
nied by a number of applications that illustrate its power. In the literature, past
and present, there are many other fixed-point theorems, based upon a variety of
hypotheses. We shall sample some of these theorems here.
In reading this chapter, refer, if necessary, to Section 7.6 for topological
spaces, and to Section 7.7 for linear topological spaces.
Let us say that a topological space X has the fixed-point property if every
continuous map f : X -+ X has a fixed point (that is, a point p such that
f(p) = p). An important problem, then, is to identify all the topological spaces
that have the fixed-point property. A celebrated theorem of Brouwer (1910)
begins this program.

Theorem 1. Brouwer's Fixed-Point Theorem. Every compact


convex set in ]Rn has the fixed-point property.

We shall not prove this theorem here, but refer the reader to proofs in [DS]
page 468, [Vic] page 28, [Dug] page 340, [Schj] page 74, [Lax], [KA] page 636,
[Sma] page 11, [Gr] page 149, and [Smi] page 406.

333
334 Chapter 7 Additional Topics

Theorem 2. If a topological space has the fixed-point property,


then the same is true of every space homeomorphic to it.

Proof. Let spaces X and Y be homeomorphic. This means that there is a


homeomorphism h : X -» Y (a continuous map having a continuous inverse).
Suppose that X has the fixed-point property. To prove that Y has the fixed-
point property, let f be a continuous map of Y into Y. Then the map h- 1 0
f 0 h is continuous from X to X, and thus has a fixed point x. The equation
h- 1 (f(h(x))) = x leads immediately to f(h(x)) = h(x), and h(x) is a fixed point
of f. •
Lemma. If K is a compact set in a locally convex linear topological
space, and if U is a symmetric, convex, open neighborhood of 0, then
there is a finite set F in K and a continuous map P from K to the
convex hull of F such that x - Px E U for all x E K.

Proof. The family {x+U : x E K} is an open cover of K, and by compactness


there must exist points Xl, ... ,Xn in K such that K c U7=1 (Xi + U). Let h be
the Minkowski functional of U, defined by the equation

h( x) = inf { A : ~ E U , A > 0 }

(See [KN], page 15.) Define

gi(X) = max{O, 1- h(x - Xi)} (l:S;i:S;n)

It is elementary to verify that the inequality gi(X) > 0 is equivalent to the


assertion that X - Xi E U. Since each X in K belongs to at least one of the sets
Xi + U, we have 2::7=1 gi(X) > 0 for all X E K. Define
n
2:: gi(X)Xi n
Px = i=~ == L:0i(X)Xi
2:: gj(x) i=l
j=l

Since Oi(X) ~ 0 and 2::0i(X) = 1, we see that Px is in the convex hull of


{Xl,X2,'" ,xn } whenever X E K. Since the condition Oi(X) > 0 occurs if and
only if Xi Ex + U, we see that Px is a convex combination of points in X + U.
By the convexity of this set, Px E X + U. •

Theorem 3. The Schauder-Tychonoff Fixed-Point Theorem.


Every compact convex set in a locally convex linear topological Haus-
dorff space has the fixed-point property.

Proof. ([Day], [Sma]) Let K be such a set, and let f be a continuous map of
K into K. We denote the family of all convex, symmetric, open neighborhoods
of 0 by {Ua : 0: E A}. The set A is simply an index set, which we partia1ly
order by writing 0: ~ {3 when Ua C U{3. Thus ordered, A becomes a directed set,
Section 7.1 Fixed-Point Theorems 335

suitable as the domain of a net. Since K is compact, the map J is uniformly


continuous, and there corresponds to any 0: E A an 0:' E A such that Uex' C Uex
and J(x) - J(y) E Uex whenever x - y E Uex'.
For any 0: E A, the preceding lemma provides a continuous map Pex such
that Pex(K) is a compact, convex, finite-dimensional subset of K. This map
has the further property that x - PexX E Uex for each x in K. The composition
Pex 0 J maps Pex(K) into itself. Hence, by the Brouwer Fixed-Point Theorem
(Theorem 1 above), Pex 0 J has a fixed point Zex in Pex(K). By the compactness
of K, the net [zex : 0: E A] has a cluster point z in K. In order to see that z is a
fixed point of J, write

(1) J(z) - z = [J(z) - J(zex)] + [J(zex) - Pexf(Zex)] + [zex - z]


For any (3 E A, we can select 0: E A such that 0: ;>, (3 and z - Zex E U{3'. Then
J(z)- J(zex) E U{3. Also, J(zex)-PexJ(zex) E Uex C U{3. Finally, Z-Zex E U{3' C U{3.
Equation (1) now shows that J(z) - z E 3U{3. Since (3 is any element of A,
J(z) = z. Theorem 1 in Section 7.7 (page 368) justifies this last conclusion. •

Corollary. If a continuous map is defined on a domain D in a


locally convex linear topological Hausdorff space and takes values in a
compact, convex subset of D, then it has a fixed point.

Proof. Let F : D --+ K, where K is a compact, convex set in D. Then the


restriction of F to K is a continuous map of K to K. By the Schauder-Tychonoff
Theorem, FIK has a fixed point. •
In Section 4.2 it was shown how fixed-point theorems can lead to existence
proofs for solutions of differential equations. This topic is taken up again here.
We consider an initial-value problem for a system of first-order differential equa-
tions:
X~(t): J;(t,XI(t), ... ,xn(t) (1 ~ i ~ n)
{
Xi(O) - 0 (1 ~ i ~ n)
This is written more compactly in the form

x'(t) = f(t,x(t))
(2) {
x(O) =0
where x = (XI,X2, ... , xn) and f = (iI, 12,··· ,in).
Although the choice of initial values Xi(O) = 0 may seem to sacrifice gen-
erality, these initial values can always be obtained by making simple changes of
variable. Changing t to t - a shifts the initial point, and changing Xi to Xi - Ci
shifts the initial values.
The space en [a, b] consists of n-tuples of functions in eta, b]. If x =
(Xl, ... ,Xn ) E en [a, b], we write
336 Chapter 7 Additional Topics

where 11111 denotes the [,I-norm on IRn. That is,


n
Ilulll = ~)Ui/ if U = (Ul' U2, ... , un) E IR n
i=l
Theorem 4. Let f(t, u) be defined for 0 ~ t ~ a and for u E IR n
such that Ilulll ~ T. Assume that on this domain f is continuous and
satisfies Ilf(t, u)lll ~ Tla. Then the initial-value problem (2) has a
solution x in en [0, a], and Ilxlloo ~ T.

Proof. Refer to Section 4.2, page 179, where an initial-value problem is shown
to be equivalent to an integral equation. In the present circumstances, the
integral equation arising from Equation (2) is

(3) x(t) = lot f(s,x(s))ds


Equation (3) presents us with a fixed-point problem for the nonlinear operator
A defined by
(Ax)(t) = lot f(s,x(s))ds
The domain of A is taken to be

First, we shall prove that A maps D into D. Let xED and y = Ax. Since
Ilxlloo ~ T, the inequality IIx(s)lIl ~ T follows for all s in the interval [O,a].
Hence

lIy(t)lIl = t /Yi(t)/ = tilt fi(s, xes)) dsl ~t l a


/fi(s,x(s))/ ds

t
i=l i=l 0 i=l 0

= l a
/h(s,x(s))/ ds = l a
Ilf(s,x(s))lll ds ~ a(~) = T

This shows that IIYlloo ~ T.


The next step is to prove that A(D) is equicontinuous. If x and yare as in
the preceding paragraph, and if 0 ~ tl ~ t2 ~ a, then
n
Ily(t2) - y(tl)lll = l:IYi(t2) - Yi(tdl
i=l
Section 7.1 Fixed-Point Theorems 337

The set A(D) is an equicontinuous subset of the bounded set Din en [0, a]. By
the Ascoli Theorem (Section 7.4, page 349), the closure of A(D) is compact.
By Mazur's Theorem (Theorem 10, below), the closed convex hull H of A(D)
is compact. Since D is closed and convex, H c D. The preceding corollary is
x
therefore applicable, and A has a fixed point in H. Then ~ r, and Ilxlloo x
solves the initial-value problem. •

Theorem 5. There is no continuous mapping of the closed unit


ball in IRn to its boundary that leaves all boundary points fixed. (In
other words, there is no "retraction" of the unit ball in IRn onto its
boundary.)

Proof. Let En be the ball and sn-l the sphere that is its boundary. Suppose
that f : En -+ sn-l, that f is continuous, and that f(x) = x for all x E sn-l.
Let 9 be the antipodal map on sn-l, given by g(x) = -x. Then go f has
no fixed point (in violation of the Brouwer Fixed-Point Theorem). To see this,
suppose g(f(z)) = z. Then f(z) = -z and 1 = Ilf(z)11 11- zll Ilzll·
= = Thus
z E sn-l. The point z contradicts our assumption that f(x) = x on sn-l. •
The next theorem is a companion to the corollary of Theorem 3. Notice that
the hypothesis of convexity has been transferred from the range to the domain
of f.

Theorem 6. Let D be a convex set in a locally convex linear


topological Hausdorff space. If f maps D continuously into a compact
subset of D, then f has a fixed point.

Proof. As in the proof of Theorem 3, we use the family of neighborhoods Ua.


Let K be a compact subset of D that contains f(D). Proceed as in the proof
of Theorem 3, using the same set of neighborhoods Ua. By the lemma, for each
Q there is a finite set Fa in K and a continuous map FaK -+ co(Fa) such that

x - Fax E Ua for each x E K. If x E co(Fa), then xED, f(x) E K, and


Fa(f(x)) E co(Fa). Thus Pa 0 f maps the compact, convex, finite-dimensional
set co(Fa) into itself. By the Brouwer Theorem, Pa 0 f has a fixed point Za
in co(Fa). Then f(za) lies in the compact set K, and the net [f(za) : Q E A]
has a cluster point y in K. We will show that f(y) = y by establishing that
f(y) - y E Ua for all Q. Theorem 1 in Section 7.7, page 368, applies here.
Let Q be given. Select f3 ;;?: Q so that U{3 + U{3 C Ua. By the continuity of
fat y, select I ;;?: f3 so that f(y) - f(x) E U{3 whenever x E K and y - x E U"(.
Select 0 ;;?: I so that U/j + U/j C U"(. Select € ;;?: 0 so that f(zo) E y + U/j. Then
we have

y - Zo = [y - f(zo)] + [f(zo) + Pof(zo)] E U/j + Uo C U/j + U/j C U"(

Hence f(y) - f(zo) E U{3. Furthermore,

f(y) - y = [f(y) - f(zo)] + [f(zo) - y] E U(3 + U/j C U{3 + U(3 C Ua •


338 Chapter 7 Additional Topics

Theorem 7. Rothe's Theorem. Let B denote the closed unit


ball of a normed linear space X. If f maps B continuously into a
compact subset of X and if f(8B) c B, then f has a fixed point.

Proof. Let r denote the radial projection into B defined by r(x) = x if Ilx/1 ~ 1
r( I I I I
and x) = x I x if x > 1. This map is continuous (Problem 1). Hence 0 f r
maps B into a compact subset of B. By Theorem 6, r 0 f has a fixed point x in
B. If IIxll
= 1, then IIf(x)1I = 1 by hypothesis, and we have x = r(f(x)) = f(x)
by the definition of If r. IIxll
< 1, then IIr(f(x))1I < 1 and x = r(f(x)) = f(x),
again by the definition of r. •

Theorem 8. Let B denote the closed unit ball in a normed space


X. Let {It : 0 ~ t ~ I} be a family of continuous maps from B into
one compact subset of X. Assume that
(i) fo(8B) c B.
(ii) The map (t,x) >-+ ft(x) is continuous on [0,1] x B.
(iii) No ft has a fixed point in 8B.
Then II has a fixed point in B.

Proof. (From [Sma]) If 0 < £ < 1, define

IIxll ~ 1- £

1- £ ~ IIxll ~ 1
Notice that gc is continuous, since the two formulas agree when = 1 - £. IIxll
If x E 8B, then IIxll
= 1 and gc(x) = fo(x) E B. Thus f maps 8B into B. If
K is a compact set containing all the images ft(B), then gc(B) c K, by the
definition of gc. The map gc satisfies the hypotheses of Theorem 7, and gc has
a fixed point Xc in B.
I I
We now shall prove that for all sufficiently small £, Xc ~ 1 - £. If this
is not true, then we can let £ converge to zero through a suitable sequence of
values and have, for each £ in the sequence, IIx l
c > 1 - £. Since gc(x c ) = xc,
we see that Xc is in K. By compactness, we can assume that the sequence of £'s
has the further properties Xc ~ Xo and (1 c -lIx /D/£
~ t, where Xo E K and
t E [0, 1]. By the definition of gc'

f(1-IIX c ll)/c (//:://) = Xc

In the limit, we have ft(x o ) = Xo and IIxo l = 1, in contradiction of hypothesis


(iii) .
We now know that II I ~ 1 -
Xc £ for all sufficiently small £. Thus, for such
values of £,
Section 7.2 Selection Theorems 339

The points x€ belong to K, and for any cluster point we will have x = JI(x) .•

Problems 7.1

1. Prove that the radial projection defined in the proof of Theorem 7 is continuous.
2. Prove Theorem 7 for an arbitrary closed convex set that contains 0 as an interior point.
Hint: Replace the norm by a Minkowski functional as in the proof of the lemma.
3. In Theorem 6 assume that D is closed. Show that the theorem is now an easy corollary
of Theorem 3, by using the closed convex hull of K and Mazur's Theorem.
4. Prove that the unit ball in £2(Z) does not have the fixed-point property by following
this outline. Points in £2(Z) are functions on Z such that L Ix(n)12 < 00. Let 8 be
the element in P(Z) such that 8(0) = 1, and 8(n) = 0 otherwise. Let A be the linear
operator defined by (Ax)(n) = x(n + 1). Define f(x) = (1 - Ilxll)8 + Ax. This function
maps the unit ball into itself continuously but has no fixed point. This example is due
to Kakutani.
5. In IR n , define B = {x : 0 < IIxll ~ I} and S = {x : Ilxll = I}. Is there a continuous
map f : B -+ S such that f(x) = x when xES? (Cf. Theorem 5.)
6. In an alternative exposition of fixed-point theory, Theorem 5 is established first, and
then the Brouwer theorem is proved from it. Fill in this outline of such a proof. Suppose
f : B n -+ B n is continuous and has no fixed point. Define a retraction 9 of B n onto
sn-l as follows. Let g(x) be the point where the ray from f(x) through x pierces sn-l.

7. In 1904, Bohl proved that the "cube" K = {x E IR n : IIxlioo ~ I} has this property: If f
maps K continuously into K and maps no point to 0, then for some x on the boundary
of K, f(x) is a negative multiple of x. Using Bohl's Theorem, prove that the boundary of
K is not a retract of K (and thus substantiate the claim that Bohl deserves much credit
for the Brouwer Theorem).

7.2 Selection Theorems

Let X and Y be two topological spaces. The notation 2Y denotes the family
of all subsets of Y. Let <I>: X -t 2Y . Thus, for each x E X, <I>(x) is a subset of
Y. Such a map is said to be set-valued. A selection for <I> is a map f : X -t Y
such that f(x) E <I>(x) for each x E X. Thus f "selects" an element of <I>(x),
namely f(x). If <I> (x ) is a nonempty subset of Y for each x EX, then a selection
f must exist. This is one way of expressing the axiom of choice. In the setting
adopted above, one can ask whether <I> has a continuous selection. The Michael
Selection Theorem addresses this question.
Here is a concrete situation in which a good selection theorem can be used.
Let X be a Banach space, and Y a finite-dimensional subspace in X. For each
x EX, we define the distance from x to Y by the formula

dist(x, Y) = inf Ilx - YII


yEY

Since Y is finite-dimensional, an easy compactness argument shows that for each


x, the set
<I>(x) = {y E Y : Ilx - YII = dist(x, Y)}
340 Chapter 7 Additional Topics

is nonempty. That is, each x in X has at least one nearest point (or "best
approximation") in Y. In general, the nearest point will not be unique. See the
sketch in Figure 7.1 for the reason.

-1

-2 -1

Figure 7.1

In the sketch, the box with center at 0 is the unit ball. The line of slope 1
represents a subspace Y. The small box is centered at a point x outside Y. That
box is the ball of least radius centered at x that intersects Y. The intersection is
elI(x). The set elI(x) is the set of all best approximations to x in Y. In this case
elI(x) is convex, since it is the intersection of a subspace with a ball. It is also
closed, by the definition of cP and the continuity of the norm. Now we ask, is
there a continuous map f : X -+ Y such that for each x, f(x) is a nearest point
to x in Y? One way to answer such questions is to invoke Michael's theorem, to
which we now turn.
First some definitions are required. An open covering of a topological
space X is a family of open sets whose union is X. One covering B is a re-
finement of another A if each member of B is contained in some member of
A. A covering B is said to be locally finite if each point of X has a neigh-
borhood that intersects only finitely many members of B. A Hausdorff space X
is paracompact if each open covering of X has a refinement that is an open
and locally finite covering of X ([KelJ, page 156). Clearly, a compact Hausdorff
space is paracompact.
It is a nontrivial and useful fact that all metric spaces are paracompact
([KelJ, page 156). In many applications this obviates the proving of paracom-
pactness by means of special arguments.
Given the set-valued mapping cP : X -+ 2Y and a subset U in Y, we put

elI- (U) = {x EX: elI(x) n U is nonempty}

Finally, we declare ell to be lower semicontinuous if elI-(U) is open in X


whenever U is open in Y.
Section 7.2 Selection Theorems 341

Theorem 1. The Michael Selection Theorem. Let cP be a lower


semicontinuous set-valued map defined on a paracompact topological
space and taking as values nonempty closed convex sets in a Banach
space. Then cP has a continuous selection.

For the proof of this theorem, we refer the reader to [Michl] and [Mich2]. As an
application of Michael's theorem, we give a result about approximating possibly
discontinuous maps by continuous ones.

Theorem 2 Let X be a paracompact space, Y a Banach space, and


H a closed subspace in Y. Suppose that f : X -+ Y is continuous and
9 : X -+ H is bounded. Then for each € > 0 there is a continuous map
9 : X -+ H that satisfies

(1) sup Ilf(x) - g(x)11 ~ sup Ilf(x) - g(x)11 + €


xEX xEX

Thus when approximating the continuous map f by the bounded map


g, we can find a continuous map 9 that is almost as good asg.

Proof. Let A denote the number on the right in Inequality (1). For each x EX,
define
cp(x) = {h EH: Ilf(x) - hll.~ A}

This set is nonempty because g(x) E cp(x). (Notice that 9 is a selection for cP
but not necessarily a continuous selection.) The set cp(x) is closed and convex
in the Banach space H.
We shall prove that <I> is lower semicontinuous. Let U be open in H. It is to
be shown that cP- (U) is open in X. Let x E cP- (U). Then cp( x) n U is nonempty.
Select h in this set. Then hE U and Ilf(x) - hll
~ A. Also Ilf(x) - g(x)11 < A.
So, by considering the line segment from h to g(x), we conclude that there is
E
an h' U such that Ilf(x) - h'li
< A. Since f is continuous at x, there is a
neighborhood N of x such that

Ilf(u) - f(x)11 < A -llf(x) - h'li (u EN)

By the triangle inequality, Ilf(8) - h'li


< A when 8E
N. This proves that
h' E CP(8), that CP(8) n U is nonempty, that 8 E CP-(U), that N C CP-(U), that
CP-(U) is open, and that cP is lower semicontinuous.
Now apply Michael's theorem to obtain a continuous selection 9 for CPo Then
9 is a continuous map of X into H and satisfies g(x) E cp(x) for all X. Hence 9
satisfies (1). •

Another important theorem that follows readily from Michael's is the the-
orem of Bartle and Graves:
342 Chapter 7 Additional Topics

Theorem 3. The Bartle-Graves Theorem. A continuous linear


map of one Banach space onto another must have a continuous (but
not necessarily linear) right inverse.

Proof. Let A : X --+ Y, as in the hypotheses. Since A is surjective, the


equation Ax = y has solutions x for each y E Y. At issue, then, is whether a
continuous choice of x can be made. It is clear that we should set

<I>(y) = {x EX: Ax = y}

Obviously, each set <I>(y) is closed, convex, and nonempty. Is <I> lower semicon-
tinuous? Let 0 be open in X. We must show that the set <I>- (0) is open in
Y. But <I>-(O) = A(O) by a short calculation. By the Interior Mapping Theo-
rem (Section 1.8, page 48), A(O) is open. Thus <I> is lower semicontinuous, and
by Michael's theorem, a continuous selection f exists. Thus f(y) E <I>(y) , or
A(f(y)) = y. •

In the literature there are many selection theorems that involve measur-
able functions instead of continuous ones. If X is a measurable space and Y a
topological space, a map <I> : X --+ 2Y is said to be weakly measurable if the
set
{x EX: <I>(x) n 0 is not empty}
is measurable in X for each open set 0 in Y. (For a discussion of measurable
spaces, see Section 8.1, pages 381ff.) The measurable selection theorem of Ku-
ratowski and Ryll-Nardzewski follows. Its proof can be found in [KRNJ, [Part],
and [Wag].

Theorem 4. Kuratowski and Ryll-Nardzewski Theorem. Let


<I> be a weakly measurable map of X to 2Y , where X is a measurable
space and Y is a complete, separable metric space. Assume that for
each x, <I>(x) is closed and nonempty. Then <I> has a measurable selec-
tion. Thus, there exists a function f : X --+ Y such that f(x) E <I>(x)
for all x, and f- 1 (0) is measurable for each open set 0 in Y.

7.3 Separation Theorems

The next three theorems are called "separation theorems." They pertain
to disjoint pairs of convex sets, and to the positioning of a hyperplane so that
the convex sets are on opposite sides of the hyperplane. In]R2, the hyperplanes
are lines, and simple figures show the necessity of convexity in carrying out
this separation. In Theorem 3, one can see the necessity of compactness by
considering one set to be the lower half plane and the other to be the set of
points (x, y) for which y ~ X-I and x > O.
Section 7.3 Separation Theorems 343

Theorem 1. Let X be a normed linear space and let K be a


convex subset of X that contains 0 as an interior point. If Z EX" K,
then there is a continuous linear functional ¢ defined on X such that
for all x E K, ¢(x) ~ 1 ~ ¢(z).

Proof. Again, we need the Minkowski functional of K. It is

p(x) = inf{A: A > 0 and xlA E K}


We prove now that p(x + y) ~ p(x) + p(y) for all x and y. Select A, J.L > 0 so
that x I A and y I J.L are in K. By the convexity of K,
x+y A x J.L
--:=-- -+-- -y E K
A+J.L A+J.L A A+J.L J.L
Hence p(x + y)
~ A + J.L. Taking the infima of A and J.L, we obtain p(x + y) ~
p(x) + p(y).
Next we prove that for A ~ 0 the equation p(AX) = Ap(X) is true.
Select J.L > 0 so that xl J.L E K. Then Axl AJ.L E K and p(,Xx) ~ AJ.L. Taking the
infimum of J.L, we conclude that p(AX) ~ Ap(X). From this we obtain the reverse
inequality by writing Ap(X) = Ap(A-1AX) ~ AA-1p(AX) = p(AX).
Now define a linear functional ¢ on the one-dimensional subspace generated
by Z by writing
¢(AZ) = Ap(Z) (A E JR)
If A ~ 0, then ¢(AZ) = p(AZ). If A < 0, then ¢(AZ) = Ap(Z) ~ 0 ~ p(AZ). Hence
¢ ~ p. By the Hahn-Banach Theorem (Section 1.6, page 32), ¢ has a linear
extension (denoted also by ¢) that is dominated by p. For each x E K we have
¢(x) ~ p(x) ~ 1. As for z, we have ¢(z) = p(z) ~ 1, because if p(z) < 1, then
zi A E K for some A E (0,1), and by convexity the point
z:= A(zIA) + (1- A)O
would belong to K.
Lastly, we prove that ¢ is continuous. Select a positive r such that the ball
B(O,r) is contained in K. For Ilxll < 1 we have rx E B(O,r) and rx E K. Hence
p(rx) ::;;: 1, ¢(rx) ~ 1, and ¢(x) ::;;: 1/r. Thus II¢II ~ 1/r. •

Theorem 2. Let K 1 ,K2 be a disjoint pair of convex sets in a


normed linear space X. If one of them has an interior point, then there
is a nonzero functional ¢ EX· such that
sup ¢(x) ~ inf ¢(x)
XEKI xEK2

Proof. By performing a translation and by relabeling the two sets, we can


assume that 0 is an interior point of K 1 • Fix a point Z in K2 and consider the
set Kl - K2 + z. This set is convex and contains 0 as an interior point. Also,
Z (j. K 1 - K 2 +Z because K 1 is disjoint from K 2. By the preceding theorem, there
is a ¢ E x· such that for u E Kl and v E K2 we have ¢(u - v + z) ::;;: 1 ::;;: ¢(z).
Hence ¢(u) ~ ¢(v). •
344 Chapter 7 Additional Topics

Theorem 3. Let K 1 , K2 be a disjoint pair of closed convex sets


in a normed linear space X. Assume that at least one of the sets is
compact. Then there is a ¢ E X' such that

sup ¢(x) < inf ¢(x)


xEK2 xEKl

Proof. The set Kl - K2 is closed and convex. (See Problems 1.2.19 on page
12 and 1.4.17 on page 23.) Also, 0 ~ Kl - K 2 , and consequently there is a
ball B(O, r) that is disjoint from Kl - K 2 . By the preceding theorem, there is a
nonzero continuous functional ¢ such that

sup ¢(x) ~ inf ¢(x)


IIxll~r xEK1- K 2

Since ¢ is not zero, there is an € > 0 such that for U E Kl and v E K 2 ,


€ ~ ¢(u) - ¢(v). •

Separation theorems have applications in optimization theory, game theory,


approximation theory, and in the study of linear inequalities. The next theorem
gives an example of the latter.

Theorem 4. Let U be a compact set in a real Hilbert space. In order


that the system of linear inequalities

(1) (U,x) >0 (U E U)

be consistent (i.e., have a solution, x) it is necessary and sufficient that


o not be in the closed convex hull of U.

Proof. For the sufficiency of the condition, assume the condition to be true.
Thus, 0 ~ co(U). By Theorem 3, there is a vector x and a real number A such
that co(U) and 0 are on opposite sides of the hyperplane

{y: (y,x) = A}

We can suppose that (y, x) > A for y E co(U) and that (0, x) < A. Obviously,
A> 0 and x solves the system (1).
Now assume that system (1) is consistent and that x is a solution of it. By
continuity and compactness, there exists a positive € such that (u, x) ~ € for all
u E U. For any v E co(U) we can write a convex combination v = 2: Biui and
then compute

Then, by continuity, (w, x) ~ € for all w E co(U). Obviously, 0 ~ co(U). •


Section 7.3 Separation Theorems 345

In order to prove a representative result in game theory, some notation will


be useful. The standard n-dimensional simplex is the set

Sn = {x E]Rn : x ~ 0 and t
i=1
Xi = 1}

Theorem 5. For an m x n matrix A, either Ax ~ 0 for some x E Sn,


or yT A < 0 for some y E Sm.

Proof. Suppose that there is no x in the simplex Sn for which Ax ~ O. Then


A(Sn) contains no point in the nonnegative orthant,

Consequently, the convex sets A(Sn) and Pm can be separated by a hyperplane.


Suppose, then, that
Pm C {y E]Rm : (u, y) > A}
A(Sn) C {y E]Rm : (u, y) < A}
Since 0 E Pm, A < O. Let ei denote the i-th standard unit vector in ]Rm. For
positive t, tei E Pm. Hence (u, tei) > A, tUi > A, Ui > A/t, and Ui ~ O. Thus
U E Pm and (u, Ax) < 0 for all x E Sn. Obviously, u -I 0, so we can assume
u E Sm. Since u T Ax < 0 for all x E Sn, we have u T Aei < 0 for 1 ::;; i ::;; n, or,
in other terms, uTA < O. (In this last argument, ei was a standard unit vector
in ]Rn.) •

This section concludes with a brief discussion of a fundamental topic in


game theory. A rectangular two-person game depends on an m x n matrix
of real numbers. Player 1 selects in secret an integer i in the range 1 ::;:; i ::;:; m.
Likewise, Player 2 selects in secret an integer j in the range 1 ::;; j ::;; n. The
two chosen integers i and j are now revealed, and the payoff to Player 1 is the
quantity aij in the matrix. If this payoff is positive, Player 2 pays Player 1. If
the payoff is negative, Player 1 pays Player 2.
Both players have full knowledge of the matrix A. If Player 1 chooses i,
then he can assure himself of winning at least the quantity minj aij' His best
choice for i ensures that he will win maxi minj aij' Player 2 reasons similarly:
By choosing j, he limits his loss to maXi aij' His best choice for j will minimize
the worst loss and that number is then minj maXi aij' If

min max aij = max min aij


J' 'J

then this common number is the amount that one player can be sure of winning
and is the limit on what the other can lose.
In the more interesting case (which will include the case just discussed) the
players will make random choices of the two integers (following carefully assigned
probability distributions) and play the game over and over. Player 1 will assign
346 Chapter 7 Additional Topics

probabilities can be denoted by Xi. We will then want Xi ~ for each i as °


specific probabilities to each possible choice from the set {I, 2, ... , m}. The

well as 2::;':1 Xi = 1. In brief, X E Sm. Similarly, Player 2 assigns probabilities


Yj to the choices in {1,2, ... ,n}. Thus y E Sn. When the game is played just
once, the expected payoff to Player 1 can be computed to be 2::;':1 2::7=1 aijXiYj'
Player 1 seeks to maximize this with an appropriate choice of X in Sm, while
Player 2 seeks to minimize this by a suitable choice of Y in Sn. The principal
theorem in this subject is as follows.

Theorem 6. The Min-Max Theorem of Von Neumann Let A


be any m x n matrix. Then

max min x T Ay = min max x T Ay


xESm yESn yESn xESm

Proof. It is easy to prove an inequality :,;; between the terms in the above
equation. To do so, let u E Sn and v E Sm. Then

Since u and v were arbitrary in the sets Sn and Sm, respectively, we can choose
them so that we get

(2) max min x T Ay :,;; min max x T Ay


xESm yESn yESn xESm

Now suppose that a strict inequality holds in Inequality (2). Select a real
number T such that

(3) max min x T Ay < T < min max x T Ay


xESm yESn yESn xESm

Consider the matrix A' whose generic element is aij - T. By Theorem 5 (applied
° °
actually to -A'), either A'u:';; for some u E Sn or v T A' ~ for some v E Sm.
If the first of these alternatives is true, then for all X E Sm, we have x T A'u :,;;
0. In quick succession, one concludes that maxx x T A'u :,;; 0, miny maxx x T A'y :,;;
0, and miny max x x T Ay :,;; T. In the last inequality we simply compute the
bilinear form x T A'y, remembering that x E Sm and y E Sn. The concluding
inequality here is a direct contradiction of Inequality (3).
Similarly, if there exists v E Sm for which v T A' ~ 0, then we have for all y E
Sn, v T A'y ~ 0, miny v T A'y ~ 0, max x miny x T A'y ~ 0, and max x miny x T Ay ~
T, contradicting Inequality (3) again. •

In the language of game theory, Theorem 6 asserts that each player of a


rectangular game has an optimal strategy. These are the "probability vectors"
x E Sm and y E Sn such that
x T Ay = max min x T Ay = min max x T Ay
xESm yESn yESn xESm
Section 7.4 The Arzela-Ascoli Theorems 347

The common value of these three quantities is called the value of the game.
Convenient references for these matters and for the theory of games in general
are [McK] and [Mor].

Problems 7.3

1. Let J :X x Y --+ JR, where X and Yare arbitrary sets. Prove that

supinfJ(x,y} ~ infsupJ(x,y}
x y Y x

(In order for this to be universally valid, one must admit +00 as a permissible value for
the supremum and -00 for the infimum.)
2. Prove for any u E JRn that maxxESn (u, x) = maxl";;i";;n Ui.
3. Let Pn denote the set of x in JRn that satisfy x ;::: O. Prove that for u E JRn and >. E JR
these properties are equivalent:
a. Pn C {x: {u,x} > >.}
b. u E Pn and >. < O.
4. Saddle points. If there is a pair of integers (T, s) such that ai. ~ ar• ~ arj for all i
and j, then ar. is called a "saddle point" for the rectangular game. Prove that if such a
point exists, each player has an optimal strategy of the form (0, ... ,0,1,0, ... , OJ.
5. (A variation on Theorem 4) Let X be any linear space, <I> a set of linear functionals on
X. Prove that the system of linear inequalities

</>(x} <0 (</> E <I>)

has a finite inconsistent subsystem if and only if 0 E co(<I>}.


6. (A result in approximation theory) Let K be a closed convex set in a normed linear space
X. Let u be a point not in K, and set T = dist(u, K}. Prove that there exists a functional
</> E X· such that
sup </>(x} ~ inf q,(x}
Ilx-ull";;r xEK

7. (Separation theorem in Hilbert space) Let K be a closed convex set in Hilbert space, and
u a point outside K. Then there is a unique point v in K such that for all x in K,

{x, u - v} ~ (v, u - v) < (u, u - v)

8. Let </>1, </>2, ... , </>n be continuous linear functionals on a normed space. Let aI, a2, .. ·, an
be scalars, and define affine functionals tPi(X} = q,i(X} + ai. Define F(x} = maxi tPi(X}.
Prove that F is bounded below if and only if the inequality maxi q,i(X} ;::: 0 is true for
all x of norm 1.

7.4 The Arzela-Ascoli Theorems

The hypothesis of compactness is often present in important theorems of analy-


sis. For this reason, much attention has been directed to the problem of charac-
terizing the compact sets in various Banach spaces. The Arzela-Ascoli Theorem
does this for spaces of continuous functions. The Dunford-Pettis Theorem does
348 Chapter 7 Additional Topics

this for L1-spaces, and the Frechet-Kolmogorov Theorem does it for the LP-
spaces. The most extensive source for results on this topic is [DS] , Chapter 4.
See also [Yo] for the Frechet-Kolmogorov Theorem.
We begin with spaces of continuous functions. Let (X, d) and (Y, p) be
compact metric spaces. For example, X and Y could be compact intervals on
the real line. We denote by C(X, Y) the space of all continuous maps from X
into Y. It is known that continuity and uniform continuity are the same for
maps of X into Y. Thus, a map I of X into Y belongs to C(X, Y) if and only
if there corresponds to each positive e a positive 8 such that p(J( u), I( v)) < e
whenever d( u, v) < 8.
The space C(X, Y) is made into a metric space by defining its distance
function ~ by the equation

(1) ~(f,g) = sup p(f(x),g(x))


xEX

A first goal is to characterize the compact sets in C(X, Y).


Let K be a subset of C(X, Y). We say that K is equicontinuous if to
each positive e there corresponds a positive 8 such that this implication is valid:

(2) [I E K and d(u, v) < 8] ~ p(t(u),/(v)) < e

Theorem 1. First Arzela-Ascoli Theorem. Let X and Y be


compact metric spaces. A subset of C(X, Y) is compact if and only if
it is closed and equicontinuous.

Proof. Let K be the subset in question. First, suppose that K is compact.


Then it is closed, by the theorem in general topology that asserts that compact
sets in Hausdorff spaces are closed ([KeIJ, page 141). In order to prove that
K is equicontinuous, let e be a prescribed positive number. Since K is com-
pact, it is totally bounded ([KeIJ, page 198). Consequently, there exist elements
It, 12,···, In in K such that
n
K c UB(/i,e)
i=l

where B(f, e) is the ball {g : ~(f, g) < e}. A finite set of continuous functions is
obviously equicontinuous, and therefore there exists a 8 for which this implication
is valid:

[1:::; i :::; nand d(u, v) < 8] ~ p(/i(u), /i(v)) < e

If 9 E K and d(u, v) < 8, then for a suitable value of j we have

p(g(u),g(v)) :::; p(g(u), /j(u)) + p(Jj(u), /j(v)) + p(/j(v),g(v)) :::; 3e

The index j is chosen so that ~(g,/j) < e. The above inequality establishes the
equicontinuity of K.
Section 7.4 The Arzela-Ascoli Theorems 349

Now suppose that K is closed and equicontinuous. The space M(X, Y) of


all maps from X into Y with the metric ~ as defined in Equation (1) is a metric
space that contains C(X, Y) as a closed subset. It suffices then to prove that K
is compact in M(X, V).
Let c be a prescribed positive number. Select a positive 8 such that the
implication (2) above is valid. Since X and Yare totally bounded, we can
arrange that
n m
Xc UB(Xi,8) Y c UB(Yi,c)
i=1 i=1
In order to have a disjoint cover of X, let
Ai = B(Xi,8) " [B(X1,8) U··· U B(Xi-1,8)] (l:(i:(n)
Notice that if x E Ai, then it follows that x E B(Xi,8), that d(Xi' x) < 8, and
that p(J(xi),f(x)) < c for all f E K.
Now consider the functions 9 from X to Y that are constant on each Ai
and are allowed to assume only the values Y1, Y2, ... ,Ym' (There are exactly mn
such functions.) The balls B(g,2c) cover K. To verify this, let f E K. For
each i, select Yji so that P(J(Xi), Yji) < c. Then let 9 be a function of the type
described above whose value on Ai is Yji' For each x E X there is an index i
such that x E Ai. Then
p(J(x), g(x)) :( p(J(x), f(Xi)) + P(J(Xi), g(x)) < 2c
Hence ~(J, g) < 2c. This proves that K is totally bounded. Since a closed and
totally bounded set is compact, K is compact. •
As usual, if X is a compact metric space, C(X) will denote the Banach
space of all continuous real-valued functions on X, riormed by writing
Ilfll = xEX
sup If(x)1

Theorem 2. Arzela-Ascoli Theorem II. Let X be a compact


metric space. A subset of C(X) is compact if and only if it is closed,
bounded, and equicontinuous.

Proof. Suppose that K is a compact set in C(X). Then it is closed. It is also


totally bounded, and can be covered by a finite number of balls of radius 1:
n
K c UB(Ji,l)
i=l

For any 9 E K there is an index i for which 9 E B(j;, 1). Then


Ilgll :( Ilg - fill + II!;II :( 1 + max

II!dl == M
Thus K is bounded. Let Y = [-M, M]. Then
K C C(X,Y)
The preceding theorem now is applicable, and K is equicontinuous.
For the other half of the proof let K be a closed, bounded, and equicontinu-
ous set. Since K is bounded, we have again K C C(X, V), where Y is a suitable
compact interval. The preceding theorem now shows that K is compact. •
350 Chapter 7 Additional Topics

Theorem 3. Dini's Theorem. Let II, h, . .. be continuous real-


valued functions on a compact topological space. For each x assume
that Ifn(x)l.J.. o. Then this convergence is uniform.

Proof. Given c > 0, put Sk = {x : 1!k(x)1 ~ c}. Then each Sk is closed, and
Sk+l C Sk. For each x there is an index k such that x 1- Sk. Hence n~=l Sk is
empty. By compactness and the finite intersection property, we conclude that
n~=l Sk is empty for some n. This means that Sn is empty, and that Ifn(x)1 < c
for all x. Thus 1!k(x)1 < c for all k ~ n. This is uniform convergence. •
We conclude this section by quoting some further compactness theorems.
In the spaces LP(IR), the following characterization of compact sets holds. Here,
1 ~ p < 00. A precursor of this theorem was given by Riesz, and a generalization
to locally compact groups with their Haar measure has been proved by Wei!. See
[Edw] page 269, [DS] page 297, and [Yo] page 275.

Theorem 4. The Frechet-Kolmogorov Theorem. A closed


and bounded set K in the space LP(IR) is compact if and only if the
following two limits hold true, uniformly for f in K:

lim f
h..... O}a
If(x + h) - f(x)I P dx = 0

lim f If(x)IP dx = 0
M..... oo}lxl>M

Theorem 5. A closed and bounded set K in the space Co (de-


fined in Section 1.1) is compact if and only if for each positive c there
corresponds an integer n such that SUPxEK sUPi>n Ix(i)1 < c.

Theorem 6. A closed and bounded set K in e2 is compact if and


only if
lim sup Lx(i? =
n ..... oo xEK .
0
t~n

Problems 7.4

1. Define!n E C[O, 1] by !n(x) = nx/(nx + 1). Is the set Un : n E N} equicontinuous? Is


it bounded? Is it closed?
2. Let />,(x) = e>'x. Show that {f>. : A ~ b} is equicontinuous on [O,a].
3. In the space CIa, b]let K be the set of all polynomials of degree at most n that satisfy
lipli ~ 1. (Here n is fixed.) Is K equicontinuous? Is it compact?
4. Let Q and A be fixed positive numbers. Let K be the set of all functions! on [a, b] that
satisfy the Lipschitz condition

I!(x) - !(Y)I ~ Alx - yl"


Is K closed? compact? equicontinuous? bounded?
Section 7.5 Compact Operators and the Fredholm Theory 351

5. Let K be a set of continuously differentiable functions on [a, b]. Put K' = {I' : f E K}.
Prove that if K' is bounded, then K is equicontinuous.
6. Let K be an equicontinuous set in C[a, b]. Prove that if there exists a point Xo in [a, b]
such that {J(xo) : f E K} is bounded, then K is bounded.
7. Define an operator L on the space C[a, b] by

(Lf)(x) = lb k(x, y)f(y) dy

where k is continuous on [a,b] x [a,b]. Prove that L is a compact operator; that is, it
maps the unit ball into a compact set.
8. Prove or disprove: Let [fn] be a sequence of continuous functions on a compact space.
Let f be a function such that If(x) - fn(x)l.J. 0 for all x. Then f is continuous and the
convergence fn -t f is uniform.
9. Select an element a E £2, and define

K = {x E £2 : IXil ~ lailfor all i}

Prove that K is compact. The special case when ai = Iii gives the so-called Hilbert
cube.
10. Prove that not every compact set in £2 is of the form described in the preceding problem.
11. Reconcile the compactness theorems for L2 and £2, in the light of the isometry between
these spaces.

7.5 Compact Operators and the Fredholm Theory

This section is devoted to operators that we think of as "perturbations of the


identity," meaning operators I + A, where I is the identity and A is a compact
operator. The definition and elementary properties of compact operators were
given in Section 2.3, page 85. In particular, we found that operators with finite-
dimensional range are compact, and that the set of compact members of £(X, Y)
is closed if X and Yare Banach spaces. Thus, a limit of operators, each having
finite-dimensional range, is necessarily a compact operator. In many (but not all)
Banach spaces, every compact operator is such a limit. This fact can be exploited
in practical problems involving compact operators; one begins by approximating
the operator by a simpler one having finite-dimensional range. Typically, this
leads to a system of linear equations that must be solved numerically. (Examples
of problems involving operators with finite-dimensional range occur in Problems
20,21,22, and 29 in Section 2.1, pages 68-69.)
Here, however, we consider a related class of operators, namely those of the
form I + A (where A is compact), and find that such operators have favorable
properties too. Intuitively, we expect such operators to be well behaved, be-
cause they are close to the identity operator. For example, we shall prove the
famous Fredholm Theorem, which asserts that for such operators the property
of injectivity (one-to-oneness) is equivalent to surjectivity (being "onto"). This
is a theorem familiar to us in the context of linear operators from IR n to IR n :
352 Chapter 7 Additional Topics

For an n x n matrix, the properties of having a O-dimensional kernel and an n-


dimensional range are equivalent. The proof of the Fredholm Theorem is given
in several pieces, and then further properties of such operators are explored. We
have relied heavily on the exposition in [Jam], and recommend this reference to
the reader.

Lemma 1. Let A be a compact operator on a normed linear space.


If I + A is surjective, then it is injective.

Proof. Let B = I +A and Xn = ker(Bn). Suppose that B is surjective but not


injective. We shall be looking for a contradiction. Note that 0 C Xl C X 2 C ...
It is now to be proved that these inclusions are proper. Select a nonzero element
YI in Xl. Since B is surjective, there exist points Y2, Y3,··· such that BYn+l == Yn
for n = 2,3, ... We have

Furthermore,

These two equations prove that Yn E Xn "Xn- l and that those inclusions men-
tioned above are proper.
By the Riesz Lemma, (Section 1.4, page 22), there exist points Xn such
that Xn E X n, IIXnl1 == 1, and dist(xn, Xn-d ~ 1/2. If m > n, then we
have Bmx m = 0 because Xm E X rrt == ker(Bm). Also, Brrt-I xn = 0 because
Xn E Xn C X m- l . Finally, Bmx n = 0 because Xn E Xn C X m. These
observations show that

Now we can write

IIAxn - AXml1 = II(B - I)xn - (B - I)xmll == IIBxn - Xn - BXm + Xmll


= IIXm - (Bxm +xn - Bxn)11 ~ dist(xm,Xm-d ~ 1/2

The sequence [Axnl therefore can have no Cauchy subsequence, contradicting


the compactness property of A. •

Lemma 2. If A is a compact operator on a Banach space, then the


range of I + A is closed.

Proof. Let B = I + A. Take a convergent sequence [Yn] in the range of B,


and write Y == limYn. We want to prove that Y is in the range of B. Since this is
obvious if Y = 0, we assume that Y =I O. Denote the kernel (null space) of B by
K. Let Yn = BX n for suitable points Xn .
If [xnl contains a bounded subsequence, then (because A is compact) [Axnl
contains a convergent subsequence, say AXni --* u. Since AXni + x ni == BXni ==
Section 7.5 Compact Operators and the Fredholm Theory 353

Yn·
~
-+y, we infer that Xn t· = Yn·t -Axn1.· -+
y-u. Then Y = lim Bx n1.· = B(y-u),
and y is in the range of B. This completes the proof in this case.
If [xnl contains no bounded subsequence, then Ilxnll-+
00. Since y =I 0, we
can discard a finite number of terms from the sequence [xnl. and assume that
Xn 1:. K for all n. Using Riesz's Lemma, construct vectors Vn = kn + OnXn so
that Ilvnll
= 1, k n E K, and dist(vn,K) ~ 1/2. Note that

(1)

Since II0nYnii IIBvnl1 IIBII


= ~ and Yn -+
Y =I 0, we see that [onl is bounded.
Since [vnl is bounded, [Avnl contains a convergent subsequence. Using the
boundedness of [on], we can arrange that

and

From Equation (1), we conclude that (I + A)vni = oniYni and

If 0 were 0, we would have vni -+ -


z and - Bz = lim BVni = lim( vni + Avni ) =
- z + z = 0. This would show that z E K. This cannot be true because it would
imply

Hence 0 =I 0. Since BVni -+ oY, we have

Consequently, B(y - 0- 1 z) = y, and Y is in the range of B.



Lemma 3. Let A be a compact operator on a Banach space. If
I + A is injective, then it is surjective.

Proof. Let B = I +A and let Xn denote the range of Bn. We have

Since each Ak is compact (for k ~ 1), Bn is the identity plus a compact operator.
Thus Xn is closed by Lemma 2.
If x E Xn for some n, then for an appropriate u we have

Thus

(2)

Now our objective is to establish that Xl = Xo.


354 Chapter 7 Additional Topics

If all the inclusions in the list (2) are proper, we can use Riesz's Lemma to
select Xn E Xn such that IIXn I n,
= 1 and dist(x X n+1 ) ~ 1/2. Then, for n < m,
we have

IIAXm - AXnl1 = II(B - I)xm - (B - I)xnll = IIXn - (xm + BXn - Bxm)11


~ dist(x n , Xn+d ~ 1/2

because Xm E Xm C X n+1 , BX m E X m+1 C X n+1 , and BX n E X n+1 . This


argument shows that [Axn] can contain no Cauchy subsequence, contradicting
the compactness of A.
Thus, not all the inclusions in the list (2) are proper, and for some n,
Xn = X n+1 • We define n to be the first integer having this property. All we
have to do now is prove that n = O.
If n > 0, let x be any point in X n - 1 • Then x = B n - 1 y for some y, and

It follows that Bx = B n+1 z for some z. Since B is injective by hypothesis, x =


Bnz E X n . Since x was an arbitrary point in X n- 1 , this shows that X n- 1 C X n .
But the inclusion X n - 1 ::J Xn also holds. Hence X n- 1 = X n , contrary to our
choice of n. Hence n = O. •

Theorem 1. The Fredholm Alternative. Let A be a compact


linear operator on a Banach space. The operator I + A is surjective if
and only if it is injective.

Proof. This is the result of putting Lemmas 1 and 3 together.


The name attached to this theorem is derived from its traditional formu-

lation, which states that one and only one of the these alternatives holds: (l)
I + A is surjective; (2) I + A is not injective. A stronger result is known, and
we refer the reader to [BN] or [KA] for its proof:

Theorem 2. If A is a bounded linear operato", on a Banach space


and if An is compact for some natural number n, then the properties
of surjectivity and injectivity of I + A imply each other.

An easy extension of Theorem 1 is important:

Theorem 3. Let B be a bounded linear invertible operator, and


let A be a compact operator, both defined on one Banach space and
taking values in another. Then B + A is surjective if and only if it is
injective.

Proof. Suppose that B+A is injective. Then so are B- 1 (B+A) and I +B- 1 A.
Now, the product of a compact operator with a bounded operator is compact.
(See Problem 7.) Thus, Theorem 1 is applicable, and I + B-1 A is surjective.
Hence so are B(I + B- 1 A) and B + A. The proof of the reverse implication is
similar. •
Section 7.5 Compact Operators and the Fredholm Theory 355

Theorem 4. A compact linear transformation operating from one


normed linear space to another maps weakly convergent sequences into
strongly convergent sequences.

Proof. Let A be such an operator, A : X -+ Y. Let Xn ->. X (weak conver-


gence) in X. It suffices to consider only the case when x = O. Thus we want
to prove that AX n -+ O. By the weak convergence, 4>(xn) -+ 0 for all 4> E X'.
Interpret 4>(xn) as a sequence of linear maps Xn acting on an element 4> E X'.
Since X· is complete even if X is not, the Uniform Boundedness Theorem (Sec-
tion 1.7, page 42) is applicable in X'. One concludes that IIXnl1 is bounded. For
any 't/J E Y',

because 't/J 0 A E X'. Thus AX n ->. o. IfAx n does not converge strongly to 0,
there will exist a subsequence such that IIAXni II ~ E > O. By the compactness
of A, and by taking a further subsequence, we may assume that AXni -+ Y for
some y. Obviously, IIYII ~ E. Now we have the contradiction AXni ->. y and
AXni ->. o. •
Lemma 4. Let [Anl be a bounded sequence of continuous linear
transformations from one normed linear space to another. If Anx -+ 0
for each x in a compact set K, then this convergence is uniform on K.

Proof. Suppose that the convergence in question is not uniform. Then there
exist a positive E, a sequence of integers ni, and points x ni E K such that
IIAniX ni II ~ E. Since K is compact, we can assume at the same time that
x ni converges to a point x in K. Then we have a contradiction of pointwise
convergence from this inequality:

IIAniXl1 = IIAnixni + (AniX - AniXnJ11


~ IIAniXni 11-IIAni X- Anixni II
~ E - IIAni 1IIIx - xni II •
For some Banach spaces X, each compact operator A : X -+ X is a limit of
operators of finite rank. This is true of X = C(T) and X = L2(T), but not of
all Banach spaces. One positive general result in this direction is as follows.

Theorem 5. Let X and Y be Banach spaces. lEY has a (Schauder)


basis, then every compact operator from X to Y is a limit of finite-rank
operators.

Proof. If [vnl is a basis for Y, then each y in Y has a unique representation of


the form
L Ak(Y)Vk
00

y=
k=l
(See Problems 24-26 in Section 1.6, pages 38-39.) The functionals Ak are con-
tinuous, linear, and satisfy sUPk IIAkl1 < 00. By taking the partial sum of the
356 Chapter 7 Additional Topics

first n terms, we define a projection Pn of Y onto the linear span of the first n
vectors Vk. Now let A be a compact linear transformation from X to Y, and let
S denote the unit ball in X. The closure of A(S) is compact in Y, and Pn - I
converges pointwise to 0 in Y. By the preceding lemma, this convergence is
uniform on A(S). This implies that (PnA - A)(x) converges uniformly to 0 on
S. Since each PnA has finite-dimensional range, this completes the proof. I

Theorem 6. Let A be a compact operator acting between two


Banach spaces. If the range of A is closed, then it is finite dimensiona1.

Proof. Since A is compact, it is continuous and has a closed graph. Assume


that A : X --t Y and that A(X) is closed in Y. Then A(X) is a Banach space.
Let S denote the unit ball in X. By the Interior Mapping Theorem (Section
1.8, page 48), A(S) is a neighborhood of 0 in A(X). On the other hand, by
its compactness, A maps S into a compact subset of A(X). Since A(X) has a
compact neighborhood of 0, A(X) is finite dimensional, by Theorem 2 in Section
1.4, page 22. I

Let us consider the very practical problem of solving an equation Ax- AX = b


when A is of finite rank. In other words, A has a finite-dimensional range. Let
{VI, ... ,Vn } be a basis for the range of A. Let b EX, Ab = L(3iVi, and
AVj = Li aijVi·
We must assume that the numerical values of aij and (3i are available to us.
Determining the unknown X will now reduce to a standard problem in numerical
linear algebra. The case when A = 0 is somewhat different, and we dispose of
that first.
If A = 0, we find Ui such that AUi = Vi. Since the equation Ax = b can be
solved only if b is in the range of A, we write b = L "/iVi. Then the solution is
x = L7=1 riUi, because with that definition of x we have Ax = L~l riAui =
L7=1 riVi = b.
Now assume that A is not zero. The following two assertions are equivalent:
(a) There is an x E X such that Ax - AX = b.
(b) There exist CI,"" Cn E lR such that Lj aijCj - ACi = (3i (for i = 1, ... , n).
To prove this equivalence, first assume that (a) is true. Define Ci by Ax =
L CiVi· Then one has successively

n
LCjVj - AX = b
j=l

n
L cjAvj - AAx = Ab
j=l

n n n n
L Cj L aijVi - A L CiVi = L (3i vi
j=l i=l i=l i=l
Section 7.5 Compact Operators and the Fredholm Theory 357

Since {VI, ... , v n } is linearly independent,


n
(3) LaijCj - ACi = (3i for i = 1, ... ,n
j=1
This proves (b).
For the converse, suppose that (b) is true. Define x = C~:::CjVj - b)jA.
Then we verify (a) by calculating

l[n n n n]
=":\ ~ ~ aijCjVi - ~ (3iVi - A~ CiVi +b= b
This analysis has established that the original problem is equivalent to a
matrix problem of order n, where n is the dimension of the range of the operator
A. The actual numerical calculations to obtain x involve solving the Equation
(3) for the unknown coefficients Cj. We have not yet made any assumptions to
guarantee the solvability of Equation (3).
Now one can prove the Fredholm Alternative for this case by elementary
linear algebra. Indeed we have these equivalences (in which A =I- 0):
(i) A - AI is surjective.
(ii) For each b there is an x such that Ax - AX = b.
(iii) For all «(31, ... ,(3n) the system 'L.7=laijCj - ACi = (3i (i = 1, ... ,n) is
soluble.
(iv) The system 'L.7=1 aijCj - ACi = 0 has only the trivial solution.
(v) The equation Ax - AX = 0 has only the trivial solution.
(vi) A is not an eigenvalue of the operator A.
(vii) A - AI is injective.
Integral Equations. The theory of linear operators is well illustrated in the
study of integral equations. These have arisen in earlier parts of this book, such
as in Sections 2.3, 2.5, 4.1, 4.2, and 4.3. A special type of integral equation
has what is known as a degenerate kernel. A kernel is called degenerate or
separable if it is of the form k(s, t) = 'L.~ Ui(S )Vi(t). The corresponding integral
operator is

(Kx)(t) = J k(s, t)x(s) ds = Jf ,=1


Ui(S)Vi(t)X(S) ds

= i>i(t)
i=1
J Ui(s)x(s)ds

If we use the inner-product notation for the integrals in the above equation,
we have the simpler form
n
Kx = L(X,Ui)Vi
i=1
It is clear that K has a finite-dimensional range and is therefore a compact
operator. (Various spaces are suitable for the discussion.)
358 Chapter 7 Additional Topics

Lemma 5. In the definition of the degenerate kernel k = L~=1 UiVi,


there is no loss of generality in supposing that {Ul,"" un} and
{VI, ... , Vn } are linearly independent sets.

Proof. Suppose that {VI, .. . , vn } is linearly dependent. Then one vector is a


linear combination of the others, say Vn = L~:11 aivi. Then we can write the
kernel with a sum of fewer terms as follows:
n n-l
Kx = 2:)x, Ui)Vi = 2)x, Ui)Vi + (x, un)vn
i=1 i=1
n-l n-l
= L (x, Ui)Vi + (x, un) L aiVi
i=1 i=1

= t;
n-l[
(x, Ui) + ai(x, Un)
]
Vi = t;
n-l
(X, Ui + aiUn)Vi

A similar argument applies if {Ul' ... , Un} is dependent.



To solve the integral equation K x- >.x = b when K has a separable kernel (as
above), we assume>. i- 0 and that {Vl' ... , v n } is independent. Then {VI, ... , v n }
is a basis for the range of K, and the theory of the preceding pages applies. If
the integral equation has a solution, then the system of linear equations

n
L aijCj - >'Ci = f3i (l:(i:(n)
j=1

has a solution, where K Vj = Li aiJvi and Kb = L f3iVi. It follows that the


solution is x = >.-l(LC;Vi - b).
The spectrum of a linear operator A on a normed linear space is the set
of all complex numbers >. for which A - >.I is not invertible.

Theorem 7. Let A be a compact operator on a Banach space. Each


nonzero element of the spectrum of A is an eigenvalue of A.

Proof. Let>. i- 0 and suppose that>. is not an eigenvalue of A. We want to


show that >. is not in the spectrum, or equivalently, that A - >.I is invertible.
Since>. is not an eigenvalue, the equation (A - >.I)x = 0 has only the solution
x = O. Hence A - >'1 is injective. By the Fredholm Alternative, A - >'1 is
surjective. Hence (A - >'I)-l exists as a linear map. The only question is whether
it is a bounded linear map. The affirmative answer comes immediately from the
Interior Mapping Theorem and its corollaries in Section 1.8, page 48ff. That
would complete the proof. There is an alternative that avoids use of the Interior
Mapping Theorem but uses again the compactness of A. To follow this path,
assume that (A - >'I)-l is not bounded. We can find Xn such that IlxnII = 1 and
II(A->'I)-lxnll ~ 00. Put Yn = (A->.I)-lx n · Then IIYnll/ll(A->.I)Ynll ~ 00.
Section 7.5 Compact Operators and the Fredholm Theory 359

Put Zn = Yn/IIYnll, so that IIZnl1 = 1 and II(A->.I)znll-+ O. Since A is compact,


there is a convergent subsequence AZnk -+ w. Then

Hence A(>.-IW) = w or (A - >.I)w = O. Since Ilwll = 1>'1 i- 0, we have contra-


dicted the injective property of A - >.I. •
For an integral equation having a more general kernel, there is a possibility
of approximating the kernel by a separable (degenerate) one, and solving the
resulting simpler integral equation. The approximation is possible, as we shall
see. We begin by recalling the important Stone-Weierstrass Theorem:

Theorem 8. Stone-Weierstrass Theorem. Let T be a compact


topological space. Every subalgebra ofC(T) that contains the constant
functions and separates the points ofT is dense in C(T).

Definition. A family of functions on a set is said to separate the points of


that set if for every pair of distinct points in the set there exists a function in
the family that takes different values at the two points.
Example 1. Let T = [a, b] c R Let A be the algebra of all polynomials
in C(T). Then A is dense in C(T), by the Stone-Weierstrass Theorem. This
implies that for any continuous function f defined on [a, b] and for any f > 0
there is a polynomial p such that

lit - pll == max{lf(t) - p(t)1 : a ~ t ~ b} < f

Example 2. Let Sand T be compact spaces. Then the set


n
{f: f(s,t) = I>i(S)bi(t) for some n, ai E C(S), bi E C(T)}
i=1

is dense in C(S x T).


The next theorem shows that theoretically we obtain a solution to the orig-
inal problem as a limit of solutions to the simpler ones involving operators of
finite rank.

Theorem 9. Let AD, AI, ... be compact operators on a Banach


space, and suppose limn An = AD' If>' is not an eigenvalue of AD and
if for each n there is a point Xn such that Anxn - >,xn = b, then for all
sufficiently large n,

Proof. Since>. is not an eigenvalue of AD, it is not in the spectrum of AD, by


Theorem 7. Hence AD - >'I is invertible. Select m such that for n ~ m
360 Chapter 7 Additional Topics

By Problem 2 in Section 4.3 (page 189), (An - )..1)-1 exists (when n :;:;: m). Now
write

Xn - Xo = [(An - )..1)-1 - (Ao - )..1)-l]b


= (An - )..1)-1 [I - (An - )..1)(Ao - )..1)-1] b

= (An - )..1)-1 [I - {Ao -),,1 - (Ao - An)}(Ao - )..1)-l]b


= (An - )..1)-l(Ao - A.n)(Ao - )..1)-lb
= (An - )..1)-l(Ao - An)xo •

Problems 7.5

1. Supply the "similar" argument omitted from the proof of Lemma 1.


2. Let k(s, t) = l:=~ Ui(S)Vi(t), where Ui E L2(8) and Vi E L2(T). Show that k(s, t) can
also be represented as l:=~ Ui(S)Vi(t), where {Vi} an orthonormal set.
3. On page 358 we saw how to solve K x - AX = b if K is an integral operator having a
separable kernel and A f. O. Give a complete analysis of the case when A = O.
4. Is the set of polynomials of the form CO + qt 17 + c2t 34 + c3t 51 + ... dense in C[a, bJ?
Generalize.
5. Solve the integral equation

11 (t - s)x(s) ds - AX(t) = b(t)

6. Solve the integral equation

11 [a(s) + b(t)]x(s) ds - x(t) = c(t)

7. Prove that the set of compact operators in £(X, X) is an ideal. This means that if A
is compact and B is any bounded linear operator, then AB and BA are compact. (This
property is in addition to the subspace axioms.)
8. Let A be a compact operator on a Banach space. Prove that if I - A is injective, then it
is invertible.
9. Let A,B E £(X,X). Assume that AB = BA and that AB is invertible. Prove
that (a) A(AB)-l = (AB)-IA; (b) B-1 = A(AB)-I; (c) A-I = B(AB)-I; (d)
A(AB)-1 = (AB)-IA; (e) (BA)-l = A- I B-l.

10. Let A be a linear operator from X to Y. Suppose that we are in possession of elements
U 1, U2, ... , Un whose images under A span the range of A. Describe how to solve the
equation Ax = b.
11. Prove that if A is a compact operator on a normed linear space, then for some natural
number n, the ranges of (I + A)n, (I + A)n+l, . .. are all identical.
12. Let A be a linear transformation defined on and taking values in a linear space. Prove
that if A is surjective but not injective, then ker(An) is a proper subset of ker(A n+l),
for n = 1,2,3, ...
13. Let A be a bounded linear operator defined on and taking values in a normed linear
space. Suppose that for n = 1,2,3, ... , the range of An properly contains the range of
An+l. Prove that the sum A + K is never invertible when K is a compact operator.
Section 7.6 Topological Spaces 361

14. Prove that if An are continuous linear transformations acting between two Banach spaces,
and if Anx --+ 0 for all x, then this convergence is uniform on all compact sets. (Cf.
Lemma 4.)
15. Give examples of operators on the space Co that have one but not both of the properties
injectivity and surjectivity.
16. (More general form of Lemma 5) Let A be an operator defined by the equation Ax =
2:~=1 <Pi(X)Vi, in which x E X, Vi E Y, and <Pi E X'. Prove that there is no loss of
generality in supposing that {VI, ... , vn} and {<PI, ... , <Pn} are linearly independent sets.
17. Show how to solve the equation Ax-x = b if the range of A is spanned by {AUl,.'" AUn},
for some Ui EX. Prove that the equation is solvable if 1 is not an eigenvalue of A.
18. Let A and B be members of £(X, Y), where X and Yare Banach spaces. Suppose that
B is invertible and that (B-1 A)m is compact for some natural number m. Prove that
B + A is surjective if and only if it is injective.
19. Provide the details for the assertions in Example 2.
20. Use the Stone-Weierstrass Theorem to prove this result of Diaconis and Shahshahani
[DiS]: If X is a normed linear space and f is a continuous function from X to JR, then
for any compact set K in X and for any positive c there exist <Pi E X' and coefficients
Ci such that

sup If(X) -
xEK
t
i=l
I
ci e4>d x ) < c

21. Prove this for an arbitrary compact operator A: The transformation I +A is surjective
if and only if -1 is not an eigenvalue of A.
22. Prove the finite-dimensional version of the Fredholm alternative, which we formulate as
follows, for an arbitrary matrix A and vector b: The system Ax = b is consistent if and
=
only if the system yT A 0 yTb of 0 is inconsistent.
23. Discuss the existence and uniqueness of solutions to the integral equation

1" cos(s - t)u(s) ds = f(t)

where f is a prescribed function, and U is the unknown function.

7.6 Topological Spaces

In this section we provide an abbreviated introduction to topological notions-


hardly more than enough to bring us to the Tychonoff Theorem.
So far in this book we have been dealing routinely with topological spaces,
but of a special kind, usually metric spaces or normed linear spaces. Now we
require a more general discussion so that there will be a suitable framework for
the weak topologies on linear spaces and for other examples.
A good starting point is the question, What is a topology? It is a family
of sets such that:
a. The empty set, 0, is a member of the family.
h. The intersection of any two members of the family is also in the family.
c. The union of any subfamily is also in the family.
If T is a topology, we define X to be the union of all members of T We say
that X is the space of T and that T is a topology on X. We call the pair
362 Chapter 7 Additional Topics

(X, T) a topological space. Each member of T is called an open set in X.


By Axiom c, X is open. So is the empty set, 0. The use of the word "open"
will be unambiguous if there is only one topology being discussed. If there are
several, a more exact terminology will be needed. For example, one could refer
to the T-open sets and the S-open sets, if T and S are topologies. In any case,
X and 0 will always be open, no matter what topology has been assigned to X.
Example 1. Let X be any set and let T = 2x. That notation signifies that T
is the family of all subsets of X, including the empty set and X itself. Obviously,
the axioms are fulfilled in this example. This topology is the largest one that
can be defined on X, and is called the discrete topology. Every singleton {x}
is an open set in this topological space. Every topology on X is contained in
the discrete topology on X. •
Example 2. Let X be any set, and define T to consist of only the empty
set and the given set X. This is the smallest topology on X, and is called
whimsically the indiscrete topology on X. Every topology on X contains the
indiscrete topology. •
Example 3. In JR define a set V to be open if every point x in V is the center
of an interval (x - c, x +c) that lies wholly in V. This defines the usual topology
on lR. We do not stop to prove that this definition of "open" leads to a family
satisfying the axioms (but this is a good exercise for the reader). •
Example 4. In any set X let a notion of distance between points be intro-
duced. The distance from x to y can be denoted by d(x, y), and this function
should satisfy three axioms:
a. If x # y, then d(x, y) = d(y, x) > o.
b. For each x, d(x, x) = o.
c. For all x,y,z, d(x,z) ~ d(x,y) +d(y,z).
Then a topology can be defined as in Example 3. Namely, a set V is open if
each point x in V is the center of a "ball"

B(x,c) = {y : d(x,y) < f}

that lies wholly in V. The pair (X, d) is a metric space, and is a topological
space, it being understood that its topology is the one just described. All normed
linear spaces are metric spaces, because the equation d(x, y) = Ilx - YII defines
a metric. •
A topological space is said to be a Hausdorff space if for any pair of
distinct points x and y there is a disjoint pair of open sets U and V such that
x E U and y E V. Every metric space is a Hausdorff space, since B(x,c) and
B(y,c) will be disjoint from each other if c is sufficiently small. The Hausdorff
property is one of a number of separation axioms that topological spaces may
satisfy. It is useful in questions of convergence, for it ensures that a sequence
(or net) can converge to at most one limit.
A base for a topology T is any subfamily B of T such that every open set
is a union of sets in B. For example, the open intervals with rational endpoints
form a base for the usual topology on JR. In a discrete space, the singletons {x}
form a base.
Section 7.6 Topological Spaces 363

A subbase for a topology T is a subfamily S of T such that the finite


intersections of sets in S form a base for T. For example, the intervals of the
form (a, 00) and (-00, b) provide a subbase for the (usual) topology of R This is
evident, since by intersecting two such intervals we can obtain an interval (a, b).
A topology on a space X can be defined by specifying any family S of
subsets of X as a subbase for the topology. A base B is then the family of all
finite intersections of sets in S, and the topology itself consists of all unions of
sets in B. An easy proof then establishes that the resulting family satisfies the
axioms for a topology.
If A is any set in a topological space, we can form the largest open set
contained in A by simply taking the union of all open sets that are subsets of A.
The resulting set is called the interior of A and is often denoted by A o . The
reader can verify that a set is open if and only if it equals its interior.
In a topological space, a neighborhood of a point is any set whose interior
contains the given point. It is easily verified that a set is open if and only if it
is a neighborhood of each of its points.
In a topological space, the closed sets are the complements of the open
sets. Further properties are easily proved:
a. The empty set and the space itself are closed.
b. The intersection of any family of closed sets is closed.
c. The union of any finite collection of closed sets is closed.
As a consequence of the preceding definitions, each set can be enclosed in
a smallest closed set, called the closure of that set. Namely, we take as the
closure of A the intersection of all closed sets containing A. Then a set is closed
if and only if it equals its closure. One quickly proves that a point x belongs to
the closure of a set A if and only if each neighborhood of x intersects A.
Another basic notion in general topology is that of the relative topology
in a subset of a topological space. If Y is a subset of a topological space X,
and if T is the topology on X, then we take as the "relative" topology on Y the
family
U = {Y nO: 0 E T}
A set can be open in Y (meaning that it belongs to U) without necessarily being
open in X. For example, if Y is a non-open subset of X, then Y E U, while
Y~T.
Another ingredient of general topology is an extended concept of conver-
gence. It turns out that sequential convergence is inadequate for describing
topological notions, in general. Sequential convergence suffices in some spaces,
such as metric spaces, but not in all spaces. The generalization of sequences
consists principally in allowing an ordered set other than the natural numbers
to serve as the domain of the indices. The definitions pertaining to this topic
are as follows.
364 Chapter 7 Additional Topics

A partially ordered set is a pair (D, -<) in which D is a set and -< is a
relation obeying these axioms:
a. Q -< Q
h. If Q -< j3 and j3 -< " then Q -< ,.
A directed set is a partially ordered set in which an additional axiom is re-
quired:
c. Given Q and j3 in D, there is, E D such that Q -< , and j3 -< ,.
The reader will recognize N as a familiar example of a directed set, it being
understood that -< is the ordinary relation~. Another important example is the
set of all neighborhoods of a point in a topological space, where -< is interpreted
as J.
A net or generalized sequence is a function on a directed set. This is
obviously more general than a sequence, which is a function on No We can use
the notation [x, D, -<J for a net, specifying the function x, the directed set D, and
the relation -<. When we need not concern ourselves with niceties, the notation
[xaJ can be used, just as we abuse notation for sequences and write [xnJ.
Useful conventions are as follows. A net [xaJ is eventually in a set V if
there is a j3 such that Xa E V whenever j3 -< Q. If a net is eventually in every
neighborhood of a point y, then we say that the net converges to y. Let us
illustrate with one example of a theorem employing nets.

Theorem 1. A point y is in the closure of a set S in a topological


space if and only if some net in S converges to y.

Proof. If the net [xal is in S and converges to y, then to each neighborhood


U of y there corresponds an index j3 such that Xa E U whenever j3 -< Q. In
particular, x{3 E U. Thus each neighborhood of y contains a point of S, and y is
in the closure of S. Conversely, suppose that y is in the closure of S. Let D be
the family of all neighborhoods of y, ordered by inclusion: Q -< j3 means j3 C Q.
Since y is in the closure of S, there exists for each QED a point Xa E Q n S.
The net [xaJ thus defined (with the aid of the Axiom of Choice) is in Sand
converges to y. •
In the preceding proof, had we known in advance that the point y pos-
sessed a countable neighborhood base, we could have used a sequential argu-
ment. However, there exist spaces in which some points do not have a countable
neighborhood base. (A base for the neighborhoods of a point x is a family of
neighborhoods of x such that every neighborhood of x contains one of the sets
in the family.)
A family A of sets in a topological space X is said to be an open cover
of X if all the sets in A are open and if X is contained in the union of the
sets in A. If every open cover of X has a finite subfamily that is also an open
cover of X, then X is said to be compact. The compact sets on the real
line are precisely the closed and bounded sets. The same assertion is true for
any finite-dimensional normed linear space. These matters were investigated in
Section 1.4, pages 19-22. But in that part of the book we adopted a sequential
definition of compactness that is inadequate for general topology.
Section 7.6 Topological Spaces 365

Alexander's Theorem. Let a subbase be specified for a topology


on a space. If every cover of the space by subbase elements has a finite
subcover, then the space is compact.

Proof. Let X be the space, T the topology, and S the subbase in question.
Assume that every cover of X by elements of S has a finite sub cover. Suppose
that X is not compact. We seek a contradiction.
The family of open covers that do not have finite subcov~rs is a nonempty
family. Partially order this family by inclusion, and invoke Zorn's Lemma (Sec-
tion 1.6, page 32). This maneuver produces an open cover A of X that is
maximal with respect to the property of possessing no finite subcover. Define
A' = S n A. Certainly, no finite subfamily of A' covers X. Since all sets in A'
are members of the subbase, our hypotheses imply that A' itself does not cover
X.
This last assertion implies that there exists a point x that is contained in
no member of A'. Since A is an open cover of X, we can select U E A such that
x E U. By the properties of a subbase, there exist sets Sl,"" Sn in S such that
x E n7=1 Si cU. Since x is contained in no member of A', one concludes that
Si tI- A'. Hence Si tI- A. By the maximal property of A, each enlarged family
Au {Si} contains a finite subcover of X, for i = 1,2, ... ,n. Hence, for each i in
{I, 2, ... ,n}, there is an open set Oi that is a union of finitely many sets in A
and has the property Oi U Si = X. Define B = 0 1 U· .. U On. Then B U Si = X
for each i, and n7=1 (B U Si) = X. It follows that

Since B is the union of finitely many sets in A, and since U E A, we see that
a finite subfamily of A covers X, contradicting a property of A established
previously. . •
If we have two topological spaces, say (Xl, 71) and (X2' 1;), then we can
topologize the Cartesian product Xl x X 2 in a standard way: We take as a base
for the topology of Xl x X 2 the family of all sets A x B, where A is open in Xl
and B is open in X 2 . (The topology itself consists of all unions of sets in the
base.)
The notion of a product extends to any family (finite, countable, or un-
countable) of topological spaces (X i ,7i), where i E I. The index set I can be
of arbitrary cardinality. The product space is denoted by IIX i or II{Xi : i E I}
and is defined to be the set of all functions x on I such that x( i) E Xi for all
i E I. In this context, we usually write Xi = x( i). This is exactly the process by
which we construct JRn from R We take n factors, all equal to JR. The generic
element of the product space is a "vector" that we write as x = [Xl, X2, ... , x n ].
Thus, x is a function on the index set {I, 2, ... , n}.
For each i E I there is a projection Pi from the product space X = IIXi to
Xi' It is defined by Pi(x) = Xi' The topology on X is taken to be the weakest
one that makes each of these projections continuous. One then must require
that each set Pi- l [OJ be open when a is an open set in Xi. The family of all
these sets is taken as a subbase for the product topology on X.
366 Chapter 7 Additional Topics

Tychonoff Theorem. A topological product of compact spaces is


compact.

Proof. For each i in an index set I, let (X i ,7;) be a compact topological


space. We form the product space X = IIXi , and give it the product topology
as described above. Use the projections Pi described above also. A subbase for
the product topology is the family S of all sets of the form pi-l(O), where i
ranges over I and 0 ranges over 7;. In order to take advantage of Alexander's
Theorem, let W be a cover of X by subbase sets. Thus

Wc S and X = U{ 0 : 0 E W}
For each i let Vi be the family of all open sets in Xi whose inverse images by Pi
are in W:

Assertion: For some i, Vi covers Xi. To prove this, assume that it is false. Then
for each i, Vi fails to cover Xi, and consequently, there exists a point Xi E Xi
such that Xi ~ U{O : 0 E V;}. By the Axiom of Choice we can select these
points Xi simultaneously and thereby construct an X in X such that

PiX = x(i) = Xi E Xi" U{O: 0 E Vi}


iEI

Consequently, we have for each i and for each open set 0 in Xi the following
implications:

However, W consists exclusively of sets having the form pi-l(O), and so the
above implication reads as follows:

This contradicts the fact that W is a cover of X, and proves the assertion.
Now select an index j E I such that Vj covers X j . By the compactness of X j '
a finite subfamily ofVj covers Xj, say 0 1 , ... , On E Vj and Xj = OlU·· ·UOn . It
follows (by using Pj-l) that X = Pj-l(Ol)U.· ·UPj-l(On). Since these n sets are
in W (by the definition of Vj), we have found a finite subcover in W, as desired .

A particular case of the topological product is especially useful. It occurs •


when all the factors Xi are equal, say to X. In the general theory, we can then
take each Xi to be a copy of X. The notation X I now is more natural than
II {X : i E I}. This space consists of all functions from I to X. We still have
the projections P; from Xl to X, and J>;(x) = x(i) for all i E I.
Section 7.7 Linear Topological Spaces 367

7.7 Linear Topological Spaces

Although normed linear spaces have served us well in this book, there are some
matters of importance in applied mathematics that require a more general topol-
ogized linear space. (The theory of distributions is a pertinent example.) What
is needed is a linear space in which the topological notions of continuity, com-
pactness, completeness, etc., do not necessarily arise from a norm and its induced
metric. The appropriate definition follows.
Definition. A linear topological space is a pair (X, T) in which X is a linear
space and T is a topology on X such that the algebraic operations in X are
continuous.
Being more specific about the continuity, we say that the two maps

(x,y) f-t X +y (A,x) f-t AX

are continuous, the first being defined on X x X and the second being defined
on lR x X. There is a corresponding definition if the scalar field is taken to be
C rather than R
We remind the reader that the sets belonging to the family T are called
the open sets, and a neighborhood of a point x is any set U such that for some
open set CJ we have x E CJ cU. The continuity axioms above can be stated in
terms of neighborhoods like this:
a. If U is a neighborhood of x + y, then there exist neighborhoods V of x
and W of y such that v + w is in U whenever v E V and wE W.
b. If U is a neighborhood of AX, then there are neighborhoods V of A and
W of x such that aw E U whenever a E V and w E W.
A very useful fact is that the topology is completely determined by the
neighborhoods of o. This is formally stated in the next lemma.

Lemma 1 In a linear topological space, a set V is a neighborhood


of a point z if and only if -z + V is a neighborhood of o.

Proof. Hold z fixed, and define f(x) = x + z. This mapping sends 0 to z. Let
V be a neighborhood of z. Since f is continuous, f-l(V) is a neighborhood of
o. Observe, now, that rl(V) = {x : f(x) E V} = {x : x + z E V} = -x + V.
Conversely, assume that -z+ V is a neighborhood of O. We have f-l(X) = x-z,
and f- 1 is also continuous. It maps z to O. Hence (f-l)-1 carries -z + V to a
neighborhood of z. But

(f-l)-l(_Z+ V) = {x: r1(x) E -z+ V} = {x: x-z E -z+ V} = V •

In any topological space, a family U of neighborhoods of a point x is called


a base for the neighborhoods of x if each neighborhood of x contains a
member of U. For example, one base for the neighborhoods of a point x on the
real line is the set of intervals [x - ~,x +~], where n ranges over N.
Since we often want the Hausdorff axiom to hold in our linear topological
spaces, we record here the appropriate condition.
368 Chapter 7 Additional Topics

Theorem 1. A linear topological space is a Hausdorff space if and


only if 0 is the only element common to all neighborhoods of O.

Proof. The Hausdorff property is that for any pair of points x i- y there must
exist neighborhoods U and V of x and y respectively such that the pair U, V
is disjoint. Select a neighborhood W of 0 such that x - y rt. W. Then (using
the continuity of subtraction) select another neighborhood W' of zero such that
W' - W' c W. Then x + W' is disjoint from y + W', for if z is a point in
their intersection, we could write z = x + WI = Y + W2, with Wi E W'. Then
x - y = W2 - WI E W' - W' c W. The other half of the proof is even easier:
just separate any nonzero point from 0 by selecting a neighborhood of zero that
excludes the nonzero point. •
At this juncture, we should alert the reader to the fact that some authors
assume the Hausdorff property as part of the definition of a linear topological
space.
Let X be a linear space (preferably without a topology, so that confusion
between two topologies can be avoided in what we are about to discuss). The
notation X' signifies the algebraic dual of X, i.e., the space of all linear maps
from X into the scalar field (IR or C). We can use X to define a "weak" topology
on X', and we can use X to define a weak topology on X'. There is an abstract
description that includes both of these constructions, but let us proceed in a
more pedestrian manner. What we have in mind is rather simple: We want
the topologies to lead to pointwise convergence in both cases. Although we
did not discuss it here, a topology can be defined by specifying the meaning of
convergence of nets. The topic is addressed in [Kel], pages 73-76.
The topology on X induced by X' can be called the weak topology. A base
for the neighborhood system at 0 is given by all sets of the form

In this equation c is any positive number, and {¢l, ... , ¢n} is any finite su bset of
X'. Convergence in this topology means the following: A net [x,,] in X converges
to a point x if [¢(x,,)] converges to ¢(x) for every ¢ EX'.
The topology on X' induced by X is often called the weak* topology. A
base for the neighborhood system of 0 is given by

V(C;XI,X2, ... ,Xn ) = {¢ E X': 1¢(Xi)1 < c, 1 ~ i ~ n}

Here, c is any positive number and {Xl, ... , x n } is any finite set in X. With
this topology, a net [¢,,] in X' converges to ¢ if and only if [¢,,(x)] converges to
¢(x) for all x E X. This is pointwise convergence.
The topologies just described are both Hausdorff topologies, as is easily
deduced from Theorem 1. Also, one sees immediately that the space X' is a
subspace of IR x , since the latter is the space of all functions from X to 1R, while
the former contains only linear functions. This observation leads one to surmise
that the Tychonoff Theorem can help in understanding compactness in X'. The
result that carries this out requires one further notion: In a linear topological
space, a set A is bounded if for any neighborhood of zero, say U, we have
A C AU for some real A.
Section 7.7 Linear Topological Spaces 369

Compactness Theorem. Let X be a linear space, and let X' be its


algebraic dual. Give X' the weak* topology. Then the compact sets in
X' are precisely the closed and bounded sets.

Proof. Let K be a compact set in X'. The set K is closed because in any
Hausdorff space compact sets are closed. (See [Kel], page 141.) To prove that
K is bounded, let U be any neighborhood of 0 in X'. It is to be shown that
K c )..U for some )... First, select a "basic" neighborhood V = V(c; Xl, . .. ,xm )
contained in U. Then
K C U{¢+ V: ¢ E K}

Since K is compact, this covering has a finite subcover:

(Here ¢i E K.) Select).. so that all ¢i are in )"V. Then

¢i + V C ).. V + V = ().. + 1) V C ().. + 1)U

(An easy calculation justifies the equation in this string of inclusions.) Conse-
quently, K C ().. + l)U and K is bounded.
For the converse, let K be a closed and bounded set in X'. For each X in
X, define
Ux = {¢ E X': 1¢(x)1 ~ I}

Thus Ux is a neighborhood of 0 in X'. Since K is bounded, there exists (for each


X in X) a positive scalar rx such that K C rxUx . Put Dx = {c: Icl ~ rx}. The

set Dx is either a disk in e or an interval in R In either case, Dx is a compact


set in the scalar field. If ¢ E K, then ¢ E rxUx for all x. Hence ¢(x) E Dx
for all x, and ¢ is in the product space II{Dx : X E X}. Consequently, K is a
subset of this product. That K is a closed subset therein is easily proved. By
the Tychonoff Theorem, this product of compact sets is compact in the product
topology of IR x (or eX). This is the weak* topology in X'. Since K is a closed
subset of a compact space, K is compact. •

Naturally, we are more interested in the spaces that are already linear topo-
logical spaces. In this case, there will be two topologies on X and three on X'.
(Notice that X' will have a weak* topology coming from X and a weak topology
coming from X".) The originally given topologies can be called the "strong"
topologies, in contrast to the "weak" ones discussed above. (Rudin argues for
the term "original topology" instead of "strong topology," because elsewhere in
functional analysis, "strong topology" means something else.)
370 Chapter 7 Additional Topics

Theorem 2. Let X be a linear topological space, and U a neigh-


borhood ofO. Then the polar set

UO = {¢ E X* : 1¢(x)1 ~ 1 for all x E U}

is compact in the weak* topology of X* .

Proof. The linear space X* (whose elements are continuous linear functionals)
is a subspace of X' (whose elements are linear functionals). The weak* topology
in X* is the relative topology in X* derived from the weak* topology on X'. By
the preceding theorem, we need only prove that UO is closed and bounded in the
weak* sense in X'. If we have a net [¢oJ in UO and ¢o -' ¢, then ¢o(x) ---t ¢(x)
for all x E U. Consequently, 1¢(x)1 ~ 1 for all x E U, and ¢ E uo. Thus UO is
closed in the weak* topology of X'. If W is any neighborhood of 0 in X', then
W contains a set of the form

Select r so that rXi E U. Then for ¢ E UO we have 1¢(x)1 ~ 1 for all x E U.


Furthermore, 1¢(rxi)1 ~ 1 and ¢ E (l/rc:)W. Thus UO is bounded. •

Theorem 3. (The Banach-Alaoglu Theorem) The unit ball in


the conjugate space of a normed linear space is compact in the weak*
topology.

Proof. In the preceding theorem, take U to be the unit ball of X. The polar
of U will then be the unit ball in X*. •
If the neighborhoods of 0 in a linear topological space have a base consisting
of convex sets, the space is said to be locally convex. It is these spaces that
we shall emphasize in the following discussion. Among such spaces we find the
normed spaces and the pseudo-normed spaces. A pseudo-norm or seminorm
is a real-valued function p defined on a linear space X such that:
1. p(AX) = IAlp(x) for all A E lR,x E X
2. p(x + y) ~ p(x) + p(y) for all x, y E X
It follows that p(O) = 0 and that p(x) ~ 0 for all x E X. If p is a semi norm on
a linear space X, then in a standard way X receives a locally convex topology.
Namely, a base for the neighborhoods of 0 is taken to be the family of sets

V(f) = {x E X :p(x) < f} (f > 0)

It is easy to see that this set is convex.


In many spaces, the topology is not quite so simple; their topologies are
defined not by a single seminorm but by a family of seminorms. Let P be a
family of seminorms on a linear space X. We define the topology by giving a
neighborhood base for o. The base consists of all sets
Section 7.7 Linear Topological Spaces 371

in which € > 0 and {PI, ... ,Pn} is any finite subset of P.


In a linear space topologized with a family P of seminorms, it is easy to
prove that a sequence (or generalized sequence) of points Xk converges to zero if
and only if p(Xk) converges to zero for each pEP.
Notice that these basic neighborhoods of 0 are convex, and X, thus topol-
ogized, is a locally convex linear topological space. A remarkable theorem now
can be stated.

Theorem 4. For any locally convex linear topological space there


is a family of continuous seminorms that induces the topology.

Proof. Let P be the family of all continuous semi norms defined on the given
space. Let U be a neighborhood of 0 in the original topology. First we must
prove that U contains one of the sets V(€;PI, ... ,Pn)' Since the space is locally
convex, U contains a convex neighborhood U I of O. By the continuity of scalar
multiplication, there exists a convex neighborhood U2 of 0 and a number 8 > 0
such that ex E U I whenever x E U2 and Icl < 8. The set U3 = U{,\U2 : 1,\1 < 1}
is a convex neighborhood of 0 contained in U. Its Minkowski functional P is
continuous because for any r > 0 and any x E rU3 , we have p(x) ~ r. Thus,
V(~;p) C W3 C U. (Minkowski functionals were defined in the the proof of
Theorem 1 in Section 7.3, page 343.)
Now let V be any "basic" neighborhood of 0 in the new topology. Say,
V = V(€;PI,." ,Pn)' Since each Pi is continuous in the original topology, V is
open in the original topology. It therefore contains a convex neighborhood of 0
from the original topology. •
One of the main justifications for emphasizing locally convex linear topological
spaces is that such spaces have useful conjugate spaces. For any linear topolog-
ical space X, one can define X' to be the linear space of all continuous linear
functionals on X. Without further assumptions, X' may have only one ele-
ment, namely O! A good example of this phenomenon is the space £P in which
o < P < 1. The topology is given by a norm-like functional that is actually not
a norm (since it fails the triangle inequality):

Ilxll = L
(
00
Ixnl P
)l/P
n=l

The only continuous linear functional is O.


The principal corollary of the Hahn-Banach Theorem is valid for locally
convex spaces, and takes the following form.

Theorem 5. Every continuous linear functional defined on a sub-


space of a locally convex space has a continuous linear extension defined
on the entire space.

Example. Consider the space 1) of test functions on IRn. This space was
defined in Chapter 5, page 247. Its elements are Coo functions having compact
support. The convergence to zero of a sequence [<t>k] in 1) was defined to mean
372 Chapter 7 Additional Topics

that there was one compact set containing the supports of all ¢k, and on that
compact set, DOt¢k converged uniformly to zero, for every 0:. This notion of
convergence can be defined with a sequence of seminorms. For j = 1,2,3 ... ,
define
Pj(¢) = sup{I(DOt¢)(x)1 : x E IR n , IIxll ~ j, 10:1 ~ j}
Thus topologized, the space of test functions becomes a locally convex linear
topological space. Its conjugate space is the space of distributions. •
In a linear topological space, a set A is totally bounded if, for any neigh-
borhood U of 0, A can be covered by a finite number of translates of U:
m
Ac U(Xi+U)
i=l

More succinctly, A c F + U, for some finite set F. From the definition of


compactness in terms of coverings, it is obvious that a compact set in a linear
topological space is totally bounded. We shall use these ideas to prove Mazur's
Theorem, to the effect that co(K) is compact when K is compact. We shall
require the following result, for which we refer the reader to [KN].

Theorem 6. A set in a linear topological space is compact if and


only if it is both totally bounded and complete.

Lemma 2. In a locally convex linear topological space, the convex


hull of a totally bounded set is totally bounded.

Proof. Let Y be such a set and let U be any neighborhood of O. Select a


convex neighborhood V of 0 such that V + V cU. Since Y is totally bounded,
there is a finite set F such that Y c F + V. Let Z = co(F). The set Z is
compact, being the image of a compact set under a continuous map of the form
(lh, ... ,(}n) f-t L~=l (}iZi, where {Zl, .. . ,zn} = F. It follows that Z is totally
bounded, and that Z C F' + V for another finite set F'. By the convexity of V
we have

co(Y) C co( F + V) = co( F) +V = Z +V C F' +V +V c F' +U •

Theorem 7. Mazur's Theorem. The closed convex hull of


a totally bounded set in a complete locally convex linear topological
space is compact.

Proof. Let K be such a set in such a space. By the preceding lemma, co( K) is
totally bounded. Hence co(K) is closed and totally bounded. Since the ambient
space is complete, co(K) is complete and totally bounded. Hence, by Theorem 6,
it is compact. •
Section 7.8 Analytic Pitfalls 373

7.8 Analytic Pitfalls

The purpose of this section is to frighten (or amuse) the reader by exhibiting
some examples where erroneous conclusions are reached through an analysis
that seems at first glance to be sound. In every case, however, some theorem
pertinent to the situation has been overlooked. The relevant theorems are all
quoted somewhere in this section or elsewhere in the book. Proofs or references
are given for each of them. A connecting thread for many of these examples is
the question of whether interchanging the order of two limit processes is justified.
We begin with some matters from the subject of Calculus.
Here is an elementary example to show what can go wrong:

lim lim x - y = lim :: = 1


x + y x---;O X
x---;Oy---;O

lim lim x - Y = lim -y = -1


x +y
y---;Ox---;O y---;O y
A theorem governing this situation (and many others) is E.H. Moore's theorem,
proved later.
It is natural to think that if a function is defined by a series of analytic
functions, then the resulting function should be continuous, continuously differ-
entiable, and so on. (This was a commonly held view until the mid-1850s.) For
example, the series

1
'L 2n cos(3nx)
00
(1) f(x) =
n=l

consists of analytic terms, and the function f should be a "nice" one. We think
that the function defined by the series should inherit the good properties of the
terms in the series. Indeed, in this example, f is continuous, by the Weierstrass
M-Test. This test, or theorem, goes as follows.

Theorem 1. (Weierstrass M-Test.) If the functions gn are


continuous on a compact Hausdorff space X and if

00

(2) Ign(x)1 ~ Mn (for all x E X) and 'L Mn < 00


n=l

then the series E~l gn(x) converges uniformly on X and defines a


function that is continuous on X.

The hypotheses in display (2) constitute the "M- Test." In modern notation, we
could write instead E~=l Ilgnlloo < 00. In the example of Equation (1), one can
set gn(x) = 2- n cos(3nx) and see immediately that the constants Mn = 2- n
serve in Weierstrass's Theorem.
374 Chapter 7 Additional Topics

The Weierstrass M-Test gives us some hypotheses under which we can in-

terchange two limits:

m m
lim lim ' " gn(x
h-+O m-+oo L...,
+ h) = lim lim
m-+oo h-+O
L gn(x + h)
n=l n=l

Returning to the function I in Equation (1), we propose to compute I' by


differentiating term by term in the series, getting

00

J'(x) = - L3 n 2- n sin(3 n x)
n=l

But here there is an alarming difference, as the factors 3n 2- n are growing, not

shrinking. The very convergence of the series is questionable.

This example, I, is the famous Non-Differentiable Function of Weierstrass.

It is not differentiable at any point whatsoever! A detailed proof can be found

in [Ti2] or [Ch]. A sketch showing a partial sum of the series is in Figure 7.2.

Figure 7.2 A partial sum in the non-differentiable function

When we take more terms and blow up the picture, we see more or less the same
behavior, which reminds us of fractals. See Figure 7.3, where a magnification
factor of about 15 has been used.
Section 7.8 Analytic Pitfalls 375

Figure 7.3 Another partial sum, magnified


Now for the positive side of this question concerning differentiating a series
term by term: A classical theorem that can be found, for example, in [Wi] is as
follows.

Theorem 2. If the functions gn are continuously differentiable on


a closed and bounded interval, if the series L:n gn(x) converges on
that interval, and if the series L:n g~ (x) converges uniformly on that
interval, then (L:ngn)' = L:ng~·

Since differentiation involves a limiting process, the theorem just quoted is again
providing hypotheses to justify the interchange of two limits.
What can be said, in general, to legitimate interchanging limits? A famous
theorem of Eliakim Hastings Moore gives one possible answer to this question.

Theorem 3. Let f : N x N -t JR. Assume that lim f(n, m) exists


n--+oo
for each m and that lim f(n, m) exists for each n, uniformly in n.
m--+oo
Then the two limits limn limm f (n, m) and limm limn f (n, m) exist and
are equal.

Proof. Define g(m) = lim f(n, m) and h(n) = lim f(n, m). Let
n m
€ > O. Find a
positive integer M such that

m~M ==} If(n,m) - h(n)1 < € for all n


Notice that the uniformity hypothesis is being used at this step. A consequence is
I
that If(n, M)-h(n) < €, and by the triangle inequality If(n, m)- f(n, M)I < 2€
when m ~ M. Find N such that
n ~ N ===> If(n, M) - g(M)1 < €
376 Chapter 7 Additional Topics

No uniformity of the limit in m is needed here, as M has been fixed. Now we


have If(N, M) - g(M)1 < £ and If(N, M) - f(n, M)I < 2£ when n ~ N. We
next conclude that If(n, m) - f(N, M)I < 4£ when n ~ Nand m ~ M. This
establishes that the doubly indexed sequence f(n,m) has the Cauchy property.
By the completeness of JR, the limit lim f(n, m) exists. Call it L. Then,
(n,m)-->(oo,oo)
by letting (n, m) go to its limit, we conclude that IL - f(N, M)I ~ 4£. Also,
IL - f(n, m)1 < 8£ if n ~ Nand m ~ M. Letting n go to its limit, we get
IL - g(m)1 < 8£ if m ~ M. By letting m go to its limit, we get IL - h(n)1 ~ 8£
if n ~ N. Hence h(n) -+ Land g(m) -+ L. •
Moore's theorem is actually more general: The range space can be any com-
plete metric space, and the sequences can be replaced by "generalized" sequences
("nets"). See [DS], page 28. The reader will find it a pleasant exercise in the
use of these concepts to carry out the proof in the more general case.
Another case in which the interchange of limits creates difficulties is pre-
sented next in the form of a problem.
Problem. Let U be an orthonormal sequence in a Hilbert space, say U =
{Ul' U2, ... }. Is it true that each point in the closed convex hull of U is repre-
sentable as an infinite series I:~=1 anun , in which an ~ 0 and I: an = 1?
At first, this seems to be almost obvious: We are simply allowing an "infinite"
convex combination of elements from U in order to represent points in the closure
of the convex hull of U. A proof might proceed as follows. (Here we use "co"
for the convex hull and co for the closed convex hull.) Suppose that x E co(U).
Then there exists a sequence Xn E co(U) such that Xn -+ x. With no loss of
generality, we may suppose that
n n
Xn = Laniui where ani ~ 0 and L ani = 1 for all n
i=1 i=1

Letting n tend to 00, we arrive at x = I:~1 aiui, where ai = limn ani. This
limit is justified by the Hilbert space structure. Indeed, by the properties of an
orthonormal sequence, we must have ani = (xn, Ui), and therefore limn ani =
limn(xn,ui) = (X,Ui).
After examining the proof and discovering the flaw in it, the reader should
contemplate the following special case. Define Xn = (Ul + ... + un) In. Certainly,
Xn is in the convex hull of U. Since Xn is given by an orthonormal expansion,
we have IIxnl12 = n(1/n 2 ) = lin. This calculation shows that Xn -+ O. Hence 0
is in the closure of the convex hull of U. But it is not possible to represent 0 as
an "infinite" convex combination of the vectors Un. The only representation of
o is the trivial one, and those coefficients do not add up to 1.
The foregoing example shows that in general, for a series of constants,
n 00

lim ' " Cni =I- '~


n-+oo ~
lim Cni
" n-+oo
i=1 i=1

The same phenomenon can be illustrated by a more familiar example. Con-


sider the contrast between approximating a function with a polynomial and
Section 7.8 Analytic Pitfalls 377

expanding that function in a Taylor series. The expansion in powers of the vari-
able is not obtained simply by allowing more and more terms in a polynomial
and appealing to the Weierstrass approximation theorem. Indeed, only a select
few of the continuous functions will have Taylor series. If f is continuous, say
on [-1,1]' then there is a sequence of polynomials Pn such that Ilf - Pnlloo -+ O.
If it is desired to represent f as a series, one can write
00'

f = Pl + (P2 - Pl) + (P3 - P2) + (P4 - P3) + ... = L qn


n=l

where the polynomials qn have the obvious interpretation. However, this is


not a simple Taylor series, in general. Thus, if we have Pn -+ f and write
Pn (x) = L~=o Cnixi, it is not legitimate to conclude that

n
lim Pn(x) = lim L =L
00

f(x) = n-+oo n-+oo


Cni xi Ci Xi
i=O i=O

where Ci = n->oo
lim Cni. This last limit will not exist in most cases.
The expansion of a function in an orthogonal series has its own cautionary
examples. Consider the orthonormal family of Legendre polynomials, PO,Pl, ...
defined on the interval [-1, 1]. They have the property

For any continuous function f defined on this same interval, we can construct
its series in Legendre functions:

Here we write ~ to remind us that equality mayor may not hold. It is only
asserted that each continuous function has a corresponding formal series in Leg-
endre functions. Can we not appeal to the Weierstrass approximation theorem
to conclude that the series does converge to f? By now, the reader must guess
that the answer is "No". The reason is not at all obvious, but depends on a
startling theorem in Analysis, quoted here. (A proof is to be found in [Ch].)

Theorem 4. The Kharshiladze-Lozinski Theorem. For each


n = 0,1,2, ... let Pn be a projection of the space C[-l, 1] onto the
subspace TIn of polynomials of degree at most n. Then IlPnll -+ 00.

It is readily seen that the equation


n
Pn(J) = L ak(J)Pn
k=O
378 Chapter 7 Additional Topics

where the coefficients ak are as above, defines a projection of the type appearing
in Theorem 4. That is, Pn is a continuous linear idempotent map from C[-I, 1]
onto IIn. Hence, IlFnll -+ 00. By the Banach-Steinhaus Theorem (Chapter 1,
Section 7, page 41) the set of fin C[-I, 1] for which the series above converges
uniformly to f is of the first category (relatively small) in C[-I, 1].
One should think of this phenomenon in the following way. The space
C[-I, 1] contains not only the nice familiar functions of elementary calculus, but
also the bizarre unmanageable ones that we do not see unless we go searching
for them. Most functions are of the latter type. See Example 1 on page 42 to
be convinced of this. To guarantee convergence of the series under discussion,
one must make further smoothness assumptions about f. For example, if f is
an analytic function of a complex variable in an ellipse that contains the line
segment [-1,1]' then the Legendre series for f will converge uniformly to f on
that segment. For results about these series, consult [San].
Interchanging the order of two integrals in a double integral can also involve
difficulties. The Fubini Theorem in Chapter 8 addresses this issue. Here we offer
an example of a double integral in a discrete measure space, where the integrals
become sums. This is adapted from [MT].
Example. Consider a function of two positive integers defined by the formula
0 ifm> n
f(n,m) = { -1 ifm = n
2m - n ifm < n
The two possible sums can be calculated in a straightforward way, and they turn
out to be different:

0000 00 11
L L f(n, m) = L L f(n, m) = L [-1 + 2 + 4 + ...] = L 0 = 0
m n m=1n=m m=1 m

00 noon
LLf(n,m) = L L f(n,m) = f(I,I) + L L f(n,m)
n m n=1 m=1 n=2m=1
00

= -1 + L[2 1 - n + 22 - n + ... +2- 1 -1]


n=2

= -1 + L
00

_2 1 - n = -1-1 = -2
n=2

The difficulty here is not to be attributed to the fact that our domain N x N
stretches infinitely far to the right and upwards in the 2-dimensional plane. One
can make this example work on the unit square in the plane by the following
construction. Define intervals In = [l/(n + 1), lin]. On each rectangle In X 1m
define a function F whose integral over that rectangle is f(n, m), as defined
previously. We then find, from the above calculations, that

1111 F(x,y)dxdy = LLf(n,m) = 0


o 0 m n
Section 7.8 Analytic Pitfalls 379

whereas
t t F(x,y)dydx = LLf(n,m) =-2
io io n m

By referring to the Fubini Theorem, page 426, we see that our functions
f and F do not satisfy the essential hypothesis of that theorem: They are not
integrable over the Cartesian domain. The function If I, for example, has an
infinite number of values +1, and so cannot yield a finite integral over N x N.
Had we wished to apply the Tonelli Theorem, the crucial missing hypothesis
would have been that f ) 0 or F ) O.
Let us return to the functions defined by infinite series, for such functions are
truly ubiquitous in Mathematics. We can ask, "How does integration interact
with the summation process? Can integration be interchanged with summa-
tion?" The answer is that the conditions for this to be valid are less stringent
than those for differentiation. This is to be expected, for (in general) integration
is a smoothing process, whereas differentiation is the opposite: It emphasizes or
magnifies the variations in a function. The relevant theorem, again conveniently
accessible in [Wi], is as follows.

Theorem 5. If the functions gn are continuous on [a, b]' and if the


series L:n gn converges uniformly, then

This theorem is often used to obtain Taylor series for troublesome functions.
For example, if a Taylor series is needed for the Arctan function, we can start
with the relationship
d 1
L(-et
00
-d Arctan(t) = - - 2 =
t 1+t n=O

This is valid for t 2 < 1. Then for x 2 < 1,

The interchange of differentiation and integration is another common tech-


nique in analysis. Here there are various theorems that apply, for example, the
following one, given in [Wi].

Theorem 6. Let f(x) = J:


g(x, t) dt, where 9 and ogjox are
continuous on the rectangle [A, B] x [a, b] in the xt-plane. Then
f'(x) = J:og(x, t)joxdt

(Amusing anecdotes about differentiating under the integral sign occur in


Feynman's memoirs [Feyn], pages 86,87, and 110.) For more general situations,
involving an arbitrary measure J.L, we can still raise the question of whether

(3)
d
dx iTrg(x, t) dJ.L(t) = iTr ox
og
(x, t) dJ.L(t)
380 Chapter 7 Additional Topics

The setting is as follows. A measure space (T, A, Jl) is prescribed. Thus T


is a set, A is a a-algebra of subsets of T, and Jl : A ---+ [O,ooJ is a measure. An
open interval (a, b) is also prescribed. The function g is defined on (a, b) x T
and takes values in JR. Select a point Xo in (a, b) where (og/ox)(xo, t) exists for
all t. What further assumptions are needed in order that Equation (3) shall be
true at a prescribed point xo? Let us assume that:
(A) For each x in (a, b), the function t t---+ g(x, t) belongs to Ll(T, A, Jl).
(B) There exists a function G E L1(T,A,Jl) such that

g(X,t)-g(xo,t)/ !!(G(t) (tET, a<x<b, xf=xo)


/ X-Xo
Theorem 7. Under the hypotheses given above, Equation (3) is
true for the point x = Xo.

Proof. By Hypothesis (A) we are allowed to define

f(x) = hg(x,t)dJl(t)

The derivative f'(xo) exists if and only if for each sequence [xnJ converging to
Xo we have
!'(xo) = lim f(x n ) - f(xo) = lim f g(xn' t) - g(xo, t) dJl(t)
n--+oo Xn - Xo n--+oo iT Xn - Xo
By Hypothesis (B), the integrands in the preceding equation are bounded in
magnitude by the single L1-function G. The Lebesgue Dominated Convergence
Theorem (see Chapter 8, page 406) allows an interchange of limit and integral.
Hence

!'(xo) = f lim g(xn' t) - g(xo, t) dJl(t) = f og (xo, t) dJl(t) •


iT n--+oo Xn - Xo iT ox
This proof is given by Bartle [Bart 1J. A related theorem can be found in
McShane's book [McSJ. A useful corollary of Theorem 5 is as follows.

Theorem 8. Let (T, A, Jl) be a measure space such that Jl(T) < 00.
Let g : (a, b) x T ---+ IR. Assume that for each n, (ong/oxn)(x, t) exists,
is measurable, and is bounded on (a, b) x T. Then

(4) (n=1,2, ... )

Proof. Since Jl(T) < 00, any bounded measurable function on T is integrable.
To see that Hypothesis (B) of the preceding theorem is true, use the mean value
theorem:
g(x, t) - g(xo, t) = og (~,
I I I
!!( M t)1
x - Xo ox
where M is a bound for log/oxl on (a, b) x T. By the preceding theorem,
Equation (4) is valid for n = 1. The same argument can be repeated to give an
inductive proof for all n. •
Chapter 8

Measure and Integration

8.1 Extended Reals, Outer Measures, Measurable Spaces 381


8.2 Measures and Measure Spaces 386
8.3 Lebesgue Measure 391
8.4 Measurable Functions 394
8.5 The Integral for Nonnegative Functions 399
8.6 The Integral, Continued 404
8.7 The LV-Spaces 408
8.8 The Radon-Nikodym Theorem 413
8.9 Signed Measures 417
8.10 Product Measures and Fubini's Theorem 420

8.1 Extended Reals, Outer Measures, Measurable Spaces

This chapter gives, in as brief a form as possible, the main features of measure
theory and integration. The presentation is sufficiently general to cover Lebesgue
measures and measures that arise in studying the continuous linear functionals
on Banach spaces.
Since measures are employed to assign a size to sets, they are often allowed
to assume infinite values. The extended real number system is designed to
assist in this situation. This is the set JR. = JR U {oo } U { - 00 }. The two new
points, 00 and -00, that have been adjoined to JR are required to behave as
follows:
(1) (-oo,oo)=JR
(2) x + 00 = 00 for x E (-00, ooJ
(3) xoo = 00 for x E (O,ooJ
(4) 000=0
From these rules various others follow, such as x- 00 = -00 when x E [-00, (0).
One advantage of JR" is that every subset of JR. has a supremum and an infimum
in JR". For example, the equation sup A ~ 00 means that for each x E JR, A
contains an element a such that a > x. Note that certain expressions, such as
-00 + 00, must remain undefined.

381
382 Chapter 8 Measure and Integration

Definition. Let X be an arbitrary set. An outer measure "on X" is a


function J.t : 2x --+ JR" such that:
(1) J.t(0) == 0 (0 is the empty set).
(2) If A c B, then J.t(A) ~ J.t(B).
00 00

(3) J.t[U Ai] ~ LJ.t(Ai ).


i=1 i=1

Of course, in these postulates, A, B, .. . are arbitrary subsets of X. Notice that


(1) and (2) together imply that J.t(A) ;;:: 0 for all A. Let us look at some examples.
Example 1. Let X be any set. If A c X, define J.t(A) == o. •
Example 2. Let X be any set. Define J.t(0) == 0, and for any nonempty set
A, put J.t(A) == +00. •

Example 3. Let X be any set. For a finite subset A, let J.t(A) == # (A), the
number of elements in A. For all other sets, put J.t(A) == 00. This is called
counting measure. •
Example 4. Let X be any set and let Xo be any point in X. Define J.t(A) to
be 1 if Xo E A and to be 0 if Xo tf- A. •
Example 5. Let X be any infinite set. Let {X1,X2, ... } be a countable set
of (distinct) points in X. Let An be positive numbers (n == 1, 2, ... ). Define
J.t(A) == 2:{An : Xn E A}, and J.t(0) == o. •
Example 6. Lebesgue Outer Measure. Let X be the real line. Define


Example 7. Lebesgue Outer Measure on JRn • In JRn define the "unit
cube" to be the set Q of all points (6, ... , ~n) whose components lie in the
interval [0,1]. If x E JRn and A E JR, define x + AQ == {x + AV : v E Q}. For
A C JRn, define


Example 8. Lebesgue-Stieltjes Outer Measure. Let X be the real line,
and select a monotone non decreasing function, : JR --+ lR. Define

Notice that Lebesgue outer measure (in Example 6) is a special case, obtained
when ,(x) == x. •
In order to see that Examples 6, 7, and 8 are bona fide outer measures, one
can appeal to the following theorem.
Section 8.1 Extended Reals, Outer Measures, Measurable Spaces 383

Theorem 1. Let X be an arbitrary set, and C a collection of subsets


of X, countably many of which cover X. Let (3 be a function from C
to jR*. such that
inf{(3(G) : G E C} = 0

Then the equation

A c 0
i=1
Gi , Gi E C}
defines an outer measure on X.

Proof. Assume all the hypotheses. There are now three postulates for an outer
measure to be verified. Our assumption about (3 implies that (3( G) ~ 0 for all
G E C. Therefore, p,(A) ~ 0 for all A. Since 0 c G for all G E C, p,(0) ,,-;; (3(G)
for all G. Taking an infimum yields p,(0) ,,-;; O.
If A c Band B C U~1 Gi , then A c U~1 Gi and p,(A) ,,-;; I:~1 (3(Gi ).
Taking an infimum over all countable covers of B, we have p,(A) ,,-;; p,(B).
Let Ai C X (i E N) and let € > O. By the definition of p,(Ai) there exist
Gij E C such that Ai C Uj:1 Gij and I:j:1 (3(Gij ) ,,-;; P,(Ai) + €/2 i . Since
U~1 Ai C U~=1 Gij , we obtain

The postulates for an outer measure do not include all the desirable at-
tributes that are needed for integration. For example, an essential property is
additivity:
An B = 0 ===} p,(A U B) = p,(A) + p,(B)
This property cannot be deduced from the axioms for an outer measure. See
Problem 6 for a simple example. Even Lebesgue outer measure is not additive,
although it seems to be a natural or intrinsic definition for "the measure of a
set" in lR. If we concentrate for the moment on this all-important example, we
ask, How can additivity be obtained? We could change the definition. But if the
measure of an interval is to be its length, changing the definition will not succeed.
The only solution is to reduce the domain of p, from 21R (or 2x in general) to a
smaller class of sets. This is the brilliant idea of Lebesgue (1901) that leads to
Lebesgue measure on lR. The procedure for accomplishing this domain reduction
is dealt with in the theorem of Caratheodory, proved later. First, we describe in
the abstract the sort of domain that will be used for measures.
384 Chapter 8 Measure and Integration

Definition. A measurable space is a pair (X, A) in which X is a set and A


is a nonempty family of subsets of X such that:
(i) If A E A, then X" A E A.

UAi EA.
00

(ii) If AI, A 2 , ... E A, then


i=l

In brief, A is closed under complementation and forming countable unions. A


family A having these two properties is said to be a a-algebra. If a measurable
space (X, A) is prescribed, we call the sets in A the measurable subsets of X.
Example 9. Let X be any set, and let A consist solely of X and 0. Then
(X, A) is a measurable space. In fact, this is the smallest nonempty a-algebra
of subsets of X. •

Example 10. Let X be an arbitrary set, and let A = 2x (the family of all
subsets of X). Then (X, A) is a measurable space and A is the largest a-algebra
of subsets of X. •
Example 11. Let X be any set and let A be a particular subset of X. Define
A = {X, 0, A, X" A}. This is the smallest a-algebra containing A. We are
observing that, as long as A is nonempty, the set X and the empty set 0 must
belong to A. In other words, these two sets will always be measurable. •
Example 12. Let X be any set, and let A consist of all countable subsets of
X and their complements. Then A is the smallest a-algebra containing all finite
subsets of X. •

Lemma 1. If (X, A) is a measurable space, then X and 0 belong


to A. Furthermore, A is closed under countable intersections and set
difference.

Proof. This is left to the reader as a problem.



Lemma 2. For any subset F of 2x there is a smallest a-algebra
containing F.

Proof. As Example 10 shows, there is certainly one a-algebra containing :F.


The smallest one will be the intersection of all the a-algebras AI' containing
n n
:F. It is only necessary to verify that Avis a a-algebra. If Ai E AI" then
Ai E AI' for all v. Since Avis a a-algebra, U:
l Ai E AI" Since this is true for
all v, U: l Ai E nAv. A similar proof is needed for the other axiom. •
In any topological space X, the smallest a-algebra containing the topology
(Le., containing all the open sets) is called the a-algebra of Borel sets, or the
"Borel a-algebra."
Suggested references for this chapter are [Bart2J, [Berb3]' [Berb4J, [DS],
[Frie2], [HaI4], [HS], [JoJ, [KS], [LooJ, [ODJ, [RN], [Roy], [Ru3], [Tay3], and
[Ti2J.
Section 8.1 Extended Reals, Outer Measures, Measurable Spaces 385

Problems 8.1

1. Does the extended real number system JR' become a topological space if a neighborhood
of 00 is defined to be any set that contains an interval of the form (a, ooJ, and similarly
for -oo?
2. Why, in defining Lebesgue outer measure, do we not "approximate from within" and
define

J.L(A) = sup { f(b i - ai): (ai, b;) c A, intervals mutually diSjOint} ?


t=1

3. Prove that if J.L is an outer measure and if J.L(B) = 0, then J.L(A U B) = J.L(A).
4. An outer measure on a group is said to be invariant if J.L(x + A) = J.L(A) for all x and
A. Prove that Lebesgue outer measure has this property.
5. Under what conditions is Lebesgue-Stieltjes outer measure invariant, as defined in Prob-
lem 4?
6. Let X = {I, 2}, and define J.L(0) = 0, Jl(X) = 2, J.L( {I}) = 1 and J.L( {2}) = 2. Show ,that
J.L is an outer measure but is not additive.
7. Prove that the Lebesgue outer measure of each countable subset of JR is O.
8. How many outer measures having range in {O, 1, ... , n} are there on a set of n elements?
9. Prove that the Lebesgue outer measure of the interval [a, b] is b - a.
10. Let J.L be an outer measure on X, and let Y C X. Define v(A) = J.L(A) when A C Y. Is
v an outer measure on Y?
11. Does an outer measure necessarily obey this equation?

J.L(A U B) + Jl(A n B) ~ J.L(A) + J.L(B)

12. Let J.L be an outer measure on X, and let Y C X. Define v(A) = J.L(YnA). Is v an outer
measure on X?
13. Let J.L and v be outer measures on X. Define O(A) = max{J.L(A),v(A)} for all A C X. Is
o an outer measure on X?
14. Are the outer measures in Examples 3 and 5 additive?
15. What is the Lebesgue outer measure of the set of irrational numbers in [O,lJ?
16. Prove Lemma 1.
17. Prove that every countable set in JR is a Borel set.
18. Let (X, A) be a measurable space, and let A and B be two subsets of X. If A is measurable
and B " A is not, what conclusions can be drawn about Au B and A " B?
19. Let (X, A) be a measurable space, and let YEA. Define B to be the family of all sets
Y n A, where A ranges over A. Prove that (Y, B) is a measurable space.
20. Does there exist a count ably infinite IT-algebra?
21. Prove that A in Example 12 is a IT-algebra.
22. Is there an example of a IT-algebra containing exactly 5 sets?
386 Chapter 8 Measure and Integration

8.2 Measures and Measure Spaces

Let (X, A) be a measurable space, as defined in the preceding section. A function


p, : A -+ JR" is called a measure if:
(a) p,(0) = O.
(b) p,(A) ~ 0 for all A in A.
(c) P,(U~l Ai) = l:~l p,(Ai) if {AI, A2, ... } is a disjoint sequence in A.
Notice that the additivity property discussed in the preceding section is now
being assumed in a strong form. It is called countable additivity. On the
other hand, the domain of p, is not assumed to be 2x but is instead the (J-
algebra A.
Example 1. Let X be any set and let A = {X,0}. Define p,(0) = 0 and let
p,(X) be any number in [0,00]. •
Example 2. Let X be any set and A = 2x. Define p,(A) to be the number of
elements in A if A is finite, and define p,(A) = 00 otherwise. •
Example 3. If A = 2x, let p,(0) = 0 and p,(A) = 00 if A 1= 0.
Example 4. Let A be a subset of X such that A 1= 0 and A 1= X. Let

A = {0, A, X" A, X} and define p,(0) = 0, p,(A) = 1, p,(X "A) = 1, p,(X) = 2.

Example 5. Let A = 2x, and let {Xl,X2, ... } be a countable subset of X.



Select numbers Ai E [0,00], i E I'l, and define p,(A) = l:{Ai: Xi E A}, p,(0) = o.

If (X, A) is a measurable space and p, is a measure defined on A, then



(X, A, p,) is called a measure space.

Lemma 1. If (X, A, p,) is a measure space, then:


(1) p,(A) ~ p,(B) if A E A, BE A, and A c B.

UAi) ~ LP,(A
(Xl (Xl

(2) p,( i) if Ai E A.
i=l i=l

Proof. For (1), write

p,(B) = p,[A U (B" A)] = p,(A) + p,(B" A) ~ p,(A)


For (2), we create a disjoint sequence of sets Bi by writing Bl = AI, B2 =
A 2 " AI, B3 = A 3 " (AI U A2), and so on. (Try to remember this little trick.)
These sets are in A by Lemma 1 in the preceding section. Also, Bi C Ai and


Section 8.2 Measures and Measure Spaces 387

Lemma 2. Let (X, A, Il) be a measure space. If Ai E A for i E N


and Al C A2 C .. " then Il(An) t Il(U:I Ai)'

Proof. Let Ao = 0 and observe that An = U~=I (Ai" Ai-d. It follows


from the disjoint nature of the sets Ai "A i - I that Il(An) = L~IIl(Ai "Ai-d·
Hence

Definition. A measure space (X, A, Il) is said to be complete if the conditions


A C B, BE A, Il(B) = 0 imply that A E A.
We now arrive at the point where we want to create a measure from an
outer measure Il. This is to be done by restricting the domain of Il from 2x to
a suitable a-algebra. A remarkable theorem of Caratheodory accomplishes this
in one stroke:

Theorem 1 (Caratheodory Theorem). Let Il be an outer


measure on a set X. Let A be the family of all subsets A of X having
the property

(1)

Then (X, A, Il) is a complete measure space.

Proof. (In the conclusion of the theorem, it is understood that Il is restricted


to A, although we refrain from writing ILIA.) It is to be shown that A is a
a-algebra, that f.t is a measure on A, and that the newly created measure space
is complete. There are six tasks to be undertaken.
I. If A E A, then X " A E A, because for an arbitrary set S,

Il(S) = Il(S n A) + Il(S" A) = Il[S" (X" A)] + 1l[(S n (X" A)]


II. We prove that A is closed under the formation of finite unions. It suffices
to consider the union of two sets A and B in A. We have, for any S,

Il(S) = Il(S n A) + Il(S" A)


= Il(S n A) + 1l[(S" A) n B] + 1l[(S" A) " B]
~ Il [(S n A) u «S" A) n B)] + Il[S" (A U B)]
= Il[S n (A U B)] + Il[S" (A U B)]
~ Il [(S n (A U B)) U (S" (A U B))] = Il(S)

III. Let Ai E A. Here we prove that U:I Ai E A. Define Bi = Al U· .. U Ai


and C i = B i " B i - I , where Bo = 0. By Parts I and II, together with the
equation Ci = X" [B i - I U (X" B i )], we see that Ci and Bi are in A. For any
388 Chapter 8 Measure and Integration

8, 1-1(8 n En) ~ I:~=11-1(8 n Ci ). This is proved by induction. For n it is trivial,


because El = C 1 . If it is true for n - 1, then

1-1(8 n En) = 1-1(8 n En n Cn) + 1-1[(8 n En) '- Cn]


= 1-1(8 n Cn) + 1-1(8 n En-I)
n-l n
~ 1-1(8 n Cn) +L 1-1(8 n Ci ) = L 1-1(8 n Ci )
i=1 i=1

With this inequality available, we have, with A = U:l Ai,


n
(2) 1-1(8) = 1-1(8 n En) + 1-1(8 '- En) ~ L 1-1(8 n Ci ) + 1-1(8 '- A)
i=1
Since this is true for each n, we can write an inequality to show that A E A:

1-1(8) ~ ~1-I(8nCi) + 1-1(8 '-A) ~ I-I[Q(8nC i )] + 1-1(8 '-A)

= 1-1 [8 n Q+Ci ] 1-1(8 '- A) = 1-1(8 n A) + 1-1(8,- A) ~ 1-1(8)

IV. The postulate 1-1(0) = 0 is true for 1-1 on A because it is a postulate for
outer measures. That I-I(A) ~ 0 follows from the postulates of an outer measure:

because 0 c A.
V. For the countable additivity of 1-1, look at the proof in Part III. If the
sequence AI, A 2 , • •. is disjoint, then Ci = Ai, and Equation (2) will read, with
A = U:l Ai,
n
1-1(8) ~ L 1-1(8 n Ai) + 1-1(8 '- A)
i=1

In this equation, let 8 = A and let n --+ 0 to conclude that


00

I-I(A) ~ LI-I(Ai ) ~ I-I(UAi) = I-I(A)


i=1
VI. That the measure space (X, A, 1-1) is complete follows from the more
general fact that A E A if I-I(A) = O. Indeed, if I-I(A) = 0, then for 8 E 2x,

1-1(8) = I-I(A) + 1-1(8) ~ 1-1(8 n A) + 1-1(8 '- A) •


If 1-1 is an outer measure on X, then the sets A that have the property
in Equation (1) of Caratheodory's Theorem are said to be I-I-measurable.
CaratModory's Theorem thus asserts that the family of I-I-measurable sets (1-1
being an outer measure) is a a-algebra. If this a-algebra is denoted by A, then
Section 8.2 Measures and Measure Spaces 389

the concepts of j,t-measurable and A-measurable are the same (by the definition
of A). However, there can be other a-algebras present (for the same space X),
and there can be different kinds of measurability.
One example of this situation occurs when j,t is Lebesgue outer measure,
as defined in Example 6 of the preceding section (page 382). The a-algebra A
that arises from Caratheodory's Theorem is called the a-algebra of Lebesgue
measurable sets. A smaller a-algebra is the family B 'of all Borel sets. This
is the smallest a-algebra containing the open sets. It turns out that B is a
proper subset of A. In some situations one uses the measure space (lR, B, j,t) in
preference to (lR, A, j,t). It is convenient to use j,t without indicating notationally
whether its domain is 21R, or A, or B. Remember, however, that j,t on 21R is only
an outer measure, and countable additivity can fail for sets not in A.
We have seen that every outer measure leads to a measure via CaratModory's
Theorem. There is a converse theorem asserting, roughly, that every measure
can be obtained in this way.

Theorem 2. Let (X, A, j,t) be a measure space. For S E 2x define


j,t*(S) = inf{j,t(A) : SeA E A}
Then j,t* is an outer measure whose restriction to A is j,t. Furthermore,
each set in A is j,t-measurable.

Proof. I. Since j,t is nonnegative, so is j,t*. Since 0 E A, we have 0 ~ j,t*(0) ~


j,t(0) = O.
II. If SeT, then {A: SeA E A} contains {A : TeA E A}. Hence
j,t*(S) = inf{j,t(A) : SeA E A} ~ inf{j,t(A) : TeA E A} = j,t*(T)
III. If Si E 2x and f > 0, select Ai E A so that Si c Ai and j,t*(Si) ~
j,t(Ai) - f/2 i . Then U:1 Si c U:1 Ai E A. Consequently,

Since f can be any positive number, j,t*(U:1 Si) ~ 2:::1 j,t(Sd·


IV. If SEA and SeA E A, then j,t*(S) ~ j,t(S) ~ j,t(A). Taking an
infimum for all choices of A, we get j,t*(S) ~ j,t(S) ~ j,t*(S). This proves that j,t*
is an extension of j,t.
V. To prove that each A in A is j,t* -measurable, let S be any subset of X.
Given f > 0, we find B E A such that j,t*(S) ~ j,t(B) - f and S C B. Then
j,t*(S) +f ~ j,t(B) = j,t(B n A) + j,t(B" A) = j,t*(B n A) + j,t*(B" A)
~ j,t*(S n A) + j,t*(S" A) ~ j,t*(S)
This calculation used Parts III and IV of the present proof. Since f was arbitrary,
j,t*(S) = j,t*(S n A) + j,t*(S" A) for all S. Hence A is j,t*-measurable. •
The construction of j,t* in the preceding theorem yields some additional
information. First, we define the concept of regularity for an outer measure. An
outer measure j,t on 2x is said to be regular if for each S E 2x there corresponds
a j,t-measurable set A such that SeA and j,t(S) = j,t(A).
390 Chapter 8 Measure and Integration

Theorem 3. Under the same hypotheses as in Theorem 2, the


outer measure JL* is regular.

Proof. Let S be any subset of X. For each n E N select An E A so that


ScAn and JL*(S) ;:: JL(An) - lin. Put A = n~=l An. Since A is au-algebra,
A E A. (See Lemma 1 in the preceding section, page 384.) From the inclusion
SeA c An we get

Since this is true for all n, JL*(S) = JL*(A). By the preceding theorem, A is
JL* -measurable. •

Problems 8.2

1. Let JL be Lebesgue outer measure. Let IQi be the set of all rational numbers. Prove that
if IQi n [0,1] is contained in U~=l (ai, b;), then I:~=l {bi - ail ;:: 1. Show that this is not
true if we permit a countable number of intervals {ai, bi }.
2. Is there an example of a set X and a measure JL on 2 x such that JL{X}
for all points x in X?
= 1 and JL{ {x}} = °
3. In JR, is the smallest IT-a1gebra containing all singletons {x} the same as the IT-a1gebra of
all Borel sets?
4. Let {X,A,JL} be a measure space, and let JL' be the outer measure defined in Theorem 2
{page 389}. Define the inner measure JL. induced by JL via the equation

JL'{S} = sup{JL{A} : S :> A E A}

Prove these properties of p,.: {i} JL.{S} ~ JL'{S}; {ii} p,.{S} ~ 0; {iii} JL'{S} ~ JL.{T}
when SeT; {iv} JL.{0} = 0; {v} JL.{A} = JL{A} if A E A.
5. Prove that an outer measure JL on 2x is a measure on 2x if and only if every set in 2x
is JL-measurable.
6. Let {X,A,JL} be a measure space, and let Ai E A. Prove that P,{U:l Ai} =
limn-too JL{U~=l Ai}·
7. The symmetric difference of two sets A and B is A fC, B = {A" B} u {B" A}. Prove
°
that for measurable sets A and B in a measure space, the condition JL{A fC, B} = implies
that JL{A} = JL{B}.
8. Let {X, A, JL} be an incomplete measure space. Show how to enlarge A and extend JL so
that a complete measure space is obtained.
9. Let {X,A,JL} be a measure space. Let B be the family of all sets B E 2x such that
An B E A whenever A E A and JL{A} < 00. Show that B is a IT-a1gebra containing A.
10. Prove that if {X, A, JL} and {X,A,v} are measure spaces, then so is {X,A,JL+ v}. Gen-
eralize.
11. If {X,A,JL} and {X,A,v} are measure spaces such that JL;:: v, is there a measure (J {on
=
A} such that v + (J JL? {Caution: 00 - 00 is not defined in JR'.}
12. Let {X, A,JL} be a measure space. Suppose that An E A and An+l C An for all n. Does
it follow that JL{n:=l An} = limn-too JL{An }?

13. Let X be an uncountable set and A = 2x. Define JL{A}


JL{A} = 00 otherwise. Is {X, A, JL} a measure space?
= ° if A is countable and

14. Prove that for any outer measure JL and any set A such that JL{A} = 0, A is JL-measurable.
Section 8.3 Lebesgue Measure 391

15. Let (X, A, IL) be a measure space, and let B E A. Define v on A by writing v(A) =
t-t(A n B). Prove that (X, A, v) is a measure space.

16. Let (X,A,t-t) be a measure space. Let An E A and L::'=I t-t(An) < 00. Prove that the
set of x belonging to infinitely many An has measure O.
17. Let (X,A,t-t) be a measure space for which t-t(X) <00.Let An be a sequence ofmeasur-
able sets such that Al C A2 C ... and X = U An. Show that t-t(X" An) .j. o.

8.3 Lebesgue Measure

In this section f.l will denote both Lebesgue outer measure and Lebesgue measure.
Both are defined for subsets of JR by the equation

(1)

The outer measure is defined for all subsets of JR, while the measure f.l is the
restriction of the outer mea.'Jure to the o--algebra described in Caratheodory's
Theorem (page 387). The sets in this o--algebra are called the Lebesgue measur-
able subsets of JR. It is a very large class of sets, bigger than the o--algebra of
Borel sets. The latter has cardinality c, while the former has cardinality 2c .

Theorem 1. The Lebesgue outer measure of an interval is its


length.

Proof. Consider first a compact interval [a, b]. Since [a, b] c (a - c, b + c), we
conclude from the definition (1) that J.t([a, bJ) ,,;; b - a + 20: for every positive c.
Hence f.l([a, b]) ,,;; b- a. Suppose now that f.l([a, b]) < b-a. Find intervals (ai, bi )
such that [a, b] C U~l (ai, bi ) and L~1 Ibi - ail < b- a. We can assume ai < bi
for all i. By compactness and renumbering we can get [a, b] c U:=1 (ai, b;).
It follows that L~1 (b i - ai) < b - a. By renumbering again we can assume
a E (al,bJ), b1 E (a2,b 2), b2 E (a3,b3), and so on. There must exist an index
k ,,;; n such that b < bk. Then we reach a contradiction:

n k k-l
b - a > ~)bi - ai) ~ ~)bi - a;) = bk - al + ~)bi - ai+d > bk - al > b - a
i=1 i=1 i=1

If J is a bounded interval of the type (a, b), (a, b], or [a, b), then from the
inclusions
[a + c, b - c] c J c [a - c, b + c]
we obtain b - a - 2c ,,;; f.l(J) ,,;; b - a + 2c and f.l( J) = b - a.
Finally, if J is an unbounded interval, then it contains intervals [a, b] of
arbitrarily great length. Hence f.l( J) = 00. •
392 Chapter 8 Measure and Integration

Theorem 2. Every Borel set in JR is Lebesgue measurable.

Proof. (S.J. Bernau) The family of Borel sets is the smallest a-algebra con-
taining all the open sets. The Lebesgue measurable sets form a a-algebra. Hence
it suffices to prove that every open set is Lebesgue measurable.
Recall that every open set in JR can be expressed as a countable union of
open intervals (a, b). Thus it suffices to prove that each interval (a, b) is Lebesgue
measurable. We begin with an interval of the form (a, 00), where a E JR.
To prove that the open interval (a, 00) is measurable, we must prove, for
any set S in JR, that

(2)

Let us use the notation III for the length of an interval I. Given c: > 0, select
open intervals In such that S C U~=l In and L IInl < j.l(S) + c:. Define I n =
In n (a,oo), Kn = In n (-oo,a), and Ko = (a - C:,a + c:). Then we have

UI
00

Sn(a,oo) c n
n=l

U Kn
00

S" (a, 00) = Sn (-oo,a] C


n=O

Consequently,

L {IJnl + IKnl} + IKol


00

j.l[S n (a, 00)] + j.l[S" (a, 00)] ~


n=l

L
00

~ IInl + 2c: < j.l(S) + 3c:


n=l

Because c: was arbitrary, this establishes Equation (2). Since the measurable sets
make up a a-algebra, each set of the form (-00, b] = JR" (b, 00) is measurable.
Hence the set (-00, b) = U~=l (-00, b - ~] is measurable and so is (a, b)
(-oo,b)n(a,oo). •

Theorem 3. Lebesgue outer measure is invariant on the group


(JR, +).

Proof. The statement means that j.l(S) = j.l(v+S) for all S E 2IR and all v E R
The translate v + S is defined to be {v + x : XES}. Notice that the condition
S c U: 1 (ai, b;) is equivalent to the condition x+S c U:l (x+ai, x+bi ). Since
the length of (x + ai,x + bi ) is the same as the length of (ai,b i ), the definition
of J.l gives equal values for j.l(S) and j.l(x + S). •
Section 8.3 Lebesgue Measure 393

Lemma. Let {rl' r2, ... } be an enumeration of the rational num-


bers in [-1, 1]. There exists a set P contained in [0, 1] such that the
sequence of sets ri + P is disjoint and covers [0,1].

all sets of the form x + Q, where °: : ;


Proof. Let Q be the set of all rational numbers. Consider, the family F of
x :::::; 1. Although our description of F
involves many sets being listed more than once, the family F is, in fact, disjoint.
To verify this, suppose that x + Q and y + Q have a point t in common. Then
t = x + ql = Y + q2, for appropriate qi E Q. Consequently, x - y = q2 - ql E Q,
from which it follows easily that x + Q = y + Q.
The family of sets (x + Q) n [0,1]' where °: : ;
x :::::; 1, is also disjoint, and
each of these sets is nonempty. By Zermelo's Postulate ([Kel], page 33), there
exists a set P c [0, 1] such that P contains one and only one point from each
set in the family.
Now we want to prove that the family {ri + P} is disjoint. Suppose that
t E (ri + P) n (rj + P). Then ri + p = rj + pi for suitable p and pi in P. We have
°
p = pi + (rj -ri), whence p E pn(p' +Q). Since E Q, we have pi E pn(p' +Q).
By the properties of P, p = p'. From an equation above, ri = rj and then i = j.
Finally, we show that [0,1] C U:1(ri + P). If 0:::::; x:::::; 1, then P contains
an element p in x + Q. By the definition of P, °: : ;
p :::::; 1. We write p = x + r
for a suitable r in Q. Then r = p - x E [-1,1] and -r E [-1,1]. Thus -r = ri
for some i. It follows that x = p - r = p + ri E ri + P. •

°
Theorem 4. There exists no translation-invariant measure v
defined on 21R such that < v([O, 1]) < 00. Consequently, there exist
subsets of lR that are not Lebesgue measurable.

Proof. The second assertion follows from the first because if every set of reals
were Lebesgue measurable, then Lebesgue measure would contradict the first
assertion.
To prove the first assertion, suppose that a measure v exists as described.
By the preceding lemma, the set P given there has the property
00

[0,1] C U(ri + P) c [-1,2]


i=l

Also by the lemma, the sequence of sets ri +P is disjoint. Consequently,

00 ] 00 00

0< v([O, 1]):::::; v [ i~(ri + P) = ~v(ri + P) = ~v(P)

Therefore, v(P) > 0, 2: v(P) = 00, and we have the contradiction


394 Chapter 8 Measure and Integration

Problems 8.3

1. Zermelo's Postulate states that if F is a disjoint family of nonempty sets, then there is
a set that contains exactly one element from each set in the family F. Prove this, using
the Axiom of Choice.
2. Prove that the set P in the lemma is not Lebesgue measurable.
3. Prove that Lebesgue measure restricted to the Borel sets is not complete.
4. An F.,.-set is any countable union of closed sets, and a G8-set is any countable intersection
of open sets. Prove that both types of sets are Borel sets.
5. Prove that the Cantor "middle-third" set is an uncountable Borel set of Lebesgue mea-
sure O. This set is defined in Problem 1.7 26, page 46.
6. Prove that the infimum in the definition of Lebesgue outer measure is attained if the set
is bounded and open.
7. Prove that the set Q of all rational numbers is a Borel set of measure O.
8. Prove that for any Lebesgue measurable set A of finite measure and for any c > 0 there
are an open set G and a closed set F such that Fe A c G and p,(G) ~ p,(F) + c.
9. Let S be a subset of lR such that for each c > 0 there is a closed set F contained in S for
which p,(S" F) < c. Prove that S is Lebesgue measurable.
10. Prove that a set of Lebesgue measure 0 cannot contain a nonmeasurable set, but every
set of positive measure does contain a nonmeasurable set.
11. Under what set operations is 2R " B closed? Here B is the u-algebra of Borel sets.
12. Prove that if S c lR and for every c > 0 there is an open set G containing S and satisfying
J.t( G " S) < c, then S is Lebesgue measurable.
13. In Theorem 4, is the result valid when the domain of 1/ is a subset of 21R?

8.4 Measurable Functions

In the study of topological spaces, continuous functions play an important


role. Analogously, in the study of measurable spaces, measurable functions are
important. In fact, they are perhaps the principal reason for creating measurable
spaces. Our considerations here are general, i.e., not restricted to Lebesgue
measure.
Consider an arbitrary measurable space (X, A), as defined in Section 8.1
(page 384). A function f : X -+ JR" is said to be A-measurable (or simply
measurable) if f-l(B) E A whenever B is a Borel subset of JR".

Theorem 1. Let (X, A) be a measurable space. A function f


from X to the extended reals JR" is measurable if it has anyone of the
following properties:
a. f-l((a,oo]) E A for each a E JR"
b. r 1(la, 00]) E A for each a E JR"
c. r 1 ([-00,a)) EAforeachaEJR"

d. f- ([-00, a]) E A for each a E JR"


1
Section 8.4 Measurable Functions 395

e. f- 1 ((a,b)) E A for all a and b in JR-


f. f- 1 (0) E A for each open set 0 in JR-

Proof. We shall prove that each condition implies the one following it, and
that f implies that f is measurable. That a implies b follows from the equation
f-l([a,oo]) = n~=1 f-l((a - ~,oo]) and from the properties of a a-algebra.
That b implies c follows in the same manner from the equation f- 1 ([-00, a)) =
X "-f-l([a,ooJ). That c implies d follows from the equation f- 1([-00,aj) =
n:'d- 1([-00,a+ ~)). That d implies e follows from writing f- 1 ((a,b)) =
U:'1 f- 1 ([-00, b - ~)) "- f- 1 ([-00, aJ). That e implies f is a consequence of
the theorem that each open set in JR- is a countable union of intervals of the
form (a, b), where a and b are in JR-. To complete the proof, assume condition
f. Let S be the family of all sets 8 contained in JR- such that f-l(8) EA. It is
straightforward to verify that S is a a-algebra. By hypothesis, each open set in
JR- belongs to S. Hence S contains the a-algebra of Borel sets. Consequently,
f-l(B) E A for each Borel set B, and f is measurable. •

Our next goal is to study, for a given measurable space (X, A), the class of
measurable functions. First, we define the characteristic function of a set A
to be the function X A given by

I if x E A
XA(x) = { 0
if x f{. A

Theorem 2. Let (X, A) be a measurable space. The family of


all measurable functions contains the characteristic function of each
measurable set and is closed under these operations:
a. f + 9 (provided that there is no point x where f(x) and g(x)
are infinite and of opposite signs).
b.)..f ().. E JR-)
c. fg
d. supli (i EN)
e. inf Ii (i E N)
f. lim inf Ii (i E N)
g. lim sup Ii (i E N)

Proof. If A E A, then the characteristic function X A is measurable because


XA: 1((a,00j) = {x: XA(X) > a}, and this last set is either X, A, or 0. These
three sets are measurable, and Theorem 1 applies.
Now suppose that f and 9 are measurable functions. Let rl, r2, . .. be an
396 Chapter 8 Measure and Integration

enumeration of all the rational numbers. Then


(f + g)-1 ((a, 00]) = {x: f(x) + g(x) > a}
00

= U{x : f(x) > ri and g(x) > a - rJ


i=1

U [rl ((ri, 00]) n g-1 ((a -


00

= ri, 00])]
i=1

To verify this, notice that f(x) + g(x) > a if and only if a - g(x) < f(x), and
this last inequality is true if and only if a- g(x) < ri < f(x) for some i. The last
term in the displayed equation is a countable union of measurable sets, because
f and 9 are measurable.
If f is measurable, then so is )..j, because (AI) -1 (( a, 00]) is either 0 (when
A = 0 and a ~ 0), or X (when A = 0 and a < 0), or f-l((a/A,oo]) (when
A> 0), or f-l([-oo,a/A») (when A < 0).
If f is measurable, then so is f2, because (f2)-1 ((a, 00]) is X when a < 0,
and it is f-l((ya,OO]) U f-l([-OO,_ya») when a ~ O. From the identity
:t :t
f 9 = (f + g)2 - (f - g)2 it follows that f 9 is measurable if f and 9 are
measurable.
If J; are measurable and if g(x) = sUPi J;(x), then 9 is measurable because
g-1 ((a, 00]) = U~ 1 f i- 1(( a, 00]). A similar argument applies to infima, if we
use an interval [-00, a).
If Ii are measurable and g(x) = lim sup li(X), then 9 is measurable, because
g(x) = lim n-+ oo sUPi>n J;(x) = infn sUPi>n J;(x). A similar argument applies to
the limit infimum. •
Consider now a measure space (X, A, Jl). Let I and 9 be functions on X
taking values in JR'. If the set {x: f(x) =1= g(x)} belongs to A and has measure
0, then we say that f(x) = g(x) almost everywhere. This is an equivalence
relation if the measure space is complete (Problem 1). More generally, if P( x) is
a proposition, for each x in X, then we say that P is true almost everywhere if
the set {x : P(x) is false} is a measurable set of measure O. The abbreviation a.e.
is used for "almost everywhere." The French use p.p. for "presque partout."

Theorem 3. Let (X, A, Jl) be a complete measure space, as


defined in Section 8.2, page 387. If f is a measurable function and if
f(x) = g(x) almost everywhere, then 9 is measurable.

Proof. Define A = {x : f(x) =1= g(x)}. Then A is measurable and Jl(A) = O.


Also, X " A is measurable. For a E JR' we write

On the right side of this equation we see the union of two sets. The first of
these is measurable because it is f- 1 (( a, (0») "A. The second set is measurable
Section 8.4 Measurable Functions 397

because it is a subset of a set of measure 0, and the measure space is complete .

Let (X, A, Ji) be a measure space, and let I, h, 12, ... be measurable func-

tions. We say that In ---+ I almost uniformly if to each positive c there
corresponds a measurable set of measure at most c on the complement of which
In ---+ I uniformly.

Theorem 4 (Egorov's Theorem). Let (X,A,Ji) be a measure


space such that Ji(X) < 00. For a sequence of finite-valued measurable
functions I, h, 12, ... these properties are equivalent:
a. In ---+ I almost everywhere
b. In ---+ I almost uniformly

Proof. Assume that b is true. For each m in N there is a measurable set


Am such that Ji(Am) < 11m, and on X'" Am, In(x) ---+ f(x) uniformly. Define
A = n:=l Am· Then Ji(A) = 0 because A c Am for all m. Also, In(x) ---+ I(x)
on X'" A because In(x) ---+ I(x) for x EX", Am and.X '" A = x'" n Am =
U(X '" Am). Thus a is true.
Now assume that a is true. Let gn = I - In. By altering gn on a set
of measure 0, we can assume that gn(x) ---+ 0 everywhere. Next, we define
A::;' = {x : !gi(X)! ~ 11m for i ~ n}. Thus At" CAr c ... For each x there is
an index n such that x E A::;'; in other words, x E U~=l A::;' and X C U~=l A::;'.
Since X has finite measure, Ji(X '" A::;') ---+ 0 as n ---+ 00. (See Lemma 2 in
Section 8.2, page 387.) Let c > O. For each m, let nm be an integer such that
Ji(X '" A~) < c/2m. Define A = n:=l A::;'m' Then.

On A, gi(X) ---+ 0 uniformly. Indeed, for x E A we have (for all m)


Let (X, A) be a measurable space. A simple function is a measurable
function I : X ---+ JR' whose range is a finite subset of JR'. Then I can be written
in the form I = L:~=l AiXAi' where the Ai can be taken to be distinct elements
of JR', and Ai can be the set {x : I(x) = Ai}. It then turns out that each
Ai is measurable, that these sets are mutually disjoint, and their union is X.
(Problem 2)

Theorem 5. Let (X, A) be a measurable space, and I any nonneg-


ative measurable function. Then there exists a sequence of nonnegative
simple functions gn such that gn(x) t I(x) for each x. If I is bounded,
this sequence can be constructed so that gn t I uniformly.
398 Chapter 8 Measure and Integration

Proof. ([HewS], page 159.) Define


i i+1}
Af = {x E X 2n ~ f(x) < ~
Bn={xEX f(x);;::n}

gn = L 2~XAf +nXBn
i
The sets A? and Bn are measurable, by Theorem 1. Hence gn is a simple
function. The definition of gn shows directly that gn ~ f. In order to verify
that gn(x) converges to f(x) for each x, consider first the case when f(x) i- 00.
i +1 t 1
For large n and a suitable i, x E A? Then f(x) - gn(x) < - - - - = -.
2n 2n 2n
On the other hand, if f(x) = +00, then gn(x) = n -+ f(x).
For the monotonicity of gn (x) as a function of n for x fixed, first ver-
ify (Problem 3) that (for i < n2n) A? = A~:l u A~i~\. If x E A~:l,
then gn+l(X) = =
2i/2 n + 1 = gn(x). If x E A~t;.lll then gn+l(X) =
i/2 n
(2i + 1)/2n + 1 2i/2 n+l = gn(x). If x E B n , then f(x) ;;:: n, and therefore
;;::

x E Ui~n2n+l Af+l U Bn+l. It follows that gn+l (x) ;;:: n = gn(x).


Finally, if f is bounded by m, then for n;;:: m we have 0 ~ f(x) - gn(x) ~
2- n . In this case the convergence is uniform. •

Problems 8.4

1. Prove that the relation of two functions being equal almost everywhere is an equivalence
relation if the underlying measure space is complete.
2. Prove the assertions about the sets Ai that were mentioned in the definition of a simple
function.
3. In the proof of Theorem 5, ven.fy t hat A(n)
i = A(n+1)
2i U A(n+1)
2i+1·

4. Let (X,A) be a measurable space, and let r1,r2, ... be an enumeration of the rational
numbers. Prove that a function f : X --t JR' is measurable if and only if all the sets
f- 1(ri,00]) are measurable.
5. Prove that every Borel set in IR' is one of the four types B, B U {oo}, B U {-oo},
B U {+oo, -oo}, where B is a Borel set in R
6. Prove that in order for f to be measurable it is necessary and sufficient that f- 1 (0) be
measurable for all open sets 0 in IR and that f-1( {co}) and f-1( {-co}) be measurable.
7. Prove that if f and 9 are measurable functions, then the sets {x : f(x) = g(x)}, {x :
f(x) ;;:: g(x)}, and {x: f(x) > g(x)} are measurable.
8. Prove that if f and 9 are measurable functions and if (f + g)( x) is assigned some constant
value on the set where f(x) and g(x) are infinite and of opposite sign, then f + 9 is
measurable.
9. Let (IR, A) be the measurable space in which A is the family of all Lebesgue measurable
sets. Give an example of a nonmeasurable function in this setting.
10. Let B be the u-algebra of Borel sets in R Let A be the u-algebra of Lebesgue measurable
sets. >ove that B is a proper subset of A.
11. Let (X, A, J.I) be a measure space for which J.I(X) < 00. Prove that if f is a measurable
function that is finite-valued almost everywhere, then for each t: > 0 there is an M such
that J.I{ {x : If(x)1 > M}) < t:.
Section 8.5 The Integral for Nonnegative Functions 399

12. Let (X, A) be a measurable space and I a measurable function. What can you say about
the following set?
{s: s c JR and rl(S) E A}

13. Prove that the composition log of two Borel measurable functions on JR is Borel mea-
surable.
14. Let (X, A, tt) be a measure space and I a measurable function. For each Borel set B in
JR- define v(B) = ttU- 1 (B)). Show that v is a Borel measure, i.e., a measure on 8, the
(7-algebra of Borel sets.
15. If III is measurable, does it follow that I is measurable?
16. Show that the composition of two Lebesgue measurable functions need not be Lebesgue
measurable.
17. Prove that if I is a real-valued Lebesgue measurable function, then there is a Borel
measurable function equal to I almost everywhere.
18. Let X = N, A = 2x, and let tt be counting measure, as defined in Example 3 in
Section 8.1, page 382. Let In be the characteristic function of the set {1,2, ... ,n}.
Prove that the sequence [In] has property (a) but not property (b) in Egorov's Theorem.
Resolve the apparent contradiction.
19. Refer to Problem 7 in Section 8.2, page 390, for the definition of the symmetric difference
=
of two sets. Prove that IX A - XBI X A [', B'
20. Prove that a monotone function I : JR --+ JR is Borel measurable.
21. Prove that the set of points where a sequence of measurable functions converges is a
measurable set.

8.5 The Integral for Nonnegative FUnctions

With any measure space (X, A, JL) there is associated (in a certain standard
way) an integral. It will be a linear functional on the space of all measurable
functions from X into JR-. The motivation for an appropriate definition arises
from our wish that the integral of the characteristic function of a measurable set
should be the measure of the set:

(1) ! XA = JL(A) (A E A)

The requirement that the integral act linearly leads to the definition of the
.integral of a simple function f:

(2)

In Equation (2) we assume that the sets Ai,'" ,An are mutually disjoint and
that the Qi are distinct. Such a representation f = L QiXAi is called canonical.

Lemma 1. Let f = L~=1 QiXAi' where we assume only


that the sets Ai are mutually disjoint measurable sets. Then J f =
L~=1 QiJL(Ai).
Proof. The function f is simple, and its range contains at most n elements. Let
be the range of f, and let Bi = 1-1 ( {l1d ). Then f = L~=1 l1i X Bi '
{111, ... ,11k}
400 Chapter 8 Measure and Integration

and this representation is canonical; i.e., it conforms to the requirements of


Equation (2). Putting J i = {j : OJ = ,Bil, we have

! i=l
k

f= L,BiJL(Bi ) = L,BiJL(
k

i=l
UAj) = LL,BiJL(A
jEJi jEJ;
j)
k

i=l
k n
= L L ojJL(A j ) = LOjJL(Aj )
j=l

Lemma 2. If g and f are simple functions such that g ~ f, then
f g ~ f f·
Proof. Start with canonical representations, as described following Equa-
tion (2):
n k
g= LOiXAi f= L,BjXBj
i=l j=l
Then we have (non-canonical) representations conforming to Lemma 1:

n k k n
g= L L OiXAinBj f = LL,BjXA;nBj
i=l j=l j=li=l

Since g ~ f, we have 0i ~,Bj whenever Ai n B j ::f. 0. By Lemma 1

! n
g = LLOiJL(Ai
i=l
k

j=l
n Bj ) ! k n
f = L L,BjJL(Aj n B j )
j=li=l

Hence


The next step in the process involves the approximation of nonnegative mea-
surable functions by simple functions, as addressed in Theorem 5 of Section 8.4,
page 397. Suppose, then, that gl, g2, ... are nonnegative simple functions such
that gn t f. Then we want the integral of f to be the limit of f gn. For technical
reasons this is best accomplished by defining

(3) ! f {! = sUP g : g simple and g ~ f}


In this equation we continue to assume f ;?! O.
At this juncture we have two definitions for the integral of a nonnegative
simple function, namely, Equations (2) and (3). Let us verify that these defini-
tions are not in conflict with each other.
Section 8.5 The Integral for Nonnegative Functions 401

Lemma 3. If i is a nonnegative simple function, then its integral


as given in Equation (2) equals its integral as given in Equation (3).

Proof. Since i itself is simple, the expression on the 'right of Equation (3) is
at least f j. On the other hand, if 9 is simple and if 9 :::;; i, then by Lemma 2,
f 9 :::;; f j. By taking a supremum, we see that the right side of Equation (3) is
at most f i· •
Lemma 4. If i and 9 are nonnegative simple functions, then
f(J + g) = f i + f g.

Proof. Proceed exactly as in the proof of Lemma 2. Then


n k

9+i = L 2:)li + ,Bj)XAinBj


i=1 j=1

By the disjoint nature of the family {Ai n B j } we have

/(g + 1)
n k
= L L(ai + ,Bj)/l(Ai n B j )
i=1 j=1

This is the same as f g+ f i, as we see from an equation in the proof of Lemma 2.



Lemma 5. For two measurable functions i and g, the condition
0:::;; i : :; 9 implies 0 :::;; f i : :; f g.

Proof. Since 0 is a simple function, the definition of f i in Equation (3) gives


fi ~ f0 = O. If h is a simple function such that h :::;; i, then h :::;; 9 and
f h :::;; f 9 by the definition of f g. In this last inequality, take the supremum in
h to get f i : :; f g. •
We now arrive at the first of the celebrated convergence theorems for the
integral. It is these theorems that distinguish the integral defined here from
other integrals, such as the Riemann integral.

Theorem 1. Monotone Convergence Theorem. If lin]


is a sequence of measurable functions such that 0 :::;; in t i, then
0:::;; fin t f j.
Proof· (Rudin) Since 0 :::;; in :::;; in+1 :::;; i, we have 0 :::;; fin:::;; f in+1 :::;; f i
by Lemma 5. Hence lim f in exists and is no greater than f i. For the reverse
inequality, let 0 < 0 < 1 and let 9 be a simple function satisfying 0 :::;; 9 :::;; i.
Put An = {x: in(x) ~ Og(x)}. If i(x) = 0, then g(x) = in(x) = 0, and x E An
for all n. If i(x) > 0, then eventually in(x) ~ Og(x). Hence x E U~=1 An and
X = U~=1 An· Also, we have An C An+1 for all n. By Lemma 2 in Section 8.2,
page 387, we have, for any measurable set E,

(4)
402 Chapter 8 Measure and Integration

From this it is easy to prove that J gXAn t J g. Indeed, we write 9


L:;:l AiXEi (Ei being mutually disjoint) and observe that

As n t 00, we have JL(An n Ei ) t JL(Ei ) by Equation (4). Since the coefficients


Ai are nonnegative, J gXn t L:;:l AiJL(Ei) = J g. We have proved that

() j 9 = li~ j ()gXAn ~ li~ j In

Since this is true for any () in (0,1), one concludes that J 9 ~ limn J In. In this
inequality take a supremum over all simple 9 for which 0 ~ 9 ~ I, arriving at
JI ~ limn J In. •

Theorem 2. For nonnegative measurable functions I and 9 we


have JU + g) = J1+ J g.
Proof. By Theorem 5 in Section 8.4, page 397, there exist nonnegative simple
functions In t I and gn t g. Then In +gn t 1+ g. By Theorem 1 (the Monotone
Convergence Theorem) and Lemma 4 above, we have

Theorem 3. Let I be nonnegative and measurable. The conditions


J 1= 0 and I(x) = 0 almost everywhere are equivalent.
Proof. Let A = {x : I(x) > O} and B = X" A. If I(x) = 0 almost every-
where, then JL(A) = O. Hence

j 1= jUXA + IX B ) = j IXA + j IX B
~j ooXA + j OX B = ooJL(A) + OJL(B) = 0

For the other implication, assume J I = O. Define An = {x : I(x) > ~}.


Then A = U:=l An· Since ~XAn is a simple function bounded above by f we
have
0= j I ~ j ~XAn = ~JL(An)
Thus JL(An) = 0 for all nand JL(A) = 0 by Lemma 1 in Section 8.2, page 386 .•
Section 8.5 The Integral for Nonnegative Functions 403

Theorem 4. Fatou's Lemma. For a sequence of nonnegative


measurable functions, J(lim inf f n) ~ lim inf J f n·

Proof. Recall that the limit infimum of a sequence of real numbers [xnl is
defined to be limn-H)o infi~n Xi. The limit infimum of a sequence of real-valued
functions is defined pointwise: (liminf fn)(x) = liminf fn(x) = limgn(x), where
gn(x) = infi~n J;(x). Observe that gn-l(X) ~ gn(x) ~ fn(x) and that gn t
lim inf fn. Hence by Theorem 1 (The Monotone Convergence Theorem)

!(liminf fn) = ! limgn = lim! gn = liminf! gn ~ liminf! fn •

Theorem 5. If f and g are nonnegative measurable functions that


are equal almost everywhere, then Jf = J g.

Proof. Let A = {x : f(x) = g(x)} and B = X" A. Then

o~ ! fX B ~ ! ooX B = oof..L(B) = 000 = 0

Similarly, J gX B = O. Hence

J
Theorem 5 states that f is not affected if f is altered on a set of measure
0, while retaining measurability.

Problems 8.5
1. Give an example in which strict inequality occurs in Fatou's Lemma (Theorem 4).
2. Show that a monotone convergence theorem for decreasing sequences is not true. For
example, consider In as the characteristic function of the interval [n,oo).

3. Prove that if I is nonnegative and Lebesgue integrable on IR and if F(x) = J~oo I, then
F is continuous.
4. Define In(x) to be n if Ixl ~ lin and to be 0 otherwise. What are Jlim In and lim J In?
5. Prove or disprove: If (X,A,/L) is a measure space and if A and B are measurable sets,
J
then IX A -- XBI = /L(A f:>. B). Recall that A f:>. B = (A" B) U (B" A).

6. Let I be Lebesgue measurable on [O,lJ, and define cp(t) = /L(J-l((--oo,t»). Find the
salient properties of cpo For example, is it continuous from the right or left? Is it mono-
tone? Is it measurable? Is it invertible? What are limt~oc cp(t) and limt---+-oo cp(t)?
7. (Continuation). Define J*(x) = sup{t : cp(t) ~ x}. Prove that cp(t) ~ x if and only
if t ~ J*(x). Prove that the sets {x : I(x) < t} and {x : J*(x) < t} have equal
measure. Hence, J* is called an equimeasurable nondecreasing rearrangement of
I. Prove that the sets {x : I(x) ~ t} and {x : J*(x) ~ t} have the same measure. Prove
that the sets {x : I(x) > t} and {x : J*(x) > t} have the same measure. Prove that
J*(cp(x» ~ x ~ cp(f*(x».
8. Give an example to show that the nonnegativity hypothesis cannot be dropped from
Fatou's Lemma (Theorem 4).
404 Chapter 8 Measure and Integration

9. Let fn be nonnegative measurable functions (on any measure space). Prove that if
fn -t f and f ~ fn for all n, then I fn -t If.

10. Prove, for any sequence in JR, that lim inf ( -Xn) = -lim sup Xn.
11. Let X = [0,11. Is there a Borel measure J1 on X that assigns the same positive measure
to each open interval (0, lin), n = 1,2,3, ... ?
12. If f is a bounded function, then there is a sequence of simple functions converging uni-
formly to f. (The domain of f can be any set, and no measurability assumptions are
needed.)
13. Let fn be measurable functions such that fn ~ 0 a.e. ("almost everywhere") and fn t f
a.e. Prove that I fn t If.

14. Let fn be nonnegative and measurable. Prove that I L::=1 fn = L::=1 I fn.

15. Let f be nonnegativ~ and measurable. Prove that I f = IA f, where A = {x : f(x) > a}.
16. Let fn be measurable functions such that fn ~ fn+l ~ 0 for all n and I fn -l. o. Prove
that fn -l. 0 a.e.
17. Let f be the characteristic function of the set of irrational points in [0, 11. Is f measurable?
What are the Riemann and the Lebesgue integrals of f?
18. Prove that if {An} is a disjoint sequence of measurable sets and if X = U::: 1 An, then
If = L::=1 IAn f·

19. Prove that if fn are nonnegative measurable functions for which L::=1 I fn ~ 0, then
fn -t 0 a.e.

20. Give an example of a sequence of Riemann integrable functions such that the inequalities
o ~ fn ~ fn+l ~ 1 hold, yet limfn is not Riemann integrable.
21. Give an example of a sequence of simple functions fn converging pointwise to a simple
function f, and yet I Ifn - fl f+ O.

22. Find a sequence of simple functions fn converging uniformly to 0, yet I Ifni f+ o.

8.6 The Integral, Continued

In the preceding section, the integral for nonnegative functions was developed in
the general setting of an arbitrary measure space (X, A, fL). Next on the agenda
is the extension of the integral to "arbitrary" functions.

Definition. Let (X,A,fL) be a measure space, and let f : X --+ JR". We define

Jf=Jr-Jr

where f+ = max(f, 0) and f- = max( - f, 0). Note that If remains undefined


if I f+ = I f- = 00.
The lattice operators max and min are defined for functions in a pointwise
manner. Thus, f+(x) = max(f(x), 0). Notice that f = f+ - f-, that f+ ~ 0,
that f- ~ 0, and that If I = f+ + f- .
Section 8.6 The Integral, Continued 405

The general definition just given for the integral is in harmony with the
previous definition, Equation (3), Section 8.5, page 400, in the cases where both
definitions are applicable. Indeed, if f ~ 0, then f+ = f and f- = o.
Definition. A function f : X --+ IR+ is said to be integrable, if it is measurable
and if J If I < 00. The set of all integrable functions on the given measure space
is denoted by £1 (X, A, p,), or simply by £1 ifthere can be no ambiguity about
the underlying measure space.

Lemma 1. A function f is integrable if and only if its positive and


negative parts, f+ and f-, are integrable.

Proof. Assume that f is integrable. Then it is measurable, and the measura-


bility off+ follows from the fact that {x: f+(x) ~ a} is X when a ~ 0 and is
{x: f(x) ~ a} when a > o. The finiteness of the integral of IJ+I is immediate
from the inequality If+ (x) I ~ If (x) I· The remainder of the proof involves similar
elementary ideas. •

Theorem 1. The set £1(X, A, p,) is a linear space, and the integral
is a linear functional on it.

Proof. Let f and 9 be members of £1. To show that f+g E £1, write h = f+g,
and

From this it follows that

Since these are all nonnegative functions, Theorem 2 of Section 8.5, page 402, is
applicable, and

Therefore, by Lemma 1,

With this equation now established, we use Lemma 5 in Section 8.5 (page 401)
to write
j If + gl ~ j(lfl + Igl) = j If I + j Igl < 00
For scalar multiplication, observe first that if >. ~ 0 and f ~ 0, then the
definition of the integral in Equation (3) of Section 8.5 (page 400) gives Af = J
>. J f· If f ~ 0 and>' < 0, then
406 Chapter 8 Measure and Integration

In the general case, we use what has already been proved:

j>'1= j[>.r+(->.)r] = jV++ j->.r=>.jr->.jr


= >. [j r - j r] = >. j I
The finiteness of the integral is now trivial:

j IVI = j 1>'1111 = 1>'1 jill < 00 •


The second of the celebrated convergence theorems in the theory can now
be given.

Theorem 2. Dominated Convergence Theorem. Let


g,fIJ2, ... be functions in L 1(X,A,IL) such that IInl ~ g. If the
sequence lIn] converges pointwise to a function I, then I E L1 and
fIn -+ fl.
Proof. The functions In + g are nonnegative. By Fatou's Lemma (Theorem 4
in Section 8.5, page 403) and by the preceding theorem,

j g+j I = j (g + 1) = j lim inf (g + In) ~ lim inf j (g + In)

= lim inf [/ g + / In] = / g + lim inf / In

Since f g < 00, we conclude that f I ~ lim inf f In. Since -land -In satisfy
the hypotheses of our theorem, the same conclusion can be drawn for them:
f - I ~ lim inf f - In. This is equivalent to - f I ~ -lim sup f In and to f I ~
lim sup f In. Putting this all together produces

liminf j In ~ lim sup j In ~ j I ~ liminf j In •


A step function is a function on IR that is a simple function 2:~1 CiXAi
in which the sets Ai are intervals, mutually disjoint.

Theorem 3. Let I be Lebesgue integrable on the real line. For


any positive € there exist a simple function g, a step function h and a
continuous function k having compact support such that

j II - gl < € iii-hi < € jlI - kl < €


Proof. By Lemma 1, 1+ and 1- are integrable. By the definition of the
integral, Equation (3) in Section 8.5, page 400, there exist simple functions g1
Section 8.6 The Integral, Continued 407

and g2 such that gl ~ j+, g2 ~ r,


J j+ < J gl +c, and Jf- < J g2 +c. Then
gl - g2 is a simple function such that

In order to establish the second part of the theorem, it now suffices to prove
it in the special case that f is an integrable simple function. It is therefore a
linear combination of characteristic functions of measurable sets of finite mea-
sure. It then suffices to prove this part of the theorem when I = X A for some
measurable set A having finite measure. By the definition of Lebesgue mea-
sure, there is a countable family of open intervals {In} that cover A and satisfy
JL(A) ~ l:~=l JL(In) < JL(A) + c. There is no loss of generality in assuming
that the family {In} is disjoint, because if two of these intervals have a point
in common, their union is a single open interval. Since the series l: JL(In) con-
verges, there is an index m such that l:~=m+l JL(In) < c. Put B = U;:'=1 In,
E = U~=m+l In, h = X B , and cp = X E . Then h is a step function. Since
A c B U E, we have f ~ h + cp. Then

Ih - II ~ Ih + cp - II + Icpl = (h + cp - f) - cp
Consequently,

Jlh-fl~J(h+cp-f)+ J cp

= JL(B) + JL(E) - JL(A) + JL(E) ~ 2c

For the third part of the proof it suffices to consider an f that is an integrable
step function. For this, in turn, it is enough to prove that the characteristic
function of a single compact interval can be approximated in Ll by a continuous
function that vanishes outside that interval. This can certainly be done with a
piecewise linear function. •
The linear space £1 (X, A, JL) becomes a pseudo-normed space upon intro-
J
ducing the definition Ilfll = If I· Since a function that is equal to 0 almost
everywhere will satisfy Ilfll = 0, we will not have a true norm unless we interpret
each I in Ll as an equivalence class consisting of all functions equal to f almost
everywhere. This manner of proceeding is eventually the same as introducing
the null space of the norm, N = {g E Ll : Ilgll = O}, and considering the quo-
tient space £1 IN. The elements of this space are cosets 1+ N, and the norm
of a coset is defined to be Ilf + Nil = inf{llf + gil: 9 EN}. This is the same as
Ilfll·
A consequence of these considerations is that for I in Ll, the expression
I (x) is meaningless. After all, f stands for a class of functions that can differ
from each other on sets of measure O. The single point x is a set of measure
zero, and we can change the value of f at x without changing I as a member
of Ll. The conventional notation J f(x) dx should always be interpreted as
J f· Remember that the integral of f is not affected by changing the values
408 Chapter 8 Measure and Integration

of f on any set of measure 0, such as the set of all rational points on the line!
Problems 8.6

1. Define the lattice operations V and 1\ by writing

(f V g)(x) = max (J(x), g(x)) (f 1\ g)(x) = min (J(x), g(x))

Prove that the set of measurable functions on a given measurable space is closed under
these lattice operations. Prove the same assertion for L1
2. Complete the proof of Lemma 1.
3. Let X = (0,1), let A be the a-algebra of Lebesgue measurable subsets of (0, 1), and let J.l
be Lebesgue measure on A. Which of these functions are in £1(X,A,J.l): (a) f(x) = x- 1,
(b) g(x) = X- 1/ 2 , (c) h(x) = exp(-x- 1 ), (d) k(x) = log x?
4. If f = 9 almost everywhere, does it follow that f+ = g+ and f- = g- almost everywhere?
What can be said of the converse?
5. Prove or disprove: If Un} is a sequence of measurable functions such that fn t f, then
J fn t J f·
J J
6. Prove that if f E L1(X, A, J.l), then If I E L1 and I fl ~ If I· Verify that If I = f+ + f-,
that f = f+ - f-, that 0 ~ f+ ~ If I, and 0 ~ f- ~ If I·
7. Show that from the five hypotheses fn integrable, h integrable, 9 measurable, fn -t f,
J J
Ifni ~ h one cannot draw the conclusion fng -t fg. Find an appropriately weak
additional hypothesis that makes the inference valid.
8. This problem and the next four involve convergence in measure. If I, h, h, ... are
measurable functions on a measure space (X, A, J.l) and if limn J.l{x : Ifn(x) - f(x)1 > E}
is 0 for each E > 0, then we say that fn - t f in measure. Prove that if In - t f
almost uniformly, then fn -t f in measure. (Almost uniform convergence is defined in
Section 8.4, page 397.)
9. Consider the following sequence of intervals: A1 = [0,1]' A2 = [0,1/2], A3 = [1/2,1],
A4 = [0,1/4]' A5 = [1/4,1/2], A6 = [1/2,3/4]' A7 = [3/4,1]' As = [0,1/8], ... Let fn
denote the characteristic function of An. Prove that f n - t 0 in measure but / n does not
converge almost everywhere.
10. Using Lebesgue measure, test the sequence fn = X 1n - 1,nj for pointwise convergence,
convergence almost everywhere, convergence almost uniformly, and convergence in mea-
sure.
11. Let (X,A,J.l) be a measure space such that J.l(X) < 00. Let /,/1,12, ... be real-valued
measurable functions such that /n -t / almost everywhere. Prove that /n -t f in
measure.
12. Prove that the Monotone Convergence Theorem (page 401), Fatou's Lemma (page 403),
and the Dominated Convergence Theorem (page 406) are valid for sequences of functions
converging in measure.
13. Prove that if A is a Lebesgue measurable set of finite measure, then for each E > 0 there
is a finite union B of open intervals such that J.l(A '" B) < E.
14. Prove that if / is Lebesgue measurable and finite-valued on a compact interval, then
there is a sequence of continuous functions gn defined on the same interval such that
gn -t / almost uniformly.
15. Prove Lusin's Theorem: If / is Lebesgue measurable and finite-valued on [a, b] and
if E > 0, then there is a continuous function 9 defined on [a, b] that has the property
J.l{x : /(x) # g(x)} < E.
Section 8.7 The LP -Spaces 409

8.7 The LP-Spaces

Throughout this section, a fixed measure space (X, A, fL) is the setting. For
each p > 0, the notation LP(X,A,fL), or just LP, will denote the space of all
measurable functions f such that J ifl P < 00. The case when p = 1 has been
considered in the preceding section. We write

(1) Ilfllp = (J IfI P ) lip

although this equation generally does not define a norm (nor even a seminorm
if p < 1). The case p = 00 will be included in our discussion by making two
special definitions. First, f E L OO shall mean that for some M, If(x)1 ~ M
almost everywhere. Second, we define

(2) Ilflloo = inf{M : If(x)1 ~ M a.e.}

The functions in L oo are said to be essentially bounded, and Ilfll oo is called


the essential supremum of If I, written as Ilfll oo = ess sup If(x)l.
When the equation ~+~ = 1 appears, it is understood that q will be 00
p q
when p = 1, and vice versa.

·11
Theorem 1. HOlder's Inequality. Let 1 ~ p ~ 00, - + - = 1,
p q
f E LP, and 9 E Lq. Then fg E Ll and

(3)

Proof. The semi norms involved here are homogeneous: Ilvll = 1,\lllfll. Con-
sequently, it will suffice to establish Equation (3) in the special case when Ilfllp =
Ilgllq = 1. At first, let p= 1 and q= 00. Since 9 E Loo, we have Ig(x)1 ~ M a.e.
for some M. From this it follows that J Ifgl ~ M J If I = Mllfil l . By taking
the infimum for all M, we obtain Ilfgll l ~ Ilflllllgll oo .
Suppose now that p > 1. We prove first that if a > 0, b> 0, and ~ t ~ 1, °
then atb l - t ~ ta + (1 - t)b. The accompanying Figure 8.1 shows the functions
of t on the two sides of this inequality (when a = 2 and b = 12). It is clear that
we should prove convexity of the function 'P(t) = atb l - t . This requires that we
prove 'P"(t) ? 0. Since log'P(t) = tioga + (1- t) 10gb, we have

~gj = log a -10gb = C

whence 'P"(t) = c'P'(t) = c2 'P(t) ? 0.


410 Chapter 8 Measure and Integration

12~
10

0.2 0.4 0.6 0.8

Figure 8.1
Now let a = If(x)IP, b = Ig(xW, t = lip, 1- t = 11q. Our inequality yields
then If(x)g(x)1 ~ ~lf(x)IP + ~lg(xW. By hypothesis, the functions on the right
in this inequality belong to Ll. Hence by integrating we obtain

Ilfglll = 1 Ifgl ~ ~ 1 IflP+ ~ 1 Igl q= ~ + ~ = 1 = Ilfllpllgllq •


Theorem 2. Minkowski's Inequality Let 1 ~ P ~ 00. If f and 9
belong to V, then so does f + g, and
Ilf + gllp ~ Ilfllp + Ilgllp
Proof. The cases p = 1 and p = 00 are special. For the first of these cases,

1If + gl ~ 1(lfl + Igl)


just write
= 1 If I+ 1 Igl
For p = 00, select constants M and N for which If(x)1 ~ M a.e. and Ig(x)1 ~ N
a.e. Then If(x) + g(x)1 ~ M + N a.e. This proves that f + 9 E LOO and that
Ilf glloo
+ ~ M + N. By taking infima we get + ~ Ilf glloo Ilfll oo Ilglloo'
+
Now let 1 < p < 00. From the observation that If + gl ~ 2 max{lfl, Igl},
we have

This establishes that f + 9 E LP. Next, write


If + glP = If + gllf + glP-l ~ Ifllf + glP-l + Igllf + glP-l
Since If + gl E V, we can infer that If + glP-l E Lq (where! + ! = 1) because

1If + gl(p-l)q 1If + glP


p q

= < 00
By the homogeneity of Minkowski's inequality, we may assume that I f +9 II P = 1.
Observe now that Holder's Inequality is applicable to the product Ifllf + glP-l
and to the product Igllf + gIP-l.
Consequently,

1= 1 If + glP ~ 1 Ifllf + glP-l + 1 Igllf + glP-l


~ 1IJllpilif + glP-lll q+ Ilgllplllf + glP-lll q
This is equivalent to


Section 8.7 The LP-Spaces 411

Theorem 3. The Riesz-Fischer Theorem. Each space


LP(X, A, /1), where 1 ~ P ~ 00, is complete.

Proof. The case p = 00 is special and is addressed first. Let lfn] be a Cauchy
sequence in L 00. Define

By Problem 1, these sets all have measure o. Hence the same is true of their
union, E. If x EX" E, then Ifn(x) - fm(x)1 ~ Ilfn - fmlloo'
and thus [fn(x)]
is a Cauchy sequence in IR for each x EX" E. This sequence converges to a
number that we may denote by f(x). Define f(x) = 0 for x E E. On X" E,
If(x)1 = lim Ifn(x)1 ~ lim Ilfnll oo
< 00. (Use the fact that a Cauchy sequence
in a metric space is bounded.) Thus, f E L=. To prove that Ilfn - -+ 0, fll oo
let € > 0 and select N so that Ilfn - fmll oo
< € when n > m > N. Then
Ifn(x) - fm(x)1 < € on X" E, and If(x) - fm(x)1 ~ € for m > N.
All the cases when 1 ~ P < 00 can be done together. Let [fn] be a Cauchy
sequence in LP. For each k = 1,2,3, ... there exists a least index nk such that
the following implication is valid:

It follows that nl ~ n2 ~ ... and that Ilfnk+l - fnkllp < 2- k. Let 90 = 0 and
9k = fnk+l - Ink for k ~ 1. Then
00 00

2: 119kll p < 2: Tk = 1
k=O k=l

Define h n = l:~=o 19k I and h = lim h n . By Minkowski's Inequality (Theo-


rem 2), Ilhnll
p ~ l:~=o 119kll
p < 1, and thus by Fatou's Lemma (Theorem 4 in
Section 8.5, page 403)

This proves that h E £P. Consequently, the set A on which h(x) = 00 is of


measure o. On X" A the two series L:;;:o 19dx)1
and L:%"=o 9k(X) converge.
Therefore, we can define f(x) = L:;;:o 9dx) for x EX" A and let f(x) = 0 on
A. Since
i
2:9k = fnl + (fn2 - fnl) + (fn3 - fn2) + ... + (fnHl - fni) = fni+l
k=O

we have fni(X) -+ f(x) a.e. Since If I ~ L:;;:o 19k1


= h, we conclude that f E £P.
It remains to be shown that Ilf -
fnll p -+ O. By the definition of nk, if j ~ nk,
then

412 Chapter 8 Measure and Integration

Problems 8.7

1. Prove that if IE £,x', then the set {x: I/(x)1 > 1l/11co} has measure O.

2. Let X be any set, and take A to be 2 x and J.L to be counting measure. In this setting, the
space LP(X, A, J.L) is often denoted by ep(X). Prove that for each I E ep(X) the support
of I is countable. Here, the support of I is defined to be {x EX: I(x) # O}.
3. (Continuation) Prove that if X is a set of n points, then dimeP(X) is n.
4. (Continuation) For n = 2, draw the set {f E ep(X) : 1I/IIp = I} using p = 1,2,10,00.
5. Let In E LOO and In ~ O. Prove that sup Il/n II co = II sup Inll oo '

6. In LP(X, A, J.L), write I == 9 if III - gllp = O. Prove that I == 9 if and only if I = 9 a.e.
(Thus the equivalence relation is independent of p.) Prove that the equivalence relation
is "consistent" with the other structure in LP by establishing that the conditions h == 12
and gl == g2 imply that II + gl == 12 + g2, >'h == >'/2, and Ilhll = 111211·
7. The space ep(J'\I) of Problem 2 is usually written simply as ep, and if IE ep, we usually
write In instead of I(n). Show that if IE ep , 9 E eq , lip + llq = 1, then Ig E e1 and
L:::"=ll/ngnl ~ (L:::"=ll/nIP)I/P(L:::"=llgnlq)l/q.
8. Let (E, II II) be a pseudo-normed linear space. Let M = {f E E : 11/11 = O}. Prove that
M is a linear subspace of E. In the quotient space ElM the elements are cosets 1+ M.
Define IIII + Ml = inf{1I1 + gil: gEM}. Show that this defines a norm in ElM.
9. Let I and In belong to Loo(X,A, J.L). Show that 111- Inll oo ~ 0 if and only if In ~ I
almost uniformly. (See the definition in Section 8.4, page 397.)
10. Let (X, A, J.L) be a measure space for which J.L(X) < 00. Show that if 0 < a < (3 ~ 00,
then L!3 C La. Show that the hypothesis J.L(X) < 00 cannot be omitted.
11. Prove that if 0 < a < (3 ~ 00, then ea c e!3. (See the definition in Problem 7.)
12. Show that in the proof of the Riesz-Fischer Theorem (Theorem 3), the sequence [/nJ
need not converge to I almost everywhere. Consider, for example, the characteristic
functions of the intervals [O,IJ, [0, ~J, [~, IJ, [0, %], [%,~], [~, IJ, ... Show that Il/nll p ~ 0
but In(x) is divergent for each x in [O,IJ.
13. Let (X, A, J.L) be a measure space for which J.L(X) < 00. Prove that for each I E Loo,
limp--+ oo II/lIp = 11/11 00 ,

14. Prove that for any measure space, if 0 < a < (3 ~ 00, then LOO n La C L'x n Li3.
15. Prove for any measure space: If 0 < a < (3 < I < 00, then L'" n L -y C Li3 n L -y.
16. Let I(x) = [xlog 2(llx)J-l and prove that I is in £1[0, ~J but is not in Up>! LP[O, H
17. Prove that if [/nJ is a Cauchy sequence in LP, then it has a subsequence that converges
almost everywhere.
18. Let I and In belong to LP. If Il/n - Illp ~ 0 and In ~ 9 a.e., what relationship exists
between I and g?
19. Let (X,A,J.L) be a measure space for which J.L(X) = 1. (Such a space is a probability
space.) Prove that if I and 9 are positive, measurable, and satisfy Ig ~ 1, then the
J J
inequality I . 9 ~ 1 holds.

20. Prove that if In E £1(X,A,J.L) and L:::"=!ll/nll! < 00, then In ~ 0 a.e.

21. If 0 < Jill < 00, then there is a continuous function 9 having compact support such
that Jig # O.
22. Prove that if IE LP(X, A, J.L) for all sufficiently large values of p, and if the limit of 1I/IIp
Section S.S The Radon-Nikodym Theorem 413

exists when p -t 00, then the value of the limit is 11/11 00 ,


23. Show that in general, LOO(X,A,Jl) oF n >1 LP(X,A,Jl). Are there cases when equality
p
occurs?
24. If A = 2x and Jl is counting measure on A, what is LOO(X, A, Ji)?

25. Prove that for I E L1(X, A, Jl) we have I f II ~ f III. When does equality occur here?

26. Let 1 < p < 00, l/p+ l/q = 1, and I E LP. Prove that I/IP E L1, that I/lp-1 E Lq, and
that for r oF 0, I/lr E LP/r.

8.8 The Radon-Nikodym Theorem

In elementary calculus, the expression f:


I(t) dt is called an "indefinite
integral." It is a function of the two arguments I and x, or of I and the
set [a,x]. Therefore, in general integration theory the analogous concept is an'
integral fA I depending on the two arguments I and A. Recall that our notation
is as follows:

where X A is the characteristic function of the set A. The set A and the function
I should be measurable with respect to the underlying measure space (X, A, Il).
Now suppose that a second measure 1/ is defined on the a-algebra A. If
I/(A) = 0 whenever Il(A) = 0, we say that 1/ is absolutely continuous with
respect to Il, and we write 1/ « Il.
One easy way to produce such a measure 1/ is given in the next theorem.

Theorem 1. If (X, A, Il) is a measure space, and if I is a nonneg-


ative measurable function, then the equation

(1) (A E A)

defines a measure 1/ that is absolutely continuous with respect to Il.

Proof. The postulates for a measure are quickly verified.


(a) 1/( 0) = f0 I = f IX 0 = f 0 = 0
(b) I/(A) ~ 0 because I ~ 0
(c) If [Ai] is a disjoint sequence of measurable sets, then

1/ (U
,=1
Ai) = 1 ,I = JIX = JI L X = JL IX
UA,
UAi Ai Ai

00

= LI/(A;)
i=l
414 Chapter 8 Measure and Integration

This calculation used the Monotone Convergence Theorem (Section 8.5, page 401).
The absolute continuity of v is clear: if I-l(A) = 0, then v(A) = fA f = o. •
It is natural to seek a converse for this theorem. Thus we ask whether each
measure that is absolutely continuous with respect to I-l must be of the form
in Equation (1). The answer is a qualified "Yes." It is necessary to make a
slight restriction. Consider a general measure space (X, A, I-l). We say that X
(or I-l) is <T-finite if X can be written as a countable union of measurable sets,
each having finite measure. For example, the real line with Lebesgue measure is
<T-finite, since we can write IR = U~=d-n, n].

Theorem 2. Radon-Nikodym Theorem. Let I-l and v be <T-finite


measures on a measurable space (X, A). If v is absolutely continuous
with respect to I-l, then there exists a nonnegative measurable function
h,determined uniquely up to a set of I-l-measure 0, such that v(A) =
fA hdl-l for all A E A.

Proof. We prove the theorem first under the assumption that I-l(X) < 00 and
v(X) < 00. Consider the Hilbert space L2 = L2(X, A, I-l + v). For any f in
L2, define ip(f) = f f dl-l. It is easily verified that ip is a linear functional on
I}. Furthermore it is bounded (continuous) because by the Holder Inequality
(Theorem 1 in Section 8.7, page 409)

J
lip(f)1 = I f· 1dl-ll ~ J I.1
If d(1-l + v) ~ IIfl12 111112
By the Riesz Representation Theorem for Hilbert space (Section 2.3, page 81)
there exists an element ho in L2 such that

ip(f) = J fho d(1-l + v)

This means that f f dl-l = J fh o d(1-l + v), whence

J f(l- ho)dl-l = J fhodv

Let B = {x : ho(x) ~ O}. Then 1 - ho ~ Ion B, and consequently

Thus I-l(B) = 0 and ho(x) > 0 a.e. (with respect to I-l). Since v « I-l, we have
v(B) = 0 also. Hence for any A E A,

v(A) = J X A dv = J hOlXAhO dv = J ho 1X A(1- ho) dl-l

= iho1(1-ho)dl-l= ihdl-l (h=hol(l- ho))


Section 8.8 The Radon-Nikodym Theorem 415

To see that h ~ 0 a.e., with respect to J.L, write A = {x : h(x) < O}, so that
o ~ I/(A) = fA hdJ.L ~ 0, whence J.L(A) = O.
For the second half of the proof we assume only that J.L and 1/ are a-finite.
Then X = U~=l An = U~=l Bn, where An and Bn are measurable sets such
that J.L(An) < 00 and I/(Bn) < 00 for each n. Write the doubly-indexed family
Ai n B j as a sequence Cn. Then X = UCn , J.L(Cn ) < 00, and I/(Cn) < 00.
With no loss of generality we assume that the sequence [Cn] is disjoint. Define
measures I/n and J.Ln by putting I/n(A) = I/(A n Cn) and J.Ln(A) = J.L(A n Cn).
Since 1/ « J.L, we have I/n « J.Ln for all n. By the first half of the proof there exist
functions hn such that I/n(A) = fA hn dJ.Ln, for all A E A. Since the Cn-sequence
is disjoint, we can define h on X by specifying that h(x) = hn(x) for x E Cn.
Then we have

For the uniqueness of h, suppose that fA h dJ.L = fA h' dJ.L for all A E A. Letting
A = {x : h(x) > h'(X)}, we have

l (h - h') dJ.L =0 h> h' on A

It follows that J.L(A) = O. By symmetry, the set where h'(x) > h(x) is also of
measure O. Hence h = h' a.e. (J.L). •
The preceding paragraphs have involved the concept of absolute continuity
of one measure with respect to another. The antithesis of this is "mutual singu-
larity." Two measures J.L and 1/ on the same measure space are said to be mu-
tually singular if there is a measurable set B such that J.L(B) = I/(X "B) = O.
This relation is written symbolically as J.L .1 1/. It is obviously a symmetric
relation.

Theorem 3. Lebesgue Decomposition Theorem. If J.L and 1/


are a-finite measures on the measurable space (X, A), then there exist
unique measures 1/1 and 1/2 on A such that 1/ = 1/1 + 1/2, 1/1 « J.L, and
1/2 .1 J.L.

Proof. By the Radon-Nikodym Theorem (Theorem 2, above)-or indeed by


the first half of its proof-there exists a measurable function h such that

(A E A)

Define the set B = {x : h(x) = O}. Next, define

1/1 (A) = I/(A "B) (A E A)

Obviously, 1/1 + 1/2 = 1/. By Problem 8.2.15, page 391, 1/1 and 1/2 are measures.
Let us prove that 112 .l J.L. Since h = 0 on B, we have J.L(B) = fB h d(J.L + II) = O.
On the other hand, 1/2 (X "B) = I/((X "B) n B) = 1/(0) = O. Next, we prove
416 Chapter 8 Measure and Integration

that //1 ~ J.t. Suppose that J.t(A) = O. Then fAhd(J.t+//) = 0, fAhd// = 0,


and fA" B hd// = O. But h > 0 on A" B, and hence //(A" B) = 0, //1(A) = O.
Finally, we prove the uniqueness of our decomposition. Suppose that another
decomposition is given: // = //3 + //4, where //3 ~ J.t and //4 ~ J.t. Then there
exists a set C such that J.t(C) = //4 (X " C) = O. (This set C is akin to B in
the first part of our proof.) If D = B U C, then 0 = J.t(B) + J.t(C) ~ J.t(D) and
J.t(D) = O. It follows, for any measurable set A, that //1(A n D) ~ //1(D) = O.
Hence
//(A n D) = (//1 + //2)(A n D) = //2 (A n D)
= //2 (A n D) + //2 (A "D) = //2(A)

The same argument will prove that //(A n D) = //4(A). Hence //2 = //4. Since
// = //1 + //2 = //3 +//4, one is tempted to conclude outright that //1 = //3,
However, if A is a set for which //2(A) = //4(A) = 00, we cannot perform the
necessary subtraction. Using the O'-finite property of the space, we find a disjoint
sequence of measurable sets Xn such that X = UXn and //(Xn) < 00. Then
//1(X n n A) = //3(X n n A) for all n and for all A. It follows that //1(A) = //3(A)
and that //1 = //3'

Problems 8.8

1. Is the relation of absolute continuity (for measures) reflexive? What about symmetry
and transitivity? Is it a partial order? A linear order? A well-ordering? Give examples
to support each conclusion.
2. Solve Problem 1 for the relation of mutual singularity.
dll
3. The function h in the Radon-Nikodym Theorem is often denoted by - . Prove that
dJ.L
dll dll dJ.L .
- = - - If II « J.L « (J.
d(J dJ.L d(J
d(1I + (J) dll d(J.
4. Refer to Problem 3 and prove that - - - = - +- If II « J.L and (J « J.L.
dJ.L dJ.L dJ.L
dJ.L dll
5. Refer to Problem 3 and prove that - - = 1 If. II « J.L « II.
dll dJ.L

6. Refer to Problem 3 and prove that f f dll = f f dll


dJ.L
dJ.L if II « J.L.

7. Let X = [0,1]' let A be the family of all Lebesgue measurable subsets of X, let II be
Lebesgue measure, and let J.L be counting measure. Show that II «J.L. Show that there
exists no function h for which II(A) = fA h dJ.L. Explain the apparent conflict with the
Radon-Nikodym Theorem.
8. Prove that in the Radon-Nikodym Theorem, h(x) < 00 for all x. Show also that if
II(X) < 00, then h E LI(X,A,J.L).
9. Extension of Radon-Nikodym Theorem. Let J.L and II be measures on a measurable space
(X,A). Suppose that there exists a disjoint family {B",} of measurable sets having these
properties:
(i) J.L(B",) < 00 for all Q.

(ii) J.L(A) = 0 if A E A and J.L(An B",) = 0 for all Q.


(iii) II(A) = 0 if A E A and II(A n B",) = 0 for all Q.
Section 8.9 Signed Measures 417

If 1I « Ji, then there is an h as in the Radon-Nikodym Theorem, but it may be measurable


only with respect to the a-algebra

B = {B : B C X, B nA EA when A E A and Ji(A) < oo}

10. If there corresponds to each positive c a positive 8 such that

[A E A and Ji(A) < 8] = II(A) <c

then 1I « Ji, and conversely.


11. Let (X, A, Ji) be a measure space. Fix B E A and define lI(A) = Ji(B n A) for all A E A.
Is 1I absolutely continuous with respect to Ji? Is Ji - 1I a measure? Is the equation
fA f dll = fAnB f dJi true for measurable f and A? Is Ji - 1I singUlar with respect to Ji?
Give examples.
12. If 1I « Ji and .A .i Ji then 1I .i .A.
13. If Ji .ill « Ji, then 1I = O.
14. Give an example of the Radon-Nikodym Theorem in which the function h fails to be
bounded a.e.
15. Let (X, A, Ji) be a a-finite measure space. Let 1I be a measure on A. The existence of
a constant c for which 1I ~ CJi is equivalent to the existence of a bounded measurable
nonnegative function h such that lI(A) = fA f dJi.

8.9 Signed Measures

In this section we examine the consequences of relaxing the nonnegativity re-


quirement on a measure. Let (X, A) be a measurable space. A function J.L from
A to JR' is called a signed measure if
(i) The range of J.L does not include both +00 and -00.
(ii) J.L(0) =0
(iii) J.L{U:l Ai) = L::l J.L(Ai) when {Ai} is a disjoint sequence in A.
The reason for the first requirement is that we want to avoid the meaningless
expression 00 - 00. Thus, if J.L(A) = 00 and J.L(B) = -00, then from the equation
A = (A n B) u (A" B) we see that one of the terms J.L(A n B) and J.L(A" B)
must be +00. Likewise, one of J.L(A n B) and J.L(B " A) must be -00. Hence the
right side of the equation

is meaningless.

Theorem 1. Jordan Decomposition. The difference of two


measures (defined on the same a-algebra), one of which is finite, is a
signed measure. Conversely, every signed measure J.L is the difference
of two measures J.L+ and J.L-, one of which is finite. Furthermore, we
418 Chapter 8 Measure and Integration

may require these two measures to be mutually singular, and in that


case they are uniquely determined by f..t.

Proof. For the first assertion, let f..t1 and f..t2 be measures, and suppose that f..t1
is finite. Put f..t = f..t1 - f..t2. To see that f..t is a signed measure, note first that f..t
does not assume the value +00. Next, we have f..t(0) = 0 since f..t1 and f..t2 have
this property. Finally, let {Ai} be a disjoint sequence of measurable sets. Then

00 00

= :~::>l(Ai) - Lf..t2(Ai )
i=l i=l

n 00

= li~ L [f..t1 (Ai) - f..t2(Ai )] = L f..t(Ai)


i=l i=l

Notice that on the second line of this calculation the first sum is finite, although
the second may be infinite.
For the other half of the proof, let f..t be a signed measure that does not
assume the value +00. In an abuse of language, we say that a (measurable) set
S is positive if f..t(A) ~ 0 for all measurable subsets A in S. Define

() = sup{f..t( S) : S is a positive set}

Let Sn be a sequence of positive sets such that f..t(Sn) t (), and define P =
U~=l Sn. Let us prove that P is a positive set. If A c P, we write

Since An C Sn, we have f..t(An) ~ O. Since A is the union of the disjoint family
{An}, it follows that
00

f..t(A) = L f..t(An) ~ 0
n=l
Since P is a positive set,

whence f..t(P) ~ () and () < 00.


Now we wish to prove that f..t(A) ~ 0 whenever A C X '- P. Suppose, on the
contrary, that A C X '- P and f..t( A) > o. If A contains a positive set B of positive
measure, then PuB is a positive set for which f..t(P U B) = f..t(P) + f..t(B) > (), in
contradiction to the definition of (). Thus, A contains no positive set of positive
Section 8.9 Signed Measures 419

measure. Define sets AI, A 2, ... as follows. Let n1 be the first positive integer
such that there exists a set Al satisfying

Since 0 < Jl(A) = Jl(A ...... AI) + Jl(A 1), we see that A . . . Al is a subset of A having
positive measure. It is therefore not a positive set. Hence there is a first positive
integer n2 and a set A2 such that

Continue in this manner, finding at the kth step a set Ak and an integer nk such
that

Define B = A . . . U%"=l A k . By the same argument used earlier, B has positive


measure. It is actually a positive set. To verify this, suppose on the contrary
that there exists a set C C B such that Jl( C) < O. Let m be the first positive
integer such that Jl( C) < -1/ m. Since C C A . . . (AI U ... U A k- 1) for every
k, we have nk ~ m for all k. Hence Jl(U%"=l Ak) = -00 and Jl(B) = +00, a
contradiction.
Now define Jl+ and Jl- by writing, for A E A,

We see that Jl+ .1. Jl- because Jl+(X . . . P) = 0 = Jl-(P).


Our last task is to prove the uniqueness of this decomposition. Suppose
that Jl = Jl1 - Jl2 = VI - V2, where these are measures such that VI .1. V2 and
Jl1 .1. Jl2. Then there exists a set Q such that V1(X . . . Q) = 0 = V2(Q). We can
prove that VI ~ Jl1 by writing

VI (A) = VI (A n Q) + VI (A . . . Q) = VI (A n Q) = (Jl + v2)(A n Q)


= Jl(A n Q) = (Jl1 - Jl2)(A n Q) ~ Jl1(A n Q) ~ Jl1(A)

By the symmetry in this situation, we can prove Jl1 ~ VI. Hence Jl1 = VI and
Jl2 = V2· •

Theorem 2. Radon-Nikodym Theorem for Signed Measures.


Let (X,A,Jl) be a a-finite measure space. Ifv is a finite-valued signed
measure that is absolutely continuous with respect to Jl, then there is
a measurable function h such that for all A E A, v(A) = fA hdJl.

Proof. By the preceding theorem, there exist measures v+ and V- such that
V = v+ - v- and v+ .1. V-. Consequently, there exists a measurable set P for
which v+(X . . . P) = 0 = v-(P). If A is a measurable set satisfying Jl(A) = 0,
then Jl(A n P) = 0 and v(A n P) = 0, by the absolute continuity. Hence

v+(A) = v+(A n P) + v+(A ...... P) = v+(A n P)


= (v + v-)(A n P) = v(A n P) = 0
420 Chapter 8 Measure and Integration

This establishes that v+ is absolutely continuous with respect to p,. It follows


that IF is also absolutely continuous with respect to p,. By the earlier Radon-
Nikodym Theorem (Theorem 2 in Section 8.8, page 414), there exist nonnegative
measurable functions hI and h2 such that for A in A,

It follows that hI and h2 are finite almost everywhere. Thus, there is nothing
suspicious in the equation

Theorem 3. The Hahn Decomposition. If p, is a signed measure


on the measurable space (X, A), then there is a decomposition of X
into a disjoint pair of measurable sets Nand P such that p,(A) ~ 0
when A c P and p,(A) ::;; 0 when A c N.

Proof. Left as a problem.



Problems 8.9

1. Use the Jordan decomposition theorem to prove the Hahn decomposition theorem.
2. Prove that J.L+ in Theorem 1 has the property

J.L+(A) = sup{J.L(S) : SEA and SeA}


3. Prove that a signed measure J.L is monotone on a positive set. Thus, if A C B c S, where
S is a positive set, then J.L(A) ::;; J.L(B).
4. If J.L is a signed measure, does it follow that -J.L is also a signed measure? Are sums and
differences of signed measures signed measures?
5. Let (X, A, J.L) be a measure space, and let h be a nonnegative measurable function. Define
v(A) = fA hdJ.L. Prove that f fdv = f fhdJ.L for all measurable f·

6. Let J.L and v be measures on the measurable space (X, A). Suppose that 0 is the only
measurable function such that v(A) ~ fA f dJ.L, for all A E A. Prove that v.l J.L.

7. Let (X, A, J.L) be a measure space such that each singleton {x} is measurable. Define
v(A) to be the sum of all J.L({x}) as x ranges over A. Does this define a measure on A?
8. Is the function h in Theorem 2 unique?

8.10 Product Measures and Fubini's Theorem

Suppose that two measure spaces are given: (X, A, p,) and (Y, H, v). Is there a
suitable way of making the Cartesian product X x Y into a measure space? In
particular, can this be done in such a way that

( f(x,y) = { ( f(x,y)dv(y)dp,(x)?
ixxY ixiy
Section 8.10 Product Measures and Fubini's Theorem 421

We begin by forming the class of all sets of the form A x B, where A E A


and B E B. Such sets are called measurable rectangles or simply rectangles.
The family of all rectangles is not a O'-algebra. For example, (AI x Bt}u(A2 x B 2)
will not be a rectangle, in general. To understand this, observe that if points
(Xl,Yl) and (X2,Y2) belong to a rectangle, then (Xl,Y2) and (X2,Yl) also belong
to that rectangle.
The next step, therefore, is to construct the O'-algebra A ® B generated by
the rectangles. (Refer to Lemma 2 in Section 8.1, page 384). For any subset E
of the Cartesian product X x Y, we define cross-sections

Ex = {y E Y : (x, y) E E}
EY={xEX:(x,Y)EE}

Lemma 1. Let (X, A)and (Y, B) be two measurable spaces. If


E E A ® B, then Ex E B for all x E X and EY E A for all Y E Y.

Proof. Define

M = {E : E C X x Y and EY E A for all Y E Y}

We shall prove that M is a O'-algebra containing all rectangles. From this it will
follow that M ::) A ® B, since the latter is the smallest O'-algebracontaining all
rectangles. Then, if E E A ® B, we can conclude that E E M and that EY E A
for each y. Now consider any rectangle E = A x B. If Y E B, then EY = A E A.
If Y f- B, then EY = 0 E A. Thus in all cases EY E A and E E M. Next, let E
be any member of M. The equation

(1)

shows that (X x Y)" E belongs to M. If Ei EM, then by the equation

]Y co
Er
CO
(2) [
i~ Ei = ild

we see that U Ei EM.


An algebra of subsets of a set X is a collection C such that if A and B

belong to C then X" A and Au B belong also to C.

Lemma 2. The collection of all unions of finite disjoint families of


rectangles constructed from a pair of O'-algebras is an algebra.

Proof. Let C be the collection referred to, and let E and F be members of C.
Then E and F have expressions E = U~=1 (Ai x Bi) and F = U;:1 (Cj x Dj ),
both being unions of disjoint families. Since
n m
(3) EnF= UU[(AiIlCj) x (BinDj)]
i=1 j=1
422 Chapter 8 Measure and Integration

we see that E n FEe, and that C is closed under the taking of intersections.
From the equation

(4) (X x Y)'-(A x B) = [(X '-A) x B] U [X x (y,-B)]

we get

(X X
n
Y) '- E = (X x Y) '- U(A i x B i ) =
i=1
nn

i=1
[(X x Y) '- (Ai x B i )]

= n
n

i=1
{[(X,-Ai ) x Bi] U [X x (Y,-Bi)l}

This shows that the complement of E belongs to C, because C is closed under


finite intersections. By the de Morgan identities, C is closed under unions. •

Lemma 3. In any measure space (X,A,JL) the following are true


for measurable sets Ai :
(1) If Al C A2 C ... , then JL (U: I Ai) = limn JL(An)
(2) If Al :J A2 :J ... and JL(AI) < 00, then JL (n: 1 Ai)
limn JL(An)

Proof. Assume the hypothesis in (1), and define Bn = An '- An-I. The se-
quence {Bn} is disjoint, and consequently,

JL(Q Ai) = JL(Q Bi) = ~JL(Bi) = nl~~~JL(Bi)


n
= lim JL(U Bi) = lim JL(An)
n--+oo n--+oo
i=1

To establish (2), assume its hypothesis. Then {AI '- An} is an increasing se-
quence, and by part (1) we have

JL(AI) - JL(E1 Ai) = JL( Al '- E1 Ai) = JL(Q(AI '- Ai))

= n--+oo
lim JL(A I '- An) = lim (JL(Ad -
n--+oo
JL(An))
= JL(AI) - n-+oo
lim JL(An) •

A monotone class of sets is a family M having these two properties:


(1) If Ai EM and Al C A2 C ... then U:I Ai EM
(2) If Ai E M and Al :J A2 :J ... then n:
1 Ai E M
x
If X is a set, then 2 is a monotone class. Also, every o--algebra of sets is a
monotone class (easily verified). If A is any family of sets, then there exists a
smallest monotone class containing A. This assertion depends on the easy fact
that the intersection of a collection of monotone classes is also a monotone class.
Section 8.10 Product Measures and Fubini's Theorem 423

Lemma 4. Let C be an algebra of sets, as defined above. Then the


monotone class generated by C is identical with the a-algebra generated
byC.

Proof. Let M and S be respectively the monotone class and the a-algebra
generated by C. Since every a-algebra is a monotone class, we have M C S.
The rest of the proof is devoted to showing that M is a a-algebra (so that
ScM).
For any set F in the monotone class M we define
KF = {A: the sets A" F, F" A, and AuF belong to M}
Assertion 1 KF is a monotone class.
There are two properties to verify, one of which we leave to the reader. Suppose
that Ai E KF and Al C A2 C ... Let A = U:l Ai. Then Ai" F, F" Ai, and
Ai U F all belong to M and form monotone sequences. Since M is a monotone
class, we have
00

i=1
00

i=1
00

i=1
These calculations establish that A E K F .
Assertion 2 If FE C, then C C K F .
To prove this let E be any element of C. Since C is an algebra, we have E" F,
F" E, and EuF all belonging to C and to M. By the definition of KF, E E K F.
Assertion 3 If FE C, then M C KF.
To prove this, note that KF is a monotone class containing C, by Assertions 1
and 2. Hence KF :J M, since M is the smallest monotone class containing C.
Assertion 4 If F E C and E E M, then E E KF.
This is simply another way of expressing Assertion 3.
Assertion 5 If FE C and E E M, then F EKE.
This is true because the statement E E KF is logically equivalent to F EKE.
Assertion 6 If E E M, then C C KE.
This is a restatement of Assertion 5.
Assertion 7 If E E M, then M C KE.
This follows from Assertions 6 and 1 because KE is a monotone class containing
C, while M is the smallest such monotone class.
Assertion 8 M is an algebra.
To prove this, let E and F be members of M. Then F E KE by Assertion 7.
Hence E" F, F" E, and E U F all belong to M.
Assertion 9 M is a a-algebra.
To prove this, let Ai E M and define Bn = Al U··· U An. By Assertion 8, M
is an algebra. Hence Bn E M and Bl C B2 C ... Since M is a monotone class,
U~=1 Bn EM. It follows that U~=1 An EM. •
424 Chapter 8 Measure and Integration

Theorem 1. First Fubini Theorem. If (X,A,/1) and (Y,B,v)


are a-finite measure spaces, and if E E A @ B, then
(1) The function y H /1(EY) is measurable.
(2) The function x H v(Ex) is measurable.
(3) Jx v(Ex) dJ-l(x) = Jy J-l(EY) dv(y)
Proof. Let M be the family of all sets E in A @ B for which the assertion in
the theorem is true. Our task is to show that M = A@ B.
We begin by showing that every measurable rectangle belongs to M. Let
E = A x B, where A E A and B E B. Since EY = A or EY = 0, depending on
whether y E B or y E Y" B, we have J-l(EY) = XB(Y)J-l(A). This is a measurable
function of y. Furthermore,

We can carry out the same argument for v(Ex) to see that E E M.
In the second part of the proof, let C denote the class of all sets in A @ B
that are unions of finite disjoint families of rectangles. By Lemma 2, C is an
algebra. We shall prove that C c M. Let E E C. Then E = U7=1 Ei where
E 1 , ... ,En is a disjoint set of rectangles. Hence

This shows that y H /1(EY) is a measurable function. By the symmetry in this


situation, x H v(Ex) is measurable and v(Ex) = L~1 V((Ei)x). Since Ei EM
by the first part of our proof, we have

This establishes that E E M and that C eM.


In the third segment of the proof, we show that M is closed under the taking
of unions of increasing sequences of sets. Let Ei E M and E1 C E2 C ... Define
E = U~1 Ei· Then by Lemmas 1 and 3, /1(EY) = /1(U~1 En = limn J-l(EX).
Hence, y H J-l(EY) is a measurable function. Also, since En EM,

by the Monotone Convergence Theorem (Theorem 1 in Section 8.5, page 401).


This shows that E EM.
In the fourth part of the proof we establish that M is closed under taking
intersections of decreasing sequences of sets. Since X and Yare a-finite, there
Section 8.10 Product Measures and Fubini's Theorem 425

exist An E A and Bn E B such that X = U~=1 An, Y = U~=1 Bn, J1(An) < 00,
and I/(Bn) < 00.We may suppose further that Al C A2 C ... and that Bl C
B2 C ... Let {E;} be a decreasing sequence of sets in M, and set E = n~=1 En.
We want to prove that E E M. Since E = U~=I[En (An X Bn)] and since Mis
closed under "increasing unions," it suffices to prove that fJ n (An X Bn) E M
for each n. We therefore define

F = {F : F n (An X Bn) E M for n = 1, 2, ... }

Now it is to be proved that E E F. Since E = n:l


E i , it will be sufficient to
prove that F is a monotone class and that Ei E F for each i. Since Ei E M c
A Q9 B, we have only to prove that F is a monotone class containing A Q9 B. By
Lemma 4, this will follow if we can show that F is a monotone class containing
C. That F ~ C can be verified as follows. Since An X Bn E C, and C is an
algebra, we have the implications

F E C ==} F n (An X Bn) E C eM==} F E F

To prove that F is a monotone class, let {Fi} be an increasing sequence in F,


and set F = U:l F i . The equation

00

F n (An X Bn) = U [Fi n (An X Bn) 1


i=1

shows that F n (An X Bn) E M, since M is closed under "increasing unions."


Hence FE F. Next, take a decreasing sequence {Fi} in F, and let F = Fi . n:l
Let n be fixed. For each i, the set G i = Fi n (An X Bn) belongs to M. Let
G = n:l Gi · For each y E Y, Gi c An, whence J.l(Gi) ~ J1(An) < It 00.
follows from Lemma 3 that J.l(GY) = limi J.l(Gn. This proves that J.l(GY) is a
measurable function of y. Since

the Dominated Convergence Theorem (Theorem 2 in Section 8.6, page 406)


implies that IyJ.l(GY)dl/(Y) = lim;fyJ.l(Gndl/. Similarly, Ixv(Gx)dJ1(x) =
limi Ix I/((Gi)x) dJ1(x). But for each i, Ix I/((Gi)x) dJ1(x) = Iy J1(Gn dl/(y) be-
cause G i E M. Hence Iy J.l(GY) dl/(y) = Ix I/(G x ) dJ1(x). Thus GEM. Since
G = F n (An X B n ), F E F.
We are now at the point where M is a monotone class containing C. By
Lemma 4, M contains the a-algebra generated by C. Thus M ~ A 0 B. •
The preceding theorem enables us to define a measure ¢ on A Q9 B by the
equation

This measure ¢ is called the product measure of J.l and v. It is often denoted
by J.l Q9 1/.
426 Chapter 8 Measure and Integration

Lemma 5. If (X,A,J-L) and (Y,B,II) are IJ"-finite measure spaces,


then so is (X x Y, A@ B, J-L @ II).

Proof. It is clear that the set function ¢ has the property ¢(0) = 0 and the
property ¢(E) ~ O. If {Ei} is a disjoint sequence of sets in A@ B, then {En is
a disjoint sequence in A. Hence, by the Dominated Convergence Theorem,

¢(Q Ei) = i (Q
J-L ( Eir) dll(Y) = i J-L(Q EY) dll(Y)

= it J-L(Ef) dll(Y) = ti J-L(Ef) dll(Y) = t ¢(Ei)

Thus ¢ is a measure. For the IJ"-finiteness, observe that if X = U::"=I An and


Y = U::"=I B n , where Al C A2 C ... and BI C B2 C "', then X x Y =
U::"=I (An X Bn). If, further, J-L(An) < 00 and II(Bn) < 00 for all n, then we have
¢(An x Bn) = J-L(An)II(Bn) < 00. •

Theorem 2. Second Fubini Theorem. Let (X,A,J-L) and


(Y, B, II) be two IJ"-finite measure spaces. Let f be a nonnegative func-
tion on X x Y that is measurable with respect to (X x Y, A @ B).
Then
( 1) For each x, Y f-t f (x, y) is measurable
(2) For each y, x f-t f (x, y) is measurable
(3) y f-t Ix f(x,y) dJ-L(x) is measurable
(4) X f-t fy f (x, y) dll(Y) is measurable
(5) fxxy f(x, y) d¢ = fx fy f(x, y) dlldJ-L = fy Ix f(x, y) dJ-Ldll

Proof. If f is the characteristic function of a measurable set E, then (1) is true


because f(x, y) = XEx(Y)' Part (2) is true by the symmetry in the situation.
Since

the preceding lemma asserts that (3) is also true in this case. Part (4) is true
by symmetry. For part (5), write

f f(x, y) d¢ = ¢(E) = f J-L(EY) dll(Y)


}XXy }y
= i Ix f(x, y) dJ-L(x) dll(Y)

The other equality is similar. Thus, Theorem 2 is true when f is the character-
istic function of a measurable set.
If f is a simple function, then f has properties (1) to (5) by the linearity of
the integrals.
Section 8.10 Product Measures and Fubini's Theorem 427

If I is an arbitrary nonnegative measurable function, then there exist simple


functions In such that In t I. Since the limit of a sequence of measurable func-
tions is measurable, I has properties (1) to (4). By the Monotone Convergence
Theorem, property (5) follows for f. •
An extension of the Fubini Theorem exists for a more general class of mea-
sures, namely signed measures and even complex-valued measures. Since the
complex-valued measures include the real-valued measures, we describe them
first. Let (X, A) be a measurable space. A complex measure "on X" is a
function ft : A -t C such that:
(I) sUPAEA Ift(A)1 < 00

(II) ft(U Ai) = L~1 ft(Ai) for any disjoint sequence of measurable sets Ai.

In an abuse of notation, we define Iftl by the equation

L Ift(Ai)1
00

Iftl(A) = sup
i=1

where the supremum is over all partitions of A into a disjoint sequence of measur-
able sets. It is clear that Ift(A)1 ::::; Iftl(A), because {A} is a competing partition
of A. The theory goes on to establish that Iftl is an ordinary (i.e., nonnegative)
measure and Iftl(X) < 00. This feature distinguishes the theory of complex or
signed measures from the traditional nonnegative measures. References: [DS],
[Roy], [Ru3], [HS], [Berb3], [Berb4].
The Fubini Theorem in this new setting is as follows:

Theorem 3. Fubini's Theorem for Complex Measures. Let


(X, A) and (Y,8) be two measurable spaces, and let ft and v be com-
plex measures on X and Y, respectively. Let I be a complex-valued
measurable function on X x Y. If Jy Jx If(x,y)1 dlftl dlvl < 00, then

Iv L I(x, y) dft(x) dv(y) = LIv f(x, y) dv(y) dft(x)

This theorem is to be found in [DSj.

Problems 8.10

1. Verify Equations (1) and (2).


2. Verify Equations (3) and (4). Show that the two sets on the right side of Equation (4)
are mutually disjoint.
3. Prove that every a-algebra is a monotone class. Prove that the intersection of a family
of monotone classes is also a monotone class.
4. Prove that if a monotone class is an algebra, then it is a a-algebra. Can you get this
conclusion from weaker hypotheses?
References

[AS] Abramowitz, M. and l. Stegun, Handbook of Mathematical Functions with Formulas,


Graphs, and Mathematical Tables, U.S. Department of Commerce, National Bureau of Stan-
dards, 1964. Reprint, Dover Publications, New York.
[Ad] Adams, R. A., Sobolev Spaces, Academic Press, New York, 1975. (Vol. 65 in the series
Pure and Applied Mathematics.)
[Agm] Agmon, S., Lectures on Elliptic Boundary Value Problems, Van Nostrand, New York,
1965.
[AG] Akhiezer, N. l. and l. M. Glazman, Theory of Linear Operators in Hilbert Space, Ungar,
New York, 1963.
[ATS] Alekseev, V. M., V.M. Tikhomirov, and S. V Fomin, Optimal Control, Consultants
Bureau, New York, c1987.
[AY] Alexander, J. C. and J. A. Yorke, "The homotopy continuation method: Numerically
implemented topological procedures," Trans. Amer. Math. Soc. 242 (1978), 271-284.
[AlG] Allgower, E. and K. Georg, "Simplicial and continuation methods for approximating
fixed points and solutions to systems of equations," SIAM Review 22 (1980),28-85.
[AGP] Allgower, E. L., K. Glasshoff, and H.-O. Peitgen, eds., Numerical Solution of Nonlinear
Equations, Lecture Notes in Math., vol. 878, Springer-Verlag, New York, 1981.
[Ar] Aronszajn, N., Introduction to the Theory of Hilbert Spaces, Research Foundation, Okla-
homa State University, Stillwater, Oklahoma, 1950.
[At] Atkinson, K. E., A Survey of Numerical Methods for the Solution of Fredholm Integral
Equations of the Second Kind, SIAM Publications, Philadelphia, 1976.
[Au] Aubin, J. P., Applied Functional Analysis, 2nd ed., Wiley, New York, 1999.
[Av1] Avez, A., Introduction to Functional Analysis, Banach Spaces, and Differential Calculus,
Wiley, New York, 1986.
[Av2] Avez, A., Differential Calculus, Wiley, New York, 1986.
[Ax] Axelsson, 0., "On Newton Type Continuation Methods," Comm. Applied Analysis 4
(2000), 575-595.
[BN] Bachman, G. and L. Narici, Functional Analysis, Academic Press, New York, 1966.
[Bac] Bachman, G., Elements of Abstract Harmonic Analysis, Academic Press, New York,
1964.
[Bak) Baker, C. T. H., The Numerical Treatment of Integral Equations, Oxford University
Press, 1977.
[Ban] Banach, S., Theorie des Operations Lineaires, Hafner, New York, 1932.
[Barb] Barbeau, E. J., Mathematical Fallacies, Flaws, and Flimflam, Mathematical Associa-
tion of America, Washington, 2000.
[Bar] Barnes, E. R., "A variation on Karmarkar's algorithm for solving linear programming
problems," Mathematical Programming 36 (1986), 174-182.
[Bart] Bartle, R. G., The Elements of Real Analysis, 2nd ed., Wiley, New York, 1976.
[Bart2] Bartle, R. G., Elements of Integration Theory, Wiley, New York, 1966. Revised edition,
1995.
[Bea] Beauzamy, B., Introduction to Banach Spaces and their Geometry, North Holland, Am-
sterdam, 1985.
[Bee] Beckner, W., "Inequalities in Fourier Analysis," Ann. Math. 102 (1975),159-182.
[Berb] Berberian, S. K., Introduction to Hilbert Space, Chelsea Publishing Co., New York,
1976. Reprint by American Mathematical Society, Providence, Rl.
[Berb2] Berberian, S. K., Notes on Spectral Theory, Van Nostrand, New York, 1966.
[Berb3] Berberian, S. K., Fundamentals of Real Analysis, Springer-Verlag, New York, 1999.
[Berb4] Berberian, S. K., Measure and integration. Macmillan, New York, 1965.

429
430 References

[BerezJ Berezanski, Ju. M., Expansions in Eigenfunctions of Self-Adjoint Operators, Amer.


Math. Soc., Providence, RI, 1968.
[BesJ Besicovitch, A. S., Almost Periodic FUnctions, Dover, New York, 1954.
[BlJ Bliss, G. A., Calculus of Variations, Mathematical Association of America, 1925.
[BoJ Bolza, 0., Lectures on the Calculus of Variations, 1904. Chelsea Publishing Co., Reprint,
1973.
[BrJ Bracewell, R. N., The Fourier Transform and Its Applications, McGraw-Hill, New York,
1986.
[BraeJ Braess, D., Finite Elements, Cambridge University Press, Cambridge, 1997.
[BrezJ Brezinski, C., Biorthogonality and its Application to Numerical Analysis, Dekker, Basel,
1991.
[BrS] Brophy, J. F. and P. W. Smith, "Prototyping Karmarkar's algorithm using MATH-
PROTAN," Directions 5 (1988), 2-3. IMSL Corp. Houston.
[Cann1] Cannell, D. M., "George Green: An enigmatic mathematician," Amer. Math. Monthly
106 (1999), 137-151.
[Cann2J Cannell, D. M., George GTeen, Mathematician and Physicist, 1793-1841, SIAM,
Philadelphia, 2000.
[CarJ Caratheodory, C., Calculus of Variations and Partial Differential Equations of the First
Order, Holden-Day Publishers, 1965. Second edition, Chelsea, New York, 1982.
[CartJ Cartan, H., Cours de Calcul Differentiel, Hermann, Paris 1977.
[Ccq Chen, M. J., Z. Y. Chen, and G. R. Chen, Approximate Solutions of Operator Equations,
World Scientific, Singapore, 1997.
[ChJ Cheney, E.W., Introduction to Approximation Theory, McGraw-Hill, New York, 1966.
2nd ed., Chelsea Pub!. Co., New York, 1985. American Mathematical Society, 1998.
[CL] Cheney, E. W. and W. A. Light, A Course in Approximation Theory, Brooks/Cole,
Pacific Grove, CA, 1999.
[CMY] Chow, S. N., J. Mallet-Paret, J. A. Yorke, "Finding zeroes of maps: Homotopy methods
that are constructive with probability one," Math. Compo 32 (1978),887-899.
[Cia] Ciarlet, P., The Finite Element Method for Elliptic Problems, North Holland, New York,
1978.
[Coc] Cochran, J. E., Applied Mathematics, Wadsworth, Belmont, CA, 1982.
[Coh] Cohen, P. J., "The independence of the continuum hypothesis," Proc. Nat. Acad. Sci.,
U.S.A. 50 (1963), 1143-1148 and 51 (1964), 105-110.
[Col] Collatz, 1., FUnctional Analysis and Numerical Mathematics, Springer-Verlag, New York,
1966.
[Con] Constantinescu, F., Distributions and TheiT Applications in Physics, Pergamon Press,
Oxford, Eng., 1980.
[Cor] Corduneanu, C., Integral Equations and Applications, Cambridge University Press, 1991.
[CS] Corwin, L., and R. Szczarla, Calculus in Banach Spaces, Dekker, New York, 1979.
[Cou] Courant, R., Calculus of Variations, Lecture Notes from the Courant Institute, 1946.
[CH] Courant, R. and D. Hilbert, Methods of Mathematical Physics, Vol. I, II, Interscience,
New York, 1953, 1962.
[CP] Curtain, R.F. and A.J. Pritchard, FUnctional Analysis in Modern Applied Mathematics,
Academic Press, New York, 1977.
[Dav] Davis, H. T., Introduction to Nonlinear Differential and Integral Equations, Dover Pub-
lications, New York, 1962.
[Davp] Davis, P. J., Interpolation and Approximation, Blaisdell, New York, 1963. Reprint,
Dover Publications, New York.
[Day] Day, M. M., Normed Linear Spaces, Academic Press, New York, 1962. Reprint, Springer-
Verlag, Berlin.
[DM] Debnach, 1. and P. Mikusinski, Introduction to Hilbert Spaces with Applications, Aca-
demic Press, New York, 1990.
[Deb] Debnath, L., Integral Transforms and Their Applications, CRC Press, Boca Raton, FL.,
1995.
References 431

[DMo] Delves, L. M. and J. L. Mohamed, Computational Methods for Integral Equations,


Cambridge University Press, 1985.
[Det] Dettman, J. W., Mathematical Methods in Physics and Engineering, McGraw-Hill, New
York, 1962. Reprint, Dover Publications, New York, 1988.
[DiS] Diaconis, P. and M. Shahshahani, "On nonlinear functions of linear combinations," SIAM
J. Sci. Statis. Comput. 5 (1984), 175-19l.
[Dies] Diestel, J., Sequences and Series in Banach Spaces, Springer-Verlag, New York, 1984.
[Dieu] Dieudonne, J., Foundations of Modem Analysis, Academic Press, New York, 1960.
[Dono] Donoghue, W. F., Distributions and Fourier Transforms, Academic Press, New York,
1969. Vol. 32 in the series Pure and Applied Mathematics.
[Dug] Dugundji, J., Topology, Allyn and Bacon, Boston, 1965.
[DS] Dunford, N. and J. T. Schwartz, Linear Operators, Part I, General Theory, Interscience,
New York, 1958.
[DM] Dym, H. and H. P. McKean, Fourier Series and Integrals, Academic Press, New York,
1972.
[Dzy] Dzyadyk, V. K., Approximation Methods for Solutions of Differential and Integral Equa-
tions, VSP Publishers, Zeist, The Netherlands, 1995.
[Eav] Eaves, B. C., "A short course in solving equations with PL homotopies," SIAM-AMS
Proceedings 9 (1976), 73-144.
[Edw] Edwards, R. E., Functional Analysis, Holt Rinehart and Winston, New York, 1965.
[Egg] Eggleston, H. G., Convexity, Cambridge University Press, 1958.
[Els] Elsgolc, L. E., Calculus of Variations, Pergamon, London, 1961.
[Ev] Evans, L. C., Partial Differential Equations, Amer. Math. Soc., Providence, RI, 1998.
[Ewi] Ewing, G. W., Calculus of Variations with Applications, W.W. Norton, 1969. Reprint,
Dover Publications, 1985.
[Fef] Feferman, S., "Does mathematics need new axioms?," Amer. Math. Monthly 106 (1999),
99-111.
[Fern] Fernandez, L. A., "On the limits of the Lagrange multiplier rule," SIAM Rev. 39 (1997),
292-297.
[Feyn] Feynman, R. P., Surely You're Joking, Mr. Feynman, W. W. Norton, New York, 1985.
[Fic] Ficken, F. A., "The continuation method for functional equations," Comm. Pure Appl.
Math. 4 (1951), 435-456.
[Fla] Flanders, H., Differential Forms, Academic Press, New York, 1963.
[Fol] Folland, G. B., Fourier Analysis and Its Applications, Wadsworth-Brooks-Cole, Pacific
Grove, Calif., 1992.
[Fox] Fox, C., An Introduction to the Calculus of Variations, Oxford University Press, 1963.
Reprint, Dover Publications, 1987.
[Fri] Friedlander, F. G., Introduction to the Theory of Distributions, Cambridge University
Press, 1982.
[Friel] Friedman, A., Generalized Functions and Partial Differential Equations, Prentice-Hall,
Englewood Cliffs, N.J., 1963.
[Frie2] Friedman, A., Foundations of Modem Analysis, Holt Rinehart and Winston, New York
1970. Reprint, Dover Publications, New York, 1982.
[Fried] Friedman, B., Principles and Techniques of Applied Mathematics, Wiley, New York,
1956. Reprint, Dover Publications.
[FM] Furi, M. and M. Martelli, "On the mean-value theorem, inequality, and inclusion," Amer.
Math. Monthly 98 (1991), 840-846.
[Gar] Garabedian, P. R., Partial Differential Equations, Wiley, New York, 1964. Reprint,
Chelsea Publications, New York.
[GG] Garcia, C. B. and F. J. Gould, "Relations between several path-following algorithms and
local and global Newton methods," SIAM Rev. 22 (1980), 263-274.
[GZ1] Garcia, C. B. and W. l. Zangwill, "An approach to homotopy and degree theory," Math.
Oper. Res. 4 (1979), 39(}~405.
432 References

[GZ2] Garcia, C. B. and W. I. Zangwill, "Finding all solutions to polynomial systems and
other systems of equations," Math. Programming 16 (1979), 159-176.
[GZ3] Garcia, C. B. and W. I. Zangwill, Pathways to Solutions, Fixed Points and Equilibria,
Prentice Hall, Englewood Cliffs, N.J., 1981.
[GF] Gelfand, I. M. and S. V. Fomin, Calculu.. of Variations, Prentice-Hall, Englewood Cliffs,
N.J., 1963.
[GV] Gelfand, I. M. and N. Ya. Vilenkin, Generalized Functions, 4 volumes, Academic Press,
1964. (Vol. 1 is by Gelfand and G.E. Shilov.)
[Go] Godel, K., The Consistency of t te Axiom of Choice and of the Generalized Continuum
Hypothesis with the Axioms of Set Theory, Princeton University Press, 1940.
[GP] Goffman, C. and G. Pedrick, First Course in Functional Analysis, Chelsea Publishing
Co., New York. Reprint, American Mathematical Society.
[Gol] Goldberg, R. R., Fourier Transforms, Cambridge University Press, 1970.
[Gold] Goldstein, A. A., Constructive Real Analysis, Harper and Row, New York, 1967.
[Gr] Graves, L. M., The Theory of Functions of Real Variables, McGraw-Hill, New York, 1946.
[Green] Green, G., Mathematical Papers of George Green, edited by N.M. Ferrers, Amer.
Math. Soc., Providence, RI, 1970.
[Gre] Greenberg, M. D., Foundations of Applied Mathematics, Prentice Hall, Englewood Cliffs,
NJ, 1978.
[Gri] Griffel, D. H., Applied FUnctional Analysis, John Wiley, New York, 1981.
[Gro] Groetsch, C. W., Elements of Applicable F'lmctional Analysis, Marcel Dekker, New York,
1980.
[Hall] Halmos, P. R., "What does the spectral theorem say?," Amer. Math. Monthly 70
(1963),241-247.
[HaI2] Halmos, P. R., A Hilbert Space Problem Book, van Nostrand, Princeton, 1967.
[HaI3] Halmos, P. R., Introduction to Hilbert Space, Chelsea Publishing Co., New York, 1951.
[HaI4] Halmos, P. R., Measure Theory, Van Nostrand, New York, 1950. Reprint, Springer-
Verlag, New York.
[Hel] Helson, H., Harmonic Analysis, Addison-Wesley, London, 1983.
[Hen] Henrici, P. Discrete Variable Methods in Ordinary Differential Equations, Wiley, New
York, 1962,
[Hesl] Hestenes, M. R., Calculus of Variations and Optimal Control Theory, Wiley, New York,
1965.
[Hes2] Hestenes, M. R., "Elements of the Calculus of Variations" pp. 59-91 in Modem Math-
ematics for the Engineer, E. F. Beckenback, ed., McGraw-Hill, New York, 1956.
[HS] Hewitt, E. and K. Stromberg, Real and Abstract Analysis, Springer-Verlag, New York,
1965.
[HP] Hille, E. and R. S. Phillips, FUnctional Analysis and Semigroups, Amer. Math. Soc.,
Providence, RI 1957.
[HS] Hirsch, M. W. and S. Smale, "On algorithms for solving f(x) = 0," Comm. Pure Appl.
Math. 32 (1979), 281-312.
[Hoi] Holmes, R. B., Geometric Functional Analysis and its Applications, Springer-Verlag,
New York, 1975.
[Ho] Hormander, 1., The Analysis of Linear Partial Differential Operators I, Springer-Verlag,
Berlin, 1983.
[Horv] Horvath, J., Topological Vector Spaces and Distributions, Addison-Wesley, London,
1966.
[Hu] Huet, D., Distributions and Sobolev Spaces, Lecture Note #6, Department of Mathemat-
ics, University of Maryland, 1970.
[Hur] Hurley, J. F., Multivariate Calculus, Saunders, Philadelphia, 1981.
[In] Ince, E. L., Ordinary Differential Equations, Longmans Green, London, 1926. Reprint,
Dover Publications, New York, 1948.
[IK] Isaacson, E. and H. B. Keller, Analysis of Numerical Methods, Wiley, New York, 1966.
[Jal] James, R., "Weak compactness and reflexivity," Israel J. Math. 2 (1964), 101-119.
References 433

[Ja2] James, R., "A non-reflexive Banach space isometric with its second conjugate space,"
Proc. Nat. Acad. Sci. U.S.A. 37 (1951),174-177.
[Jam] Jameson, G. J. 0., Topology and Normed Spaces, Chapman and Hall, London, 1974.
[JKP] Jaworowski, J., W. A. Kirk, and S. Park, Antipodal Points and Fixed Points, Lecture
Note Series, Number 28, Seoul National University, Seoul 1995.
[Jon] Jones, D. S., The Theory of Generalised Functions, McGraw-Hill, 1966. 2nd. Edition,
Cambridge University Press, 1982.
[Jo] Jones, F., Lebesgue Integration on Euclidean Space, Jones and Bartlet, Boston, 1993.
[JLJ] Jost, J., and X. Li-Jost, Calculus of Variations, Cambridge University Press, 1999.
[KA] Kantorovich, L. V. and G. P. Akilov, FUnctional Analysis in Normed Spaces, Pergamon
Press, London, 1964.
[KK] Kantorovich, L.V. and V.l Krylov, Approximate Methods of Higher Mathematics, Inter-
science, New York, 1964.
[Kar] Karmarkar, N., "A new polynomial-time algorithm for linear programming," Combina-
torica 4 (1984), 373-395.
[Kat] Katznelson, Y., An Introduction to Harmonic Analysis, Wiley, New York, 1968. Reprint,
Dover Publications, New York.
[Kee] Keener, J. P., Principles of Applied Mathematics, Addison-Wesley, New York, 1988.
[Kel] Kelley, J. L., General Topology, D. Van Nostrand, New York, 1955. Reprint, Springer-
Verlag, New York.
[KN] Kelley, J. L., 1. Namioka, et a!., Linear Topological Spaces, D. Van Nostrand, New York,
1963.
[KS] Kelley, J. L. and T. P. Srinivasen, Measure and Integral, Springer-Verlag, New York,
1988.
[Kello] Kellogg, O. D., Foundations of Potential Theory, Dover, New York.
[Ken] Keener, J. P., Principles of Applied Mathematics, Perseus Books Group, Boulder, CO,
1999.
[KC] Kincaid, D. and Cheney, W., Numerical Analysis, 3nd ed., Brooks/Cole, Pacific Grove,
CA., 2001.
[KF] Kolmogorov, A. N. and S. V. Fomin, Introductory Real Analysis, Dover Publications,
New York, 1975.
[Ko] Korner, T. W., Fourier Analysis, Cambridge University Press, 1988.
[Kras] Krasnoselski, M.A., Topological Methods in the Theory of Nonlinear Integral Equations,
Pergamon, New York, 1964.
[Kr] Kress, R. Linear Integral Equations, Springer-Verlag, Berlin, 1989. 2nd edition, 1999.
[Kre] Kreysig, E., Introductory Functional AnalYSis with Applications, Wiley, New York, 1978.
[KRN] Kuratowski, K. and C. Ryll-Nardzewski, "A general theorem on selectors", Bull. Acad.
Polonaise Sciences, Serie des Sciences Math. Astr. Phys. 13 (1965), 397-403.
[Lane] Lanczos, C., Applied Mathematics, Dover Publications, New York, 1988.
[Lanl] Lang, S., Analysis II, Addison-Wesley, London, 1969.
[Lan2] Lang, S., Introduction to Differentiable Manifolds, Interscience, New York, 1962.
[Las] Lass, H., Vector and Tensor Analysis, McGraw Hill, New York, 1950.
[Lax] Lax, P. D., "Change of variables in multiple integrals," Amer. Math. Monthly 106
(1999),497-501.
[LSU] Lebedev, N. N., 1. P. Skalskaya, and Y. S. Uflyand, Worked Problems in Applied Math-
ematics, Reprint, Dover Publications, New York, 1979.
[Leis] Leis, R., Initial Boundary Value Problems in Mathematical Physics, Wiley, New York,
1986.
[Li] Li, T. Y. "Solving polynomial systems," Math. Intelligencer 9 (1987), 33-39.
[LL] Lieb, E. H. and M. Loss, Analysis, Amer. Math. Soc., Providence, 1997.
[LT] Lindenstrauss, J. and L. Tzafriri, Classical Banach Spaces I, Springer-Verlag, Berlin.
[LM] Lions, J.L. and E. Magenes, Nonhomogeneous Boundary Value Problems and Applica-
tions, Springer-Verlag, New York, 1972.
434 References

[Lo] Logan, J. D., Applied Mathematics: A Contemporary Approach, Wiley, New York, 1987.
[Lov] Lovett, W. V., Linear Integral Equations, McGraw-Hill, New York, 1924. Reprint, Dover
Publications, New York, 1950.
[Loo] Loomis, L. H., An Introduction to Abstract Harmonic Analysis, Van Nostrand, New
York, 1953.
[Lue1] Luenberger, D. G., Introduction to Linear and Nonlinear Programming, Addison-
vVesley, London, 1965.
[Lue2] Luenberger, D. G., Optimization by Vector Space Methods, Wiley, New York, 1969.
[MT] Marsden, J. E. and A. J. Tromba, Vector Calculus (2nd ed.), W.H. Freeman, San fran-
cisco, 198!.
[Mar] Martin, J. B., Plasticity: Fundamentals and General Results, MIT Press, Cambridge,
MA,1975.
[Mas] Mason, J., Methods of Functional Analysis for Applications in Solid Mechanics, Elsevier,
Amsterdam, 1985.
[Maz] Mazja, V. G., Sobolev Spaces, Springer-Verlag, Berlin, 1985.
[McK] McKinsey, J. C. C., Introduction to the Theory of Games, McGraw-Hill, New York,
1952.
[Mey] Meyer, G. H., "On solving nonlinear equations with a one-parameter operator embed-
ding," SIAM J. Numer. Analysis 5 (1968), 739-752.
[Michl] Michael, E., "Continuous Selections," Ann. Math. 63 (1956), 361-382.
[Mich2] Michael, E., "Selected Selection Theorems," Amer. Math. Monthly 63 (1956), 233-
238.
[Mil] Milne, W. E., Numerical Solution of Differential Equations, Dover, New York.
[Moo] Moore, R. E., Computational FUnctional Analysis, Wiley, New York, 1985.
[Morl] Morgan, A. "A homotopy for solving polynomial systems," Applied Math. and Compo
18 (1986), 87-92.
[Mor2] Morgan, A. Solving Polynomial Systems Using Continuation for Engineering and Sci-
entific Problems, Prentice Hall, Englewood Cliffs, N.J., 1987.
[Morr] Morris, P., Introduction to Game Theory, Springer-Verlag, New York, 1994.
[NaSn] Naylor, A.W. and G.R. Snell, Linear Operator Theory in Engineering and Science,
Springer-Verlag, New York, 1982.
[NazI] Nazareth, J. L., "Homotopy techniques in linear programming," Algorithmica 1 (1986),
529-535.
[Naz2] Nazareth, J. L., "The implementation of linear programming algorithms based on ho-
motopies," Algorithmica 15 (1996),332-350.
[Nel] Nelson, E., Topics in Dynamics, Vol 1: Flows, Princeton University Press (1969).
[NSS] Nickerson, H. K., D. C. Spencer, and N. E. Steenrod, Advanced Calculus, van Nostrand,
New York, 1959.
rOD] Oden, J. T. and 1. F. Demkowicz, Applied FUnctional Analysis, CRC Press, New York,
1996.
[OdR] Oden, J. T. and J. N. Reddy, An Introduction to the Mathematical Theory of Finite
Elements, Wiley, New York, 1976.
[01] Olver, F. W. J., Asymptotics and Special Functions, Academic Press, New York, 1974.
[OR] Ortega, J. M. and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several
Variables, Academic Press, New York, 1970.
[Par] Park, Sehie, "Eighty years of the Brouwer fixed point theorem," in Antipodal Points and
Fixed Points by J. Jaworowski, W. A. Kirk, and S. Park. Lecture Notes Series, No. 28, Seoul
National University, 1995. 55-97.
[Part] Parthasarathy, T., Selection Theorems and Their Applications, Lecture Notes in Math.
No. 263, Springer-Verlag, New York, 1972.
[Ped] Pedersen, M., FUnctional Analysis in Applied Mathematics and Engineering, CRC, Boca
Raton, FL, 1999.
[Pet] Petrovskii, I. G., Lectures on the Theory of Integral Equations, Graylock Press, Rochester,
NY, 1957.
References 435

[PM] Polyanin, A. and A. V. Manzhirov, Handbook of Integral Equations, CRC Press, Boca
Raton, FL, 1998.
[PBGM] Pontryagin, L. S., V. G. Boltyanskii, R. V. Gamkrelidze, E. F. Mishchenko, The
Mathematical Theory of Optimal Processes, Interscience Pub!., New York, 1962.
[Pry] Pryce, J. D., Numerical Solution of Sturm-Liouville Problems, Oxford University Press,
1993.
[Red] Reddy, J.N., Applied FUnctional Analysis and Variational Methods in Engineering,
McGraw-Hill, New York, 1986.
[RS] Reed, M. and B. Simon, Methods of Modern Mathematical Physics, Vo!' I, Academic
Press, New York, 1980.
[Rh1] Rheinboldt, W. C., Numerical Analysis of Parameterized Nonlinear Equations, Wiley,
New York, 1986.
[Rh2] Rheinboldt, W. C., "Solution fields of nonlinear equations and continuation methods,"
SIAM J. Numer. Analysis 17 (1980), 221-237.
[Ri] Richtmyer, R. D., Principles of Advanced Mathematical Physics, 2 volumes, Springer-
Verlag, New York, 1978.
[RN] Riesz, F. and B. Sz.-Nagy, FUnctional Analysis, Frederick Ungar, 1955. Reprint, Dover
Publications, New York, 1991.
[Rie] Riesz, T., Perturbation Theory for Linear Operators, Springer-Verlag, New York, 1966.
[Ro] Roach, G. F., Green's FUnctions, 2nd ed., Cambridge University Press, 1982.
[Ros] Rosenbloom, P. C., "The method of steepest descent" in Numerical Analysis, J. H.
Curtiss, ed., Symposia in Applied Math., vo!.VI, 1956, 127-176.
[Roy] Royden, H. 1., Real Analysis, Macmillan, New York, 1968.
[Rub] Rubin, H. and J. E. Rubin, Equivalents of the Axiom of Choice, North Holland Pub!.
Co., Amsterdam, 1985.
[Ru1] Rudin, W. FUnctional Analysis, McGraw-Hill, New York, 1973.
[Ru2] Rudin, W., Fourier Analysis on Groups, Interscience, New York, 1963.
[Ru3] Rudin, W., Real and Complex Analysis, 2nd ed., McGraw-Hill, New York, 1974.
[Sa] Saaty, T. L., Modern Nonlinear Equations, McGraw-Hill, New York, 1967. Reprint, Dover
Publications, New York, 1981.
[Sag] Sagan, H., Introduction to the Calculus of Variations, McGraw-Hill Book Co., 1969.
Reprint, Dover Publications, 1992.
[San] Sansone, G. Orthogonal Functions, Interscience, New York, 1959.
[Schul Schur, I., "Uber lineare Transformationen in der Theorie der unendlichen Reihen," J.
Reine Angew. Math. 151 (1920), 79-11l.
[Schj] Schwartz, J. T., Non-Linear FUnctional Analysis, Gordon and Breach, New York, 1969.
[Schl] Schwartz, L., Mathematics for the Physical Sciences, Addison-Wesley, London, 1966.
[SchI2] Schwartz, 1., Theorie des Distributions, I, II, Hermann et Cie, Paris, 1951.
[SemI Semadeni, Z., Schauder Bases in Banach Spaces of Continuous FUnctions, Lecture
Notes in Mathematics, vo!' 918, Springer-Verlag, New York, 1982.
[Sho] Showalter, R. E., Hilbert Space Methods for Partial Differential Equations, Pitman,
London, a977. (Available on-line from http://ejde.math.swt.edu/ /mono-toc.htm!.)
[SimI Simmons, G. F., Introduction to Topology and Modern Analysis, McGraw-Hill, 1963.
[Sing] Singer, I., Bases in Banach Spaces (2 volumes), Springer-Verlag, Berlin. 1970, 1981.
ISm] Smale, S., "Algorithms for solving equations," Proceedings of the International Congress
of Mathematicians, 1986.
[Sma] Smart, D. R., Fixed Point Theorems, Cambridge University Press, 1974.
[Smi] Smith, K. T., Primer of Modern Analysis, Bogden and Quigley, Belmont, CA, 1971.
Springer-Verlag, Berlin, 1983.
[So] Sobolev, S. 1., Applications of Functional Analysis in Mathematical Physics, Amer. Math.
Soc. Translations Series, 1963.
[Sta] Stakgold, I., Green's FUnctions and Boundary Value Problems, Wiley, New York, 1979.
436 References

[SW] Stein, E. M. and G. Weiss, Introduction to Fourier Analysis on Euclidean Spaces, Prince-
ton University Press, 1971.
[StWi] Stoer, J. and C. Witzgall, Convexity and Optimization in Finite Dimensions, Springer-
Verlag, New York, 1970.
[Str1] Strang, G., Linear Algebra and Its Applications, 3rd ed., Harcourt Brace Jovanov;ch,
San Diego, 1988.
[Str2] Strang, G., Introduction to Applied Mathematics, Wellesley-Cambridge, Wellesley, MA,
1986.
[Sz] Szego, G., Orthogonal Polynomials, American Mathematical Society Colloquium Publica-
tions, vo!' 23, 1959.
[Tay1] Taylor, A. E., Advanced Calculus, Ginn, New York, 1955.
[Tay2] Taylor, A. E., Introduction to Functional Analysis, Wiley, New York, 1958. Reprint,
Dover Publications.
[Tay3] Taylor, A. E. General Theory of Functions and Integration, Blaisdell, New York, 1965.
Reprint, Dover Publications, New York.
[Til] Titchmarsh, E. C., Introduction to the Theory of Fourier Integrals, Oxford University
Press, 1937. Reprinted by Chelsea Pub!. Co., New York, 1986.
[Ti2] Titchmarsh, E. C., The Theory of Functions, Oxford University Press, 1939.
[Tod] Todd, M. J., "An introduction to piecewise linear homotopy algorithms for solving
systems of equations" in Topics in Numerical Analysis, P. R. Turner, ed., Lecture Notes in
Mathematics, vo!' 965, Springer-Verlag, New York, 1982, 147-202.
[Tri] Tricomi, F. G., Integral Equations, Interscience, New York, 1957. Reprint, Dover Publi-
cations, New York, 1985.
[Vic] Vick, J. W., Homology Theory, Academic Press, New York, 1973.
[Wac] Wacker, H. G., ed., Continuation Methods, Academic Press, New York, 1978.
[Wag] Wagner, D. H., "Survey of measurable selection theorems: an update," in Measure
Theory Oberwolfach 1979, D. Kolzow, ed., Lecture Notes in Mathematics, vo!' 794, Springer-
Verlag, Berlin, 1980,
[Wall Walter, G. G., Wavelets and Other Orthogonal Systems with Applications, CRC Press,
Boca Raton, FL, 1994.
[Was] Wasserstrom, E., "Numerical solutions by the continuation method," SIAM Review 15
(1973), 89-119.
[Wat] Watson, L. T., "A globally convergent algorithm for computing fixed points of C 2 maps,"
Appl. Math. Comput. 5 (1979), 297-311.
[Wein] Weinstock, R., Calculus of Variations, with Applications to Physics and Engineering,
McGraw-Hill, New York, 1952. Reprint, Dover Publications 1974.
[West] Westfall, R. S., Never at Rest: A Biography of Isaac Newton, Cambridge University
Press, 1980.
[Whi] Whitehead, G. W., Homotopy Theory, MIT Press, Cambridge, Massachusetts, 1966.
[Wid1] Widder, D. V., Advanced Calculus, 2nd ed., Prentice-Hall, Englewood Cliffs, NJ, 1961.
Reprint, Dover Publications, New York.
[Wie] Wiener, N., The Fourier Integral and Certain of Its Applications, Cambridge University
Press, Cambridge, 1933. Reprint, Dover Publications, New York, 1958.
[Wilf] Wilf, H. S., Mathematics for the Physical Sciences, Dover Publications, New York, 1978.
[Will] Williamson, J. H., Lebesgue Integration, Holt, Rinehart and Winston, New York, 1962.
[Yo] Yosida, K., Functional Analysis, 4th ed., Springer-Verlag, Berlin, 1974.
[Youl] Young, L. C., Lectures on the Calculus of Variations and Optimal Control Theory,
Chelsea Publishing Co., 1980.
[Youn] Young, N., An Introduction to Hilbert Space, Oxford University Press, 1988.
[Ze] Zeidler, E., Applied Functional Analysis, Springer-Verlag, New York, 1995.
[Zem] Zemanian, A. H., Distribution Theory and Transform Analysis, Dover Publications,
New York, 1987.
[Zie] Ziemer, W. P., Weakly Differentiable Functions, Springer, New York, 1989.
[Zien] Zienkiewicz, O. C. and K. Morgan, Finite Elements and Approximation, Wiley, New
York, 1983.
[Zy] Zygmund, A., Trigonometric Series, 2nd ed., Cambridge University Press, 1959.
Index
A-orthogonal, 233 Bounded functional, 81
A-orthonormal, 233 Bounded map, 25
Absolute continuity, 413 Bounded set, 20, 368
Absolutely convergent, 14, 17 Brachistochrone Problem, 153,
Accumulation point, 12 157ff
Adjoint of an operator, 50, 82-83 Brouwer's Theorem, 333
Adjoint space, 34 Calculus of Variations, 152
Affine map, 120 Canonical embedding, 58
Alaoglu Theorem, 370 Cantor set. 46
Alexander's Theorem, 36.') Caratheodory's Theorem, 387
Algebra of sets, 421 Category argument, 45, 46, 47, 48
Almost everywhere, 396 Category, 41
Almost periodic functions, 76, 77 Catenary, 153, 156, 169
Almost uniformly, 397 Cauchy sequence, 10
Angle between vectors, 67 Cauchy-Riemann equations, 199
Annihilator, 36 Cauchy-Schwarz Inequality, 62
Approximate inverse, 188 Cesaro means, 13
Arzela-Ascoli Theorems, 347ff Chain Rule, 121
Autocorrelation, 293 Chain, 31
Axiom of Choice, 31 Characteristic function of a set,
Babuska-Lax-Milgram Theorem, 395
201 Characters, 288
Baire Theorem, 40 Chebyshev polynomials, 214
Banach limits, 37 Closed Graph Theorem, 49
Banach space, 10 Closed Range Theorem, 50
Banach-Alaoglu Theorem, 370 Closed graph, 47
Banach-Steinhaus Theorem, 41 Closed mapping, 47
Bartle-Graves Theorem, 342 Closed set, 16
Base for a topology, 362 Closure of a set, 16, 363
Base, 5 Cluster point, 12
Basin of attraction, 135 Collocation methods, 213ff
Bernoulli, J", 153 Compact operator, 85, 351
Bessel functions, 179 Compact set, 8
Bessel's Inequality, 72 Compactness in the weak
Best approximation, 192 topologies, 369
Bilinear functional, 201 Compactness, 19, 20, 364
Binomial Theorem, 262 Complete measure space, 387
Binomial coefficients, 261 Completeness, 9, 10, 15, 21
Biorthogonal System, 82, 192 Completion of a space, 15, 60
Bohl's Theorem, 339 Composition operator, 252
Borel Sigma-algebra, 384 Condensation of singularities, 46
Borel sets, 392 Conjugate direction methods, 232
Bounded above, 6 Conjugate gradient method, 235

437
438 Index

Conjugate space, 34 Dominated convergence theorem,


Conjugate-linear map, 90 406
Connectedness, 124 Dual space, 34
Continuation methods, 238 Eberlein-Smulyan Theorem, 59
Continuity, 15 Egorov's Theorem, 397
Contraction Mapping Theorem, Eigenvalue, 91
177,333 Eigenvector, 92
Contraction, 132, 176 Elliptic, 211
Convergence in measure, 408 Embedding theorems, 330ff
Convergence of distributions, 257 Equimeasurable rearrangement,
Convergence of test functions, 249 403
Convergence, 8, 11, 17 Equivalence, 4
Convex functional, 231 Equivalent norms, 23, 27,39
Convex hull, 12 Essential supremum, 409
Convex set, 6 Euclidean norm, 4
Convolution of distributions, 285 Euler Equation, 155ff, 164
Convolution, 269ff, 290ff Euler-Lagrange Equation, 155
Coset, 29 Extended real number system, 381
Cosine transform, 300, 321 Extension of a function, 31
Extremum problems, 145
Countable additivity, 386
Fatou's Lemma, 403
Count ably compact, 227
Feasible set, 243
Counting measure, 382
Fermat's Principle, 162, 164
Cycloid, 153, 158
Finite dimensional, 5
Degenerate kernel, 176, 357
Fixed point of Fourier transform,
Dense set, 14, 28, 36
301
Derivative of a distribution, 253
Fixed-Point Theorems, 140, 333
Descent methods, 225
Formal adjoint, 279, 280
Diaconis-Shahshahani Theorem, Fourier coefficients, 72
361 Fourier projections, 42
Diagonal dominance, 172ff Fourier series, 42, 167
Diameter of a set, 185 Fourier transform table, 292
Differentiable, 115 Fourier transform, 24, 287ff
Differential operator, 24, 273 Frechet derivative, 115
Dini's Theorem, 350 Frechet-Kolmogorov Theorem,
Dirac distribution, 250, 256, 260, 350
268, 283 Fredholm Alternative, 351ff
Direct sum, 80, 143 Fredholm integral equation, 175,
Directed set, 363 178, 190
Directional derivative, 227 Fredholm theory, 356
Dirichlet Problem, 167, 198 Fubini Theorems, 424, 426, 427
Discrete space, 46 Fundamental solution of an
Discrete topology, 362 operator, 273
Discretization, 170 Fundamental set, 36
Distance function, 9, 19, 23, 34, G8 set, 46
64 G6del's Theorem, 30
Distributions, 246, 249 Gateaux derivative, 120, 228
Dominate, 32 Galerkin method, 198
Index 439

Game theory, 345 Inner measure, 390


Gamma function, 293 Inner product, 61
Gaussian elimination, 172 Integrable function, 405
Gaussian function, 318 Integral equations, 131, 141, 357
Gaussian quadrature, 223 Integral operator, 24
Generalized Cauchy-Schwarz Integration, 399ff
Inequality, 84 Interior Mapping Theorem, 48
Generalized function, 246 Interior of a set, 363
Generalized sequence, 364 Invariant measure, 385, 392
Geodesic, 13, 164ff Inverse Fourier transform, 30lff
Geometrical optics, 162 Inverse Function Theorems, 139,
Goldschmidt solution, 157 140
Gradient, 117 Invertible, 28
Gram matrix, 197 Isolated point, 47
Gram-Schmidt process, 75 Isometric, 35
Greatest lower bound, 6 Isoperimetric Problem, 159, 161
Green's Identity, 277 Iteration, 176
Green's Theorem, 161, 200, 203, Iterative refinement, 187, 188
205,210 Jacobian, 118
Green's functions, 107ff, 215 James' Theorem, 60
Holder Inequality, 55, 409 Jordan decomposition, 417
Hahn decomposition, 420 Kantorovich Theorem, 127, 130
Hahn-Banach Theorem, 32 Kernel, 26
Half-space, 38 Kharshiladze-Lozinski Theorem,
Hamel base, 32 377
Hammerstein Equation, 225 K uratowski-Ryll-N ardzewski
Harmonic function, 199 Theorem, 342
Harmonic series, 18 Lagrange interpolation, 193
Hausd.,orff space, 362 Lagrange multipliers, 145, 148,
Hausdorff-Young Theorem, 309 152, 159
Heat equation, 318ff Laplace transform, 24, 287
Heaviside distribution, 250, 254, Laplacian, 198, 275, 297
256,257, 283 Laurent's Theorem, 315
Heine-Borel Theorem, 19 Least upper bound, 6
Helmholtz equation, 320 Lebesgue Decomposition
Hermite functions, 309 Theorem, 415
Hermitian matrices, 104 Lebesgue measurable set, 389, 391
Hermitian operator, 83 Lebesgue measure, 391
Hilbert cube, 351 Lebesgue outer measure, 382
Hilbert space, 61, 63 Lebesgue space, 4
Hilbelt-Schmidt operator, 83, 96, Lebesgue-Stieltjes outer measure,
98 382
Homotopy, 237ff Legendre polynomials, 76, 77, 377
Hyperplane, 38 Leibniz formula, 265
Idempotent operator, 189, 191 Limit in the mean, 308
Implicit Function Theorems, 135ff Linear functional, 24
Infimum, 6 Linear independence, 4
Initial-value problem, 179ff Linear inequalities, 344
440 Index

Linear mapping, 24 Neighborhood, 17


Linear operator, 24 Net, 364
Linear programming, 243 Neumann Theorem, 28, 133, 186
Linear space, 2 Neural networks, 315
Linear topological spaces, 367ff Newton's Method, 125
Linear transformation, 24 Newton, I., 154
Lion, 154 Non-differentiable function, 13
Lipschitz condition, 120, 178, 180 Non-expansive, 19, 185
Local integrability, 251 Norm, 3
Locally-convex space, 370 Normal equations, 200
Locally-finite covering, 345 Normal operator, 100
Lower semicontinuity, 22, 226, 340 Nowhere dense, 41
Lusin's Theorem, 408 Null space, 26
Malgrange-Ehrenpreis Theorem, o-notation, 119
273 Objective function, 243
Mathematica, 126, 205, 230 Open set, 17
Maximal element, 32 Order of a distribution, 253
Mazur's Theorem, 372 Order of a multi-index, 247
Mean-Value Theorem, 122, 123 Ordered vector space, 150, 152
Measurable functions, 394ff Orthogonal complement, 65
Measurable rectangle, 421 Orthogonal projection, 72, 74, 193
Measurable sets, 384 Orthogonal set, 64, 70
Measurable space, 384 Orthonormal base, 73
Measure space, 386 Orthonormal set, 71
Measure, 386 Outer measure, 382
Metric space, 8, 13 Paracompactness, 345
Meyers-Serrin Theorem, 330 Parallelogram law, 61, 62
Michael Selection Theorem, 341 Partial derivative, 117, 118, 144
Min-Max Theorem, 346 Partially ordered set, 31, 363
Minimizing sequence, 226 Partition of unity, 282
Minimum deviation, 192 Pascal's triangle, 268
Minkowski Inequality, 55 Picard iteration, 181
Minkowski functional, 334, 343 Plancherel Theorem, 305ff
Minkowski's Inequality, 410 Point-evaluation functional, 29,
Mollifier, 249 193,214
Monomial, 6, 261 Pointwise convergence, 11
Monotone class, 422 Poisson summation formula, 298
Monotone convergence theorem, Poisson's Equation, 203, 210
401 Polar set, 370
Monotone norm, 14 Polygonal path, 13
Moore's Theorem, 373, 375 Polynomial, 261
Multi-index, 246 Positive cone, 150, 152
Multinomial Theorem, 263 Positive sets, 418
Multiplication operator, 252, 268 Pre-Hilbert space, 61
Multivariate interpolation, 313 Product measures, 420ff, 425
Mutually singular, 415 Product spaces, 365
Natural embedding, 58 Projection methods, 79, 191, 194
Neighborhood base, 367 Pseudo-norm, 370
Index 441

Pythagorean Law, 62, 70 Sobolev spaces, 325


Quadrature, 175, 219, 222 Sobolev-Hilbert spaces, 332
Radial projection, 19 Span, 5
Radiative transfer, 186 Spectral Theorem, 93
Radon-Nikodym Theorem, 413ff Stable sequence, 80
Rank of an operator, 197 Steepest Descent, 124, 228
Rapidly decreasing function, 294 Step function, 406
Rayleigh quotient, 149 Stone-Weierstrass Theorem, 359
Rayleigh-Ritz Method, 166ff, Strictly positive definite functions,
205ff 315
Reflexive spaces, 58 Sturm-Liouville problems, 105ff,
Regular distribution, 252 203
Regular outer measure, 389 Subbase for a topology, 363
Relative topology, 363 Subsequence, 8
Rellich-Kondrachov Theorem, 331 Sup norm, 3
Residual set, 47 Support of a distribution, 282
Residual vector, 188, 229 Support of a function, 247
Residue calculus, 315ff Supremum, 6
Riemann Sum, 218 Surjective Mapping Theorem, 139,
Riemann integral, 43 142
Riemann's Theorem, 18 Szego's Theorem, 44
Riesz Representation Theorem, 81 Tangent, 119
Riesz's Lemma, 22 Tauber Theorem, 38
Riesz-Fischer Theorem, 63, 411 Tempered distributions, 321ff
Rothe's Theorem, 338 Test function, 247
Saddle point, 347 Topological spaces, 17, 361
Schauder base, 38, 204 Totally ordered set, 31
Schauder-Tychonoff Theorem, 334 Translation of a distribution, 270
Schur's Lemma, 56 Translation operator, 38, 252, 328
Schwartz space, 294 Tridiagonal, 172
Selection theorems, 339ff Two-point boundary value
Self-adjoint operator, 83 problem, 171, 208ff
Seminorm, 370 Tychonoff Theorem, 366
Separable kernel, 176, 357 Uncertainty Principle, 310
Separable space, 75 Uniform Boundedness Theorem,
Separation theorem, 151, 342, 343 42
Sigma-Algebra, 384 Uniform continuity, 16
Sigma-finite, 414 Uniform convergence, 11
Signed measures, 417ff Unit ball, 7
Similarity, 103 Unit cell, 7
Simple function, 397 Unitary matrices, 104
Simplex, 345 Unitary operator, 101
Simpson's Rule, 223 Upper bound, 6, 31
Sinc-function, 289 Upper semicontinuity, 208
Sine transform, 321 Variance of a function, 310
Singular-Value decomposition, 98 Vector space, 2
Skew-Hermitian operator, 101 Volterra integral equation, 141,
Snell's Law, 163 182, 183, 185, 189
442 Index

Weak Cauchy property, 88 Weierstrass non differentiable


Weak convergence in Hilbert function, 259ff, 374
space, 87 Weierstrass, 11
Weak convergence, 53 Wronskian, 106
Weak topology, 368 Young's Theorem, 332
Weak* topology, 368 Zarantonello, 183
Weakly complete, 57 Zermelo-F'raenkel Axioms, 31
Weierstrass M- Test, 373 Zorn's Lemma, 32
Symbols

t* Point-evaluation functional, 29, 43


L* The adjoint of a mapping L, 50
dim Dimension, 5
JR The real number field
JR* The extended real number system, 381
IC The complex number field, 3
Ilk (JR n ) Space of all polynomials of degree at most k in n variables, 263
II The space of all polynomials in one variable
Ilxll oo Sup-norm on JR n , 3
IIxl11 e1-norm on JRn , 3
JRn n-Dimensional Euclidean space, 3-4
bij Kronecker delta (1 if i = j and 0 otherwise), 71
C(S) Space of continuous functions on a domain S, 3, 14, 348ff
COO(JR) Space of all infinitely differentiable functions on JR, 247
LIU Restriction of a map L to a set U, 59
fog The composition of functions, f with g, 27
eoo or eoo Space of bounded functions on N with sup-norm, 4, 12
e1 Space of summable functions on N, 14, 34
ep Space of p-th power summable sequences, 54
Co Space of sequences converging to zero, with sup-norm, 12
LP Space of p-th power integrable functions, 409
R(L) Range of operator L, 51, 191
L(X, Y) Space of bounded linear maps from X to Y, 25, 27
xy Inner product in JRn, 263, 288
(x, y) Inner product, 61
# Number of elements in a set, 39
z Set of all integers
z+ Set of all nonnegative integers
z:t Set of n-tuples of nonnegative integers.
N The set of natural numbers {1, 2, ... }
dist Distance from a point to a set, 9, 19, 23
Ilk Space of polynomials of degree at most k in one variable.
Surjective mapping, 49, 193
Special convergence for test functions, 249
Implication symbol
Convolution, 269
*
443
444 Symbol Index

X Characteristic function of a set, 395


X* Conjugate Banach space, 34
..L Orthogonality symbol, 64
..L Annihilator symbol, 52, 65
r,1 Fourier transform of f, 288
r--+ Mapping symbol
----' Symbol for weak convergence, 53
III A III Norm of a quadratic form, 84
(;;,) Binomial coefficient, 261
1) Space of test functions, 247
1)' Space of distributions, 249
S the Schwartz space, 294
0 Empty set, 17
\7 2 Laplacian, 198
:3 "There exists"
V "For all"
n Intersection of a family of sets
U Union of a family of sets
"- Set difference, 22
& Logical AND
Wk,P(f?) Sobolev space, 326
Vk,P(fl) Sobolev space, 329
C;:(f?) 331
Hk(f?) 332
Graduate Texts in Mathematics
(continued/rom page ii)

66 WATERHOUSE. Introduction to Affine 100 BERG/CHRISTENSEN/REsSEL. Hannonic


Group Schemes. Analysis on Semigroups: Theory of
67 SERRE. Local Fields. Positive Definite and Related 'Functions.
68 WEIDMANN. Linear Operators in Hilbert 101 EDWARDS. Galois Theory.
Spaces. 102 VARADARAJAN. Lie Groups, Lie Algebras
69 LANG. Cyclotomic Fields II. and Their Representations.
70 MASSEY. Singular Homology Theory. 103 LANG. Complex Analysis. 3rd ed.
71 FARKAS/KRA. Riemann Surfaces. 2nd ed. 104 DUBROVIN/FoMENKOlNovIKOV. Modern
72 STILLWELL. Classical Topology and Geometry-Methods and Applications.
Combinatorial Group Theory. 2nd ed. Part II.
73 HUNGERFORD. Algebra. lOS LANG. SL2(R).
74 DAVENPORT. Multiplicative Number 106 SILVERMAN. The Arithmetic of Elliptic
Theory. 3rd ed. Curves.
75 HOCHSCHILD. Basic Theory of Algebraic 107 OLVER. Applications of Lie Groups to
Groups and Lie Algebras. Differential Equations. 2nd ed.
76 !ITAKA. Algebraic Geometry. 108 RANGE. Holomorphic Functions and
77 HECKE. Lectures on the Theory of Integral Representations in Several
Algebraic Numbers. Complex Variables.
78 BURRIS/SANKAPPANAVAR. A Course in 109 LEHTO. Univalent Functions and
Universal Algebra. Teichmiiller Spaces.
79 WALTERS. An Introduction to Ergodic 110 LANG. Algebraic Number Theory.
Theory. III HUSEMOLLER. Elliptic Curves.
80 ROBINSON. A Course in the Theory of 112 LANG. Elliptic Functions.
Groups. 2nd ed. 113 KARATZAS/SHREVE. Brownian Motion and
81 FORSTER. Lectures on Riemann Surfaces. Stochastic Calculus. 2nd ed.
82 BOTT/Tu. Differential Fonns in Algebraic 114 KOBLITZ. A Course in Number Theory and
Topology. Cryptography. 2nd ed.
83 WASHINGTON. Introduction to Cyclotomic 115 BERGERIGOSHAUX. Differential Geometry:
Fields. 2nd ed. Manifolds, Curves, and Surfaces.
84 IRELAND/RoSEN. A Classical Introduction 116 KELLEy/SRINIVASAN. Measure and Integral.
to Modern Number Theory. 2nd ed. Vol. I.
85 EDWARDS. Fourier Series. Vol. II. 2nd ed. 117 SERRE. Algebraic Groups and Class Fields.
86 VAN LINT. Introduction to Coding Theory. 118 PEDERSEN. Analysis Now.
2nded. 119 ROTMAN. An Introduction to Algebraic
87 BROWN. Cohomology of Groups. Topology.
88 PIERCE. Associative Algebras. 120 ZIEMER. Weakly Differentiable Functions:
89 LANG. Introduction to Algebraic and Sobolev Spaces and Functions of Bounded
Abelian Functions. 2nd ed. Variation.
90 BRONDSTED. An Introduction to Convex 121 LANG. Cyclotomic Fields I and II.
Polytopes. Combined 2nd ed.
91 BEARDON. On the Geometry of Discrete 122 REMMERT. Theory of Complex Functions.
Groups. Readings in Mathematics
92 DIESTEL. Sequences and Series in Banach 123 EBBINGHAUS/HERMES et al. Numbers.
Spaces. Readings in Mathematics
93 DUBROVIN/FoMENKOlNovIKOV. Modern 124 DUBROVIN/FoMENKOINOVIKOV. Modern
Geometry-Methods and Applications. Geometry-Methods and Applications.
Part I. 2nd ed. Part Ill.
94 WARNER. Foundations of Differentiable 125 BERENSTEIN/GAY. Complex Variables: An
Manifolds and Lie Groups. Introduction.
95 SHIRYAEV. Probability. 2nd ed. 126 BOREL. Linear Algebraic Groups. 2nd ed.
96 CONWAY. A Course in Functional 127 MASSEY. A Basic Course in Algebraic
Analysis. 2nd ed. Topology.
97 KOBLITZ. Introduction to Elliptic Curves 128 RAUCH. Partial Differential Equations.
and Modular Fonns. 2nd ed. 129 FULTON/HARRIS. Representation Theory: A
98 BROCKERIToM DIECK. Representations of First Course.
Compact Lie Groups. Readings in Mathematics
99 GROVE/BENSON. Finite Reflection Groups. 130 DODSON/POSTON. Tensor Geometry.
2nd ed.
131 LAM. A First Course in Noncommutative 165 NATHANSON. Additive Number Theory:
Rings. 2nd ed. Inverse Problems and the Geometry of
132 BEARDON. Iteration of Rational Functions. Sumsets.
133 HARRIS. Algebraic Geometry: A First 166 SHARPE. Differential Geometry: Cartan's
Course. Generalization of Klein's Erlangen
134 ROMAN. Coding and Information Theory. Program.
135 ROMAN. Advanced Linear Algebra. 167 MORANDI. Field and Galois Theory.
136 ADKINS/WEINTRAUB. Algebra: An 168 EWALD. Combinatorial Convexity and
Approach via Module Theory. Algebraic Geometry.
137 AXLERIBoURDON/RAMEY. Harmonic 169 BHATIA. Matrix Analysis.
Function Theory. 2nd ed. 170 BREDON. Sheaf Theory. 2nd ed.
138 COHEN. A Course in Computational 171 PETERSEN. Riemannian Geometry.
Algebraic Number Theory. 172 REMMERT. Classical Topics in Complex
139 BREDON. Topology and Geometry. Function Theory.
140 AUBIN. Optima and Equilibria. An 173 DIESTEL. Graph Theory. 2nd ed.
Introduction to Nonlinear Analysis. 174 BRIDGES. Foundations of Real and
141 BECKERIWEISPFENNING/KREDEL. Griibner Abstract Analysis.
Bases. A Computational Approach to 175 LICKORISH. An Introduction to Knot
Commutative Algebra. Theory.
142 LANG. Real and Functional Analysis. 176 LEE. Riemannian Manifolds.
3rd ed. 177 NEWMAN. Analytic Number Theory.
143 DOOB. Measure Theory. 178 CLARKEILEDY AEV/STERNIWOLENSKI.
144 DENNIS/F ARB. Noncommutative Nonsmooth Analysis and Control
Algebra. Theory.
145 VICK. Homology Theory. An 179 DOUGLAS. Banach Algebra Techniques in
Introduction to Algebraic Topology. Operator Theory. 2nd ed.
2nd ed. 180 SRIVASTAVA. A Course on Borel Sets.
146 BRIDGES. Computability: A 181 KRESS. Numerical Analysis.
Mathematical Sketchbook. 182 WALTER. Ordinary Differential
147 ROSENBERG. Algebraic K- Theory Equations.
and Its Applications. 183 MEGGINSON. An Introduction to Banach
148 ROTMAN. An Introduction to the Space Theory.
Theory of Groups. 4th ed. 184 BOLLOBAS. Modern Graph Theory.
149 RATCLIFFE. Foundations of 185 COX/LITTLE/O'SHEA. Using Algebraic
Hyperbolic Manifolds. Geometry.
150 EISENBUD. Commutative Algebra 186 RAMAKRISHNANNALENZA. Fourier
with a View Toward Algebraic Analysis on Number Fields.
Geometry. 187 HARRIS/MoRRISON. Moduli of Curves.
151 SILVERMAN. Advanced Topics in 188 GOLDBLATT. Lectures on the Hyperreals:
the Arithmetic of Elliptic Curves. An Introduction to Nonstandard Analysis.
152 ZIEGLER. Lectures on Polytopes. 189 LAM. Lectures on Modules and Rings.
153 FULTON. Algebraic Topology: A 190 ESMONDEIMURTY. Problems in Algebraic
First Course. Number Theory.
154 BROWN/PEARCY. An Introduction to 191 LANG. Fundamentals of Differential
Analysis. Geometry.
155 KASSEL. Quantum Groups. 192 HIRSCH/LACOMBE. Elements of Functional
156 KECHRIS. Classical Descriptive Set Analysis.
Theory. 193 COHEN. Advanced Topics in
157 MALLIA VIN. Integration and Computational Number Theory.
Probability. 194 ENGELINAGEL. One-Parameter Semigroups
158 ROMAN. Field Theory. for Linear Evolution Equations.
159 CONWAY. Functions of One 195 NATHANSON. Elementary Methods in
Complex Variable II. Number Theory.
160 LANG. Differential and Riemannian 196 OSBORNE. Basic Homological Algebra.
Manifolds. 197 EISENBUDIHARRIS. The Geometry of
161 BORWEIN/ERDEL YI. Polynomials and Schemes.
Polynomial Inequalities. 198 ROBERT. A Course inp-adic Analysis.
162 ALPERINIBELL. Groups and 199 HEDENMALMIKORENBLUMIZHU. Theory
Representations. of Bergman Spaces.
163 DIXON/MORTIMER. Permutation Groups. 200 BAO/CHERN/SHEN. An Introduction to
164 NATHANSON. Additive Number Theory: Riemann-Finsler Geometry.
The Classical Bases.
201 HINDRy/SILVERMAN. Diophantine 205 FELlXlHALPERINITHOMAS. Rational
Geometry: An Introduction. Homotopy Theory.
202 LEE. Introduction to Topological 206 MURTY. Problems in Analytic Number
Manifolds. Theory.
203 SAGAN. The Symmetric Group: Readings in Mathematics
Representations, Combinatorial 207 GODSILIRoYLE. Algebraic Graph Theory.
Algorithms, and Symmetric Functions. 208 CHENEY. Analysis for Applied
2nded. Mathematics.
204 ESCOFIER. Galois Theory.

You might also like