Analysis For Applied Mathematics - Ward Cheney
Analysis For Applied Mathematics - Ward Cheney
Analysis For Applied Mathematics - Ward Cheney
Editorial Board
S. Axler F.w. Gehring K.A. Ribet
, Springer
Ward Cheney
Department of Mathematics
University of Texas at Austin
Austin, TX 78712-1082
USA
Editorial Board
S. Axler F. W. Gehring K.A. Ribet
Mathematics Department Mathematics Department Mathematics Department
San Francisco State East Hall University of California
University University of Michigan at Berkeley
San Francisco, CA 94132 Ann Arbor, MI 48109 Berkeley, CA 94720-3840
USA USA USA
987 6 5 4 321
SPIN 10833405
Preface
This book evolved from a course at our university for beginning graduate stu-
dents in mathematics-particularly students who intended to specialize in ap-
plied mathematics. The content of the course made it attractive to other math-
ematics students and to graduate students from other disciplines such as en-
gineering, physics, and computer science. Since the course was designed for
two semesters duration, many topics could be included and dealt with in de-
tail. Chapters 1 through 6 reflect roughly the actual nature of the course, as it
was taught over a number of years. The content of the course was dictated by
a syllabus governing our preliminary Ph.D. examinations in the subject of ap-
plied mathematics. That syllabus, in turn, expressed a consensus of the faculty
members involved in the applied mathematics program within our department.
The text in its present manifestation is my interpretation of that syllabus: my
colleagues are blameless for whatever flaws are present and for any inadvertent
deviations from the syllabus.
The book contains two additional chapters having important material not
included in the course: Chapter 8, on measure and integration, is for the ben-
efit of readers who want a concise presentation of that subject, and Chapter 7
contains some topics closely allied, but peripheral, to the principal thrust of the
course.
This arrangement of the material deserves some explanation. The ordering
of chapters reflects our expectation of our students: If they are unacquainted
with Lebesgue integration (for example), they can nevertheless understand the
examples of Chapter 1 on a superficial level, and at the same time, they can
begin to remedy any deficiencies in their knowledge by a little private study
of Chapter 8. Similar remarks apply to other situations, such as where some
point-set topology is involved; Section 7.6 will be helpful here. To summarize:
We encourage students to wade boldly into the course, starting with Chapter 1,
and, where necessary, fill in any gaps in their prior preparation. One advantage
of this strategy is that they will see the necessity for topology, measure theory,
and other topics - thus becoming better motivated to study them. In keeping
with this philosophy, I have not hesitated to make forward references in some
proofs to material coming later in the book. For example, the Banach contraction
mapping theorem is needed at least once prior to the section in Chapter 4 where
it is dealt with at length.
Each of the book's six main topics could certainly be the subject of a year's
course (or a lifetime of study), and many of our students indeed study functional
analysis and other topics of the book in separate courses. Most of them eventu-
ally or simultaneously take a year-long course in analysis that includes complex
analysis and the theory of measure and integration. However, the applied math-
ematics course is typically taken in the first year of graduate study. It seems
to bridge the gap between the undergraduate and graduate curricula in a way
that has been found helpful by many students. In particular, the course and the
v
vi Preface
Preface .................................................................... v
vii
viii Contents
1
2 Chapter 1 Normed Linear Spaces
lofty viewpoint, from which he or she may have the advantage of being more
aware of significant features as distinguished from less significant details.
We begin by reviewing the concept of a vector space, or linear space.
(These terms are interchangeable.) The reader is probably already familiar with
these spaces, or at least with the example of vectors in JRn. However, many
function spaces are also linear spaces, and much can be learned about these
function spaces by exploiting their similarity to the more elementary examples.
Here, as a reminder, we include the axioms for a vector space or linear space.
A real vector space is a triple (X, +, .), in which X is a set, and + and·
are binary operations satisfying certain axioms. Here are the axioms:
(i) If x and y belong to X then so does x +y (closure axiom).
(ii) x + y = y + x (commutativity).
(iii)x + (y + z) = (x + y) + z (associativity).
(iv) X contains a unique element, 0, such that x +0 = x for all x in X.
(v) With each element x there is associated a unique element, -x, such
that x + (-x) = O.
(vi) If x E X and A E JR, then A. x E X (JR denotes the set of real numbers.)
(closure axiom)
(vii) A· (x + y) = A· x + A· y (A E JR), (distributivity).
(viii) (A+J,t)·X=A·X+J,t·X (A,J,tEJR), (distributivity).
(ix) A· (J,t. x) = (AJ,t) . x (associativity).
(x) 1· x = x.
These axioms need not be intimidating. The essential feature of a linear space
is that there is an addition defined among the elements of X, and when we add
two elements, the result is again in the space X. One says that the space is
closed (algebraically) under the operation of addition. A similar remark holds
true for multiplication of an element by a real number. The remaining axioms
simply tell us that the usual rules of arithmetic are valid for the two operations.
Most rules that you expect to be true are indeed true, but if they do not appear
among the axioms it is because they follow from the axioms. The effort to keep
the axioms minimal has its rewards: When one must verify that a given system
is a real vector space there will be a minimum of work involved!
In this set of axioms, the first five define an (additive) Abelian group. In
axiom (iv), the uniqueness of 0 need not be mentioned, for it can be proved
with the aid of axiom (ii). Usually, if A E JR and x E X, we write AX in place
of A . x. The reader will note the ambiguity in the symbol + and the symbol
o. For example, when we write Ox = 0 two different zeros are involved, and in
axiom (viii) the plus signs are not the same. We usually write x - y in place of
x + (-y). Furthermore, we are not going to belabor elementary consequences of
the axioms such as A L:~ Xi = L:~ Axi. We usually refer to X as the linear space
rather than (X, +, .). Observe that in a linear space, we have no way of assigning
a meaning to expressions that involve a limiting process, such as L:;'" Xi. This
drawback will disappear soon, upon the introduction of a norm.
From time to time we will prefer to deal with a complex vector space. In
such a space A·X is defined (and belongs to X) whenever A E C and x E X. (The
Section 1.1 Definitions and Examples 3
symbol C denotes the set of complex numbers.) Other fields can be employed
in place of JR and C, but they are rarely useful in applied mathematics. The
field elements are often termed scalars, and the elements of X are often called
vectors.
Let X be a vector space. A norm on X is a real-valued function, denoted
by I II,
that fulfills three axioms:
(i) Ilxll > 0 for each nonzero element in X.
(ii) IIAxl1 = IAlllxl1 for each Ain JR and each x in X.
(iii) Ilx + YII ~ Ilxll + IIYII for all x, YE X. (Triangle Inequality)
A vector space in which a norm has been introduced is called a normed linear
space. Here are eleven examples.
Example 1. Let X = JR, and define Ilxll = lxi, the familiar absolute value
function.
Example 2. Let X = C, where the scalar field is also C. Use = where Ilxll lxi,
•
Ixlhas its usual meaning for a complex number x. Thus if x = a + ib (where a
and b are real), then Ixl
= v'a 2 + b2 . •
Example 3. Let X = C, and take the scalar field to be lR. The terminology
we have adopted requires that this be called a real vector space, since the scalar
field is lR. •
Example 4. Let X = JRn . Here the elements of X are n-tuples of real numbers
that we can display in the form x = [x(l), x(2), . .. ,x(n)] or x = [Xl, X2, . .. ,x n ].
A useful norm is defined by the equation
IIxlioo = max
l,;;;.';;;n
Ix(i)1
Note that an n-tuple is a function on the set {l, 2, ... , n}, and so the notation
x( i) is consistent with that interpretation. (This is the "sup" norm.) •
Example 5. Let X = JR n , and define a norm by the equation = Ilxll
L~l Ix(i)l·
Observe that in Examples 4 and 5 we have two distinct normed
linear spaces, although each involves the same linear space. This shows the ad-
vantage of being more formal in the definition and saying that a normed linear
space is a pair (X, I II)
etc. etc., but we refrain from doing this unless it is
necessary. •
Example 6. Let X be the set of all real-valued continuous functions defined
on a fixed compact interval [a, b]. The norm usually employed here is
J:
Example 7. Let X be the set of all Lebesgue-integrable functions defined on
a fixed interval [a, bJ. The usual norm for this space is = IIxll
Jx(s)Jds. In this
space, the vectors are actually equivalence classes of functions, two functions
being regarded as equivalent if they differ only on a set of measure O. (The
reader who is unfamiliar with the Lebesgue integral can substitute the Riemann
integral in this example. The resulting spaces are different, one being complete
and the other not. This is a rather complicated matter, best understood after
the study of measure theory and Lebesgue integration. Chapter 8 is devoted to
this branch of analysis. The notion of completeness of a space is taken up in the
next section.) •
Example 8. Let X = f, the space of all sequences in R
x = [x(1),x(2), ... J
in which only a finite number of terms are nonzero. (The number of nonzero
terms is not fixed but can vary with different sequences.) Define = IIxll
maX n Jx(n)J. •
Example 9. Let X = foo, the space of all real sequences x for which
sUPn Jx(n)J < 00. Define IIxll
to be that supremum, as in Example 8. •
Example 10. Let X = II, the space of all polynomials having real coefficients.
A typical element of II is a function x having the form
Problems 1.1
Here is a Chinese proverb that is pertinent to the problems: I hear, I forget; I see, I
remember; I do, I understand!
1. Let X be a linear space over the complex field. Let XT be the space obtained from X by
restricting the scalars to the real field. Prove that XT is a real linear space. Show by an
example that not every real linear space is of the form XT for some complex linear space
X. Caution: When we say that a linear space is a real linear space, this has nothing to
do with the elements of the space. It means only that the scalar field is IR and not IC.
2. Prove the norm axioms for Examples 4-7.
3. Prove that in any normed linear space,
4. Denote the norms in Examples 4 and 5 by II IL", and II Ill' respectively. Find the best
constants in the inequality
Prove that your constants are the best. (The "constants" a and (3 will depend on n but
not x.)
5. In Examples 4, 5, 6, and 7 find the precise conditions under which we have Ilx + yll =
IIxll + Ilyll·
6. Prove that in any normed linear space, if x # 0, then x/llxli is a vector of norm 1.
7. The Euclidean norm on IRn is defined in Example 11. Find the best constants in the
inequality ollxll oc ~ IIxl12 ~ (3ll xllx'
6 Chapter 1 Normed Linear Spaces
8. What theorems in elementary analysis are needed to prove the closure axioms for Example
6?
9. What is the connection between the normed linear spaces f and II defined in Examples
8 and 1O?
t
10. For any t in the open interval (0,1), let be the sequence [t, t 2 , t 3 , .. . J. Notice that
t E foe. Prove that the set {t: 0 < t < I} is linearly independent.
11. In the space II we define special elements called monomials. They are given by xn(t) =
t n where n = 0, 1,2, ... Prove that {Xn : n = 0, 1,2,3 ... } is linearly independent.
12. Let T be a set of real numbers. We say that T is bounded above if there is an M
in ]R such that t ~ M for all t in T. We say that M is an upper bound of T. The
completeness axiom for ]R asserts that if a set T is bounded above, then the set of
all its upper bounds is an interval of the form [b,oo). The number b is the least upper
bound, or supremum of T, written b = l.u.b.(T) = sup(T). Prove that if x < b, then
(x, oo)nT is nonempty. Give examples to show that [b, oo)nT can be empty or nonempty.
There are corresponding concepts of bounded below, lower bound, greatest lower
bound, and infimum.
13. Which of these expressions define norms on ]R2? Explain.
the fact that a linear function on a convex polyhedral set must attain its extrema
at the vertices of the set. Thus, to locate the maxima of a linear function
over a convex polyhedral set, one need only test the vertices. The central idea
of Dantzig's famous simplex method is to move from vertex to vertex, always
improving the value of the objective function.
Another application of convexity occurs in studying deformations of a physi-
cal body. The "yield surface" of an object is generally convex. This is the surface
in 6-dimensional space that gives the stresses at which an object will fail struc-
turally. Six dimensions are needed to account for all the variables. See [Mar],
pages 100-104.
Among examples of convex sets in a linear space X we have:
(i) the space X itself;
(ii) any set consisting of a single point;
(iii) the empty set;
(iv) any linear subspace of X;
(v) any line segment; i.e. a set of the following form in which a and bare
fixed:
{Aa+(I-A)b: O~A~I}
In a normed linear space, another important convex set is the unit cell or unit
ball:
{x EX: ~ I} Ilxll
In order to see that the unit ball is convex, let Ilxll ~ 1, IIYII ~ 1, and 0 ~ A ~ 1.
Then, with Jl = 1 - A,
-1 1 -1
-1 -1
n ) lip
Ilxll p =
(
~ Ix(i}IP
8 Chapter 1 Normed Linear Spaces
It can be shown (Problem 1) that lim p--+oo Ilxllp = Ilxlloo. (This explains the
notation.) The unit balls (in ]R2) for II lip are shown for p = 1, 2, and 7, in
Figure 1.3.
0.5
·0.5 0.5
-0.5
Given a sequence [xn] in a normed linear space (or indeed in any metric
space), is it possible to determine, from the sequence alone, whether it con-
verges? This is certainly an important matter for practical purposes, since we
often use algorithms to generate sequences that should converge to a solution
of a given problem. The answer to the posed question is that we cannot infer
convergence, in general, solely from the sequence itself. If we confine ourselves to
the information contained in the sequence, we can construct the doubly indexed
sequence Cnm = Ilxn- xmll. If [c nm ] does not converge to zero, then the given
sequence [xn] cannot converge, as is easily proved: For any x in the space, write
This shows that if Cnm does not converge to 0, then [xn] cannot converge. On
the other hand, if Cnm converges to zero, one intuitively thinks that the sequence
ought to converge, and if it does not, there must be a flaw in the space itself: The
limit of the sequence should exist, but the limiting point is somehow missing from
the space. Think of the rational numbers as an example. The missing ingredient
is completeness of the space, to which we now turn.
10 Chapter 1 Normed Linear Spaces
Proof. Let [xn] be a Cauchy sequence in C[a, b]. (This space is described in
Example 6, page 3.) Then for each s, [xn(s)] is a Cauchy sequence in R Since IR
is complete, this latter sequence converges to a real number that we may denote
Section 1.2 Convexity, Convergence, Compactness, Completeness 11
by x(s). The function x thus defined must now be shown to be continuous, and
we must also show that n - Ilx xii
-+ o. Let t be fixed as the point at which
continuity is to be proved. We write
This inequality should suggest to the reader how the proof must proceed. Let
e > o. Select N so that n - Ilx xmll
~ e/3 whenever m ~ n ~ N (Cauchy
property). Then for m ~ n ~ N, Ixn(s) - xm(s)1
~ e/3. By letting m -+ 00 we
get Ixn(s) - x(s)1 ~ e/3 for all s. This shows that Ilxn- xii
~ e/3 and that the
Ilx
sequence n - xii
converges to o. By the continuity of Xn there exists a 6 > 0
such that IXn(s) - xn(t)1 < e/3 whenever It - sl < 6. Inequality (1) now shows
that Ix(s) - x(t)1 < e when It - sl < 6. (This proof illustrates what is sometimes
called "an e/3 argument.") •
Remarks. Theorem 2 is due to Weierstrass. It remains valid if the interval
[a, b] is replaced by any compact Hausdorff space. (For topological notions, refer
to Section 7.6, starting on page 361.) The traditional formulation of this theorem
states that a uniformly convergent sequence of continuous functions on a closed
and bounded interval must have a continuous limit. A sequence of functions [In]
converges uniformly to I if
Problems 1.2
Answer the same question for this property: For every positive E there is a natural number
n such that Ilxm - xnll < E whenever m ~ n.
3. Prove that if a sequence [Xn] in a Banach space satisfies 2::'=1 Ilxnll < 00, then the
series 2::'=1 Xn converges.
4. Prove that Theorem 2 is not true for the norm J Ix(t)1 dt.
5. Prove that the union of a finite number of compact sets is compact. Give an example to
show that the union of an infinite family of compact sets can fail to be compact.
6. Prove that II lip on IRn does not satisfy the triangle inequality if 0 < p < 1 and n ~ 2.
Prove that this space is complete. Cultural note: The space loc (l\l) is of special interest.
Every separable metric space can be embedded isometrically in it! You might enjoy
trying to prove this, but that is not part of problem 12.
13. Prove that in a normed linear space a sequence cannot converge to two different points.
14. How does a sequence [Xn : n E !\I] differ from a countable set {Xn : n E !\I}?
15. Is there a norm that makes the space of all real sequences a Banach space?
16. Let Co denote the space of all real sequences that converge to zero. Define Ilxll
sUPn Ix(n)l. Prove that Co is a Banach space.
17. If K is a convex set in a linear space, then these two sets are also convex:
AC = {t Aiai : n E!\I , Ai ~ 0 , ai E A, t Ai = 1}
Prove that A C A C Prove that AC is convex. Prove that A C is the smallest convex set
containing A. This latter assertion means that if A is contained in a convex set B, then
AC is also contained in B. The set AC is the convex hull of A.
Section 1.2 Convexity, Convergence, Compactness, Completeness 13
19. If A and B are convex sets, is their vector sum convex? The vector sum of these two sets
isA+B={a+b: aEA,bEB}.
20. Can a norm be recovered from its unit ball? Hint: If x E X, then X/A is in the unit
ball whenever IAI ~ Ilxli. (Prove this.) On the other hand, X/A is not in the unit ball if
IAI < Ilxll· (Prove this.)
21. What are necessary and sufficient conditions on a set 8 in a linear space X in order that
8 be the unit ball for some norm on X?
22. Prove that the intersection of a family of convex sets (all contained in one linear space)
is convex.
23. A metric space is a pair (X, d) in which X is a set and d is a function (called a metric)
from X x X to IR such that
(i) d(x, y) ~ 0
(ii) d(x, y) = 0 if and only if X =y
(iii) d(x, y) = d(y, x)
(iv) d(x, y) ~ d(x, z) + d(z, y)
Prove that a normed linear space is a metric space if d( x, y) is defined as II x - y II.
24. For this problem only, we use the following notation for a line segment in a linear space:
25. If Xn -+ x and if the Cesaro means are defined by an = (Xl + ... +xn)/n, then an -+ x.
(This is to be proved in an arbitrary normed linear space.)
26. Prove that a Cauchy sequence that contains a convergent subsequence must converge.
27. A compact set in a normed linear space must be bounded; i.e., contained in some multiple
of the unit ball.
28. Prove that the equation f(x) = L.:;;"=o a k cos bkx defines a continuous function on IR,
provided that 0 ~ a < l. The parameter b can be any real number. You will find
useful Theorem 2 and Problem 3. Cultural Note: If 0 < a < 1 and if b is an odd
integer greater than a-I, then f is differentiable nowhere. This is the famous Weierstrass
nondifferentiable function. (See Section 7.8, page 374, for more information about this
function.)
29. Prove that a sequence [xn] in a normed linear space converges to a point x if and only if
every subsequence of [Xn] converges to x.
30. Prove that if ¢ is a strictly increasing function from N into N, then ¢(n) ~ n for all n.
3l. Let 8 be a subset of a linear space. Let 8 1 be the union of all line segments that join
pairs of points in 8. Is 8 1 necessarily convex?
14 Chapter 1 Normed Linear Spaces
32. (continuation) What happens if we repeat the process and construct 8 2 , 8 3 , ... ? (Thus,
for example, 82 is the union of line segments joining points in 81.)
33. Let I be a compact interval in JR, I = [a, b]. Let X be a Banach space. The notation
C(I, X) denotes the linear space of all continuous maps I : I --t X. We norm C(I, X)
by putting 11111 = SUPtEI III(t)lI. Prove that C(I, X) is a Banach space.
34. Define In(x) = e- nx . Show that this sequence of functions converges pointwise on [0,1]
to the function 9 such that g(O) = 1 and g(t) = 0 for t -# O. Show that in the L 2-norm
on [0,1], In converges to O. The L2- norm is defined by 11111 = {fo1 II(t)i2dt}1/2.
35. Let [Xn] be a sequence in a Banach space. Suppose that for every c > 0 there is a
convergent sequence [Yn] such that sUPn IIxn - Ynll < c. Prove that [xn] converges.
36. In any normed linear space, define K(x, r) = {y : IIx - yll ~ r}. Prove that if K(x, ~) c
K(O, 1) then 0 E K(x, ~).
37. Show that the closed unit ball in a normed linear space cannot contain a disjoint pair of
closed balls having radius ~.
38. (Converse of Problem 3) Prove that if every absolutely convergent series converges
in a normed linear space, then the space is complete. (A series Xn is absolutely 2:
convergent if 2: IIxnll < 00.)
39. Let X be a compact Hausdorff space, and let C(X) be the space of all real-valued
continuous functions on X, with norm 11111 = supII(x)l. Let [In] be a Cauchy sequence
in C(X). Prove that
lim
X---+Xo
lim In(x)
n-+oo
= n-+<Xl
lim lim In(x)
x---t-XQ
Give examples to show why compactness, continuity, and the Cauchy property are needed.
40. The space £1 consists of all sequences x = [x(1),x(2), ... ] in which x(n) E JR and
2: Ix(n)1 < 00. The space £2 consists of sequences for which 2: Ix(n)12 < 00. Prove
that £1 C £2 by establishing the inequality 2: Ix(n)12 ~ (2: Ix(n)1)2.
41. Let X be a normed linear space, and 8 a dense subset of X. Prove that if each Cauchy
sequence in 8 has a limit in X, then X is complete. A set 8 is dense in X if each point
of X is the limit of some sequence in 8.
42. Give an example of a linearly independent sequence [xo, Xl, X2,"'] of vectors in loo such
that 2::'=0 Xn = O. Don't forget to prove that 2: Xn = O.
43. Prove, in a normed space, that if Xn --t X and Ilxn - Ynll --t 0, then Yn --t X. If Xn --t X
and IIxn - Yn II --t 1, what is lim Yn ?
45. (Continuation) Prove that there is no monotone norm on the space of all real-valued
sequences.
47. Any normed linear space X can be embedded as a dense subspace in a complete normed
linear space X. The latter is fully determined by the former, and is called the completion
of X. A more general assertion of the same sort is true for metric spaces. Prove that the
completion of the space f. in Example 8 of Section 1.1 (page 4) is the space Co described
in Problem 16. Further remarks about the process of completion occur in Section 1.8,
page 60.
48. Metric spaces were defined in Problem 23, page 13. In a metric space, a Cauchy sequence
is one that has the property limn,m d(xn,x m ) = O. A metric space is complete if
every Cauchy sequence converges to some point in the space. For the discrete metric
space mentioned in Problem 11 (page 19), identify the Cauchy sequences and determine
whether the space is complete.
Proof. To show that f(D} is compact, we let [Ynl be any sequence in f(D},
and prove that this sequence has a convergent subsequence whose limit is in
f(D}. There exist points Xn ED such that f(xn} = Yn' Since D is compact, the
sequence [xnl has a subsequence [xnil that converges to a point xED. Since f
is continuous,
16 Chapter 1 Normed Linear Spaces
A function f whose domain and range are subsets of normed linear spaces
is said to be uniformly continuous if there corresponds to each positive c a
positive 8 such that II f (x) - f (y) II < c for all pairs of points (in the domain of
f) satisfying Ilx - YII < 8. The crucial feature of this definition is that 8 serves
simultaneously for all pairs of points. The definition is global, as distinguished
from local.
Proof. Let L~l Xi be such a series and L~l a rearrangement of it. Put
Xki
m
IISm - Srll = II(xl + ... + xm) - (Xkl + ... + xkr)11 ~ L Ilxill < c
i=n+l
Hence
•
In using a series that is not absolutely convergent, some caution must be
exercised. Even in the case of a series of real numbers, bizarre results can arise
if the series is randomly re-ordered. A good example of a series of real numbers
that converges yet is not absolutely convergent is the series Ln (_1)n In. The
series of corresponding absolute values is the divergent harmonic series. There
is a remarkable theorem that includes this example:
Proof. Let the series L Xn satisfy the hypotheses. Then lim Xn = 0 and
L Xn - L Xn = L IXnl = 00
Xn>O xn<O
Since the series L Xn converges, the two series on the left of the preceding
equation must diverge to +00 and -00, respectively. (See Problems 12 and 13.)
Now let T be any real number. Select positive terms (in order) from the series
until their sum exceeds T. Now add negative terms (chosen in order) until the
new partial sum is less than T. Continue in this manner. Since limxn = 0, the
partial sums thus created differ from T by quantities that tend to zero. •
Problems 1.3
2. Let U be an arbitrary subset of a normed space. Prove that the function x ...... dist(x, U)
is continuous. This function was defined in the proof of Theorem 1 in Section 1.2, page
9. Prove, in fact, that it is "nonexpansive":
6. Use Theorem 2 and Problem 2 in this section to give a brief proof of Theorem 1 in
Section 2, page 9.
7. Using the definition of an open set as given in this section, prove that a set U is open if
and only if for each x in U there is a positive c such that B(x, c) C U.
8. Prove that the inverse image of an open set by a continuous map is open.
9. The (algebraic) sum of two sets in a linear space is defined by A + B = {a + b: a E A,
b E B}. Is the sum of two closed sets (in a normed linear space) closed? (Cf. Problem
19, page 13.)
10. Prove that if the series L:':l Xi converges (in some normed linear space), then Xi --t 0.
11. A common misconception about metric spaces is that the closure of an open ball S = {x :
d( a, x) < r} is the closed ball S' = {x : d( a, x) ::::; r}. Investigate whether this is correct
in a discrete metric space (X,d), where d(x,y) = 1 if x # y. What is the situation in a
normed linear space? (Refer to Problem 23, page 13.)
12. Let L Xn and L Yn be two series of nonnegative terms. Prove that if one of these series
converges but the other does not, then the series L(Xn - Yn) diverges. Can you improve
this result by weakening the hypotheses?
13. Let L Xn be a convergent series of real numbers such that L
IX n I = 00. Prove that the
series of positive terms extracted from the series L
Xn diverges to 00. It may be helpful
to introduce Un = max(xn, 0) and Vn = min(xn,O). By using the partial sums of series,
one reduces the question to matters concerning the convergence of sequences.
14. Refer to Problem 12, page 12, for the space too(S). We write::::; to signify a pointwise
inequality between two members of this space. Let 9n and In be elements of this space,
for n = 1,2, ... Let 9n ~ 0, In-l - 9n-l ::::; In ::::; M, and L~ 9i ::::; M for all n. Prove
that the sequence [In] converges pointwise. Give an example to show that convergence
in norm may fail.
Our first goal in this section is to show that the Heine-Borel theorem is true
for a normed linear space if and only if the space is finite-dimensional. Since most
interesting function spaces are infinite-dimensional, verifying the compactness of
a set in these spaces requires information beyond the simple properties of being
bounded and closed. Many important theorems in functional analysis address
the question of identifying the compact sets in various normed linear spaces.
Examples of such theorems will appear in Chapter 7.
n n
~ mfx la(i) - b(i)I' L IIXjl1 = Iia - bll oo L IIXj II
j=l j=l
Section 1.4 More about Compactness 21
In other words, we have only to prove that M is bounded. To this end, define
{3 = IITbl1 = II ~ b(i)Xill
Since the points Xi constitute a linearly independent set, and since b i- 0, we
{3
conclude that Tb i- 0 and that > O. Since F is bounded, there is a constant
c such that Ilxll c
~ for all x E F. Now, if a E ]Rn and a i- 0, then is a ajllall oo
vector of norm 1; consequently, IIT(ajllalloo)11 {3,
~ or
Proof. Let [xnl be a Cauchy sequence in such a space. Let us prove that the
sequence is bounded. Select an index m such that Ilxi - Xjll
< 1 whenever
i, j ~ m. Then we have
(i ~ m)
Since the ball of radius c is compact, our sequence must have a convergent
subsequence, say x ni ~ x'. Given E: > 0, select N so that Xj < E: whenIlxi - I
i,j ~ N. Then IIXj - XnJ
< E: when i,j ~ N, because ni > i. By taking the
limit as i ~ 00, we conclude that IIXj - x'il
~ E: when j ~ N. This shows that
Xj ~ x. •
22 Chapter 1 Normed Linear Spaces
In any normed linear space, a compact set is necessarily closed and bounded.
In a finite-dimensional space, these two conditions are also sufficient for compact-
ness. In any infinite-dimensional space, some additional hypothesis is required
to imply compactness. For many spaces, necessary and sufficient conditions for
compactness are known. These invariably involve some uniformity hypothesis.
See Section 7.4, page 347, for some examples, and [DS] (Section IV.14) for many
others.
Problems 1.4
Consider two vector spaces X and Y over the same scalar field. A mapping
f :X -t Y is said to be linear if
f(au + (3v) = af(u) + (3f(v)
for all scalars a and (3 and for all vectors u, v in X. A linear map is often called
a linear transformation or a linear operator. If Y happens to be the scalar
field, the linear map is called a linear functional. By taking a = (3 = a we see
at once that a linear map f must have the property f(O) = O. This meaning of
the word "linear" differs from the one used in elementary mathematics, where a
linear function of a real variable x means a function of the form x >--+ ax + b.
Example 1. If X = IR n and Y = IR m , then each linear map of X into Y is of
the form f(x) = y,
n
y(i) = LaijX(j) (1 ~ i ~ m)
j=1
where the aij are certain real numbers that form an m x n matrix. •
Example 2. Let X = C[O, 1J and Y = IR. One linear functional is defined by
f(x) = f; x(s) ds. •
Example 3. Let X be the space of all functions on [O,lJ that possess n
continuous derivatives, x', x", ... ,x(n). Let ao, a1, ... ,an be fixed elements of
X. Then a linear operator D is defined by
i=O
Such an operator is called a differential operator. •
Example 4. Let X = C[O, 1J = Y. Let k be a continuous function on [O,lJ x
11
[O,lJ. Define K by
(Kx)(s) = k(s,t)x(t)dt
This is a linear operator, in fact a linear integral operator. •
Example 5. Let X be the set of all bounded continuous functions on IR+ =
1
{t E IR: t ~ a}. Put
00
(Lx)(s) = e-stx(t)dt
(Fx)(s) = 1: e-27ristx(t) dt
ker(f) = {x : f(x) = O}
Now use the preceding corollary to infer that the functionals Ai are continuous.
Getting back to T, we have
Proof. Let X be a finite-dimensional vector space having two norms and 11111
11112· The identity map J from (X, I Ill)
to (X, I IU
is continuous by the
preceding result. Hence it is bounded. This implies that
Proof. The only issue is the completeness of £(X, Y). Let [An] be a Cauchy
sequence in £(X, Y). For each x E X, we have
A)-I = L
00
(I - Ak
k=O
Proof. Put Bn = 2:~=o Ak. The sequence [Bn] has the Cauchy property, for
if n > m, then
(In this calculation we used Problem 20.) Since the space of all bounded linear
operators on X into X is complete (Theorem 4), the sequence [Bn] converges to
a bounded linear operator B. We have
n n+1
(I - A)Bn = Bn - ABn =L Ak - L Ak = J - A n+ 1
k=O k=1
Taking a limit, we obtain (J - A)B = I. Similarly, B(I - A) = I. Hence
B = (I - A)-I. •
The Neumann Theorem is a powerful tool, having applications to many
applied problems, such as integral equations and the solving of large systems of
linear equations. For examples, see Section 4.3, which is devoted to this theorem,
and Section 3.3, which has an example of a nonlinear integral equation.
Problems 1.5
1. Prove that the closure of a linear subspace in a normed linear space is also a subspace.
(The closure operation is defined on page 16.)
2. Prove that the operator norm defined here has the three properties required of a norm.
3. Prove that the kernel of a linear functional is either closed or dense. (A subset in a
topological space X is dense if its closure is X.)
4. Let {Xl, ... , Xk} be a linearly independent finite set in a normed linear space. Show that
there exists a 8 > 0 sum that the condition
max IIXi-Yill<8
l~i~k
7. Prove that a linear map is injective (i.e., one-to-one) if and only if its kernel is the 0
subspace. (The kernel of a map T is {x : Tx = O}.)
8. Prove that the norm of a linear transformation is the infimum of all the numbers l'v! that
satisfy the inequality IITxl1 ~ Mllxll for all x.
9. Prove the (surprising) result that a linear transformation is continuous if and only if it
transforms every sequence converging to zero into a bounded sequence.
10. If f is a linear functional on X and N is its kernel, then there exis ts a one-dimensional
subspace Y such that X =Y EEl N. (For two sets in a linear space , we define U +V as
the set of all sums u +v when u ranges over U and v ranges over V. If U and V are
subspaces with only 0 in common we write this sum as U EEl V.)
11. The space eco(S) was defined in Problem 12 of Section 1.2, page 12. Let S = N, and
define T: eoc(N) -+ c[-!,!l by the equation (Tx)(s) = 2::;;"=1 x(k)sk. Prove that T is
linear and continuous.
12. Prove or disprove: A linear map from a normed linear space into a finite-dimensional
normed linear space must be continuous.
14. Let Y be a closed subspace in a Banach space X. A "coset" is a set of the form x +Y =
{x +y :y E V}. Show that the family of all cosets is a normed linear space if we use the
norm III x + Y III = dist(x, V).
15. Refer to Problem 12 in the preceding section, page 23. Show that the assertion there is
not true if ~ is replaced by !.
16. Prove that for a bounded linear transformation T : X -+ Y
17. Prove that a bounded linear transformation maps Cauchy sequences into Cauchy se-
quences.
18. Prove that if a linear transformation maps some nonvoid open set of the domain space
to a bounded set in the range space, then it is continuous.
19. On the space C[O, 1] we define "point-evaluation functionals" by t*(x) = x(t). Here
t E (0, 1] and x E C(O , 1]. Prove that IIt'li = 1. Prove that if I/> = 2:: 7= 1 A;t: , where
tl , t 2, · .·,tn are distinct points in [0, 1] , then 111/>11 = 2::7=1 IA; I·
20. In the proof of the Neumann Theorem we used the inequality IIAk II ~ IIAlik . Prove this.
21. Prove that if {<PI, ... ,qln} is a linearly independent set of linear functionals , then for
suitable Xj we have qI;(Xj) = 8ij for 1 ~ i,j ~ n.
30 Chapter 1 Normed Linear Spaces
22. Prove that if a linear transformation is discontinuous at one point, then it is discontinuous
everywhere.
23. Linear transformations on infinite-dimensional spaces do not always behave like their
counterparts on finite-dimensional spaces. The space Co was defined in Problem 1.2.16
(page 12). On the space Co define
24. What is meant by the assertion that the behavior of a linear map at any point of its
domain is exactly like its behavior at o?
25. Prove that every linear functional f on IR n has the form f(x) = 2:~=1 oix(i), where
x(I), x(2), ... , x(n) are the coordinates of x. Let 0 = [01,02, ... , On] and show that the
relationship f ...... 0 is linear, injective, and surjective (hence, an isomorphism).
26. Is it true for linear operators in general that continuity follows from the null space being
closed?
27. Let <Po,(Pl, ... ,<Pn be linear functionals on a linear space. Prove that if the kernel of <Po
contains the kernels of all <Pi for 1 :(; i :(; n, then <Po is a linear combination of <PI , ... , <Pn.
28. If L is a bounded linear map from a normed space X to a Banach space Y, then L has a
unique continuous linear extension defined on the completion of X and taking values in
Y. (Refer to Problem 1.2.47, page 15.) Prove this assertion as well as the fact that the
norm of the extension equals the norm of the original L.
29. Let A be a continuous linear operator on a Banach space X. Prove that the series
2::'=0 An In! converges in £(X, X). The resulting sum can be denoted by eA. Is eA
invertible?
30. Investigate the continuity of the Laplace transform (in Example 5, page 24).
This section is devoted to two results that require the Axiom of Choice for their
proofs. These are a theorem on existence of Hamel bases, and the Hahn-Banach
Theorem. The first of these extends to all vector spaces the notion of a base,
which is familiar in the finite-dimensional setting. The Hahn-Banach Theorem
is needed at first to guarantee that on a given normed linear space there can
be defined continuous maps into the scalar field. There are many situations
in applied mathematics where the Hahn-Banach Theorem plays a crucial role;
convex optimization theory is a prime example.
The Axiom of Choice is an axiom that most mathematicians use unre-
servedly, but is nonetheless controversial. Its status was clarified in 1940 by a
famous theorem of Godel [Go]. His theorem can be stated as follows.
Section 1.6 Zorn's Lemma, Hamel Bases, Hahn-Banach Theorem 31
In other words, the Axiom of Choice by itself cannot be responsible for intro-
ducing an inconsistency in set theory. That is why most mathematicians are
willing to accept it. In 1963, Paul Cohen [Coh] proved that the Axiom of Choice
is independent of the remaining axioms in the Zermelo-F'raenkel system. Thus
it cannot be proved from them. The statement of this axiom is as follows:
For example, suppose that A is a finite set: A = {a1, ... , an}. For each i in
{1,2, ... ,n} a nonempty set /(ai) is given. In n steps, we can select "repre-
sentatives" Xl E /(a1), X2 E /(a2), etc. Having done so, define c(ad = Xi for
i = 1,2, ... , n. Attempting the same construction for an infinite set such as
A = JR, with accompanying infinite sets /(a), leads to an immediate difficulty.
To get around the difficulty, one might try to order the elements of each set /(a)
in such a way that there is always a "first" element in /(a). Then c(a) can be
defined to be the first element in /(a). But the proposed ordering will require
another axiom at least as strong as the Axiom of Choice! For a second example,
see Problem 45, page 40.
A number of other set-theoretic axioms are equivalent to the Axiom of
Choice. See [Kel] and [RR]. Among these equivalent axioms, we single out
Zorn's Lemma as being especially useful. First, we require some definitions.
Definition 1. A partially ordered set is a pair (X, -<) in which X is a set
and -< is a relation on X such that
(i) X -< X for all X
(ii) If x -< y and y -< Z, then x -< Z
or
1
f(X2) + f(y) ~ -;xp(x + AY) = -p( -X2 - y)
I<I>(Y) I ~ MIIYII (y E y)
Then <I> has a linear extension defined on all of X and satisFying the
above inequality on X.
Example 5. Let Co denote the Banach space of all real sequences that converge
to zero, normed by putting Ilxlloo = sup Ix(n)l. Let f1 denote the Banach
space of all real sequences u for which 2:::"=1 lu(n)1 < 00, normed by putting
Ilulll = 2:::"=1 lu(n)l· With each u E f1 we associate a functional <l>u E Co by
means of the equation <l>u(x) = 2:::"=1 u(n)x(n). (The connection between these
two spaces is the subject of the next result.) •
Section 1.6 Zorn's Lemma, Hamel Bases, Hahn-Banach Theorem 35
On the other hand, if c > 0 is given, we can select N so that I:~N+l < c. lu(n)1
Then we define x by putting x(n) =: sgn u(n) for n ::;; N, and by setting x(n) =: 0
for n > N. Clearly, x E Co and Ilxlloo
=: 1. Hence
N N
II1>ull ~ 1>u(x) =: Lx(n)u(n) =: L lu(n)1 > Ilull l - c
n=l n=l
Next we show that A is surjective. Let '¢ E co. Let <5n be the element of Co
that has a 1 in the nth coordinate and zeros elsewhere. Then for any x,
00
x=: L x(n)<5n
n=l
Consequently, if we put u(n) = 'I/J(8n ), then 'I/J(x) = ¢u(x) and 'I/J = ¢u' To verify
that u E £1, we define (as above) x(n) = sgnu(n) for n ~ Nand x(n) = 0 for
n> N. Then
N
Proof. Let X be the space and Z the subset in question. Let Y be the closure
of the linear span of Z. If Y i= X, let x EX" Y. Then by Corollary 2, there
exists </J EX' such that </J(x) = 1 and </J E y.l.. Hence </J E Z.l. and Z.l. i= O. If
Y = X, then any element of Z.l. annihilates the span of Z as well as Y and X.
Thus it must be the zero functional; i.e., Z.l. = O. •
Proof. This follows from Theorem 4 in Section 1.5, page 27, by letting Y = lR
in that theorem. •
Problems 1.6
8. The space loo consists of all bounded sequences, with norm IIxlioo = sUPn Jx(n)J. Define
T : lOO --t loo by putting
Let M denote the range of T, and put u = [1,1,1, ... J. Prove that dist(u,M) = 1.
9. Prove that there exists a continuous linear functional </> E M.l such that 11</>11 = </>( u) = 1.
The functional </> is called a Banach limit, and is sometimes written LIM.
10. Prove that if x E loo and x ;;;: 0, then </>(x) ;;;: O.
11. Prove that </>(x) = limn x(n) when the limit exists.
12. Prove that if y = [x(2), x(3), ... J then </>(x) = </>(y).
38 Chapter 1 Normed Linear Spaces
13. Let loo denote the normed linear space of all bounded real sequences, with norm given
by IIxIL", = sUPn Ix{n)l. Prove that loe is complete, and therefore a Banach space. Prove
that lj = loo, where the equality here really means isometrically isomorphic.
14. A hyperplane in a normed space is any translate of the null space of a continuous,
linear, nontrivial functional. Prove that a set is a hyperplane if and only if it is of the
form {x : ¢(x) = ,X}, where ¢ E X' "0 and ,X E lIt A translate of a set S in a vector
space is a set of the form v + S = {v + s : S E S}.
15. A half-space in a normed linear space X is any set of the form {x : ¢(x) ~ ,X}, where
¢ E X' "0 and ,X E JR. Prove that for every x satisfying IIxll = 1 there exists a half-space
such that x is on the boundary of the half-space and the unit ball is contained in the
half-space.
16. Prove that a linear functional ¢ is a linear combination of linear functionals ¢1, ... , ¢n if
and only if N(¢) :l n::"1 N(¢i). Here N(¢) denotes the null space of ¢. (Use induction
and trickery.)
17. Prove that a linear map transforms convex sets into convex sets.
18. Prove that in a normed linear space, the closure of a convex set is convex.
22. Let J(z) = L.::;;"=o anz n , where [an] is a sequence of complex numbers for which nan --t O.
Prove the famous theorem of Tauber that L.:: an converges if and only if limz--+1 J(z)
exists. (See [DS], page 78.)
23. Do the vectors tl n defined just after Corollary 4 form a fundamental set in the space lX)
consisting of bounded sequences with norm Ilxlloo = maxn Ix(n)l?
THREE EXERCISES (24-26) ON SCHAUDER BASES (See [Sem] and [Sing].)
24. A Schauder base (or basis) for a Banach space X is a sequence [un] in X such that eacil
x in X has a unique representation
00
x = L,XnUn
n=l
This equation means, of course, that limN--+ocllx - L.::~=1 'xnunll = O. Show that one
Schauder base for Co is given by un(m) = tlnm (n, m = 1,2,3, ... ).
Section 1.6 Zorn's Lemma, Hamel Bases, Hahn-Banach Theorem 39
25. Prove that the An in the preceding problem are functions of x and must be, in fact, linear
and continuous.
26. Prove that if the Banach space X possesses a Schauder base, then X must be separable.
That is, X must contain a countable dense set.
27. Prove that for any set A in a normed linear space all these sets are the same:
A-L, (closure A)-L, (span A)-L, [closure (span A)]-L,.
29. Use the Axiom of Choice to prove that for any set S having at least 2 points there is a
function f :S --t S that does not have a fixed point.
30. An interesting Banach space is the space C consisting of all convergent sequences. The
norm is IIxlloc = sUPn Jx(n)J. Obviously, we have these set inclusions among the examples
encountered so far:
Prove that Co is a hyperplane in c. Identify in concrete terms the conjugate space c'.
31. Prove that if H is a Hamel base for a normed linear space, then so is {h/llhil : h E H}.
32. Let X and Y be linear spaces. Let H be a Hamel base for X. Prove that a linear map
from X to Y is completely determined by its values on H, and that these values can be
arbitrarily-assigned elements of Y.
33. Prove that on every infinite-dimensional normed linear space there exist discontinuous
linear functionals. (The preceding two problems can be useful here.)
34. Using Problem 33 and Problem 1.5.3, page 28, prove that every infinite-dimensional
normed linear space is the union of a disjoint pair of dense convex sets.
35. Let two equivalent norms be defined on a single linear space. (See Problem 1.4.3, page
23.) Prove that if the space is complete with respect to one of the norms, then it is
complete with respect to the other. Prove that this result fails (in general) if we assume
only that one norm is less than or equal to a constant multiple of the other.
36. Let Y be a subspace of a normed space X. Prove that there is a norm-preserving injective
map J : Y' --t X' such that for each <P E Y', J<p is an extension of <p.
38. Let T be a bounued linear map of Co into Co. Show that T must have the form (Tx)(n) =
2.::':1 ani X(i) for a suitable infinite matrix [ani]. Prove that sUPn 2.::':1 Jani = IITII·
J
40. What implications exist among these four properties of a set S in a normed linear space
X? (a) S is fundamental in X; (b) S is linearly independent; (c) S is a Schauder base
for X; (d) S is a Hamel base for X.
40 Chapter 1 Normed Linear Spaces
41. A "spanning set" in a linear space is a set S such that each point in the space is a linear
combination of elements from S. Prove that every linear space has a minimal spanning
set.
42. Let f : IR --t IR. Define x -< y to mean f(x) :::; f(y). Under what conditions is this a
partial order or a total order?
43. Criticize the following "proof' that if X and Y are any two normed linear spaces, then
X' == Y·. We can assume that X and Yare subspaces of a third normed space z.
(For example, we could use Z == X EI1 Y, a direct sum.) Clearly, X' is a subspace of
Z', since the Hahn-Banach Theorem asserts that an element of X' can be extended,
without increasing its norm, to Z. Clearly, Z· is a subspace of Y', since each element
of Z· can be restricted to become an element of Y·. So, we have X' C Z· C Y·. By
symmetry, Y' C X'. So X· == Y' .
44. Let K be a subset of a linear space X, and let f : K --t IR. Establish necessary and
sufficient conditions in order that f be the restriction to K of a linear functional on X.
45. For each a in a set A, let f(a) be a subset of N. Without using the Axiom of Choice,
prove that f has a choice function.
Select any Xl E X and let T1 > O. We want to prove that 8 intersects 1 On. n:=l
Since 0 1 is open and dense, 0 1 n 8 1 is open and nonvoid. Take 8~ C 8 1 n 0 1 •
Section 1.7 The Baire Theorem and Uniform Boundedness 41
The points Xn form a Cauchy sequence because Xi, Xj E Sn if i,j > n, and so
Since X is complete, the sequence [xnl converges to some point X'. Since for
i > n,
Xi E S~+1 C S1 nOn
we can let i -+ to conclude that x' E S~+1 C S1 n On. Since this is true for
n:=1 On does indeed intersect S1.
00
all n, the set •
Proof. Let Xbe a complete metric space, and suppose that =X U:=1 Fn ,
where each Fn is a closed set having empty interior. The sets On = X "Fn are
open and dense. Hence by Baire's Theorem, n:=1
On is dense. In particular, it
is nonempty. If X E n:=1
On, then X EX" U:=1
Fn , a contradiction. •
A subset in a metric space X (or indeed in any topological space) is said to
be nowhere dense in X if its closure has an empty interior. Thus the set of
irrational points on the horizontal axis in ]R2 is nowhere dense in 1R2 . A set that
is a countable union of nowhere dense sets is said to be of category I in X. A
set that is not of category I is said to be of category II in X.
Observe that all three of these notions are dependent on the space. Thus
one can have E C X C Z, where E is of category II in X and of category I in
Z. For a concrete example, the one in the preceding paragraph will serve.
The Corollary implies that if X is a complete metric space, then X is of the
second category in X.
Intuitively, we think of sets of the first category as being "thin," and those
of the second category as "fat." (See Problems 5, 6, 7, for example.)
Proof. Assume first that c = sUPo IIAo II < 00. Then every x satisfies IIAoxl1 :::;;
cllxll, and every x belongs to the set F = {x : sUPo IIAoxl1 < oo}. Since F = X,
the preceding corollary implies that F is of the second category in X.
For the sufficiency, define
and assume that F is of the second category in X. Notice that F = U~=l Fn·
Since F is of the second category, and each Fn is a closed subset of X, the
definition of second category implies that some Fm contains a ball. Suppose
that
B =- {x EX: Ilx - xoll :;;;; r} C Fm (r > 0)
For any x satisfying Ilxll :;;;; 1 we have Xo + rx E B. Hence
Example 1. Consider the familiar space C[O, 1]. We are going to show that
most members of C[O, 1] are not differentiable. Select a point { in the open
interval (0,1). For small positive values of h we define a linear functional ¢>h by
the equation
It is elementary to prove that ¢>h is linear and that II ¢>h II = h -1. Consequently,
by the Banach-Steinhaus Theorem, the set of x such that sUPh l¢>h(X)1 < 00 is
of the first category. Hence the set of x for which sUPh l¢>h(X)1 = 00 is of the
second category in C[O, 1]. In other words, the set of functions in C[O, 1] that
are not differentiable at { is of the second category in C[O, 1]. •
Example 2. The formal Fourier series of a function x is
L
00
O:n(x)e int
n=-oo
1 (27< .
O:n(x) = 211' 10 x(s)e- ms ds
It can be shown that the norm of An, considered as a map of C 2rr into itself,
is roughly (4/71"2) log n. In fact, the norm of each functional t* 0 An has this
property. Recall from Problem 19 in Section 1.5 (page 29) that t* denotes
point-evaluation at t, so that
Since sUPn Ilt* 0 Anll = +00, the set of x in C2rr whose Fourier series diverge
at a specified point t is a set of the second category. Thus, for most periodic
continuous functions, the Fourier series do not converge. •
Proof. If Anx ---+ 0 for all x, then obviously sUPn IIAnxl1 < 00 for all x. Hence
sUPn IIAnl1 < 00, by the Principle of Uniform Boundedness.
For the other half of the theorem, assume that IIAnl1 < M for all nand
that An U ---+ 0 for all U in a fundamental set F. It is elementary to prove that
AnY ---+ 0 for all Y in the linear span of F. Now let x E X. Let f > O. Select y
in the linear span of F so that Ilx - YII < f/2M. Select m so that IIAnYl1 < f/2
whenever n :;::, m. Then for n :;::, m we have
•
Example 3. The Riemann integral of a continuous function x defined on [a, b]
can be obtained as a limit as follows:
Ia
b
x(s)ds =
n b- a b- a
lim Lx(a+i--)'--
n-+oo n
i=1
n
Here it is necessary to assume that for each n, {Snl, Sn2, ... , snn} is a set of n
distinct points in [a, bJ. We call these points the "nodes" of the functional 4>n.
An old theorem of Szego, presented next, concerns this example. •
Theorem 5. Let 'lj; and 4>n be as in Equations (1) and (2) above.
In order that 4>n(x) ~ 'lj;(x) for each x E C[a, b], it is necessary and
sufficient that these two conditions be fulfilled:
n
(i) sup L IAnil < 00
n i=l
(ii) The convergence occurs for all the elementary monomial functions,
S f-t sk, k = 0, 1,2, ....
Proof. Consider the sequence of functionals ['lj; - 4>nJ. The norm of 'lj; is
Next observe that the functions ek defined by the equation ek(s) = sk, where
k = 0,1, ... , form a fundamental set in C[a, bJ, by the Weierstrass Polynomial
Approximation Theorem. Now apply the preceding theorem. •
Problems 1. 7
4. Is every set of the second category the complement of a set of the first category?
5. Prove that in a complete metric space, the complement of a set of the first category is
dense and of second category.
6. Prove that a closed, proper subspace in a normed linear space is nowhere dense (and
hence of first category).
7. Prove that in a Banach space, a subspace of second category must be dense.
8. Prove that in a Banach space every nonempty open set is of the second category. Prove
that this assertion is not true for normed linear spaces in general. (Give an example.)
9. Let [xn] be a sequence in a Banach space X. Assume that sUPn 1</>{xn)1 < 00 for each
</> E X'. Prove that [Xn] is bounded. Does X have to be complete for this? If so, give a
suitable example.
10. Determine the category of these sets: (a) the rationals in 1R; (b) the irrationals in 1R; (c) the
union of all vertical lines in 1R 2 that pass through a rational point on the horizontal axis;
(d) the set of all polynomials in C[O, I].
11. Does a homeomorphism (continuous map having a continuous inverse) preserve the cat-
egory of sets?
12. Give an example to show that a homeomorphic image of a complete metric space need
not be complete.
13. Prove that any subset of a set of the first category is also of the first category. Prove
that a set that contains a set of second category is also of second category.
14. Is the closure of a nowhere dense set also nowhere dense? Is the closure of a set of the
first category also of the first category?
15. For each natural number n, let An be a continuous linear transformation of a Banach
space X into a normed linear space Y. Suppose that for each x E X the sequence [An x]
is convergent. Define A by the equation Ax = limn--->oc Anx. Prove that A is linear and
continuous. Explain why completeness is needed.
16. Let X be the space of real sequences x = [x{l), x(2), ... ] in which only a finite number of
terms are nonzero. Give X the supremum norm. Define functionals </>n by the equation
</>n{x) = l:~=1 x{i). Show that the sequence [</>n{X)] is bounded for each x, that each
4>n is continuous, but that the sequence [</>n] is not bounded. (Compare to the Uniform
Boundedness Theorem.)
17. Prove that the set of reals whose decimal expansions do not contain the digit 7 is a set
of the first category.
18. Select a function Xo E C[O, I] and a sequence of reals [an]. Define recursively
n = 0, 1, ...
Assume that for each t E [0, I] there is an n for which xn{t) = O. Prove that Xo = O.
19. Return to Problem 15, and suppose that Y is complete. Weaken the hypotheses on An
so that An is not necessarily linear and the set of x for which [Anx] converges is of the
second category. Prove that this set must be X and that A is continuous.
20. Let [An] be a sequence of continuous linear maps from one Banach space X to another.
Prove that the set of x for which [Anx] is a Cauchy sequence is either X or a set of first
category.
21. Prove that in a complete metric space a set of the first category has empty interior.
22. Prove that in a complete metric space, if a countable intersection of open sets is dense,
then it is of second category.
23. Give an example of a metric space having count ably many points that contains no subset
of second category.
46 Chapter 1 Normed Linear Spaces
24. Prove that a set V is nowhere dense if and only if each nonempty open set has a nonempty
open subset that lies in the complement of V.
25. (Principle of Condensation of Singularities). For each n and m in 1'>1, let Anm be a
bounded linear operator from a Banach space X into a normed linear space Y. Assume
that sUPm IIAnm II = 00 for each n. Prove that the set
Draw pictures of [0, 1] '- AI, [0,1] '- (AI u A 2 ), and so on to see that we are successively
removing the middle thirds from intervals. Each An is open, so U
An is open. Hence C
is closed. Prove that C is nowhere dense. Prove that the lengths of the removed intervals
add up to 1. Explain how there can be anything left in C. Prove that C is a "perfect
set," i.e., if x E C, then C'- {x} is not closed.
27. Prove this theorem: Let X be a complete metric space. Let {f,,} be a family of continuous
real-valued maps defined on X. Assume that for each x, sup" If,,(x)1 < 00. Then for
some nonvoid open set 0, sUPxEO sup" If,,(x)1 < 00.
28. Prove that a countable union of sets of the first category is also a set of the first category.
29. Prove that a nowhere dense set is of the first category.
30. Is a countable set in a metric space necessarily a set of the first category?
31. Answer the question in Problem 30 for countable subsets of a normed linear space.
32. Prove that the sets Fn occurring in the proof of the Banach-Steinhaus Theorem are
closed.
33. In a complete metric space, is every nonempty open set of the second category?
34. A metric space (X, d) is said to be discrete if d(x, y) = 1 whenever x oF y. In such a
space identify the nowhere dense sets, sets of first category, sets of second category, and
dense sets. (Cf. Problem 2.)
35. Can a normed linear space have any of the peculiar properties of discrete metric spaces?
36. Show that a countable discrete metric space can be embedded isometrically in the Banach
space Co.
37. Give an example of sets S C Fe X, where X is a complete metric space, F is a closed
set in X, and S is of Category II in F but of Category I in X.
38. The intersection of a countable family of open sets is called a G6-set. Prove that the set
of rationals is not a G 6-set in JR.
39. (Continuation) Let f : JR --+ JR be continuous. Show that each set f-l(r) is a G6-set.
40. (Continuation) Let f : JR --+ JR. Define
w(x) = inf sup If(u) - f(v)1
<>0 Ix-ul<e
Ix-vl<e
41. (Continuation) Prove that there is no function f : JR --+ JR that is continuous at each
rational point and discontinuous at each irrational point.
Section 1.8 Interior Mappings and Closed Mappings 47
48. Prove that a set E in a metric space X (or any topological space) is nowhere dense if
and only if X " E is dense.
49. In a metric space, is a singleton {x} always nowhere dense? Answer the same question
for a normed linear space.
50. Prove that if A is of the second category and B is of the first category, then A " B is of
the second category.
51. Is a countable intersection of sets of the second category necessarily a set of the second
category?
52. A subset of a metric space is called a residual set if its complement is of the first category.
Prove that the intersection of countably many residual sets is a residual set.
Proof. Since Xn E C l (a, bj, we have x~ E C[a, b). Thus y E C[a, b), by Theorem
2 in Section 1.2, page 10. By the Fundamental Theorem of Calculus and the
48 Chapter 1 Normed Linear Spaces
continuity of integration,
it
a
y( s) ds = it
ann
lim x~ (s ) ds = lim it x~
a
(s) ds
Proof. Let L : X -» Y, where L is linear and closed, and X and Yare Banach
spaces. (This double arrow signifies a surjection.) Let S be the open unit ball
in X or Y, depending on the context. Since L is surjective,
Since Y is complete, the Baire Theorem implies that one of the sets cl [L(nS)]
has a nonempty interior. Suppose, then, that for some m in Nand r > °
v + rS C clL(mS)
there is a point X2 E t5 ltS such that Ily - LXI - LX211 < 152. We continue this
construction, obtaining a sequence Xl, X2, ... whose partial sums Zn = Xl + ... +
Xn have the property Ily - LZnl1 < t5n . Also, we have
Section 1.8 Interior Mappings and Closed Mappings 49
IIZn+i - znll = IIXn+l + ... + xn+ill < tOn + ... + tOn+i- 1 < t L OJ
j~n
y + (O/2t)S C L(U)
Proof. Let X be the space, and N 1 , N2 the two norms. The equivalence of
two norms is explained in Problem 1.4.3, page 23. Let I denote the identity map
acting from (X,N2 ) to (X,N1 ). Assume that the norms bear the relationship
Nl ~ N 2. Since N 1 {Ix) ~ N 2(x), we see that I is continuous. By the preceding
corollary, I-I is continuous. Hence for some 0, N2(X) = N 2(I-I X) ~ oN1 (x) .•
By the preceding corollary, N(x) ~ ollxll for some o. Hence IILxl1 ~ ollxll •
50 Chapter 1 Normed Linear Spaces
Proof. Let L : X --» Y be the bounded, linear, interior map. Assume that X
is a Banach space. By Problem 1.2.38 (page 14), it suffices to prove that each
absolutely convergent series in Y is convergent. Let Yn E Y and L: IIYnl1 < 00.
By Problem 2 (of this section), there exist Xn E X such that LX n = Yn and (for
some c> 0) Ilxn II ::( CllYn II· Then L: IIXn II ::( c L: llYn II < 00. By Problem 1.2.3,
page 12, the series L: Xn converges. Since L is continuous and linear, L(L: xn) =
L: LX n = L: Yn, and the latter series is convergent. •
Let L be a bounded linear transformation from one normed linear space,
X, to another, Y. The adjoint of L is the map L* : Y* -+ X* defined by
L *¢ = ¢ 0 L. Here ¢ ranges over Y*. It is elementary to prove that L * is linear.
It is bounded because
In this equation ¢ ranges over functionals of norm 1 in Y*, and x ranges over
vectors of norm 1 in X. We used Corollary 4 on page 36.
In a finite-dimensional setting, an operator L can be represented by a matrix
A (which is not necessarily square). This requires the prior selection of bases
for the domain and range of L. The adjoint operator L * is represented by
the complex conjugate matrix A*. An elementary theorem asserts that A is
surjective ("onto") if and only if A * is injective ("one-to-one"). (See Problem 20.)
The situation in an infinite-dimensional space is only slightly more complicated,
as indicated in the next three theorems.
Proof. Recall the notation U.l. for the set {x EX: ¢(x) = 0 for all ¢ E U},
where X is a normed linear space and U is a subset of X*. (See Problems 1.6.20
and 1.6.21, on page 38, as well as Problem 13 in this section, page 52.) We
denote by R(L) the range of L. To prove [closure R(L)] C [N(L*)].l., let Y be
Section 1.8 Interior Mappings and Closed Mappings 51
an element of the set on the left. Then Y = lim Yn for some sequence [Yn] in
R(L). Write Yn = LX n for appropriate Xn . To show that Y E [N(L*)].1. we must
prove that ¢(y) = 0 for all ¢ E N(L*). We have
Problems 1.8
1. Use the notation in the proof of the Interior Mapping Theorem. Show that a linear map
L : X -t Y is interior if and only if L(8) J r 8 for some r > O.
2. Show that a linear map L : X --->t Y is interior if and only if there is a constant c such
that for each y E Y there is an x E X satisfying Lx = y, Ilxll ~ cllyll.
3. Define T : CO -t Co by the equation (Tx)(n) = x(n + 1). Which of these properties does
T have: injective, surjective, open, closed, invertible? Does T have either a right or a
left inverse?
4. Prove that a closed (and possibly nonlinear) map of one normed linear space into another
maps compact sets to closed sets.
5. Let L be a linear map from one Banach space into another. Suppose that the conditions
Xn -t 0 and LXn -t y imply that y = O. Prove that L is continuous.
52 Chapter 1 Normed Linear Spaces
6. Prove that if a closed map has an inverse, then the inverse is also closed.
7. Let M and N be closed linear subspaces in a Banach space. Define L : M x N --+ M + N
by writing L(x, y) = x + y. Prove that M + N is closed if and only if L is an interior
map.
8. Adopt the hypotheses of Problem 7. Prove that M + N is closed if and only if there is
a constant c such that each z E M + N can be written z = x + y where x E M, yEN,
and Ilxll + lIyll ~ cllzll·
9. Let L : X --+> Y be a continuous linear surjection, where X and Y are Banach spaces.
Let Yn --+ Y in Y. Prove that there exist points Xn E X and a constant e E IR such that
LXn = Yn, the sequence [XnJ converges, and IIxnll ~ ellYnll.
10. Recall the space f. defined in Example 8 of Section 1.1. Define L : l --+ l by (Lx)(n) =
nx(n). Use the sup-norm in l and prove that L is discontinuous, surjective, and closed.
11. Is the identity map from (C[-1, 1J, II II",,) into (C[-l, 1], II 111) an interior map? Is it
continuous?
12. (Continuation) Denote the two spaces in Problem 11 by X and Y, respectively. Let
nx Ixl ~ lin
9n(S) = { x/lxl Ixl > lin
Show that [9nJ is a Cauci1y sequence in Y. Since the space L1[-1, 1J is complete, 9n --+ 9
in L1. Since G is closed, 9 should be in G. But it is discontinuous. Explain.
13. Let X be a normed linear space, and let K C X and U C X*. Define
Describe the range of L and prove that it does not contain the function f(x) = et .
19. (Continuation) Draw the same conclusion as in Problem 18 by invoking the Closed Range
Theorem. Thus, find ¢ in the null space of L * such that ¢(f) f. O.
Section 1.9 Weak Convergence 53
20. For an m x n matrix A prove the equivalence of these assertions: (a) A* is injective. (b)
The null space of A* is O. (c) The columns of A* form a linearly independent set. (d)
The rows of A form a linearly independent set. (e) The row space of A has dimension
m. (f) The column space of A has dimension m. (g) The column space of A is IRm. (h)
The range of A is IRm. (i) A is surjective, as a map from IR n to IRm.
rfJ(x) = La(i)x(i)
i=l
for a suitable point a E fl. Thus rfJ(e n) = a(n) --+ O. The sequence [x n] does not
have the Cauchy property, because Ilxn- Xmll
= 1 when n #- m. •
Proof. Let X be the ambient space, and suppose that Xn --' x. Define func-
tionals xnon X' by putting
(rfJEX')
For each rfJ, the sequence [rfJ(xn)] converges in JR.; hence it is bounded. Thus
sUPn IXn(rfJ) I < 00. By the Uniform Boundedness Theorem (page 42), applied in
the complete space X', IIXnl1
~ M for some constant M. Hence, for all n,
EX', IlrfJll ~ 1} ~ M
sup {IXn(rfJ)1 : rfJ
IlL
k k
Ilx - Xnll = ¢i(X - Xn)bill ~ L l¢i(X - Xn)lllbill-+ 0 •
i=l i=l
Theorem 2. If a sequence [xnJ in a normed linear space converges
weakly to an element x, then a sequence of linear combinations of the
elements Xn converges strongly to x.
Proof. Another way of stating the conclusion is that x belongs to the closed
subspace
Y = closure (span{ Xl, X2, ... })
If X f- Y, then by Corollary 2 of the Hahn-Banach Theorem (page 34), there is
a continuous linear functional ¢ such that ¢ E y..L and ¢( x) = 1. This clearly
contradicts the assumption that Xn -->. X. •
Proof. (The term "fundamental" was defined in Section 1.6, page 36.) Let
:F be the fundamental subset of X' mentioned in the theorem. Let 'l/J be any
member of X'. We want to prove that 'l/J(x n) -+ 'l/J(xo). By hypothesis, there is a
constant M such that Ilxill < M for i = 0, 1,2, ... Given c > 0, select ¢l,"" ¢m
in :F and scalars AI, ... ,Am such that
Put ¢ = L Ai¢i. It is easily seen that ¢(xn) -+ ¢(xo). Select N so that for all
n> N we have the inequality I¢(xn) - ¢(xo)1 < c/3. Then for n > N,
•
c c c
~ 3M M + 3 + 3M M = c
Example 2. Fix a real number p in the range 1 ~ P < 00. The space I!p is
defined to be the set of all real sequences X for which L~=l Ix(n)IP < 00. We
define a norm on the vector space I!p by the equation
00 ) lip
Ilxllp = ~ Ix(n)IP
(
Section 1.9 Weak Convergence 55
For p = 00, we take foo to be the space of bounded sequences, with norm
Ilxlloo = sUPn Ix(n)l. We shall outline some of the theory of these spaces. (This
theory is actually included in the theory of the LP spaces as given in Chapter
8.) Notice that in these spaces there is a natural partial order: x ~ y means
that x(n) ~ y(n) for all n. We also define Ixl by the equation Ixl(n) = Ix(n)l . •
HOlder Inequality. Let 1 < p < 00, lip + 11q = 1, x E fp, and
y E f q . Then
n=1
~ L {2max[lx(n)l, ly(n)llY
=L 2P max{lx(n)IP, ly(n)IP}
~ 2P L {lx(n)IP + ly(n)IP} < 00
This proves that x + y E f p . Now let lip + llq = 1 and observe that
because
Thus, finally,
•
Some theorems about these spaces are given here without proof.
56 Chapter 1 Normed Linear Spaces
A refinement of this theorem states that a convex set is closed if and only
if it is weakly sequentially closed. See [DSl, page 422.
Now use the Cantor diagonalization process: Define ni to be the ith element
of Ni . We claim that limi-+oo ¢ni (Xk) exists for each k. This is true because
limnENk ¢n(Xk) exists by construction, and if i ? k, then ni E Ni C Nk. For any
x E X we write
This inequality shows that [¢ni(x)] has the Cauchy property in IR for each x E
X. Hence it converges to something that we may denote by ¢(x). Standard
arguments show that ¢ E X'. •
For Schur's Lemma, see [HP] page 37, or [Ban] page 137. The original
source is [Schul. See also [Jam] page 288, or [HoI] page 149.
Problems 1.9
1. Show that the Holder Inequality remains true if we replace the left-hand side by
L lx(n)lly(n)l·
2. If 1 < p < q, what inclusion relation exists between ep and eq ?
3. Prove that if Xn ~ x and llxn II ~ c, then IIxil :::; c. Why can we not conclude that
IIxil = c? Give examples. Explain in terms of weak continuity and weak semicontinuity
of the norm.
4. Fix p > 1 and define a nonlinear map T on ep by the equation Tx = IxI P - 1 sgn(x).
Thus, (Tx)(n) = Ix(n)IP-l sgn(x(n)) for all n. Prove that T maps tp into i q , where
lip + l/q = 1. Then determine whether T is surjective.
5. Prove this theorem: In order that a sequence [xnl in a normed linear space X converge
weakly to an element x it is necessary and sufficient that the sequence be bounded and
that </>(x n ) ~ </>(x) for all functionals </> in a set that is dense on the surface of the unit
ball in X'.
6. Prove this characterization of weak convergence in the space co: In order that a sequence
Xn converge weakly to an element x in the space CO it is necessary and sufficient that the
sequence be bounded and that (for each i) we have limn-toc xn(i) = x(i).
7. A Banacl1 space X is said to be weakly complete if every sequence [xnl such that </>(xn)
converges for each </> in X' must converge weakly to an element x in X. Prove that the
space CO is not weakly complete.
58 Chapter 1 Normed Linear Spaces
The reader may wish to pause and prove that J is a linear isometry.
For an example of this embedding, let X = co; then X* = fll and X** = floo.
In this case, J : Co ---+ floo, and J can be interpreted as the identity embedding,
since </>(x) = L~=l u(n)x(n) for an appropriate u E fll.
If the natural map of X into X** is surjective, we say that X is reflexive.
Thus if X is reflexive, it is isometrically isomorphic to X**. The converse is false,
however. A famous example of R.C. James exhibits an X that is isometrically
isomorphic to X**, but the isometry is not the canonical map J, and indeed the
canonical image of J(X) is a proper subspace of X** in the example. See [Ja2].
Proof. If p-l + q-l = 1, then fl; = flq and fl~ = flp by Theorem 4 of Section
1.9, page 56. Hence fl;* = fl p. But we must be sure that the isometry involved
in this statement is the natural one, J. Let A : flp ---+ fl~ and B : flq ---+ fl; be
the isometries that have already been discussed in a previous section. Thus, for
example, if x E flp then Ax is the functional on flq defined by
L x(n)y(n)
00
(Ax)(y) =
n=l
B*</> = </> 0 B
One of the problems asks for a proof of the fact that B* is an isometric isomor-
phism of fl;* onto fl~. Thus B*-l A is an isometric isomorphism of flp onto fl;*.
Now we wonder whether B*-l A = J. Equivalent questions are these:
B>-lAx = Jx (x E flp)
Ax=B*Jx (x E flp)
(Ax)(y) = (B* Jx)(y) (x E flp, y E flq)
(Ax)(y) = (Jx)(By) (x E flp, y E fl q)
(Ax)(y) = (By)(x) (x E flp, y E flq)
The final assertion is true because both sides of the equation are by definition
L~=l x(n)y(n). •
Section 1.10 Reflexive Spaces 59
Proof. (Partial) Let X be reflexive, S its unit ball, and [Yn] a sequence in
S. We wish to extract a subsequence [Yni] such that Yni --' yES. To start,
let Y be the closure of the linear span of {Yl, Y2, ... }. Then Y is a closed and
separable subspace of X. By Theorem 2, Y is reflexive, and so Y = Y". Since
Y" is separable, so is Y' . Let {1j;1, 1j;2, ... } be a countable dense set in Y·. Since
[1j;I(Yn)] is bounded, there exists an infinite set NI eN such that limnEN l 1j;dYn)
exists. Proceeding as we did in the proof Theorem 10, Section 1.9, page 57, we
find a subsequence Yni such that 1j;( Yn i) converges for all1j; E Y'. By a corollary
of the uniform boundedness theorem, there is an element f of Y" such that
1j;( Yn i) --+ f(1j;) for all1j; E Y'. Since Y is reflexive, f(1j;) = 1j;(y) for some Y E Y.
Hence 1j;(YnJ --+ 1j;(y) for all1j; E Y·. Now if 4> E X', then 4>IY E Y·. Hence
Proof. (Partial) Suppose that X is reflexive. Let <I> E X', and select Xn E X
such that IIXnl1 :::;; 1 and <I>(xn) ----711<1>11. By the Eberlein-Smulyan Theorem, there
is a subsequence [xnij that converges weakly to a point x satisfying Ilxll :::;; 1. By
the definition of weak convergence,
The converse is more difficult, and we refer the reader to [Holj, page 157.•
One application of the second conjugate space occurs in the process of com-
pletion. If X is a normed linear space that is not complete, can we embed it
linearly and isometrically as a dense set in a Banach space? If so, such a Banach
space is termed a completion of X. The Cantor method of completion of a
metric space is fully discussed in [KFj. The idea of that method is to create a
new metric space whose elements are Cauchy sequences in the original metric
space.
If X is a normed linear space, we can embed it, using the natural map
J, into its second conjugate space X". The latter is automatically complete.
Hence J(X) can be regarded as a completion of X. It can be proved that all
completions of X are isometrically isomorphic to each other.
The Lebesgue spaces Lp[a, bj can be defined without knowing anything
about Lebesgue measure or integration. Here is how to do this. Consider the
space C[a, bj of all continuous real-valued functions on the interval [a, bj. For
1 :::;; p < 00, we introduce the norm
[f b ]l/P
Ilxllp = a Ix(s)IP ds
In this equation, the integration is with respect to the Riemann integral. The
space C[a, b], endowed with this norm, is denoted by Cp[a, bj. It is not complete.
Its completion is Lp[a, bj. Thus if J is the natural map of Cp[a, bj into its second
conjugate space, then
Problems 1.10
1. Use the fact that c~ = fl and fi = foe to prove that the successive conjugate spaces of
co are all nonrefiexive.
2. Find a sequence in the unit ball of Co that has no weakly convergent subsequence.
Chapter 2
Hilbert Spaces
2.1 Geometry 61
2.2 Orthogonality and Bases 70
2.3 Linear Functionals and Operators 81
2.4 Spectral Theory 91
2.5 Sturm-Liouville Theory 105
2.1 Geometry
Hilbert spaces are a special type of Banach space. In fact, the distinguishing
characteristic is that the Parallelogram Law is assumed to hold:
This succinct description gives no hint of the manifold implications of that as-
sumption. The additional structure available in a Hilbert space makes it the
preferred domain for much of applied mathematics! We pursue a more tradi-
tional approach to the subject, not basing everything on the Parallelogram Law,
but using ideas that are undoubtedly already familiar to the reader, in particular
the dot product or inner product of vectors. An inner-product space is a
vector space X over the complex field in which an inner product (x, y) has been
defined. We require these properties, for all x, y, and z in X:
(1) (x, y) is a complex number
(2) (x, y) = (y, x) (complex conjugate)
(3) (ax, y) = a(x, y) a EC
(4) (x,x»O if x#O
(5) (x + y, z) = (x, z) + (y, z)
The term "pre-Hilbert space" is also used for an inner-product space. Occa-
sionally, we will employ real inner-product spaces and real Hilbert spaces. For
them, the scalar field is JR, and the inner product is real-valued. However, some
theorems to be proved later are valid only in the complex case.
61
62 Chapter 2 Hilbert Spaces
\I:xi'Y) = I:(Xi'Y)
i=1 i=1
Proof. Only c and d offer any difficulty. For c, let IIYII = 1 and write
o ~ (x - >"y, x - >..y) = (x, x) - X(x, y) - >"(y, x) + 1>"1 2 (y, y).
Now let>.. = (x,y) to get 0 ~ IIxl12 -1(x,y)1 2 . This establishes c in the case
IIYII = 1. By homogeneity, this suffices. To prove d, we use c as follows:
Ilx + Yl12 = (x + y, x + y) = (x, x) + (y, x) + (x, y) + (y, y)
= IIxl12 + 2R(x, y) + IIYl12 ~ IIxl12 + 21(x, y)1 + IIYl12
(x,y) = 11X(t)y(t)dt
is not complete. Consider the sequence shown in Figure 2.l. The sequence
of functions has the Cauchy property, but does not converge to a continuous
function.
Figure 2.1
cover the given set with a sequence of open intervals (an, bn ) whose total length
satisfies Ln(bn - an) < c. An important consequence is that if f is an element
of £2, then f (x) is meaningless! Indeed, f stands for an equivalence class of
functions that can differ from each other at the point x, or indeed on any set of
points having measure O. When f(x) appears under an integral sign, remember
that the x is dispensable: The integration operates on the function as a whole,
and no particular values f(x) are involved.
Example 5. Let (S, A, /L) be any measure space. The notation £2(S) then
denotes the space of measurable complex functions on S such that J 1f (s) 12 d/L <
00. In L2(S), define (f,g) = J f(s)g(s)d/L. Then L2(S) is a Hilbert space. See
Theorem 3 in Section 8.7, page 411. •
Example 6. The space £2 (or £2) consists of all complex sequences x =
[x(I), x(2), . .. ] such that L Ix(n)12 < 00. The inner product is (x, y) =
L x(n)y(n). This is a Hilbert space, in fact a special case of Example 5. Just
take S = N and use "counting" measure. (This is the measure that assigns to
a set the number of elements in that set.) This example is also included in the
general theory of the spaces £p, as outlined in Section 1.9, pages 54-56. •
For the uniqueness of the point Y, suppose that Yl and Y2 are points in K of
distance 0 from x. By the previous calculation we have
•
In an inner-product space, the notion of orthogonality is important. If
(x, y) == 0, we say that the points x and yare orthogonal to each other, and we
write x J.. y. (We do not say that the points are orthogonal, but we could say
that the pair of points is orthogonal.) If Y is a set, the notation x J.. Y signifies
that x J.. Y for all Y E Y. If U and V are sets, U J.. V means that u J.. v for all
u E U and all v E V.
Section 2.1 Geometry 65
Hence
2R{"X(x - y, u) } ~ 1>'1 2IIul12
If (x - y, u) i- 0, then u i- 0 and we can put>. = (x - y, u) IIIul12 to get a
contradiction:
•
Definition. The orthogonal complement of a subset Y in a inner-product
space X is
(In this equation, T. denotes "the imaginary part of.") Thus we have fully es-
tablished that (u + v, y) = (u, y) + (v, y). By induction, we can then prove that
(nx, y) = n(x, y) for all positive integers n. From this it follows, for any two
positive integers m and n, that
By continuity, we obtain (AX, y) = A(X, y) for any A ;?! O. From the definition,
we quickly verify that
Hence (AX, y) = A(X, y) for all complex scalars A. From the definition we obtain
= 4(x, y)
•
Section 2.1 Geometry 67
Notice that when () = 90°, this equation gives the Pythagorean rule. In an
inner-product space, we consider a triangle as shown in Figure 2.2.
x
y
o
Figure 2.2
We have
R(x,y)
() = Arccos Ilx1111Y11
The "principal value" of Arccos is used; it is an angle in the interval [0,7l'J. Is
the definition proper? Yes, because the number R(x,y)llxll-lllyll-l lies in the
interval [-1, 1], by the Cauchy-Schwarz inequality. Other definitions for the
angle between two vectors can be given. See [Ar], pages 87-90.
There are many sources for the theory of Hilbert spaces. In addition to the
references indicated at the end of Section 1.1, there are these specialized texts:
[AG], [Ar], [Berb], [Berb2]' [DMJ, [HaI2J, [HaI3], [St], and [Youn].
Problems 2.1
18. In an inner-product space, prove that if IIxn II lIyll and (xn, y) -> lIyll2, then In -> y.
->
19. Prove or disprove: In a Hilbert space, if 2::'=1 IIxn ll 2 < 00, then the series 2::'=1 xn
converges.
20. Find all solutions to the equation (x, a)e = b, assuming that a, b, and e are given vectors
in an inner-product space.
21. Indicate how the equation Ax = b can be solved if the operator A is defined by Ax =
2:~=1 (x, ai)ei. Describe the set of all solutions.
24. Let v = [v(1),v(2), ... 1 be an element of £2 Prove that the set {x E £2: Ix(n)I";; Iv(n)1
for all n} is compact in f.2.
25. Prove that if M is a closed subspace in a Hilbert space, then M.L.L = M.
26. Prove that if M = M.L.L for every closed linear subspace in an inner-product space, then
the space is complete.
27. Prove that if M and N are closed subspaces of a Hilbert space and if M ~ N, then
M + N is closed.
28. Consider the mapping A in Problem 21. Find necessary and sufficient conditions on ai
and Ci in order that A have a fixed point other than O.
29. In a Hilbert space, elements w, Ui, and Vi are given. Show how to find an x such that
n
X = W + 2:)x, Vi)Ui
i=l
30. In a Hilbert space, let IIxnll ~ c, IIYnll ~ c, and (xn,Yn) ~ c2 . Prove that Ilxn -Ynll ~
O. Then make two generalizations. Is there any similar result for unbounded sequences?
32. Let K be a closed convex set in a Hilbert space X. Let x E X and let Y be the point of K
closest to x. Prove that R(x - y, v - y) ,,;; 0 for all v E K. Interpret this as a separation
theorem, i.e., an assertion about a hyperplane and a convex set. Prove the converse.
34. The Banach space (1 consists of sequences [Xl, X2, ... 1for which L:
IXn I < 00. The norm
is defined to be Ilxll = 2::
IXnl· Prove that £1 is dense in f.2, and explain why this does
not contradict the fact that £1 is complete.
37. Find the necessary and sufficient conditions on the complex numbers Wi, W2, ..• , Wn in
order that the equation
n
(x,y) = Z:=x(k)y(k)Wk
k=l
39. Let K be a closed convex set in a Hilbert space X. For each x in X, let Px be the point
of K closest to x. Prove that IIPx - pyll ~ IIx - yll. (Cf. Problems 2.2.24, 2.1.32.)
40. (Continuation) Prove that each closed convex set K in a Hilbert space X is a "retract",
i.e., the identity map on K has a continuous extension mapping X onto K.
41. Let F and G be two maps (not assumed to be linear or continuous) of an inner product
space X into itself. Suppose that for all x and y in X, (F(x),y) = (x,G(y». Prove that
if a sequence Xn converges to x, and G(xn) converges to y, then y = G(x). Prove also
that F(O) = G(O) = O.
This theorem has a counterpart for orthogonal sets that are not finite, but
its meaning will require some explanation. What should we mean by the sum
of the elements in an arbitrary subset A in X? If A is finite, we know what is
meant. For an infinite set, we shall say that the sum of the elements of A is s if
and only if the following is true: For each positive f. there exists a finite subset
Ao of A such that for every larger finite subset F we have
When we say "larger set" we mean only that Ao c F c A. Notice that the
definition employs only finite subsets of A. For the reader who knows all about
Section 2.2 Orthogonality and Bases 71
We let n -t 00 to get (u, Sm) = 2:::1 IIXi 112. Then let m -t 00 to get (u, x) = A,
where X = lim Sm. It follows that X = u, because
Proof. Let Y = I:~=l (x, Yi)Yi. By Theorem 3 in Section 2.1, page 65, it suffices
to verify that x - Y ..1 Y. For this it is enough to verify that x - Y is orthogonal
to each basis vector Yk. We have
(x - y, Yk) = (x, Yk) - \'L (x, Yi)Yi, Yk) = (x, Yk) - L (x, Yi)(Yi, Yk)
i i
•
The vector Y in the above proof is called the orthogonal projection of x onto
Y. The coefficients (x, Yi) are called the (generalized) Fourier coefficients of
x with respect to the given orthonormal system. The operator that produces
Y from x is called an orthogonal projection or an orthogonal projector.
Look ahead to Theorem 7 for a further discussion.
Proof. For j ranging over a finite subset J of I, let Y = I:(x, Uj)Uj. This vector
Y is the orthogonal projection of x onto the subspace U = span[uj : j E J]. By
Theorem 3, x - y..l U. Hence by the Pythagorean Law
This proves our result for any finite set of indices. The result for I itself now
follows from Problem 4. •
jEJn jEJn
Section 2.2 Orthogonality and Bases 73
UI n
00
{i: (X,Ui) =I o} =
n=l
we see that this set must be countable, it being a union of countably many finite
sets. •
Let X be any inner-product space. An orthonormal basis for X is any
maximal orthonormal set in X. It is also called an "orthonormal base." In this
context, "maximal" means not properly contained in another orthonormal set.
In other words, it is a maximal element in the partially ordered family of all
orthonormal sets, when the partial order is set inclusion, C. (Refer to Section
1.6, page 31, for a discussion of partially ordered sets.)
To see that this is actually an orthonormal base, use the preceding theorem, in
particular the equivalence of a and b. Suppose x E p2 and (x, un) = 0 for all n.
Then x(n) = 0 for all n, and x = o. •
Example 2. An orthonormal basis for L2[0, IJ is provided by the functions
un(t) = e27rint , where n E Z. One verifies the orthonormality by computing the
appropriate integrals. To show that [unJ is a base, we use Part b of Theorem 6.
Let x E L2[0, 1J and x =f. O. It is to be shown that (x,u n ) =f. 0 for some n. Since
the set of continuous functions is dense in L2, there is a continuous y such that
Ilx - < YII Il xll/5.
Then IIYII ) Ilxll-llx - YII
> tllxll·
By the Weierstrass
Approximation Theorem, the linear span of [un] is dense in the space e[a, 1],
furnished with the supremum norm. Select a linear combination p of [un] such
that lip - Ylloo < Il xll/5. Then lip - YII < Il xll/5. Hence Ilpll > IIYII-IIY - pll >
~llxll. Then
pxr
I. Py = Y for all Y E Y. Thus PlY =!y.
j. Ilxll z= I Px l1 2
+ Ilx -
Proof. This is left to the problems.
•
The Gram-Schmidt process, familiar from the study of linear algebra, is an
algorithm for producing orthonormal bases. It is a recursive process that can be
applied to any linearly independent sequence in an inner-product space, and it
yields an orthonormal sequence, as described in the next theorem.
2
T-+oo
1
lim -T jT u)..(t)ua(t)dt=
-T 2
T-+oo
1
lim -T jT ei()..-a)tdt
-T
76 Chapter 2 Hilbert Spaces
(1)
(Here we are using the notion of unordered sum as defined previously.) This
construction is familiar in certain cases. For example, if S = {I, 2, ... , n}, then
the space X just constructed is the familiar space en. On the other hand, if
S = N, then X is the familiar space £2. In the space X, addition and scalar
multiplication are already defined, since X c e S . Naturally, we define the inner
product by
f = L(f,Pk)Pk
k=O
Section 2.2 Orthogonality and Bases 77
f(t) = "L,U,Pk)Pk(t)
k=O
Consult [Davis] or [Sz] for the conditions on f that guarantee uniform conver-
gence of the series to f.
Problems 2.2
Prove that every periodic function is almost periodic, and that the sum of two almost
periodic functions is almost periodic. Refer to [Bes] and [Tay2] for further information.
2. Prove Theorem 7.
3. Prove Theorem 8. (Theorem 7 will help.)
4. Let x : I -+ IR+, where I is some index set. Suppose that there is a number M such that
L [Xj : j E J] ~ M for every finite subset J in I. Prove that L[Xi : i E I] exists and
does not exceed M. What happens if we drop the hypothesis x j ~ O?
5. Prove that the set of functions {u.x : .x E IR}, defined in Example 3, is linearly indepen-
dent.
7. Prove that the functions un(t) == e int (n == 0, ±1, ±2, ... ) form an orthonormal system
with respect to the inner product
(x, y) == -1
271" 1"
-7r
x(t) y(t) dt
78 Chapter 2 Hilbert Spaces
(x, y) 11"
= -; _" x(t)y(t) dt
10. Let VI, V2, ... be a sequence in a Hilbert space X such that span{ VI, V2, ... } = X. Show
that X is finite dimensional.
11. Prove that any orthonormal set in an inner product space can be enlarged to form an
orthonormal basis.
12. Let D be the open unit disk in the complex plane. The space H2(D) is defined to be the
space of functions f analytic in D and satisfying ID If(z)i2 dz < 00. In H2(D) we define
(f,g) = ID f(z)g(z) dz. Prove that the functions un(z) = zn (n = 0,1,2, ... ) form an
orthogonal system in H2(D). What is the corresponding orthonormal sequence?
14. Prove that if {VI, V2, ... } is linearly independent, then an orthogonal system can be
constructed from it by defining UI = VI and
L(v
n-I
15. Illustrate the process in Problem 14 with the four vectors VO,VI,V2,V3, where Vj(t) =tj
II
and the inner product is defined by (x, y) = - I x(t)y(t) dt.
16. Let lUi : i Ell be an orthonormal basis for a Hilbert space X. Let [Vi: i Ell be an
orthonormal set satisfying Li Ilui - vil1 2 < 1. Show that [v;] is also a basis for X.
17. Where does the proof of Theorem 6 fail if X is an incomplete inner-product space? Which
equivalences remain true?
Section 2.2 Ortlwgonality and Bases 79
18. Prove that if P is the orthogonal projection of a Hilbert space X onto a closed subspace
Y, then I - P is the orthogonal projection of X onto y.L.
19. (Cf. Problem 12.) Let r be the unit circle in the complex plane. For functions continuous
on r define (f, g) = -i I.
f(z )g(z) z dz. Prove that this is an inner product and that the
functions zn form an orthogonal family.
20. Prove that an orthogonal projection P has the property that (Px, x) = IIPxl1 2 for all x.
21. Let [un} be an orthonormal sequence in an inner product space. Let [On} C <C and
L::'=1 IOnl 2 < 00. Show that the sequence of vectors Yn = L:7=1 OjUj has the Cauchy
property.
22. Let [Ul, U2,· .. , un} be an orthonormal set in an inner product space X. What choice of
coefficients AJ makes the expression IIx- L:7=1 AjUj II a minimum? Here x is a prescribed
point in X.
dn
23. Define Pn(t) =
_(t 2 - l)n for n = 0,1,2, ... Prove the orthogonality of {Pn : n E N}
dtn
II
with respect to the inner product (x, y) = -1 x(t)y(t) dt.
24. If K is a closed convex set in a Hilbert space X, there is a well-defined map P : X ..... K
such that Ilx - Pxll = dist(x, K) for all x. Which properties (a), ... , (j) in Theorem 7
does this mapping have? (Cf. Problem 2.1.39, page 70.)
25. Consider the real Hilbert space X = £2[-71", 11"}, having its usual inner product, (x, y) =
I~1f x(t)y(t) dt. Let U be the subspace of even functions in X; these are functions such
that u( -t) = u(t). Let V be the subspace of odd functions, v( -t) = -v(t). Prove that
X = U + V and that U 1. V. Prove that the orthogonal projection of X onto U is given
by Px = u, where u(t) = ~ [x(t) + x( -t)}. Find the orthogonal projection Q : X ..... V.
Give orthonormal bases for U and V, and express P and Q in terms of them.
26. Let [en} be an orthonormal sequence in a Hilbert space. Let M be the linear span of this
sequence. Prove that the closure of M is
27. Let [en : n E N} be an orthonormal basis in a Hilbert space. Let [on} be a sequence
in <C. What are the precise conditions under which we can solve the infinite system of
equations (x,e n ) = On (n EN)?
28. Find orthonormal bases for the Hilbert spaces in Examples 3 and 4.
29. What are necessary and sufficient conditions in order that an orthogonal set be linearly
independent?
Prove that the map a >-t L an Un is an isometry of [2 onto Y. Prove that Y is a closed
subspace in X.
32. An indexed set lUi : i E I] in a Hilbert space is said to be stable if there exist positive
constants A and B such that
whenever a E [2(1). Prove tha.t a stable family is linearly independent. Prove that every
orthonormal family is stable.
34. (Continuation) Let [Ui : i E I] be a stable family. Let a : I --t C. Prove that these
properties ofa are equivalent: (1) Llail2 < 00; (2) Laiui converges; (3) Lai(X,Ui)
converges for each x in the Hilbert space.
37. Let [X1,X2, ... ,Xn ] be an ordered set in an inner-product space. Assume that it is
orthogonal in this sense: If Xi # Xj, then (Xi,Xj) = O. Show by an example that the
Pythagorean law in Theorem 1 may fail.
38. (Direct sums of Hilbert spaces). For n = 1,2,3, ... let Xn be a Hilbert space over the
complex field. The direct sum of these spaces is denoted by $:=1 X n , and its elements
are sequences [Xn : n E Ill], where Xn E Xn and L::1 IIxnl12 < 00. Show how to make
this space into a Hilbert space and prove the completeness.
39. This problem gives a pair of closed subspaces whose sum is not closed. Let X be an
infinite-dimensional Hilbert space, and let {un} be an orthonormal sequence in X. Put
1 Vn - 1
2
Wn = U2n+1 Zn = -Vn
n
+ ---Wn
n
Let W and Z denote the closed linear spaces generated by {w n } and {zn}. Prove that
40. Prove that an orthonormal set in a separable Hilbert space can have at most a countable
number of elements. Hint: Consider the open balls of radius ~ centered at the points in
the orthonormal set.
41. Let [un] be an orthonormal base in a Hilbert space. Define Vn = 2- 1/ 2(U2n + u2n+d.
Prove that [v n ] is orthonormal. Define another sequence [w n ] by the same formula,
except + is replaced by -. Show that the v-sequence and the w-sequence together
provide an orthonormal basis for the space.
42. Let X and Y be measure spaces, and f E L2(X X V). Let lUi] be an orthonormal basis
for L2(X). Prove that for suitable Vi E L2(y), we have f(x, y) = L
Ui(X)Vi(Y).
Recall from Section 1.5, page 24, that a linear functional on a vector
space X is a mapping ¢ from X into the scalar field such that for vectors x, y
and scalars a, b,
¢(ax + by) = a¢(x) + b¢(y)
If the space X has a norm, and if
¢
we say that is bounded, and we denote by I!¢I!
the supremum in the inequality
(1). (Boundedness is equivalent to continuity, by Theorem 2 on page 25.)
The bounded linear functionals on a Hilbert space have a very simple form,
as revealed in the following important result.
Proof. Let X be the Hilbert space, and ¢ a continuous linear functional. De-
fine Y = {x EX: ¢(x) = O}. (This is the null space or kernel of ¢). IfY = X,
then ¢(x) = 0 for all x and ¢(x) = (x,O). If Y =1= X, then let 0 =1= U E y..L. (Use
Theorem 4 in Section 2.1, page 65.) We can assume that ¢(u) = 1. Observe
that X = Y EB Cu, because x = x - ¢(x)u + ¢(x)u, and x - ¢(x)u E Y. Define
v= u/llul1 2. Then
(x, v) = (x - ¢(x)u, v) + (¢(x)u, v) = ¢(x)(u, v) = ¢(x)(u, u)/llul12 = ¢(x) •
n
X = L)x,Vj)Uj xEX
j=1
Since Ui = 2::7=1 (Ui' Vj )Uj, we must have (Ui' Vj) = lSij. In this situation, we say
that the two sets [Ul, U2, . .. ,un] and [VI, V2, .. . ,Vn] are mutually biorthogonal
or that they form a biorthogonal pair. See [Brez]. •
Before reading further about linear operators on a Hilbert space, the reader
may wish to review Section 1.5 (pages 24-30) concerning the theory of linear
transformations acting between general normed linear spaces.
Example 2. The orthogonal projection P of a Hilbert space X onto a closed
subspace Y is a bounded linear operator from X into X. Theorem 7 in Sec-
tion 2.2 (page 74) indicates that P has a number of endearing properties. For
example, Ilpll= 1. •
Example 3. It is easy to create bounded linear operators on a Hilbert space X.
Take any orthonormal system lUi] (it may be finite, countable, or uncountable),
and define Ax = 2::i 2:: j aij (x, Uj )Ui. If the coefficients aij have the property
2:: i 2:: j laij 12 < 00, then A will be continuous. •
By the Lemma in Section 2.1, page 63, it would suffice to prove that for all x,
This we recognize as a correct equation, and the steps we took can be reversed.
For the boundedness of A* we use the lemma in Section 2.1 (page 63) and
Problem 15 of this section (page 90) to write
(Tx)(s) = is k(s,t)x(t)dt
Here, S can be any measure space, as in Example 5, page 64. Assume that the
kernel of this integral operator satisfies the inequality
Proof. For each y in the unit ball, define a functional </>y by writing </>y(x) =
(Ax, y). It is obvious that </>y is linear, and we see also that it is bounded, since
by the Cauchy-Schwarz inequality
The equation (Ax, y) = (x, Ay) = (x, A*y), together with the uniqueness of the
adjoint, shows that A = A*. •
I(Ax,y)1 ~ IllAlllllxllllYl1
Proof. Consider these two elementary equations:
Proof. Let A be such an operator, and let ~ be the unit ball. Since A is
continuous, A(~) is a bounded set in a finite-dimensional subspace, and its
closure is compact, by Theorem 1 in Section 1.4, page 20. •
86 Chapter 2 Hilbert Spaces
(Tx)(s) = Is k(s,t)x(t)dt
If the kernel k belongs to the space L2(8 x 8), then T is a compact
operator from L2(8) into L2(8).
Proof. Select an orthonormal basis [un] for L2(8), and define anm =
(Tum, Un). This is the "matrix" for T relative to the chosen basis. In fact,
we have for any X in L2(8), x = Ln (x, un)u n , whence
= L jl(Tun)(s)1 2 ds = L IITunl1 2
n n
L 13m
00
Equation (3) suggests truncating the series that defines T in order to obtain
operators of finite rank that approximate T. Hence, we put
n 00
Tn x = LLaij(X,Uj)Ui
i=1 j=1
By subtraction,
00
Tx - Tnx = LLaij(X,Uj)Ui
i>n j=1
whence, by the Cauchy-Schwarz inequality (in e2 !) and the Bessel inequality,
LL laijl2 L
~ 2 0000
This shows that liT - Tnll--+ O. Since each Tn is compact, so is the limit T, by
Theorem 4. •
Proof. Let A be the operator and N(A) its null space. Denote the range of
A* by R(A*). If x E N(A) and z is arbitrary, then
A convenient notation for this is Xn --' x. Notice that this definition is in com-
plete harmony with the definition of weak convergence in an arbitrary normed
linear space, as in Chapter 1, Section 9, (page 53). Of course, the Riesz Repre-
sentation Theorem, proved earlier in this section (page 81), is needed to connect
the two concepts.
88 Chapter 2 Hilbert Spaces
Proof. Let [xn] be such a sequence. For each y, the sequence [(y,x n )] has
the Cauchy property, and is therefore bounded in C. The linear functionals ¢n
defined by ¢n (y) = (y, xn) have the property
By the Uniform Boundedness Principle (Section 1.7, page 42), we infer that
I I : :;
¢n M for some constant M. Since
Ilxn I = Iiyli=l
sup I(y, xn) I = II¢n I : :; M
R(A) = [N(A*)].L
Problems 2.3
1. Let X be a Hilbert space and let A : X -t X be a bounded linear operator. Let lUi : i E II
be any orthonormal basis for X. (The index set may be uncountable.) Show that there
exists a "matrix" (a function Q on I x I) such that for all x, Ax = LiLjQij(X,Uj)Ui.
2. Adopt the hypotheses of Problem 1. Show that there exist vectors Vi such that Ax =
Li(x,Vi)Ui' Show also that the vectors Vi can be chosen so that IIvili ::::; IIAII.
L IcnlllAlln < 00
00
n=O
Ax = f ~(x,un)un
n=l
Notice that A is a compact Hermitian operator. Prove that the range of A is the set
L 1n(y,un)1 2<00}
00
{YEX:
n=l
Prove that the range of A is not closed. Hint: Consider the vector V = L:=l Un/no
90 Chapter 2 Hilbert Spaces
15. Let X and Y be two arbitrary sets. For a function f :X x Y --t JR, prove that
Show that this equation is not generally true if we replace sUPxEX by infxEx on both
sides.
16. Prove that if An --t A, then A~ --t A*. (This is continuity of the map A t-t A*.)
17. Prove that the range of a Hermitian operator is orthogonal to its kernel. Can this
phenomenon occur for an operator that is not Hermitian?
18. Prove that for a Hermitian operator A, the function x t-t dist(x, R(A)) is a norm on
ker(A). Here R(A) denotes the range of A.
19. Let A be a bounded linear operator on a Hilbert space. Define [x, yJ = (Ax, y). Which
properties of an inner product does [, J have? What takes the place of the Cauchy-
Schwarz inequality? What additional assumptions must be made in order that [, J be an
inner product?
20. Give an example of a nontrivial operator A on a real Hilbert space such that Ax 1. x
for all x. You should be able to find an example in JR 2 . Can you do it with a Hermitian
operator? (Cf. Problem 5.)
21. Let [unJ be an orthonormal sequence in an inner product space. Let [AnJ be a sequence
of scalars such that the series L AnUn converges. Prove that L IAnl2 < 00.
22. Let [unJ be an orthonormal sequence in a Hilbert space. Let Ax = L~I On (x, Un)un ,
where [onJ is a bounded sequence in IC. Prove that A is continuous. Prove that if [OnJ
is a bounded sequence in JR, then A is Hermitian. Prove that if [onJ is a sequence in C
such that L IOnl 2 < 00, then A is compact. Suggestion: Use Lemma 3 and Theorem 4.
28. Illustrate the Fredholm Alternative with this example. In a real Hilbert space, let A be
defined by the equation Ax = x - A(V,X)W, where v and ware prescribed elements of
the space, and (v, w) f. O. The scalar A is arbitrary. What are A*, N(A*), R(A)? (The
answers depend on the value of A.)
29. Refer to Theorem 5, and assume that 8 = [O,lJ. Prove that if the kernel k is continuous,
then Tx is continuous, for each x in L2(8).
30. Let A be a bounded linear operator on a Hilbert space, and let [unJ and [vnJ be two
orthonormal bases for the space. Prove that if Ln Lm I(Aun ,vm )1 2 < 00, then A is
compact. Suggestion: Base the proof on Lemma 3 and Theorem 4. Write
31. Define the operator T as in Theorem 5, page 86, and assume that
Prove that if [un] is an orthonormal sequence and if TUn An Un for each n, then
Ln IAnl 2 ::; c.
32. Prove Theorem 7.
33. Prove the assertion made in Example 3.
Lx = Laj(x,uj/Uj
j=1
LUn = Laj(un,uj)Uj = an Un
j=1
This proves that IAI ~ IIYII. On the other hand, from the above work we also
have
IIYII = lim IIAxni II ~ lim IIAllllXni II = IIAII = IAI
Thus our previous inequality shows that 0 ~ lim IIAxni - AXni II ~ 0, and that
n n
Ax = L(x,ek)Aek = LAk(x,ek)ek
k=l k=l
If AIXn+1i= 0, we apply the preceding lemma to get An+l and en+l. It remains
to be proved that if the above process does not terminate, then lim Ak = 0.
°
Suppose on the contrary that IAnl ~ f > for all n E N. Then en/An is a
bounded sequence, and by the compactness of A, the sequence A( en / An) must
contain a convergent subsequence. But this is not possible, since A(e n/ An) = en
and {en}, b~ing orthonor~al, satisfies Ilen - emil = ,)2. In the infinite case let
Yn = X - I:k=l (x, ek)ek. Smce Yn ..l I:k=l (x, ek)ek,
n n
II xl1 2= llYn + L(x,ek)ekI1 2 = IIYnl1 2 + L l(x,ekW ~ IIYnl1 2
k=l k=l
Since IAn+11 is the norm of IIAIXn+lll, we have
t i:'~klI2 ~ s~p
law and Bessel's inequality we have
A _ ~ A (x,ek)
Vn - L k A _ Aek
k=1 k
and
m
L l(x,ekW •
k=n+l
If an operator A is not necessarily compact but has a known spectral res-
olution (in the form of an orthonormal series), then certain conclusions can be
drawn, as illustrated in the next three theorems.
Section 2.4 Spectral Theory 95
This implies the convergence of the series x = L~=l A~l (v, en)e n , by Theorem 2
in Section 2.2, page 71. It follows that
(Ax)(t) = 11 G(s,t)x(s)ds
where
(l-S)t when O~t~s~1
G(s, t) = { .
(1 - t)s when 0 ~ s ~ t ~ 1
96 Chapter 2 Hilbert Spaces
(Tx)(s) = Is K(s,t)x(t)dt
The operator T thus defined from L2(8) to L2(8) is compact. (It is an example
of a Hilbert-Schmidt operator.) Its range cannot be all of L2(8), except in the
special case when L2(8) is finite dimensional. Equation (2) will have a solution
if and only if f is in the range of T. Now, as in Section 2.3,
00 00
Tx = LLaij(X,Uj)Ui
i=l j=l
Hence the equation Tx = 1 will be true if and only if Equation (3) holds and
00
Putting ~j = (x,Uj) and f3i = (1,Ui), we have the following infinite system of
linear equations in an infinite number of unknowns:
00
Here the notation ~Jn) serves to remind us that we must not expect ~;n) to equal
(x, Uj). One can define Xn = 'L.7=1 ~;n)Uj and examine the behavior of the
sequences [xnl and [Txnl. Will this procedure succeed always? Certainly not,
for the integral equation may have no solution, as previously mentioned.
Other approaches to the solution of integral equations are explored in Chap-
ter 4. The case of Equation (2) in which the kernel is separable or "degenerate,"
i.e., of the form
n
K(s, t) = L Ui(S)Vi(t)
i=1
is easily handled:
(Tx)(s) = 1S
K(s, t)x(t) dt = 1t
S i=1
Ui(S)Vi(t)X(t) dt
n
= LUi(S)(Vi,X)
i=1
This shows that the range of T is the finite-dimensional space spanned by the
functions U1, U2, ... ,Un' Hence, in order that there exist a solution to the given
integral equation it is necessary that f be in that same space: f = 'L.~=1 CiUi.
Any x such that (Vi, x) = Ci will be a solution.
Spectral methods can also be applied to Equation (2). Here, one assumes
the kernel to be Hermitian: K(s, t) = K(t, s). Then the operator T is Hermitian,
and consequently has a spectral form
00
Tx =L An (x, Un)Un
n=1
in which [unl is an orthonormal sequence. If f is in the span of that orthonormal
sequence, we write f = 'L.~=1 (1, un)u n . The solution, if it exists, must then be
the function x whose Fourier coefficients are (1, un) I An. If this sequence is not
an £2 sequence, we are out of luck! Here we are following Theorem 5 above. This
procedure succeeds if f is in the range of T.
For compact operators that are not self-adjoint or even normal there is still a
useful canonical form that can be exploited. It is described in the next theorem.
98 Chapter 2 Hilbert Spaces
L (x, un)vn
00
Ax =
n=l
in which [unl is an orthonormal basis for the space and [vnl is an orthog-
onal sequence tending to zero. (The sequences [unl and [vnl depend on
A.)
Proof. The operator A* A is compact and Hermitian. Its eigenvalues are non-
negative, because if A* Ax = f3x, then
L A; (x, un)un
00
A* Ax =
n=l
where [unl is an orthonormal basis for the space and A~ -+ O. Since we are
assuming that [unl is a base, we permit some (possibly an infinite number) of
the An to be zero. In the spectral representation above, each nonzero eigenvalue
A~ is repeated a number of times equal to its geometric multiplicity. Define
Vn = Au n . Then we have
Hence [vnl is orthogonal, and Ilvnll = An -+ O. Since [unl is a base, we have for
arbitrary x,
L (x, un)un
00
x=
n=l
Consequently,
Ax =
n=l n=l
•
A general class of compact operators that has received much study is the
Hilbert-Schmidt class, consisting of operators A such that
for some orthonormal basis [uol. It turns out that if this sum is finite for one
orthonormal base, then it is finite for all. In fact, there is a better result:
Section 2.4 Spectral Theory 99
U(x)=J(x) f G(lx-yl)h(y)u(y)dy
J..t n
Here, J, G, and h are prescribed functions, and U is the unknown function. The
function h often has compact support. (Thus it vanishes on the complement
of a compact set.) It models the sound speed in the medium, and in a simple
case could be a constant on its support. The function J in the integral equation
represents the incident wave in a scattering experiment. An important concrete
case is
u(x) = eWx
.1 ei\x-y\
4 I
1R3 1[" X - Y
Ih(y)u(y) dy
In this equation, p is a unit vector (prescribed). Notice the singularity in the ker-
nel of this integral equation. Unfortunately, in the real world, such singularities
are the rule rather than the exception. •
References for operator theory in general are [DS, vol.lI], [RS], [AG], [HaI2].
Problems 2.4
oc
Ax= LO<n(X,Un)Un
n=l
100 Chapter 2 Hilbert Spaces
in which [an] is some prescribed bounded real sequence. Find the conditions under which
A -1 exists as a bounded linear operator.
3. Repeat Problem 1 for the operator
oc
Ax = L(x,Un+1)Un
n=l
4. Repeat Problem 1 if the basis is [... , U-2, U-l, UO, Ul, ... ] and A = L;:"=-oc (x, un )un+1'
What is A-I?
5. Let Y be a subspace of a Hilbert space X, and let A : Y -+ X be a (possibly unbounded)
linear map such that A-I: X -+ Y exists and is a compact linear operator. Prove that
if (A - AI)-1 exists, then it is compact.
6. Prove that for a compact Hermitian operator A on a Hilbert space these properties are
equivalent:
(a) (Ax, x) ~ 0 for all x
(b) All eigenvalues of A are nonnegative
7. Prove these facts about the spectral sets: (A is defined on page 91.)
x = L(x,en)en
n=l
10. Let P be the orthogonal projection of a Hilbert space onto a closed subspace. What are
the eigenvalues of P? Give the spectral form of P and 1- P.
11. Let A be a bounded linear operator on a Hilbert space. Prove that:
(1) A commutes with An for n = 0, 1,2, ....
(2) A commutes with p(A) for any polynomial p.
(3) If A-I exists, then A commutes with A -n for n = 0, 1, 2, 3, ....
(4) If (A - AI)-1 exists, then it commutes with A.
12. An operator A is said to be normal if AA* = A* A. Give an example of an operator
that is not normal. (The eminent mathematician Olga Tausky once observed that most
counterexamples in matrix theory are of size 2 x 2.) Are there any real 2 x 2 normal
matrices that are not self-adjoint? (Other problems on normal operators: 29,39,40,41.)
13. Establish the first equation in the proof of Theorem 4.
14. If A is a bounded linear operator on a Hilbert space, then A + A* and i(A - A*) are
self-adjoint. Hence A is of the form B + iC, where Band C are self-adjoint.
15. Let {el,e2, ... } be an orthonormal sequence. Let Ax = LAn(X,en)en, in which 0 <
inf IAn I < sup IAn I < 00. Prove that the series defining Ax converges. Prove that A is
not compact. Prove that A is bounded. What are the eigenvalues and eigenvectors of A?
Section 2.4 Spectral Theory 101
16. Find the eigenvalues and eigenvectors for the operator Ax = -x" acting on the space
X = {x E £2[0,1]: x(O) = 0 and x'(l) +'Yx(l) = O}. Here'Y is a prescribed real number.
How can the eigenvalues be computed numerically? Find the first one accurate to 3 digits
when 'Y = -~. Newton's method, described in Section 3.3, can be used.
17. Prove that if Ax = >.x, A *y = /-LY, and>' '" Ii, then x ~ y.
18. If A is Hermitian and x is a vector such that Ax '" 0, then A nx '" 0 for n = 0, 1,2, ....
19. Every compact Hermitian operator is a limit of a sequence of linear combinations of
orthogonal projections.
20. If >. is an eigenvalue of A2 and>' > 0, then either +V'X or -v'X is an eigenvalue of A.
(Here, A is any bounded operator.) Hint: If A 2 x = >.x, then for suitable c, x ± cAx is
an eigenvector of A.
21. Consider the problem x" +(>.2 -q)x = 0, x(O) = 1, x'(O) = O. Show that this initial-value
problem can be solved by solving instead the integral equation
x(t) - -lit
>. 0
q(s) sin{>.(t - s»)x(s) ds = cos(>.t)
34. Let A be a bounded linear operator on a Hilbert space. Suppose that the spectral
decomposition of A is known:
00
Ax = L>'n(x,en)en
n=1
where [en] is an orthonormal sequence. Show how this information can be used to solve
the equation Ax - !LX = b. Make modest additional assumptions if necessary.
35. Prove that the eigenvalues of a bounded linear operator A on a normed linear space all
lie in the disk of radius IIAII in the complex plane.
36. Prove that if P is an orthogonal projection of a Hilbert space onto a subspace, then
for any scalars 0 and {3 the operator oP + {3(I - P) is normal (i.e., commutes with its
adjoint).
37. Prove that an operator in the Hilbert-Schmidt class is necessarily compact.
38. Prove that every operator having the form described in Theorem 6 is compact, thus
establishing a necessary and sufficient condition for compactness.
39. Find all normal 2 x 2 real matrices. Repeat the problem for complex matrices.
40. Prove that for a normal operator, eigenvectors corresponding to different eigenvalues are
mutually orthogonal.
41. Prove that a normal operator and its adjoint have the same null space.
is called the matrix of L relative to the ordered basis [U1' ... , un].
With the aid of the matrix A it is easy to describe the effect of L on any
vector x. Write x = 2:.']=1 Cj Uj. The n-tuple (C1,"" cn) is called the coordinate
vector of x relative to the ordered basis [U1, ... , un]. Then
(2) Lx
n n n
= LCj LUj = LCj L a i j Ui = L
n(n) L a i j Cj Ui
j=1 j=1 i=1 i=l j=l
Appendix to Section 2.4 103
(1 ~ i ~ n)
If the basis for X is changed, what will be the new matrix for L? Let [VI, ... , vnl
be another ordered basis for X. Write
n
(3) Vj = LPijUi (1 ~ j ~ n)
i=1
The n x n matrix P thus introduced is nonsingular. Now let B denote the matrix
of L relative to the new ordered basis. Thus
n n n
(4) LVj = LbkjVk = Lbkj LPikUi (1 ~ j ~ n)
k=1 k=1 i=1
Another expression for LVj can be obtained by use of Equations (2) and (3):
(5) (1 ~ j ~ n)
Let [Ul, ... , unl be an ordered orthonormal basis for the n-dimensional
Hilbert space X. Let L be Hermitian. The spectral theorem asserts the exis-
tence of an ordered orthonormal basis [VI, ... , vnl and an n-tuple of real numbers
(AI, ... ,An) such that
n
(8) Lx = LAi(x,vi)vi
i=1
As above, we introduce matrices A and P such that
n n
(9) LUj = LaijUi Vj = LPij Ui (1 ~ j ~ n)
,=1 i=1
The matrix B that represents L relative to the v-basis is the diagonal matrix
diag(Al, ... , An), as we see from Equation (8). Thus from Equation (7) we
conclude that A is similar to a diagonal matrix having real entries. More can be
said, however, because P has a special structure. Notice that
/ n n ) n n
(I)jk = bjk = (Vb Vj) = \ {;Pik Ui, ~Prj Ur = { ; ~Pik1jrj(Ui' Ur )
n n
= LPikPij = L(P*)ji(P)ik = (p* P)jk
i=1 i=l
This shows that
(10) P*P= I
(It follows by elementary linear algebra that P P* = I.) Matrices having the
property (10) are said to be unitary. We can therefore state that the matrix A
(representing the Hermitian operator L with respect to an orthonormal base) is
unitarily similar to a real diagonal matrix.
Finally, we note that if an n x n complex matrix A is such that A = A * , then
A is the matrix of a Hermitian transformation relative to an orthonormal basis.
Indeed, we have only to select any orthonormal base [Ul, . .. ,unl and define L
by
n
LUj = LaijUi (1 ~ j ~ n)
i=1
t tt
Then, of course,
Lx =L (
Cj Uj) = Cj aij Ui
j=1 j=li=1
By straightforward calculation we have
(Lx, y) = (x, Ly)
A matrix A satisfying A * = A is said to be Hermitian. We have proved
therefore the following important result, regarded by many as the capstone of
elementary matrix theory:
= lb [y(px')' - x(pY')']
x(t) y(t)]
[
w(t) = x'(t) y'(t)
Put also
+ bx' + cx = d by
J
Proof. We transform the equation axil multiplying by the
integrating factor ~ exp (b(t)/a(t)) dt. Thus
Let
Example 1.
p = ef bla , q = (c/a)ef bla
If Ax = -x" (i.e., p(t) = -1 and q(t) = 0), what are the
•
eigenvalues and eigenfunctions? The solutions to -x" = AX are of the form
Cl sin ..;xt
+ C2 cos ..;xt.
Hence every complex number A is an eigenvalue, and
each eigenspace is of dimension 2. •
Example 2. Let Ax = -x" as before, but let the inner-product space be
the subspace of L2[0, 'IT] consisting of twice continuously differentiable functions
that satisfy x(O) = x('IT) = O. The eigenvalues are n 2 for n = 0, 1,2, ... , and the
eigenfunctions are sin n 2 t. •
(1) u" = qu
(2) v" = qv allv(a) +a12v'(a) = 0
(3) u'(a)v(a) - u(a)v'(a) = 1
From (3) we see that u =1= 0 and v =1= O. The left side of (3) is the Wronskian of
u and v evaluated at a.
In practical terms, u and v can be obtained by solving two initial-value
problems. This is often done as follows. Find Uo and Vo such that
The u and v required will then be suitable linear combinations of Uo and Vo.
Now we observe that for all s,
u'(s)v(s) - u(s)v'(s) = 1
This is true because the left side takes the value 1 at s = a and is constant.
Indeed,
(4) Ax = x" - qx
and the domain of A is the closure in L 2 [a, b] of the set of all twice continuously
differentiable functions x such that
x(s) = lb g(s,t)y(t)dt
=1 8
u(s)v(t)y(t) dt + lb v(s)u(t)y(t) dt
=u(s) 1 8
v(t)y(t)dt+v(s) lb u(t)y(t)dt
we have
x'(s) = u'(s) 1 8
v(t)y(t) dt + u(s)v(s)y(s)
= u'(s) 1 8
v(t)y(t) dt + v'(s) lb u(t)y(t) dt
x"(s) = u"(s)1 8
v(t)y(t) dt + u'(s)v(s)y(s) + v"(s) lb u(t)y(t) dt - v'(s)u(s)y(s)
= q(s)u(s) 1 8
v(t)y(t) dt + q(s)v(s) lb u(t)y(t) dt + y(s) [u'(s)v(s) - u(s)v'(s)]
= q(s)x(s) + y(s)
In the last step, the constant value of the Wronskian was substituted. Our
calculation shows that x" - qx = y or Ax = y, as asserted. Hence AB = f.
It remains to prove that x EX, i.e., that x satisfies the boundary conditions.
We have, from previous equations,
lb
and
x'(a) = v'(a) u(t)y(t) dt = cv'(a)
Hence
D:l1x(a) + D:12x'(a) = D:l1cv(a) + D:12Cv'(a) = 0
Similarly we verify that
let x EX, Y = Ax, and By = z. The previous theorem shows that y = ABy =
Az and that z E X. Hence x - z E X and A(x - z) = o. It follows that x - z = 0,
so that x = By = BAx.
Remark. The operator B in the previous theorem is Hermitian, because (by
Problem 9) 9 satisfies the equation
g(8, t) = get, 8)
00
By= LAn(y,Un}Un
n=l
Since BUk = AkUk, we have Uk = AkAuk, and Uk satisfies the boundary con-
ditions. This equation shows that Uk is an eigenvector of A corresponding to
the eigenvalue 1/ Ak. Since Ak ~ 0, 1/ Ak ~ 00. Consequently, a solution to the
problem Ax = y, where y is given and x must satisfy the boundary conditions,
is
00
x = By = L An(y, un}un
n=l
sin 8 cos t 0 ~ t ~ 8 ~ 7r
g(8, t) = {
cos 8 sin t 0 ~ 8 ~ t ~ 7r
(By)(8) = sin 8 1 8
cos ty(t) dt + cos 8 /." sin t yet) dt
•
Example 4. Let us solve the problem in Example 3 by using the Spectral
Theorem. The eigenvalues and eigenvectors of the differential operator A are
obtained by solving x" +x = /-LX. The general solution of the differential equation
is
x(t) = clsin~t+c2cos~t
Imposing the conditions x/CO) = x(7r) = 0, we find that the eigenvalues are /-Ln =
1 - (n - t)2 and the eigenfunctions are vn(t) = cos(2n - 1)t/2. The Vn are also
eigenfunctions of B, corresponding to eigenvalues An = 1/ /-Ln = (n - n2 + ~rl.
110 Chapter 2 Hilbert Spaces
Observe that the eigenfunctions Vn are not of unit norm. If O:n = I/llvnl/, then
[O:nvn] is an orthonormal system, and the spectral resolution of B is
00
By = LAn(y,O:nVn}(O:nVn)
n=l
00
Use of this formula is equivalent to the traditional method for solving the
boundary-value problem
The traditional method starts with the functions vn(t) = cos 2n; 1 t, which
satisfy the boundary conditions. Then we build a function of the form x =
2:::"=1 cnV n · This also satisfies the boundary conditions. We hope that with a
correct choice of the coefficients we will have Ax = y. Since AVn = J1,nvn, this
equation reduces to 2:::"=1 CnJ1,nVn = y. To discover the values of the coefficients,
take the inner product of both sides with Vm:
We are looking for a function 9 defined on [a, b] x [a, b]. As usual, the t-sections
of 9 are gIven by gt(s) = g(s,t).
Section 2.5 Sturm-Liouville Theory 111
x'(s) = 1 8
g'(s,t)y(t)dt+ lb g'(s,t)y(t)dt
It follows that
Any linear combination of x(a), x(b), x'(a), and x'(b) is obtained by an integra-
tion of the same linear combination of g(a, t), g(b, t), g'(a, t), and g'(b, t). Since
l satisfies the boundary conditions, so does x. We now compute X"(S) from the
equation for x' ( s ):
x" (s) = g' (s, s- )y( s) + is gil (s, t)y(t) dt - g' (s, s+ )y(s) + lb gil (s, t)y(t) dt
= y(s)/p(s) + lb g"(s,t)y(t)dt
(Ax )(s) = p' (s) lb g' (s, t)y(t) dt + y( s) + p( s) lb gil (s, t)y(t) dt
+q(s) lb g(s,t)y(t)dt
The preceding theorem asserts that gt should solve the homogeneous differential
equation in the intervals 0 < s < t < 1 and 0 < t < s < 1. Furthermore, gt
should be continuous, and it should satisfy the boundary conditions. Lastly,
g'(s, t) should have a jump discontinuity of magnitude -1 as t passes through
the value s. One can guess that g is given by
g(s, t) = {o O~s~t~l
s-t O~t~s~l
If we proceed systematically, it will be seen that this is the only solution. In the
triangle 0 < s < t < 1, Agt = 0, and therefore gt must be a linear function of s.
We write g(s, t) = a(t) + b(t)s. Since gt must satisfy the boundary conditions,
we have g(O,t) = (8g/8s)(0,t) = o. Thus a(t) = b(t) = 0 and g(s,t) = 0 in
this triangle. In the second triangle, 0 < t < s < 1. Again gt must be linear,
and we write g(s, t) = a(t) + f3(t)s. Continuity of g on the diagonal implies that
a(t) + f3(t)t = 0, and we therefore have g(s, t) = -f3(t)t + f3(t)s = f3(t)(s - t).
The condition (8g/8s)(s,s+) - (8g/8s)(s,s-) = -l/p leads to the equation
0- f3(t) = -1. Hence g(s,t) = s - t in this triangle. The solution to the
inhomogeneous boundary-value problem x" = y is therefore given by
x(s) = 1 8
(s - t)y(t) dt
•
Example 6. Find the Green's function for the problem
u(s)v(t) O~s~t~l
(7) {
g(s, t) = v(s)u(t) O~t~s~l
Section 2.5 Sturm-Liouville Theory 113
and try to determine the functions u and v. The homogeneous differential equa-
tion has as its general solution the function
With these choices, the function 9 in Equation (7) satisfies the first four require-
ments in Theorem 3. With a suitable choice of the parameters 0: and (3, the
fifth requirement can be met as well. The calculation produces the following
equation involving the Wronskian of u and v:
1. Find the eigenvalues and eigenfunctions for the Sturm-Liouville operator when p =q=1
and
114 Chapter 2 Hilbert Spaces
u(s)v(t) a ~ s ~t ~ b
g(8, t) = {
v(s)u(t) a ~ t ~ s ~ b
12. Show that the Wronskian for any two solutions of the equation (px')' - qx = 0 is a scalar
multiple of lip, and so is either identically zero or never zero. (Here we assume p(t) f. 0
for a ~ t ~ b.)
13. Find the eigenvalues and eigenfunctions for the operator A defined by the equation Ax =
-x" +2x' -x. Assume that the domain of A is the set of twice continuously differentiable
functions on [0,1] that have boundary values x(O) = x(l) = O.
Chapter 3
In this chapter we develop the theory of the derivative for mappings between
Banach spaces. Partial derivatives, Jacobians, and gradients are all examples of
the general theory, as are the Gateaux and F'rechet differentials. Kantorovich's
theorem on Newton's method is proved. Following that there is a section on
implicit function theorems in a general setting. Such theorems can often be
used to prove the existence of solutions to integral equations and other similar
problems. Another section, devoted to extremum problems, illustrates how the
methods of calculus (in Banach spaces) can lead to solutions. A section on the
"calculus of variations" closes the chapter.
The first step is to transfer, with as little disruption as possible, the ele-
mentary ideas of calculus to the more general setting of a normed linear space.
115
116 Chapter 3 Calculus in Banach Spaces
Proof. Suppose that Al and A2 are two linear maps having the required prop-
erty, expressed in Equation (1). Then to each c > 0 there corresponds a 6> 0
such that
(i=1,2)
whenever Ilhll
< 6. By the triangle inequality, IIAIh -
A2hll < 2cllhll
whenever
Ilhll
< 6. Since Al - A2 is homogeneous, the preceding inequality is true for all
h. Hence IIAI - A211 2c.
~ c
Since was arbitrary, IIAI - A211
= o. •
Notation. If f is differentiable at x, its derivative, denoted by A in the
definition, will usually be denoted by f' (x). Notice that with this notation
f'(x) E £(X, Y). This is NOT the same as saying f' E £(X, Y). It will be
necessary to distinguish carefully between f' and f' (x).
Thus, the terminology adopted here is slightly different from the elementary
notion of derivative in calculus. •
Proof. Let A = f'(x). Then A E £(X, Y). Given c > 0, select 6 > 0 so that
6 < c/(1 + II All) and so that the following implication is valid:
Ilhll < 6 =} Ilf(x + h) - f(x) - Ahll/llhil < 1
Then for Ilhll < 6, we have by the triangle inequality
By comparing this to Equation (1) and invoking the continuity of </>', we see that
A is indeed the derivative of fat x. Hence f'(x) = </>' x. 0 •
We begin by writing
n
f(x + h) - f(x) = f(v n ) - f(vo) = ~)f(vi) - f(V i - 1)]
i=l
where the vectors vi and V i - 1 differ in only one coordinate. Thus we put VO = x
and Vi = V i - 1 + hiei , where ei is the ith standard unit vector. By the mean
value theorem for functions of one variable,
where 0 < Bi < 1. Putting this together, and using the Cauchy-Schwarz in-
equality, we have
n
(J'(X)h)i = L Djfi(x)· h j for all hE IRn
j=l
Problems 3.1
1. Let 9 be a function of two real variables such that 922 is continuous. (This notation means
second partial derivative with respect to the second argument.) Define I : C[O,l] -t
C[O,l] by the equation (I(x»(t) = fol g(t, x(s» ds. Compute the Frechet derivative of
I. You may need Taylor's Theorem.
2. Let I be a Frechet-differentiable function from a Hilbert space X into JR. The gradient
of I at x is a vector v E X such that I'(x)h = (h, v) for all hEX. Prove that such a v
exists. (It depends on x.) Illustrate with I(x) = (a,x)2, a E X and fixed.
3. Prove that if I and 9 are differentiable at x, then so is I+g, and (I+g)'(x) = I'(x)+g'(x).
4. Let X, Y and Z be normed linear spaces. Prove that if f : X -+ Y is differentiable and
if A : Y -+ Z is a bounded linear map, then (A 0 I)' = A 0 I'.
5. Let I : X -t X be differentiable, X a real Hilbert space, and v EX. Define 9 : X -t JR
by g(x) = (I(x) , v). Prove that 9 is differentiable, and determine g'.
6. We write h H o(h) for a generic function that has the property
. o(h)
~~o lihif = 0
Thus I'(x) is characterized by the equation I(x + h) - I(x) -I'(x)h = o(h). Prove that
the family of all such functions 0 from X to Y is a vector space.
7. Find the derivative of the map I : C[O, 1] -t C[O, 1] defined by I(x) = g. x. Here the dot
signifies ordinary multiplication, and 9 E C[O, 1].
8. Supply the missing details in Example 4. For example, you should establish the fact that
114>' 0 (x + 8h) - 4>' 0 xII converges to 0 when h converges to O. Quote any theorems from
real analysis that you use.
9. Let X and Y be two normed linear spaces, and let x EX. Let I and 9 be functions
defined on a neighborhood of x and taking values in Y. Following Dieudonne, we say
that "I and 9 are tangent at x" if
Prove that this is an equivalence relation. Prove that the relationship is preserved if the
norms in X and Yare changed to equivalent ones. Prove that x t--+ f(xo)+ f'(xo)(x-xo)
is the unique affine map tangent to f at xo. (An affine map is a constant plus a linear
map.)
10. Show that these two functions are tangent at x = 2:
17. Define f : e[o, 1] -t e[O, 1] by the equation [f(x)](t) = x(t) + Jo1 [X(st)]2 ds. Compute
f'(x).
18. Prove that if f is differentiable at x, then f is Lipschitz continuous at x. This means
that Ilf(y) - f(x)11 ,,;; Ally - xii for some A and all y in a neighborhood of x.
19. Let an (n = 0, 1,2, ... ) be real numbers such that L~=o anZ n converges for all z E IC.
Let X be a Banach space. Define f : .c(X, X) -t .c(X, X) by the equation f(A)
L~=o anAn What is the Frechet derivative of f?
24. Prove that in an inner-product space the funct.ions f(x) = IIxll 2 and g(x) = (a,x) are
differentiable. Give formulas for the derivatives.
Section 3.2 The Chain Rule and Mean- Value Theorems 121
In order to see that this last expression is o(h), notice first that IIBo1(h)11 ~
IIBllllo1(h)ll.
Hence this term is o(h). Now let c: > o. Select 01 > 0 so that
The mean value theorem of elementary calculus does not have an exact
•
analogue for mappings between general normed linear spaces. (An exception to
this assertion occurs in the case when f : X -+ R See Theorem 2, below.) Even
for functions f : IR -+ X, the expected mean-value theorem fails, as we now
illustrate.
122 Chapter 3 Calculus in Banach Spaces
°
[a, b] = {a + t(b - a) : ~ t ~ I}
Proof. Put g(t) = f(a + t(b - a)). Then g is continuous on the interval [0,1]
and differentiable on (0,1). By the chain rule,
Proof. It suffices to prove that if a < a < {3 < b, then Ilf({3)- f(a)11 ~ M(b-a)
because, the desired result would follow from this by continuity. Also, it suffices
to prove Ilf({3) - f(a)11 ~ (M + E)(b - a) for an arbitrary positive E. Let S be
the set of all x in [a, {3] such that
Hence
Hence
This proves that u E 5. Since u > xo, we have a contradiction. Thus Xo = (3,
(3 E 5, and
5={ta+(1-t)b: O~t~ I}
Proof. Define g( t) = f (ta + (1 - t)b) for 0 ~ t ~ 1. By the chain rule, g' exists
and g'(t) = f'(ta + (1- t)b)(a - b). By the second Mean Value Theorem
Ilf(b) - f(a)11 = Ilg(l) - g(O)11 ~ sup 11g'(t)11 ~ lib - all sup Ilf'(x)11
O::;;t::;;l xES
Notice that g = f 0 f, where f(t) = ta + (1 - t)b. Thus e'(t) E C(JR, X). Hence
in the formula for g', the term (a- b) is interpreted as a mapping from R to X
defined by t t-+ t· (a - b). •
124 Chapter 3 Calculus in Banach Spaces
Proof. Since l' (x) exists for all xED, f is continuous on D (by Theorem 3 of
Section 3.1, page 117). Select Xo ED and define A = {x ED: f(x) = f(xo)}.
This is a closed subset of D (i.e., the intersection of D with a closed set in X).
But we can prove that A is also open. Indeed, if x E A, then there is a ball
B(x, r) c D, because D is open. If y E B(x, r), then the line segment from x to
y lies in B(x, r). By the Mean Value Theorem II,
So f(y) = f(x) = f(xo). This means that YEA. Hence B(x, r) c A. Thus A is
open (it contains a neighborhood of each of its points). A set is connected if it
contains no proper subset that is open and closed. Since A is open and closed
and non empty, A = D. •
The connectedness of D is essential in the preceding theorem, even if D c JR..
For example, suppose that D = (0,1) U (2,3) and that f(x) = 1 on (0,1) while
f(x) = 2 on (2,3). Then f is certainly not constant, although f'(x) = 0 at each
point of D.
Problems 3.2
3. Let X be a real Hilbert space and vEX. Define f(x) = Ilxl1 2 v. What is f'(x)?
4. Let f be a continuous real-valued map on a Hilbert space. If f'(xo) exists, then there is
a direction of steepest descent at Xo. This means that there exists a vector u of norm 1
for which (d/dt)f(xo + tU)!t=o is a maximum. What is u?
5. Let f be a differentiable and continuous real-valued function defined on an open set D
in a normed linear space. Suppose that xo E D and that f(xo) ~ f(x) for all XED.
Prove that f'(XO) = O.
6. Let D be a bounded open set in a finite-dimensional normed linear space. Let D be the
closure of D. Let f : D -t IR be continuous. Assume f differentiable in D and that f is
constant on D'-. D (the boundary of D). Show that f'(x) = 0 for some xED. (Hint: A
continuous real-valued function on a compact set achieves its maximum and minimum.
Use Problem 5.)
7. Let K be a closed convex set contained in an open set D contained in a Banach space
X. Let f : D -t X. Assume that f'(X) exists for each x E K and that f(K) C K.
Assume also that sup{IIf'(x)1I : x E K} < 1. Show that f has a unique fixed point in K.
(Banach's Theorem, page 177, is helpful.)
8. The mean value theorem for functions f : IR -t IR states that f(x+h) - f(x) = hf'(x+8h)
for some 8 E (0,1). Show that this is not valid for complex functions. Try e Z , z = 0, h =
211'i, and at least one other function.
9. Let f be a differentiable map from a normed space X to a normed space Y. Let Yo be a
point of Y such that f' is invertible at each point of f-l(yO). Prove that f-l(yO) is a
discrete set.
Section 3.3 Newton's Method 125
10. Write out the conclusion of Theorem 2 in the case that X = IRn, using the partial
derivatives &/ /&Xi.
In this equation, the point ~n is between Xn and r. Hence I~n-rl ::;; IXn-rl = lenl·
Using this we have
f(xn) f(xn)
en+l = Xn+l - r = Xn - f'(x n ) - r = en - f'(x n )
enj'(xn) - f(xn) 2 1"(~n)
= f'(Xn) = en f'(x n )
Since Ixo - rl ::;; 8 by hypothesis, we have leol ::;; 8 and I~o - rl ::;; 8. Hence
lell ::;; !e61f"(~o)ljIf'(xo)1 ::;; !e6· 2pj8 ::;; pleal· By repeating this we establish
126 Chapter 3 Calculus in Banach Spaces
that IXn+l - rl :::;; Plxn - rl (convergence). Similarly, we have Jell :::;; (p/8)eij and
len+ll :::;; (p/8)e~. (quadratic convergence). •
The successive errors en in the preceding theorem obey an inequality
len+11 :::;; Cle n l2. Suppose, for example, that C = 1 and leol :::;; 10- 1 . Then
lell :::;; 10- 2, le21 :::;; 10- 4 , hi : :; 10- 8 , and so on. For an iterative process, this
is an extraordinarily favorable state of affairs, as it indicates a doubling of the
number of significant digits in the numerical solution at each step.
Example 1. For finding the square root of a given positive number a, one can
solve the equation X2 - a = 0 by Newton's method. The iteration formula turns
out to be
Xn+l = ~(Xn + xaJ
This formula was known to the ancient Greeks and is called Heron's formula.
In order to see how well it performs, we can use a computer system such as
Mathematica, Maple, or Matlab to obtain the Newton approximations to ..;2.
The iteration function is g(x) = (x + 2/x)/2, and a reasonable starting point is
Xo = 1. Mathematica is capable of displaying Xn with any number of significant
figures; we chose 60. The input commands to Mathematica are shown here.
(Each one should be separated from the following one by a semicolon, as shown.)
The output, not shown, indicates that the seventh iterate has at least 60 correct
digits!
g[x_] :=(x+(2/x))/2; g[1] ; N[X.60] ; g [X] ; g[X] ;
•
Example 2. We illustrate the mechanics of Newton's method in higher di-
mensions with the following problem:
{ X-Y+l=O
X2 +y2 - 4 = 0
where x and yare real variables. We have here a mapping J : ]R2 -+ ]R2,
and we seek one or more zeros of J. The Newton iteration is Un+l = Un -
[!'(Un)]-l J(u n ), where Un = (Xn, Yn) E ]R2. The derivative J'(u) is given by the
Jacobian matrix J. We find that
J = [ 1 -1]
2x 2y J-l- 1 [2 Y 11]
- 2x+2y -2x
can be used here, too. The problem is chosen intentionally as one easily visual-
ized: One seeks the points where a line intersects a circle. See Figure 3.1. •
Figure 3.1
The remarkable theorem of Kantorovich is presented next. This theorem:
(1) Proves the existence of a zero of a function from suitable hypotheses, and (2)
Establishes the quadratic convergence of the Newton algorithm. When it was
published in 1948, this theorem gave new information about Newton's method
even when the domain space X was two-dimensional.
2 1
~ 2(anbnk) ~"2
(VII) Define g(x) = x - f'(Xn)-1 f(x). Then g(xn) = Xn+l' If XES, then
Ill(x)11 = III - f'(Xn)-1 f'(x)11 = 11f'(xn)-I{f'(xn) - f'(X)} I
~ 11f'(xn)-IIIIIf'(xn) - f'(x)11 ~ bn~kllXn - xii
(VIII) Using the Mean Value Theorem and parts (VII) and (II), we have
Here x is some point on the line segment joining Xn to Xn+l. From part (VII),
it follows that
1
11!,(xn)-l !(xn+l)11 ~ bnZkllxn - xllan
(IX)
~ IIH- 11111!,(xn)-1!(Xn+dll
1 -1 1 2
~ (1- Zanbnk) bnZka n
-1 1 2 1 2
~ (1- anbnk) bnZka n = Zbn+1an k = an+1
Therefore,
(XIII)
1 2 1 -1 2
an = Zkbna n- 1 = Zkbn- 1(1- an-1 bn-1 k ) an- 1
~ (kbn- 1an-dan-1 = hn- 1an-1
130 Chapter 3 Calculus in Banach Spaces
= a02-n( 2ho)2n-l
For the preceding theorem and the one to follow (for which we do not give the
proof), we refer the reader to [Gold] and to [KA].
The next theorem concerns a variant of Newton's method due to R.E. Moore
[Moo]. In this theorem, we have two normed linear spaces X and Y. An open
set n in X is given, and a mapping F : n -+ Y is prescribed. It is known that
F has a zero x· in n and that F'(x') exists. We wish to determine x'. For this
purpose we set up an iterative scheme of the form
(8) sup
xEn
IIA(x)11 = M < 00
From (11), using the fact that F(x*) = 0 and the definition of G, we have
G(x) - x* = x - x* - A(x)F(x)
= x - x* - A(x)[F'(x*)(x - x*) + TJ(x)]
= x - x* - A(x)F'(x*)(x - x*) - A(x)TJ(x)
= [I - A(x)F'(x*)] (x - x*) - A(x)TJ(x)
If we assume further that Ilx - x*11 ~ 8, then
Corollary 3. -
If I~I F'(XO)-l F'(x*)11 < 1, then the simplified
Newton iteration Xn+l = Xn - F'(XO)-l F(x) converges to x* if started
sufficiently near to x* .
Here A, v, and 9 are given. We assume that v E C[a, 1] and that 9 is continuous
on the 3-dimensional set
Also, we assume that Ig(s, t, ud - g(s, t, U2) I ~ klu1 - u21 in the domain D.
Theorem 5. If IAlk < 1, then the integral equation (13) above has
aunique solution.
Proof. Apply the Contraction Mapping Theorem (Chapter 4, Section 2, page
177) to the mapping F defined on C[a, 1] by (Fx)(s) = v(s) +A J01g(s, t, x(t)) dt.
We see easily that
~ IAlkllx1 - x211 •
If IAlk < 1, then the sequence X n +1 = F(x n ) will converge, in the space C[a, 1],
to a solution of the integral equation. In this process, Xo can be an arbitrary
starting point in C[a, 1]. Newton's method can also be used, provided that we
start at a point sufficiently close to the solution. For Newton's method, we define
the mapping f by
where g3 is the partial derivative of 9 with respect to its third argument, i.e.,
g3(S, t, u) = (a/au)g(s, t, u).
Section 3.3 Newton's Method 133
and if
. 1
hm -[g(s,t,u+r) - g(s,t,u) - rg3(s,t,U)] = 0
r .... O r
11
Explicitly,
(Ah)(s) = g3 (s, t, x(t)) h(t) dt
1+ AB - AA - A2 BA = 1- AA + AB - AAB = I
134 Chapter 3 Calculus in Banach Spaces
and
(17) B = (I + "\B)A = A(I + "\B)
Conversely, from Equation (17) we can prove Equation (14). Thus Equation
(17) serves to characterize B. Now assume that r satisfies Equations (16) and
that B is defined by Equation (15). We will show that B must satisfy Equation
(17), and hence Equation (14). We have
r(s, t) = a st + f; asur(u,t)du
{
r(s, t) = a st + fo1 atur(s, u) du
From these equations it is evident that r(s, t)/ st is on the one hand a function of
t only, and on the other hand a function of s only. Thus r(s, t)/ st is constant, say
(3, and r(s, t) = {3 s t. Substituting in the integral equation for r and solving gives
us (3 = 12/35. One step in the Newton algorithm will be Xl = xo- f'(XO)-l f(xo).
We compute y = f(xo) as follows:
y(s) = xo(s) -1 1
g(s, t, xo(t)) dt - v(s)
= -3 -
20
11 st Arctan -3 dt - 1 - S2
2
+ .485s
1
= "2 - .0063968616s - s2
Section 3.4 Implicit Function Theorems 135
3[1
XI(8) = "2 "2-,8-8 2] - iot [1
f38t "2-,t-t 2] dt (, :::::; .0063968616)
= 1 + 82 + (.0071279315) 8
•
Problems 3.3
1. For the one-dimensional version of Newton's method, prove that if T is a root of multi-
plicity m, then quadratic convergence in the algorithm can be preserved by defining
2. Prove the corollaries, giving in each case the precise assumptions that must be made
concerning the starting points.
3. Let eo, el,··. be a sequence of positive numbers satisfying en+l ~ ce~. Find necessary
and sufficient conditions for the convergence limn en = O.
4. Let I be a function from IR to IR that satisfies the inequalities I' > 0 and I" > O. Prove
that if I has a zero, then the zero is unique, and Newton's iteration, started at any point,
converges to the zero.
5. How must the analysis in Theorem 1 be modified to accommodate functions from C to
C? (Remember that the Mean Value Theorem in its real-variable form is not valid.)
6. If T is a zero of a function I, then the corresponding "basin of attraction" is the set
of all x such that the Newton sequence starting at x converges to T. For the function
I(z) = z2 + 1, z E C, and the zero T = i, prove that the basin of attraction contains the
disk of radius ~ about T.
In this section we give several versions of the Implicit Function Theorem and
prove its corollary, the Inverse Function Theorem. Theorems in this broad cate-
gory are often used to establish the existence of solutions to nonlinear equations
of the form f(x) = y. The conclusions are typically local in nature, and describe
how the solution X depends on y in a neighborhood of a given solution (XO,yo).
Usually, there will be a hypothesis involving invertibility of the derivative f'(xo).
The intuition gained from examining some simple cases proves to be com-
pletely reliable in attacking very general cases. Consider, then, a function
F : JR2 -+ lR. We ask whether the equation F(x, y) = 0 defines y to be a
unique function of x. For example, we can ask this question for the equation
X + y2 -1 = 0 (X, y E JR)
This can be "solved" to yield y = JI=X. The graph of this is shown in the
accompanying Figure 3.2. It is clear that we cannot let X be the point A in the
figure, because there is no corresponding y for which F(x, y) == x + y2 - 1 = O.
One must start with a point (xo, Yo) like B in the figure, where we already have
F(xo, Yo) = O. Finally, observe that at the point C there will be a difficulty,
for there are values of x near C to which no y's correspond. This is a point
136 Chapter 3 Calculus in Banach Spaces
where dy/dx = 00. Recall that ify = y(x) and if F(x,y(x)) = 0, then y' can be
obtained from the equation
(In this equation, Di is partial differentiation with respect to the ith argu-
ment.) Thus y' = - DI F / D2 F, and the condition y' (xo) = 00 corresponds to
D2F(xo, Yo) = o. In this example, notice that another function arises from
Equation (1), namely
y =-Jf=X
In a neighborhood of (1,0), both functions solve Equation (1), and there is a
failure of uniqueness.
2
A
-3
Figure 3.2
In the classical implicit function theorem we have a function F of two real
variables in class C 1 • That means simply that 8F/8x and 8F/8y exist and are
continuous. It is convenient to denote these partial derivatives by FI and F 2 •
Proof. Assume that F2(xo,yo) > O. Then by continuity, F2(x,y) > a > 0 in
a neighborhood of (xo, Yo), which we assume to be the original 6-neighborhood.
The function y f-t F(xo, y) is strictly increasing for Yo - 6 ~ y ~ Yo + 6. Hence
Now fix Xl in the c-neighborhood of Xo. Put YI = f(xd. Let ilx be a small
number and let YI + ily = f(XI + ilx). Then
0= F(XI + ilx, YI + ily)
= FI (Xl + 8ilx, YI + 8ily)ilx + F2(XI + 8ilx, YI + 8ily)ily
for an appropriate 8 satisfying 0 ::::; 8 ::::; 1. (This is the Mean Value Theorem for
a function from JR2 to JR. See Problem 3.2.8, page 124.) This equation gives us
f '( Xl )-1·
-
ily_
lm- - -
FI(XI,yd
ilx F2(XI,YI)
Therefore, f is differentiable at Xl. The formula can be written
f'(x) = -FI (x, f(x))1 F2(X, f(x))
and this shows that f' is continuous at x, provided that X is in the open interval
(xo - c, Xo + c). •
Theorem 2. Implicit Function Theorem for Many Variables.
Let F : JRn x JR -+ JR, and suppose that F(xo, Yo) = 0 for some Xo E ]Rn
and Yo E R If all n + 1 partial derivatives DiF exist and are continuous
in a neighborhood of (xo, Yo) and if Dn+IF(xo, Yo) =1= 0, then there is a
continuously differentiable function f defined on a neighborhood of Xo
such that F(x, f(x)) = 0, f(xo) = Yo, and
Dd(x) = -DiF(x,f(x))IDn+IF(x,f(x)) (1::::; i::::; n)
Proof. This is left as a problem (Problem 3.4.4).
Example 1. F(x, y) = x 2+y2 + 1 or x 2+y2 or x 2+y2 -1. (Three phenomena
•
are illustrated.) •
If we expect to generalize the preceding theorems to normed linear spaces,
there will be several difficulties. Of course, division by F2 will become multi-
plication by F 2- 1 , and the invertibility of the Frechet derivative will have to be
hypothesized. A more serious problem occurs in defining the value of y corre-
sponding to x. The order properties of the real line were used in the preceding
proofs; in the more general theorems, an appeal to a fixed point theorem will be
substituted.
Definition. Let X, Y, Z be Banach spaces. Let F: X x Y -+ Z be a mapping.
The Cartesian product X x Y is also a Banach space if we give it the norm
II(x,y)11 Ilxll IIYII·
= + If they exist, the partial derivatives of Fat (xo,Yo) are
bounded linear operators DIF(xo, Yo) and D2F(xo, Yo) such that
lim IIF(xo + h, Yo) - F(xo, Yo) - DIF(xo, yo)hll/llhil = 0 (h E X, h -+ 0)
and
lim IIF(xo, Yo + k) - F(xo, Yo) - D2F(xo, yo)kll/llkil = 0 (k E Y, k -+ 0)
Thus DIF(xo, Yo) E C(X, Z) and D2F(xo, Yo) E C(Y, Z). We often use the
notation Fi in place of DiF.
138 Chapter 3 Calculus in Banach Spaces
Proof. We can assume that (xo, Yo) = (0,0). Select 8 > °so that
{(x,y) : IIxll ~ 8, IIYII ~ 8} c 0
Put A = D 2 F(0, 0). Then A E .c(Y, Z) and A-I E .c(Z, Y). For each x satisfying
Ilxll~ 8 we define Gx(Y) = Y - A-I F(x, y). Here lIyll ~ 8. Observe that if Gx
has a fixed point y. , then
from which we conclude that F(x, yO) = 0. Let us therefore set about proving
that G x has a fixed point. We shall employ the Contraction Mapping Theorem.
(Chapter 4, Section 2, page 177). We have
If IIxll ~ 80 and IIYII ~ c, then by the Mean Value Theorem III of Section 2,
page 123,
II
IIGx(y)1I ~ IIGx(O) + IIGx(Y) - Gx(O) I
~ ~c + sup IIG~(AY)II . IIYII
0';;;>'';;;1
c c
~2+2=c
°
that
F(x,J(x)) =
Section 3.4 Implicit Function Theorems 139
Proof. For x in 0 and y in the second space, define F(x, y) = f(x) - y. Put
Yo = f(xo) so that F(xo, Yo) = O. Note that DIF(x, y) = f'(X), and thus
DIF(xo, Yo) is invertible. By Theorem 4, there is a neighborhood N of Yo and
a continuously differentiable function 9 defined on N such that F (g(y), y) = 0,
or f(g(y)) - y = 0 for all YEN. •
~ klixi - x211
Second, we show that G maps B into B. If X E B, then
(x E B)
By the assumptions made about y, we can verify hypothesis (ii) of the preceding
theorem by writing
in which y E C[O,l]. Notice that when y = 0 the integral equation has the
solution x = O. We ask: Does the equation have solutions when is small? Ilyll
Here, we use the usual sup-norm on C[O, 1], as this makes the space complete.
(Weighted sup-norms would have this property, too.) Write the integral equation
as f(x) = y, where f has the obvious interpretation. Then 1'(x) is given by
(We may assume also that B(xo,6) en.} If IIXI - xoll < 6 and IIx2 - xoll < 6,
then the line segment S joining Xl to X2 satisfies S c B(xo, S} c n. By Problem
2, page 145, we have
Ilf(XI} - f(X2} - !'(XO}(XI - x2}11 ~ Ilxl - x211· sup 11!'(x) - !'(xo}11
xES
~€lh-X211 •
Theorem 9. Surjective Mapping Theorem II. Let X and Y
be Banach spaces. Let n be an open set in X. Let f : n -+ Y be
continuously differentiable. If Xo E n and f'(xo) has a right inverse in
.c(Y, X}, then f(n} is a neighborhood of f(xo}.
Proof. Put A = f'(xo} and let L be a member of .c(Y, X} such that AL = I,
where I denotes the identity map on Y. Let c = IILII. By the preceding lemma,
there exists 6 > 0 such that B(xo, 6} en and such that
Ilu - xoll ~ 6, Ilv - xoll ~ 6 ==> Ilf(u} - f(v} - A(u - v}11 ~ fc Ilu - vii
Let Yo = f(xo} and y E B(yo, 6/2c}. We will find X E n such that f(x} = y.
The point x is constructed as the limit of a sequence {xn} defined inductively
as follows. We start with the given Xo. Put Xl = Xo + L(y - Yo}. From then on
we define
Xn+l = Xn - L[f(xn} - f(Xn-l} - A(xn - Xn-l]
By induction we establish that Ilxn - xn-lll ~ 6/2 n and Ilxn - xoll ~ 6. Here
are the details of the induction:
Ilxl - xoll = IIL(y - yoll ~ clly - yoll ~ c6/(2c} = 6/2.
IIXn+1 - xnll ~ cllf(xn} - f(xn-d - A(xn - xn-I}11
~ c(1/2c)llxn - xn-lll ~ o/2 n + 1
IIXn+1 - Xo II ~ IIXn+1 - Xn II + IIXn - xn-lll + ... + Ilxl - Xo II
6 6 6
~ 2n+1 + 2n + ... + 2 ~ 6
Next we observe that the sequence [xnJ has the Cauchy property, since (for
m> n)
•
Example 3. Let f : ]R3 -+ ]R3 be given by
f(x) = y
171 = 2~t + 6 cos 6 - 66
172 = (6 + 6? - 4sin6
173 = log(6 + 1) + 56 + cos 6 - 1
Notice that f(O) = O. We ask: For y close to zero is there an x for which
f(x) = y? To answer this, one can use the Inverse Function Theorem. We
compute the Frechet derivative or Jacobian:
At x = 0 we have
The direct sum of n normed linear spaces Xl, X 2, ... ,Xn is denoted by
2:~=IEBXi. Its elements are n-tuples x = (Xl,X2, ... ,X n ), where Xi E Xi for
i = 1,2, ... , n. Although many definitions of the norm are suitable, we use
II xii = 2:~=1 Ilxill·
144 Chapter 3 Calculus in Banach Spaces
~ l";J";n
max IIDjf(x)11 L. Ilhill
'l=1
= max IIDjf(x)llllhll
l";;J";n
[Suggestion: Use the function g(x) = f(x) - f'(xo)x.] Determine whether the same
inequality is true when f'(xo) is replaced by an arbitrary linear operator. In this problem,
f : X -+ Y, where Y is any normed space.
3. Suppose F(xo,yo) = o. If Xl is close to Xo, there should be a YI such that F(XI,yd = o.
Show how Newton's method can be used to obtain YI. (Here F : X x Y -+ Z, and
X, Y, Z are Banach spaces.)
4. Prove Theorem 2.
5. Let f : !1 -+ Y be a continuously differentiable map, where !1 is an open set in a Banach
space, and Y is a normed linear space. Assume that f'(x) is invertible for each X E 0,
and prove that f(!1) is open.
6. Let Q be the point in [0,1] where COSQ = Q. Define X to be the vector space of all
continuously differentiable functions on [0,1] that vanish at the point Q. Define a norm
on X by writing Ilxll = sUPO';;t';;l Ix'(t)l. Prove that there exists a positive number 0
such that if y E X and lIyll < 0, then there exists an x E X satisfying
sin ox + x 0 cos = y
7. Let f be a continuous map from an open set !1 in a Banach space X into a Banach
space Y. Suppose that for some Xo in II, r(xo) exists and is invertible. Prove that f is
one-to-one in some neighborhood of Xo.
8. In Example 2, with the nonlinear integral equation, show that the mapping x ...... f'(x) is
continuous; indeed, it satisfies a Lipschitz condition.
9. Rework Example 2 when the term 2x(0) is replaced by QX(O), for an arbitrary constant
Q. In particular, treat the case when Q = o.
Proof. Let X be the Banach space, and assume f'(xo) =I- o. Then there exists
v E X such that f'(xo)v = -1. By the definition of f'(xo) we can take A > 0
and so small that Xo + AV is in nand
This means that t[j(xo + AV) - f(xo)] is within distance ~ from -1, and so is
negative. This implies f(xo + AV) < f(xo). •
If g2(X, y) =I- 0, then A = - f2(x, y)/ g2(X, y), and we recover system (1). The
method of Lagrange multipliers treats x and y symmetrically, and includes both
cases of the implicit function theorem. Thus y can be a differentiable function
of x, or x can be a differentiable function of y.
Figure 3.3
Figure 3.4
If there are several constraint functions, there will be several Lagrange multipli-
ers, as in the next example.
Example 3. Find tIle minimum distance from a point to a line in JR3. Let the
line be given as the intersection of two planes whose equations are (a, x) =:: k
and (b,x) =:: e. (Here, x, a, and b belong to JR3.) Let the point be c. Then H
should be
This H is a function of (Xl, X2, X3, A, Ji). The five equations to solve are
2(Xl - Cl) + Aal + Jibl = 2(X2 - C2) + Aa2 + Jib2 = 2(X3 - C3) + Aa3 + Jib3 = °
(a, x) - k = (b, x) - £ = °
We see that x is of the form x = c + o:a + (3b. When this is substituted in the
second set of equations, we obtain two linear equations for determining 0: and
(3:
(a, a)o: + (a, b)(3 = k - (a, c) and (a, b)o: + (b, b)(3 = £ - (b, c) •
•
Theorem 3. Lagrange Multipliers. Let J, gl,"" gn be contin-
uously differentiable real-valued functions defined on an open set n in
a Banach space X. Let M = {x En: gl(X) = ... = gn(x) = O}. If
Xo is a local minimum point of JIM (the restriction of J to M), then
there is a nontrivial1inear relation of the form
If r < J(xo), then the point (r, 0, 0, ... ,0) is not in F(U). Thus F(U)
does not contain a neighborhood of the point (f(XO),gl(XO), ... ,gn(XO) ==
(f(xo), 0, 0, ... ,0). By the Corollary in Section 3.4, page 143, F'(xo) is not
surjective. Since the range of F' (xo) is a linear subspace of jRn+l, we now know
that it is a proper subspace of jRn+l. Hence it is contained in a hyperplane
through the origin. This means that for some Ji,Al,'" ,An (not all zero) we
have
Section 3.5 Extremum Problems and Lagrange Multipliers 149
for all v EX. This implies the equation in the statement of the theorem. •
Example 4. Let A be a compact Hermitian operator on a Hilbert space X.
Then IIAII == max{IAI : A E A(A)}, where A(A) is the set of eigenvalues of A.
This is proved by Lemma 2, page 92, together with Problem 22, page 101. Then
by Lemma 2 in Section 2.3, page 85, we have II All == sup{I(Ax,x)1 : Ilxll == I}.
Hence we can find an eigenvalue of A by determining an extremum of (Ax, x)
on the set defined by Ilxll == 1. An alternative is given by the next result. •
Proof. Let Ax == AX, xi- o. Then f(x) == (Ax,x)/(x,x) == A. Recall that the
eigenvalues of a Hermitian operator are real. Let us compute the derivative of
f at x and show that it is O.
lim If(x
h-+D
+ h) - f(x) - I'(x)hl I Ilhll == 0
-2(Ax, h) + 2A(X, h) = 0 (h E X)
whence Ax = AX.
Extremum problems with inequality constraints can also be discussed in a
•
general setting free of dimensionality restrictions. This leads to the so-called
Kuhn-Tucker Theory.
Inequalities in a vector space require some elucidation. An ordered vector
space is a pair (X, ~) in which X is a real vector space and ~ is a partial order
in X that is consistent with the linear structure. This means simply that
x~y =>
x~y, A~O =>
In an ordered vector space, the positive cone is
P={x:x~O}
In order to prove this, suppose that (t, y) is an interior point of K and belongs
to H. Then for some hEX we have
0< t ~ !,(xo)h
0< y ~ G(xo) + G'(xo)h
Here the inequality y > 0 is interpreted to mean that y is an interior point of
the positive cone P. For A E (0,1) we have
Thus Xo + Ah lies in the constraint set and produces a larger value in f than
f(xo). This contradiction shows that H is disjoint from the interior of K.
Now use the Separation Theorem (Theorem 2 in Section 7.3, page 343). It
asserts the existence of a hyperplane separating K from H. Thus there exist
J.l E IR and </> E Y· such that iJ.li + \\</>\\ > 0 and
f'(xo)h + ¢[G'(xo)h] :( 0 (h E X)
f'(xo)h + ¢[G'(xo)h] = 0
In other words,
f'(xo) = -¢oG'(xo)
•
Problems 3.5
1. (a) Use Lagrange multipliers to find the maximum of xy subject to x + y = c. (b) Find
the shortest distance from the point (1,0) to the parabola given by y2 = 4x.
2. Let the equations f(x,y) = 0 and g(x,y) = 0 define two non-intersecting curves in JR2.
What system of equations should be solved if we wish to find minimum or maximum
distances between points on these two curves?
3. Show that in an ordered vector space with positive cone P, if B(x,r) C P, then
B(>'x, >.r) C P for>. ;;::: O.
4. Prove that the positive cone P determines the vector order.
5. Let A be a Hermitian operator on a Hilbert space. Define f(x) = (Ax,x) and g(x) =
(x,x) - 1. What are f'(x) and g'(x)? Find a necessary condition for the extrema of f
on the set M = {x : g(x) = O}. (Use the first theorem on Lagrange multipliers.) Prove
that your necessary condition is fulfilled by any eigenvector of A in M.
6. What is f'(x) in the lemma of this section if x is not an eigenvalue?
7. Use the method of Lagrange multipliers to find a point on the surface
(x - y)2 - z2 = 1 as close as possible to the origin in JR3.
8. Let A and B be Hermitian operators on a real Hilbert space. Prove that the stationary
values of (x, Ax) on the manifold where (x, Bx) = 1 are necessarily numbers>' for which
A - >'B is not invertible.
9. Find the dimensions of a rectangular box (whose edges are parallel to the coordinate
axis) that is contained in the ellipsoid a 2x 2 + /?y2 + c2 z2 = 1 and has maximum volume.
10. Find the least distance between two points, one on the parabola y = x2 and the other
on the parabola y = -(x - 4)2.
11. Find the distance from the point (3,2) to the curve xy = 2.
12. In JR3 the equation x 2 + y2 = 5 describes a cylinder. The equation 6x + 3y + 2z = 6
describes a plane. The intersection of the cylinder and the plane is an ellipse. Find the
points on this ellipse that are nearest the origin and farthest from the origin.
13. Find the minimum and maximum values of xy + yz + zx on the unit sphere in JR3
(x 2 + y2 + z2 = 1). See [Barb], page 21.
problems are not simple numbers but functions. We begin with some classical
illustrations, posing the problems only, and postponing their solutions until after
some techniques have been explained. Traditional notation is used, in which x
and yare real variables, x being "independent" and y being "dependent." This
harmonizes with most books on this subject.
Example 1. Find the equation of an arc of minimal length joining two points
in the plane. Let the points be (a,o) and (b,j3), where a < b. Let the arc
be given by a continuously differentiable function y = y(x), where y(a) = a
and y(b) = 13. The arc length is given by the integral J:
Jl + y'(X)2 dx. Here
y E C 1 [a, b]. The solution, as we know, is a straight line, and this fact will be
proved later. •
Example 2. Find a function y in C 1 [a, b], satisfying y(a) = a and y(b) = 13,
such that the surface of revolution obtained by rotating the graph of y about the
x-axis has minimum area. To solve this, one starts by recalling from calculus
that the area to be minimized is given by
(1)
The solution turns out to be (in many cases) a catenary, as shown later. Figure
3.5 shows one of these surfaces. •
°
(0,0) and (b,j3). There is no loss of generality in taking b > 0, and if the positive
direction of the y-axis is downward, then 13 > also. We ask for the curve along
which the particle would fall in the least time. If the curve is the graph of a
function y in C 1 [0, b), then the time of descent is
l+y'(x)2
(2) ----::c---"-:-'--:-''- dx
2gy(x)
as is shown later. In the integral, 9 is the acceleration due to gravity. This prob-
lem is the "Brachistochrone Problem," posed as a challenge by John Bernoulli
in 1696. Figure 3.6 shows two cases of such curves, corresponding to two choices
of the terminal point (b,j3). Both curves are cycloids, one being a subset of the
other.
154 Chapter 3 Calculus in Banach Spaces
Figure 3.5
,
,
~~~i -----T-~-~
, ::~'l-O :o,'i-=~
-------1-----1---------------------------
:.-~-+-
----t------------t-----------------·---------------------
! !
-1.2 +---~----l~------~ ~---~-~
-1. 01------+-',--+---1---+---+
-1_> ----------
i~
---------r-------- --------"Jo-~-------+-~-------f---
-1. 7,j.....----i----"--i---l---j.-~.L-l
In 1696, Isaac Newton had just recently become Warden of the Mint and
was in the midst of overseeing a massive recoinage. Nevertheless, when he heard
of the problem, he found that he could not sleep until he had solved it, and
having done so, he published the solution anonymously. Bernoulli, however,
knew at once that the author of the solution was Newton, and in a famous
remark asserted that he "recognized the Lion by the print of its paw". [West] •
The three examples given above have a common form; thus, in each one
Section 3.6 Calculus of Variations 155
(3) lb F(x,y(x),y'(x))dx
This leads to J: (F2u + F3u') = O. The second term can be integrated by parts.
The result is
l b
[F2(x,y(X),y'(X)) - d~F3(X'Y(X)'Y'(x))] u(x)dx = 0
By invoking the following lemma, we obtain the Euler equation.
•
Lemma.
o for every u
J:
Ifv is piecewise continuous on [a, b] and if u(x)v(x) dx =
in Cl [a, b] that vanishes at the endpoints a and b, then
v = O.
Write this as
dy dx
Vy2 - c2 c
This can be integrated to give cosh- 1 (y/c) = (x/c) + A. Without loss of general-
ity, we take the left-hand endpoint to be (0, a). The curve y = ccosh((x/c) + A)
passes through this point if and only if a = ccosh A. Hence c can be eliminated
to give us a one-parameter family of catenaries:
(6) a
y = cosh A cosh
(COSh A
-a- x + A
)
Here A is the parameter. If this catenary is to pass through the other given
endpoint (b, (3), then A will have to satisfy the equation
III. If the terminal point is above the envelope, two catenaries of the family (6)
pass through it. One of these is a local minimum in the problem but not
necessarily the absolute minimum. If it is not the absolute minimum, the
Goldschmidt solution again solves the problem.
IV. For terminal points sufficiently far above the envelope, the upper catenary
of the two passing through the point is the solution to the problem.
V. The Goldschmidt solution is a broken line from (0, a) to (0,0) to (b,O) and
to (b,j3). It generates a surface of revolution whose area is 7l"(a 2 + 13 2 ). •
Figure 3.7
Example 3 revisited. Consider again the Brachistochrone problem. We are
using (x, y) to denote points in }R2, t will denote time, and s will be arc length.
The derivation of the integral given previously for the time of descent is as
follows. At any point of the curve, the downward force of gravity is mg, where m
is the mass of the particle and 9 is the constant acceleration due to gravity. The
component of this force along the tangent to the curve is mg cos (J = mg (dy / ds),
where (J is the angle between the tangent and the vertical. (See Figure 3.8.) The
velocity of the particle is ds / dt, and its acceleration is d2 s / de.
~dS
dY~
dx
Figure 3.8
By Newton's law of motion (F = ma) we have mg(dy/ds) = m(d 2 s/dt 2 ), or
d2 s/dt 2 = g(dy/ds). Multiply by 2(ds/dt) to get
2 ds d 2 S _ 2 dy ds
r
dt dt 2 - 9 ds dt
whence
!!:.. 2
dt
( ds )
dt
_ 2 dy
- 9 dt and ( ~; = 2gy +C
158 Chapter 3 Calculus in Banach Spaces
ds !il:::. dt 1
dt = y2gy
ds J29Y
_1_+-=.Y-,.:'(--:,x),-2 dx
2gy(x)
Since we seek to minimize this integral, the factor 1/..j2g may be ignored. The
function F in the general theory is then F(u, v, w) = y'(1 + w2 )/v. Since
Fl = 0, Theorem 2 applies, and we can infer that y' (x )F3 (x, y( x), y' (x)) -
F (x, y( x), y' (x)) = c (constant). For the particular F in this example,
F 3(u, v, w) = w[v(1 + w2 )]-1/2. Thus
1 + y'(X)2
y(x) dx
subject to y E C 2[0, b] and y(O) = O. Notice that the value y(b) is not prescribed.
To solve such a problem, we require a modification of Theorem 1, namely:
lb F(x,y(x),y'(x)) dx
d
(7) dxF3(x,y(x),y'(x)) = F2(x,y(x)y'(x)) and F3(b,y(b),y'(b)) = 0
must be zero at x = b. The cycloids going through the initial point are given
parametrically by
X = k( ¢ - sin ¢)
{
y = k(l- cos¢)
The slope is
dy dy dx sin¢
dx = d¢ -;- d¢ = 1 - cos ¢
° °
This is at ¢ = 7r. The value x = b corresponds to ¢ = 7r, and k = b/7r. The
solution is given by x = (b/7r)(¢ - sin¢), y = (b/7r)(l- cos¢), ~ ¢ ~ 7r. •
lb F(x,y(x),y'(x))dx
°
1(81,82) = J: F(x,z(x),z'(x))dx and J(81,82) = J:C(x,z(x),z'(x))dx. The
minimum of 1(81,82) under the constraint J(8 1,82) = occurs at 81 = 82 = 0,
because y is a solution of the original problem. By the Theorem on Lagrange
Multipliers (Theorem 3, page 148), there is a nontrivial linear relation of the
form /LI'(O, 0) + ).}'(O, 9) = 0. Thus
al aJ al aJ
/L a81 +>'a8 1 =0 at (81,82)=(0,0), and /L a82 +>'a82 =0 at (0,0)
y(-I) = y(l) = °
This problem can be treated with Theorem 4, taking
(In these equations, subscript 2 means a partial derivative with respect to the
second argument of the function, and so on.) In the case being considered, we
have F2 = 1, F3 = 0, G 2 = 0, and G 3 = y'(x)([1 + y'(X)2]-1/2. The necessary
condition then reads
J.l-A~ y'(x) =0
dx }1 + y'(X)2
°
If J.l = 0, then A must be as well. Hence we are free to set J.l
the previous equation, arriving at
= 1 and integrate
AY'(X)
x- = Cl
}1 + y'(x)2
This can be solved for y' (x):
We see that the curve is a circle by writing this last equation in the form
Section 3.6 Calculus of Variations 161
Since the circle must pass through the points (-1,0) and (1,0), we find that
*,
Cl = 0 and that 1 + c~ = >.2. When the condition on the length of the arc is
imposed, we obtain £. = 2>' Arcsin from which>. can be computed. •
Example 7. (The Classical Isoperimetric Problem.) Among all the plane
curves having a prescribed length, find one enclosing the greatest area. We
assume a parametric representation x = x(t) and y = y(t) with continuously
differentiable functions. We can also assume that 0 ~ t ~ b and that x(O) = x(b),
y(O) = y(b) so the curve is closed. Let us assume further that as t increases from
o to b, the curve is described in the counterclockwise direction. The region
enclosed is then always on the left. Recall Green's Theorem, [Wid1], page 223:
where R is the region enclosed by the curve r and the subscripts denote partial
derivatives. A special case of Green's Theorem is
~
21r
r(-y dx + x dy) ~2 f'lRr(1 + 1) dx dy
= = Area of R
lo
b
( - ydx
dt
dY ) dt
- +x-
dt
r r
subject to
The Euler necessary condition is that for a suitable nontrivial linear combination
H = Jl.F+>.G,
H2 (t,x,x',y,y') = :t H3 (t,x,X',y,y')
:t
{
-J-LX' == [J-LX + )..y'(X,2 + y'2fl/2]
Upon integrating these with respect to t, we obtain
2J-LY == )..x' (X,2 + y'2fl/2 + A
-2J-Lx == )..y'(X,2 +y'2f 1/ 2 - B
If J-L == 0, we infer that x' == y' == 0, and then the "curve" is a straight line.
Hence J-L =I- 0, and by homogeneity we can assume J-L == ~. Then y - A ==
)..X'(X,2 + y'2)-1/2 and x - B == _)..y'(X,2 + y'2)-1/2. Square these two equations
and add to obtain the equation of a circle: (x - B)2 + (y - A)2 == )..2. •
y-axis
'-
velocity c 1
x-axis
velocity c2
Figure 3.9
If the coordinate system is as shown in Figure 3.9, and if the unknown point
on the x-axis is (x,O) then the time of passage is
T == ~J(x
Cl
- xd 2 + yr + ~J(x -
C2
X2)2 + y~ == cl 1Pl + C2 1P2
Section 3.6 Calculus of Variations 163
_ldpl+ -ldp2_ 0
cl dx c2 dx - ,
-1 1 -1 1
c l -(X-Xl)+C2 -(X-X2)=O
PI P2
cIl sin¢l = c2"l sin¢2
velocity c 1
velocity c2
,
<l>3~ velocity c3
Figure 3.10
J
Hence
kc(y) dy
and
x = Jl _ k2c(y)2
Example 8. What is the path of a light beam if the velocity of light in the
medium is c = ay (where a is a constant)?
Solution. The path is the graph of a function y such that
X= J kay
Jl- k 2 a 2 y 2
d
Y
Here A and k are constants that can be adjusted so that the path passes through
two given points. The equation can be written in the form
l a
b }1 + Y'(x)2
-'-------;--=-:---'--'-
c(y)
dx
This is to be minimized under the constraint that y(a) = Q and y(b) = {3. Here
the ray of light is to pass from (a, Q) to (b, {3) in the shortest time. By Theorem
2, a necessary condition on y can be expressed (after some work) as
Theorem 5. Suppose that Yl, ... ,Yn are functions (of t) in C 2 [a, b]
that minimize the integral
Proof. Take functions 'T}l, ... ,'T}n in C 2 [a, b] that vanish at the endpoints. The
expression
lb F(YI + th'T}l,'" ,Yn + On'T}n) dt
will have a minimum when (0 1 , ... , On) = (0,0, ... ,0). Proceeding as in previous
proofs, one arrives at the given equations. •
Geodesic Problems. Find the shortest arc lying on a given surface and
joining two points on the surface. Let the surface be defined by z = z(x, y). Let
the two points be (xo, Yo, zo) and (Xl,Yl,zd. Arc length is
Section 3.6 Calculus of Variations 165
subject to x E C 2[0, 1], y E C2[0, ll, x(O) = xo, x(l) = Xl, y(O) = Yo, y(l) = Y1.
Example 9. We search for geodesics on a cylinder. Let the surface be the
cylinder x 2 + z2 = 1, or z = (1 - x 2)1/2 (upper-half cylinder). In the general
theory, F(x, y, x', y') = y'x,2 + y,2 + (zxx' + Zyy')2. In this particular case this
is
F = [X,2 + y,2 + Z;X,2] 1/2 = [(1 _ X2)-lX'2 + y'2] 1/2
Then computations show that
of xx,2 of x' of of y'
-=0
ox (l-x 2)2F ox' (1- x 2)F oy oy' F
To simplify the work we take t to be arc length and drop the requirement that
o ~ t ~ 1. Since dt = ds = y'X,2 + y,2 + Z,2 dt, we have X,2 + y,2 + z'2 = 1 and
F(x,y,x',y') = 1 along the minimizing curve. The Euler equations yield
(1 - X2)X" + 2xx,2
and y" = 0
(1 - X 2 )2
The first of these can be written x" = xx,2/(x 2 - 1). The second one gives
y = at + b, for appropriate constants a and b that depend on the boundary
conditions. The condition 1 = F2 leads to x,2/(1 - x 2) + y,2 = 1 and then to
x,21 (1- x 2) = 1- a2. The Euler equation for x then simplifies to x" = (a 2 - 1)x.
There are three interesting cases:
Case 1: a = 1. Then x" = 0, and thus both x(t) and y(t) are linear expressions
in t. The path is a straight line on the surface (necessarily parallel to
the y-axis).
Case 2: a = O. Then x" = -x, and x = ccos(t + d) for suitable constants c
and d. The condition x,2/(1 - x 2) = 1 gives us c = 1. It follows that
x = cos(t + d), y = b, and z = VI - x 2 = sin(t + d). The curve is a
circle parallel to the xz-plane.
Case 3: 0 < a < 1. Then x = ccos( VI - a2 t+d), and as before, c = 1. Again
z = sin( ~ t + d), and y = at + b. The curve is a spiral. •
~ 1
o ~
Figure 3.11
166 Chapter 3 Calculus in Banach Spaces
-\ +1
Figure 3.12
Direct Methods in the Calculus of Variations. These are methods that
proceed directly to the minimization of the given functional without first looking
at necessary conditions. Such methods sometimes yield a constructive proof
of existence of the solution. (Methods based solely on the use of necessary
conditions never establish existence of the solution.)
The Rayleigh-Ritz Method. (We shall consider this again in Chapter 4.)
Suppose that U is a set of "admissible" functions, and <I> is a functional on U
that we desire to minimize. Put p = inf {<I> (u) : u E U}. We assume p > -00,
and seek a u E U such that <I> (u) = p. The problem, of course, is that the
infimum defining p need not be attained. In the Rayleigh-Ritz method, we start
with a sequence of functions WI, W2, . .. such that every linear combination
Cl WI + ... + Cn Wn is admissible. Also, we must assume that for each U E U and
for each E > 0 there is a linear combination v of the Wi such that <I> ( v) :( <I> (u) + Eo
For each n we select Vn in the linear span of WI, ... ,Wn to minimize <I> ( vn ). This
is an ordinary minimization problem for n real parameters Cl, ... , Cn . It can be
attacked with the ordinary techniques of calculus.
IDa
Example 10. We wish to minimize the expression I: (4); + 4>~) dxdy sub-
ject to the constraints that 4> be a continuously differentiable function on the
rectangle R = {(x,y) : 0:( x:( a, 0:( y:( b}, that 4> = 0 on the perimeter of
II
R, and that R 4>2 dx dy = 1. A suitable set of base functions for this problem
is the doubly indexed sequence
2 . mrx . m7ry
unm(x, y) = ~ sm - - sm -b- (n,m;;:: 1)
vab a
It turns out that this is an orthonormal set with respect to the inner prod-
uct (u,v) = IIR
u(x,y)v(x,y) dxdy. We are looking for a function 4> =
I:~m=1 CnmU nm that will solve the problem. Clearly, the function 4> vanishes
on the perimeter of R. The condition II 4>2 = 1 means I:~m=1 C;'m = 1 by the
Parseval identity (page 73). Now we compute
U xx + U yy = 0 or (::2 :;2)
+ u(x, y) =0
Laplace's equation arises as the Euler equation in the calculus of variations when
we seek to minimize the integral ffR(u; + u~)dxdy subject to the constraint
that U be twice continuously differentiable and take prescribed values on the
boundary of the region R. To illustrate the Rayleigh-Ritz method, we take R
to be the circle {(x,y): x 2 +y2 ~ 1}. Then polar coordinates are appropriate.
Here are the formulas that are useful. (They are easy but tedious to derive.)
x = r cos () y = r sin () r = v'X2 + y2 () = tan-l(y/x)
x y y x
Ux = U r -:; - U(J r2 uy = U r -r +U(J2
r
u x2 + u 2y = u r2 + r- 2u (J2 dxdy = rdrd()
1= 127r 11(u;+r-2u~)rdrd()
The boundary points of the domain are characterized by their value of (). Let
the prescribed boundary values of U be given by f(()). Let f" be continuous.
Then (by classical theorems) f is represented by its Fourier series:
These are the values that u(r, B) must assume when r = 1. We therefore postu-
late that u(r, B) have the form
00
The integral I consists of two parts, of which the first is h = Jt0 r Jt"
0 u; dB dr.
Now
ur = L:' [J~(r) cosnB + g~(r) sin nB]
and consequently,
12 = 10 1
r- 1 10 u~ dB.
27r
We have
00
and consequently,
I: n21
00 1
Thus h = 7r r- 1 [fn(r)2 + gn(r)2] dr. Hence
n=l 0
1= 7r I:
00
n=O 0
1[rf~(r)2 + rg~(r)2 +
1
n 2r-1 fn(r? + n 2r- 1gn (r)2] dr
Cn= an, all other Cj being = O. Hence fn(r) = anrn. Similarly, gn(r) = bnrn.
Thus the u-function is
u(r, 9) = ~o +
n=l
•
Problems 3.6
1. Among all the functions x in Cl[a,b] that satisfy x(a) = 0, x(b) = (3, find the one for
which I: u(t)2X'(t)2 dt is a minimum. Here u is given as an element of C[a, b].
3. Find a function y E C 2 [0, 1] that minimizes the integral Iol[~Y'(x)2 +y(x)y'(x) +y'(x) +
y(x)]dx. Note that y(O) and y(l) are not specified.
4. Prove this theorem: If {un} is an orthonormal system in L2(S, /1) and {v n } is an or-
thonormal system in L2 (T, v), then {un 0 Vm : 1 ~ n < 00, 1 ~ m < oo} is orthonormal
in L2(S X T). Here Un 0 Vm is the function whose value at (s, t) is un(s)Vm(t). Explain
how this theorem is pertinent to Example 10.
5. Determine the path of a light beam in the xy-plane if the velocity of light is l/y.
6. Find a function u in Cl[O, 1] that minimizes the integral 10 [u'(t)2 + u'(t)] dt subject to
1
4 coshc
f(c, x) = - - cosh [ - - x - c]
coshc 4
10. Suppose that the path of a ray of light in the xy-plane is along the parabola described
by 2x = y2. What function describes the speed of light in the medium?
11. Find the function u in C 2 [0, 1] that minimizes the integral 101{[u(t)]2+[U'(t)j2} dt subject
to the constraints u(O) = u(l) = 1.
Chapter 4
Basic Approximate
Methods in Analysis
4.1 Discretization
170
Section 4.1 Discretization 171
interval, say at points a = to < tl < ... < tn+l = b. From values of a function
u at these points, one can create a function 11 on [a, b] by some interpolation
process. It is important to recognize that the problem itself is usually changed
by the passage to a discrete set.
Let us consider an idealized situation, and enumerate the steps involved in
a solution by "discretization."
1. At the beginning, a problem P is posed that has as its solution a function
u defined on a domain D. Our objective is to determine u, or an approxi-
mation to it.
2. The domain D is replaced by a discrete subset D h , where h is a parameter
that ideally will be allowed to approach zero in order to get finer discrete
sets. The problem P is replaced by a "discrete version" Ph.
3. Problem Ph is solved, yielding a function Vh defined on Dh.
4. By means of an interpolation process, a function Vh is obtained whose do-
main is D and whose values agree with Vh on Dh.
5. The function Vh is regarded as an approximate solution of the original prob-
lem P. Error estimates are made to justify this. In particular, as h -+ 0, Vh
should converge to a solution of P.
The parameter h in the previous discussion is just the step size in the boundary-
value problem.
Directly from Equations (1), we have
(l:::;i:::;n)
We use these (without the error terms involving 7) to approximate u' and u" at
the points ti. Since we wish to use u as the solution of the original problem, we
use a different letter to denote the solution of the discretized problem. Thus v
will be a vector of n + 2 components, and Vi is expected to be an approximate
value of U(ti). The problem Ph is
2Vi + Vi-l
(4) {
Vi+l -
h2
Vi+l - Vi-l
+ ai --~ + biVi = Ci
(
1~ i ~ n
)
Va = Vn+l = 0
Here we have written ai = a(ti), and so on. Problem (4) is a system of n linear
equations in n unknowns Vi. It is solved by standard methods of linear algebra,
such as Gaussian elimination. The ith equation in the system can be written in
the form
It is clear that the coefficient matrix for this system is tridiagonal, because the ith
equation contains only the three unknowns Vi-I, Vi, and Vi+l. Furthermore, if h
is small enough and if b(t) < 0, the matrix will be diagonally dominant. Indeed,
assume that hlail ~ 2. Then h- 2 ± ~h-lai is nonnegative and -2h- 2 + bi is
nonpositive. The condition for diagonal dominance in a generic n x n matrix
A = (Aij) is
n
IAiil- 2: IAijl > 0 (i=l, ... ,n)
j=1
#i
In this particular case, the condition becomes
(6)
We write System (5) in the form Av = c, where A is the tridiagonal matrix and
C now denotes the vector having components Ci. The vectors V and C should be
"column vectors."
Let us assume that the linear system has been solved to produce the vector
v. The next step is to "fill in" the values of a continuous function v such that
V(ti) = Vi (1 ~ i ~ n). This can be done in many ways, such as by means of a
cubic spline interpolant. Another way of interpreting this step is to say that we
have extended the function V to the function v.
Section 4.1 Discretization 173
On the other hand, the solution to the discrete problem satisfies the equation
By subtracting one equation from the other, we arrive at an equation for the
"error" e = u - v:
Here d; = f2U(4) (T;) + ia;u(3)(~;). Equation (9) has the same coefficient matrix
as Equation (8). If we denote that matrix by A h , Equation (9) has the form
Thus as h -+ 00, the discrete solution converges to the true solution at the speed
O(h2). (O(h) is a generic function such that IO(h)1 ~ ch.) •
Proof. Let x be any nonzero vector, and let Y = Ax. Select i so that IXil =
Ilxlloo' Then
n
aiiXi +L aijXj = Yi
j=l
#i
n
laiixil ~ IYil + L laijllxjl
j=l
#i
n n
Hence
Proof. We derive the first formula and leave the second as a problem. By
Taylor's Theorem we have
Observe that the expression ~[j(4)(6) + 1(4)(6)] is the average of two values of
1(4) on the interval [t - h, t + h]. Its value therefore lies between the maximum
and minimum of 1(4) on this interval. If 1(4) is continuous, this value is assumed
at some point ~ in the same interval. Hence the error term can be written as
-h 2 1(4)(~)/12. •
Example 2. We give another illustration of the discretization strategy. Con-
sider a linear integral equation, such as
lb k(s,t)x(s)ds = v(t)
In this equation, the kernel k and the function v are prescribed. We seek the
unknown function x.
Suppose that a quadrature formula of the type
(1 ~ i ~ n)
Applying the quadrature formula leads to a discrete version of the integral equa-
tion: n
2: cjk(sj,Si)X(Sj) = v(sd (1 ~ i ~ n)
j=l
Problems 4.1
into an equivalent one on the interval [0,1], we change the independent variable from t
to s using the equation t =!3s + a(1 - s). What is the new boundary value problem?
4. To change a boundary value problem
x(t) = 11 K(s,t)x(s)ds+w(t)
then a solution can be found in the form x = w + 2:~=1 aivi. Carry out the solution
based on this idea. Illustrate, by using the separable kernel e"~t.
The term iteration can be applied to any repetitive process, but traditionally
it refers to an algorithm of the following nature:
We can also write Xn = Fnxo, where FO is the identity map and Fn+l =
F 0 Fn. In such a procedure, the entities Xo, Xl, . .. are usually elements in
some topological space X, and the map F : X --+ X should be continuous. If
lim n --+ oo Xn exists, then it is a fixed point of F, because
Such a mapping is defined from a metric space X into itself and satisfies an
inequality
in which 0 is a positive constant less than 1. Complete metric spaces were the
subject of Problem 48 in Section 1.2, page 15. Every Banach space is neces-
sarily a complete metric space, it being assumed that the distance function is
d(x, y) = Ilx-YII.A closed set in a Banach space is also a complete metric space.
Since most of our examples occur in this setting, the reader will lose very little
generality by letting X be a closed subset of a Banach space in the Contraction
Mapping Theorem.
(4)
In order to establish the Cauchy property of the sequence [xnl, let n > Nand
m > N. There is no loss of generality in supposing that m ~ n. Then from
Equation (4),
d(xm' xn) :::;; d(xm, Xm-l) + d(Xm-l, Xm-2) + ... + d(Xn+l' xn)
: :; [om-l + Om-2 + ... + on]d(Xl'XO)
:::;; [ON + oN+l + ... ]d(Xl'XO)
= ON (1- O)-ld(Xl' XO)
Since 0 :::;; 0 < 1, limN-->oo ON = O. This proves the Cauchy property. Since the
space X is complete, the sequence converges to a point ~. Since the contractive
property implies directly that F is continuous, the argument in Equation (2)
shows that ~ is a fixed point of F.
If "l is also a fixed point of F, then we have
If ~ -11], then d(t;,1]) > 0, and Inequality (5) leads to the contradiction () ~ 1.
This proves the uniqueness of the fixed point. •
178 Chapter 4 Basic Approximate Methods in Analysis
-oo<r<oo
We will seek a solution x in the space C[O, 1]. This space is complete if it is
given the standard norm
/lxll 00 = sup
t
Ix(t)1
Jo(z) = - 11"
a
7r
cos(zsin8)d8
The space C(8), accompanied by this norm, is complete. Since the initial-value
problem is equivalent to the integral equation
I(Au - Av)(s)1 ~1 s
If(t, u(t)) - f(t, v(t)) Idt
~1 s
Alu(t) - v(t)1 dt
s
=A 1 e2Ate-2Atlu(t)-v(t)ldt
s
:::; Allu - vll w 1 e2At dt
and that
IIAu - Avll w ~ ~llu - vll w •
Example 2. Does the following initial value problem have a solution in the
space e[O, 10]7
x(O) = 0
Hence, the hypothesis of Theorem 3 is satisfied, and our problem has a unique
solution in C[O, 10]. •
Example 3. If 1 is continuous but does not satisfy the Lipschitz condition in
Theorem 3, the conclusions of the theorem may fail. For example, the problem
x' = x 2 / 3 , x(O) = 0 has two solutions, x(s) = 0 and x(s) = s3/27. There is no
Lipschitz condition of the form
a function f : S x ]Rn -+ ]Rn. We then adopt any convenient norm on ]Rn, and
define the norm of x to be
The setting for the theorem is now C(S,]Rn), which is the space of all continuous
maps x : S -+ ]Rn, normed with I t.
The equation x/(s) = f(s,x(s)) now
represents the system of differential equations referred to earlier. For further
discussion see the book by Edwards [EdwJ, pp. 153-155.
The use of iteration to solve differential equations predates Banach's result
by many years. Ince [4J says that it was probably known to Cauchy, but was
apparently first published by Liouville in 1838. Picard described it in its gen-
eral form in 1893. It is often referred to as Picard iteration. It is rarely used
directly in the numerical solution of initial value problems because the step-by-
step methods of numerical integration are superior. Here is an artificial example
to show how it works.
Example 5.
X' = 2t(1 + x) x(O) =0
The formula for the Picard iteration in this example is
It appears that we are producing the partial sums in the Taylor series for et2 -I,
and one verifies readily that this is indeed the solution. •
In some applications it is useful to have the following extension of Banach's
Theorem:
This shows that F~ is also a fixed point of Fm. By the uniqueness of~, F~ =~.
Thus F has at least one fixed point (namely ~), and ~ can be obtained by
iteration using the function Fm. If x is any fixed point of F, then
Fx=x
by the first part of the proof. If e > 0, we can select an integer N having the
property
(1 ::;; i ::;; m)
Since each integer j greater than Nm can be written as j = nm+i, where n ~ N
and 1 ::;; i ::;; m, we have
x=Ax+v
(Ax)(t) = it K(t,s)x(s)ds
It is clear that
F 2x = A2 x + Av + V
F3 X = A3 X + A 2 v + Av + v
and so on. Thus
Problems 4.2
1. From the existence theorem proved in the text deduce a similar theorem for the initial
value problem
x'(s) = f(s,x(s)) x(a) =c
2. Let F be a mapping of a Banach space X into itself. Let Xo EX, r
Assume that on the closed ball B(xo,r) we have
> 0, and °:'( A < 1.
x(t) = 1 t
[X(S) + s]sinsds
has a unique solution in e[O, 11"/2], and give an iterative process whose limit is the solution.
7. Prove that if X is a compact metric space and F is a mapping from X to X such that
d(Fx, Fy) < d(x, y) when x -# y, then F has a unique fixed point.
8. Let F be a contraction defined on a metric space that is not assumed to be complete.
Prove that
inf d(x, Fx) =
x
°
Section 4.2 The Method of Iteration 185
9. Let F be a mapping on a metric space such that d(Fx, Fy) < d(x, y) when x '" y. Let x
be a point such that the sequence Fnx has a cluster point. Show that this cluster point
is a fixed point of F (Edelstein).
10. Carry out 4 steps of Picard iteration in the initial value problem x' = x + 1, x(O) = O.
11. Give an example of a discontinuous map F : JR ---t JR such that F 0 F is a contraction.
Find the fixed point of F.
12. Extend the theorem in Problem 7 by showing that the fixed point is the limit of Fnx,
for arbitrary x.
13. The diameter of a metric space X is
This is allowed to be +00. Show that there cannot exist a surjective contraction on a
metric space of finite nonzero diameter. (Cf. Problem 4.)
14. Let X be a Banach space and f a mapping of X into X. Are these two properties of f
equivalent?
(i) f has a fixed point.
(ii) There is a nonempty closed set E in X such that f(E) C E and such that IIf(x)-
f(y)1I < ~ Ilx - yll for all x, y in E.
Prove that the set {x : d(x, Tx) ::;; e:} is nonempty, closed, and of diameter at most
c:(1 _ >.)-1.
16. The Volterra integral equation
XI = f(xo), r = >'(1 - A)-Id(xo, xd, and B(xJ, r) C U. Prove that f has a fixed point
in B(xJ, r).
24. Prove that the following integral equation (in which u is the unknown function) has a
continuous solution if 1>'1 < i.
u(t) - >.11 J t : 3 sin[tu(s)] ds = et (0 ~ t ~ 1)
25. Let F be a contraction defined on a Banach space. Prove that I - F is invertible and
that (I - F)-I = limn Hn, where Ho = I and Hn+1 = 1+ FHn.
26. Prove that the following integral equation has a solution in the space G[O, 1].
27. In the study of radiative transfer, one encounters integral equations of the form
in which u represents the flux density of the radiation at a specified wave length. Prove
that this equation has a solution in the special case k(t) = sin t.
Recall the Neumann Theorem in Section 1.5, page 28, which asserts that if a
linear operator on a Banach space, A: X --+ X, has operator norm less than 1,
then I - A is invertible, and
00
The series in this equation is known as the Neumann series for (I - A)-I.
This theorem is easy to remember because it is the analogue of the familiar
geometric series for complex numbers:
1
(2) --=l+z+z 2
+z 3 + ... Izl < 1
1-z
In using the Neumann series, one can generate the partial sums Xn
I:Z=o Akv by setting Xo = Vo = v and computing inductively
(3) vn = AVn-1 Xn = Xn-I + Vn (n=1,2, ... )
(5) (1 - "\A)x = v
(Ax)(t) = 11 et-Sx(s)ds
= 11 et - 17 x(a)da = (Ax)(t)
+ (_1__
= v
1-..\
I)Av = v + -..\-Av
1-,,\ •
Example 3. Another important application of the Neumann series occurs in a
process called iterative refinement. Suppose that we wish to solve an operator
188 Chapter 4 Basic Approximate Methods in Analysis
Proof. Let the sequence [xnl be defined by the algorithm, and let
n
Yn = B L(I - AB)kv
k=O
2. Prove that if A is invertible and if the operator B satisfies IIA - BII < II A -111- 1, then
B is invertible. What does this imply about the set of invertible elements in £(X, X)?
(Here X is a Banach space.)
3. Prove that if infA III - >'AII < 1, then A is invertible.
4. If IIAII is small, then (I - A)-l ~ 1+ A. Find € > 0 such that the condition IIAII < €
implies
5. Make this statement precise: If IIAB - III < 1, then 2B - BAB is superior to B as an
approximate inverse of A.
6. Prove that if X is a Banach space, if A E £(X, X), and if IIAII < 1, then the iteration
Xn +1 = AXn +b
converges to a solution of the equation x = Ax + b from any starting point Xo.
7. Let X and Y be Banach spaces. Show that the set n of invertible elements in £(X, Y) is
an open set and that the map f: n -+ £(Y,X) defined by f(A) = A-1 is continuously
differentiable.
8. Give an example of an operator A that has a right inverse but is not invertible. Observe
that in the theory of iterative refinement, A need not be invertible.
9. Prove that if the equation Ax = v has a solution xo and if III - BAli < 1, then Xo =
(BA)-l Bv, and a suitable modification of iterative refinement will work.
10. In Example 2, prove that the solution given there is correct for all >. satisfying>. # 1. In
particular, it is not necessary to assume that II>'AII < 1 in this example.
11. In Example 2, compute IIAII.
12. Show how to solve the equation (I - >.A)x = v when A is idempotent (Le., A2 = A).
13. Let A be a bounded linear operator on a normed linear space. Prove that if A is nilpotent
(i.e., Am = 0 for some m ~ 0), then I - A is invertible. Give a formula for (I - A)-l.
14. Prove this generalization of the Neumann Theorem. If A is a bounded linear transfor-
mation from a Banach space X into X such that the sequence Sn = l:~=o Ak has the
Cauchy property, then (I _A)-l exists and equals limn-+oo Sn. Give an example to show
that this is a generalization.
15. Prove or disprove: If A is a bounded linear operator on a Banach space and if IIAmll < 1
for some m, then (I - A)-l = l:;;"=oAk.
16. A Volterra integral operator is one of the form
Assume that A maps eta, b] into eta, b]. Prove that (I - A)-1 exists and is given by
the usual Neumann series. Refer to Section 4.2 for further information about Volterra
integral equations.
17. Define A : e[O, 1] -t e[O, 1] by the following equation, and prove that A is surjective.
18. Prove that the set of nonsingular n x n matrices is open and dense in the set of all n x n
matrices.
19. Let ¢ E e[O, 1] and satisfy ¢(t) > 0 on [0,1]. Put K(s, t) = ¢(s)/¢(t) and
Prove that A2 = A. What are the implications for the integral equations AAx = x + w?
20. Suppose that the operator A satisfies a polynomial equation 2::7=0 CjAj = 0 in which
co -# o. Prove that A is invertible and give a formula for its inverse.
21. Prove that if IIAII < 1, then
_ 1 _ :( 11(1 _ A)-III :( _I_
1+ IIAII ~ ~ 1- IIAII
22. Assume that Am+l = A for some integer m ~ 1, and show how to solve the equation
x - AAx b. =
23. Investigate the nature of the solutions to the Fredholm integral equation x(t) = 1 +
Jo
1
x(st) ds.
Make reasonable assumptions about K and compute the Frechet derivative F'(x). Make
further assumptions about K and prove that F'(x) is invertible.
25. In Example 3, do we have A-I = B(AB)-I?
26. Find the connection between the Neumann Theorem and Lemma 1 in Section 4.1. (That
lemma concerns diagonally dominant matrices.)
27. Let a and b be elements of a (possibly noncommutative) ring with unit 1. Show that the
partial sums of the series 2::;;"=0
b(1 - ab)k can be computed by the formulas Xo = b and
Xn+l = Xn + b(1 - ax n ).
Proof·
(1 - p)2 ::::: (1 - P)(1 - P) ::::: 1 - 2P + p2 ::::: 1 - P
Use Rand N for "range" and "null space." Then the preceding theorem shows
that
(2) R(P) ::::: N(1 - P)
Applying this to 1 - P, we get
(3) R(1 - P) ::::: N(P) •
It should be noted that the range of a projection P is necessarily closed,
because the continuity of P implies that the set {x : (I - P)x ::::: 0 } is closed.
This is a special property not possessed by all elements of £(X, X). Notice also
that P acts like the identity on its range. Thus, every projection can be regarded
as a continuous linear extension of the identity operator defined initially on a
subspace of the given space. If P is a projection of X onto V, then V is closed,
and we say that "P is a projection of X onto V," writing P : X -» V, where
the double arrow signifies a surjection.
192 Chapter 4 Basic Approximate Methods in Analysis
(5) (x E X)
i=1
Proof. Select 1/Ji E V' such that for any v E V,
n
i=1
The functionals 1/Ji are linear and continuous, by Corollary 1 in Section 1.5, page
26. (See also the proof of Corollary 2 in the same section.) For each x E X,
Px E V, and so Px = L::=I1/Ji(PX)Vi. Hence we can let </>i(X) = 1/Ji(PX).
Being a composition of continuous linear maps, ¢i is also linear and continuous.
The equation PVj = Vj implies that </>i(Vj) = 8ij by the uniqueness of the
representation of Px as a linear combination of the basis vectors Vi. •
A set of vectors VI, V2, ... and a set of functionals </>1, </>2, ... is said to form a
biorthogonal system if </>i(Vj) = 8ij for all i and j. The book [Brez] is devoted
to this topic.
In practical problems, a projection is often used to provide approximations.
Thus, if x is an element of a normed space X and if V is a subspace of X, it
may be desired to approximate x by an element of V. If V E V, the error or
deviation of x from V is Ilx - vii,
and the minimum deviation or distance
from x to V is
dist(x, V) = inf Ilx -
vEV
vii
This quantity represents the best that can be done in approximating x by an
element of V. In most normed spaces it is quite difficult to determine a best
approximation to x. That would be an element v E V such that
(6) Ilx - vii
= dist(x, V)
Such an element need not exist. It will exist if V is finite dimensional and in
certain other special cases. Usually, we find a convenient projection P : X V ---#
Ilx - Pxll = II(x - v) - P(x - v)11 = II(I - P)(x - v)11 :,;; III - plllix - vii
(8) x - Px..l V (x E X)
(9)
Hence P and I - P have operator norms at most 1. Inequality (7) now shows that
Px is the best approximation to x in the subspace V. Furthermore, x - Px is the
best approximation of x in V.L. (This last space is the orthogonal complement
of V in the Hilbert space X.)
Example 1. Consider the familiar space era, bJ. In it we single out for special
attention the subspace II n - 1 consisting of all polynomials of degree at most n-1.
This has dimension n. Now select points tl < t2 < ... < tn in [a, bj, and define
polynomials
fi(S) = IT
j=1
s-tj
ti - tj
#i
These polynomials have degree n - 1 and satisfy the equation
(l:';;i,j:';;n)
This is a special case of Equation (4) above. The operator L, defined for x E
era, bjby the equation
n
Lx = L x(ti)fi
i=l
(10) (j E N)
(11)
Does this strategy have any chance of success? It depends on whether the
sequence [x n ), arising as outlined above, converges. Assume that Xn -+ x. Let
us verify that x is a solution: Ax = b. By the continuity of A, AXn -+ Ax. Since
Ilpnll= 1, Pn(Ax n - Ax) -+ O. But PnAx n = Pnb by our choice of x n . Hence
Pnb - PnAx -+ O. In the limit, this gives us b = Ax.
Notice that this proof uses the essential fact that PnY -+ Y for all y. For
our general theorem (in any Banach space) this assumption is needed.
(13)
Section 4.4 Projections and Projection Methods 195
(14)
Often we insist that Xn E Vn , but this is not essential. One hopes that the
sequence [xnJ will converge to a solution of the original problem. We shall give
some positive results in this direction. These apply to a problem of the form
(15) x-Ax=b
(17) x - PAx = x - Px + Pb
(18) x- PAx = Pb
x - x - PA(x - x) = x - Px
or
(I - P A)(x - x) = x - Px
Thus we have
x - x = (I - PA)-I(X - Px)
(19) x-Ax=b
(21)
(22)
(This estimate comes from the proof of the Neumann Theorem.) Also, we know
that
Ilx - Pn X l1 2
= II. f
t=n+l
2
(X,Vi) Vi Il = f
i=n+l
I(X,ViW ~0
We conclude therefore from Inequality (22) that the approximate solutions Xn
converge to x as n ~ 00. We summarize this discussion in the next theorem.
(23) (l:::;;i:::;;n)
Of course, the Neumann Theorem can be used to solve the equation (J -A)x = b.
It gives x = (I - A)-lb = :L:~=o Anb. There seems to be no obvious connection
between this solution and the one provided by Theorem 8.
In the general projection method, in solving the equation
(24) Pn(Ax - b) = 0
we need not confine ourselves to the case where x is chosen in the range of Pn .
Instead, we can let x be a linear combination of other prescribed elements, say
x = L~=1 CiUi' In this case, we attempt to choose Ci so that
(25) Pn(tCjAUj-b) =0
)=1
(27) ePi('i:CjAUj - b) =0 (1 ~ i ~ n)
J=1
or
n
(28) L cjePi(Auj) = ePi(b) (1 ~ i ~ n)
j=1
This is a system of linear equations having coefficient matrix (ePi (Auj )). In the
next two sections we shall see examples of this procedure.
Problems 4.4
1. Let {UI, ... , Un} be a linearly independent set in a inner-product space. Prove that the
Gram matrix, whose elements are (Ui,Uj), is nonsingular.
2. Let P be a projection of a normed space X onto a subspace V. Prove that Px = 0 if and
only if <p(x) = 0 for each <P in the range of P·.
3. Let PI, P2, . .. be a sequence of projections on a normed space X. Suppose that
Pn+IPn = Pn for all n and that the union of the ranges of these projections is dense in
X. Suppose further that sUPn IIPnl1 < 00. Prove that Pnx -t X for all x E X.
4. Let H, P2, . .. be a sequence of projections on a Banach space X. Prove that if Pnx -t X
for all x E X, then sUPn lipnll < 00 and the union of the ranges of the projections is
dense in X. Hint: The Uniform Boundedness Theorem is useful.
5. Let X be a Banach space and H, P2, . .. projections on X such that Pnx -t X for every
x. Suppose that A is an invertible element of £(X, X). For each n, let Xn be a point
such that Pnxn = Xn and Pn(Axn - b) = O. Prove or disprove that the sequence [xnl
necessarily converges to the solution of the equation Ax = b.
6. Let {<PI, ... ,<Pn} be a linearly independent set in X*. Is there a projection P: X -t X
having rank n of the form Px = I::I <Pi(X)Vi? (The rank of a linear operator is the
dimension of its range.)
7. Adopt the notation of Theorem 4, and prove that <Pi (Px) = <Pi (x) for all i in {1, 2, ... , n}
and for all x in X.
8. Prove that the operator L in Example 1 is a projection. Prove that IILII = II I: Il!illi oo '
9. Let A and P be elements of £( X, X), where p2 = P. Let V denote the range of P. Show
that P AJV E £(V, V). Is P AJV invertible?
10. Prove a variant of Theorem 6 in which P is an arbitrary linear operator and x satisfies
Px=x.
11. In the setting of Theorem 8, prove that the solution to the problem is given by x =
2::"=0
Anb. How is this solution related to the one given in the theorem, namely, x =
limx n ?
12. Consider the familiar sequence space Co. (It was described in Problem 1.2.16, page 12.)
We define a projection P : Co -t Co by selecting any set of integers J and setting
x(n) (n EN" J)
(Px)(n) = { 0
(n E J)
Prove that P is a projection. Identify the null space and range of P. Give the formula
for 1- P. Compute IiPIl and III - pli. How many projections of this type are there?
What is the distance between any two different such projections?
198 Chapter 4 Basic Approximate Methods in Analysis
13. Let {UI' U2, ... , Un} and {VI, V2, ... ,vn } be sets in a Hilbert space, the second set be-
ing assumed to be linearly independent. Define Ax = I::I (x, Ui)Vi. Determine the
necessary and sufficient conditions on {Ui} in order that A be a projection, i.e., A2 = A.
14. (Variation on Problem 5.) Let X be a Banach space, and let PI, P2, ... be projections
on X such that Pnx -+ x for each x in X. Assume that /lPn /I = 1 for all n. Let A
be a linear operator such that /II - A/I < 1. If the points Xn satisfy Pnxn = Xn and
=
Pn(Ax n - b) 0, then the sequence [Xn] converges to a solution of the equation Ax b. =
15. In JR2, let U = (1,1), V = (1,0), and Px = (x, u)v. Prove that P is a projection. Using the
Euclidean norm in JR2, compute /lpii. This problem illustrates the fact that projections
on Hilbert spaces need not have norm 1.
16. Is a norm-l projection defined on a Hilbert space necessarily an orthogonal projection?
17. Explain why point-evaluation functionals, as defined on the space era, b], cannot be
defined on any of the spaces LP[a, b].
The procedure that goes by the name of the mathematician Galerkin is one of
the projection methods, in fact, the one described at length in the preceding
section. We review the method briefly, and then discuss concrete examples of
its use.
We wish to solve an equation of the form
(1) Au= b
in which A is an operator acting on a Hilbert space U. A finite-dimensional
subspace V is chosen in U, and we let P denote the orthogonal projection of U
u
onto V. Then we find E V such that
(2) P(Au - b) = 0
If VI, V2,"" Vn is a basis for V, and if we set u= L:;=1 CjVj, then Equation (2)
leads to
n
(3) 2: Cj (AVj, Vi) = (b, Vi) (1 ~ i ~ n)
j=1
{
V'2u = 0 in n
(4)
u(x, y) = g(x, y) on an
Section 4.5 The Galerkin Method 199
The proof for v comes from observing that it is the real part of -iw.
To illustrate this, consider the function z >-+ Z2. We have
w = Z2 = (x + iy? = x 2 - y2 + 2ixy = U + iv
Thus the functions U = x 2 - y2 and v = 2xy are harmonic. (See Problem 10.)
The Dirichlet problem is frequently encountered with Poisson's Equation:
{
'\72u = f in 0
(6)
U = 9 on 00
One way of solving (6) is to solve two related but easier problems:
(7) {
'\72v ~ f on 0 {
'\72w = 0 on 0
v = 0 on 00 w = 9 on 00
Clearly, the function U = v + w will then solve (6). The problem involving w
was discussed previously. The Galerkin procedure for approximating v begins
with the selection of base functions VI, V2,... that vanish on 00. Then an
200 Chapter 4 Basic Approximate Methods in Analysis
approximate solution v is sought having the form v = ,£7=1 CjVj. The usual
Galerkin criterion is applied, so we have to solve the linear equations
n
(8) 2::>j (V' 2Vj, Vi) = (1, Vi) i = 1, ... ,n
j=1
Since V and u vanish on an, we conclude that (u, V' 2v) = (V' 2u, v). •
A remark about Equation (8) is in order. Some authors argue that the
coefficients Cj should be chosen to minimize the expression
where the Hilbert space norm is being used, corresponding to the inner product
in Equation (9). This is a problem of approximating f as well as possible by a
linear combination of the functions V' 2 Vi (1 ~ i ~ n). The solution is obtained
via the normal equations
n
L Cj V' 2Vj - f ..l V' 2Vi (1 ~ i ~ n)
j=1
Proof. As usual, we define the u-sections of B by Bu(v) = B(u, v). Then each
Bu is a continuous linear functional on V. Indeed,
By the Riesz Representation Theorem (Section 2.3, Theorem 1, page 81), there
corresponds to each u in U a unique point Au in V such that Bu(v) = (Au, v).
Elementary arguments show that A is a linear map of U into V. Thus,
In order to prove that the range of A is closed, let [vnl be a convergent sequence
in the range of A. Write Vn = Aun , and note that by the Cauchy property
0= lim
n,m~oo
lim IIAun - Aumll ~ (3 n,m
Ilvn - vmll = n,m lim Ilun - umll
(1 ~ i ~ n)
n
'2:CjB(Uj,Vi) = (Z,Vi) (1 ~ i ~ n)
j=1
Problem 14 asks for a proof that this hypothesis will guarantee the nonsingularity
of the matrix described above.
Section 4.5 The Galerkin Method 203
lb lb
by parts:
[v(pu')' - vqu] = vf
lb + lb -
[pu'v' quv] = fv
B(u, v) = lb + = -l
(pu'v' quv) (f,v)
b
fv
There is much more to be said about this problem, but here we wish to emphasize
only the formal construction of the maps that enter into Theorem 3.
Example 2. The steady-state distribution of heat in a two-dimensional domain
n is governed by Poisson's Equation:
Here, u(x,y) is the temperature at the location (x,y) in 1R2, and f is the heat-
source function. If the temperature on the boundary an is held constant, then,
with suitable units for the measurement of temperature, we may take u(x, y) =
o on an. This simple case leads to the problem of discovering u such that
B(u, v) = (f,v) for all v, where
To arrive at this form of the problem, first write the equivalent equation
for all v
l vV 2 u = In vf for all v
The integral on the left is treated by using Green's Theorem (also known as
Gauss's Theorem). (This theorem plays the role of integration by parts for
multivariate functions.) It states that
This equation holds true under mild assumptions on P, Q, n, and an. (See
[Wid].) Exploiting the hypothesis of zero boundary values, we have
Problems 4.5
1. ([Mil], page 115) Find an approximate solution of the two-point boundary-value problem
3. Invent an efficient algorithm for generating the sequences of harmonic functions [Un], [v n ],
where zn = Un + iVn and n = 0, 1, 2, ...
4. Prove that a differentiable function f : lR -t lR such that infx f'(x) > 0 is necessarily
surjective. Show by an example that the simpler condition f'(x) > 0 is not adequate.
5. A sequence VI, V2, . .. in a Banach space U is called a basis (or more exactly a Schauder
basis) if each x E U has a unique representation as a convergent series in U of the form
x = L::'=I an(x)vn . The an depend on x in a continuous and linear manner, i.e.,
an E U·. If U has a Schauder basis, then U is separable. Prove that an(v m ) =
8nm .
Prove that no loss of generality occurs in assuming that IIvnll = 1 for all n. See problems
24-26 in Section 1.6, pages 38 and 39.
6. (Continuation) Prove that for each n, the map Pn defined by the equation Pnx =
L:;=Iak(x)vk is a (bounded, linear) projection.
10. Use the computer system Mathematica to find the real and imaginary parts of zn for
n = 1 to n = 10. Version 2 (or later) of Mathematica will be necessary. The input to do
this all at once is
Au = auxx + buxy + CU yy
is Hermitian on the space of functions having continuous partial derivatives of orders 0,
1, 2 in 0 and vanishing on ao. In the definition of A, the coefficients are constants.
12. Give an elementary proof of Green's Theorem for any rectangle in ]R2 whose sides are
parallel to the coordinate axes.
13. Solve the Dirichlet problem on the unit disk in ]R2 when the boundary values are given
by the expression 8x 4 - 8y4 + 1.
14. Prove that the matrix (B( Uj, Vi)) described following Theorem 3 of this section is non-
singular if the hypothesis (b*) is fulfilled by the spaces Un and Vn .
(1) F(x) =0 (x E X)
is obtained when the coefficient vector (Cl' C2, ... ,cn ) satisfies the "normal" equa-
tions
n
(3) 2:::: cjAuj - v ..1 A(Un)
j=l
These are not the Galerkin equations (Equation (3) in Section 4.5, page 198).
The Rayleigh-Ritz method (in its classical formulation) applies to differen-
tial equations, and the functional <P is directly related to the differential equa-
tion. We illustrate with a two-point boundary-value problem, in which all the
functions are assumed to be sufficiently smooth. (They are functions of t.)
(px')' - qx = f
(5) {
x(a) =a x(b) = (3
In correspondence with this problem, a functional <P is defined by
(6)
Proof. Assume the hypotheses, and let y be any element of C 2 [a, b] such that
y(a) = y(b) = o. We use what is known as a variational argument. For each
real number ,x, x +,Xy is a competitor of x in the minimization of <P. Hence the
function ,x ~ <p(x + ,Xy) has a local minimum at ,x = O. We compute
d d Ib
d,X <p(x + ,Xy) = d,X a [(x' + 'xy'?p + (x + ,Xy)2q + 2(x + ,Xy)f] dt
Evaluating this derivative at ,x = 0 and setting the result equal to 0 yields the
necessary condition
b
I px'y' = PX'yl b - lb (px')'y =- Ib (px')'y
a a a a
Section 4.6 The Rayleigh-Ritz Method 207
Here the fact that y(a) = y(b) = 0 has been exploited. Equation (7) now reads
(The steps just described are the same as those in Example 1, page 203.) Since
y is an arbitrary element of C 2 [a, b] vanishing at a and b, we conclude from
Equation (8) that
-(px')' + qx + f = 0
The details of this last argument are as follows. Let z = -(px')' + qx + f.
Then J:z(t)y(t) dt = 0 for all functions y of the type described above. Suppose
that z 1= o. Then for some r, z(r) 1= O. For definiteness, let us assume that
z(r) == £ > o. Then there is a closed interval J c (a, b) in which z(t) ~ £/2.
There is an open interval I containing J in which z(t) > o. Let y be a C 2
J:
function that is constantly equal to 1 on J and constantly equal to 0 on the
complement of I. Then z(t)y(t) dt > o. •
Theorem 2. Assume that p(t) > 0 and q(t) ~ 0 on [a,b]. Ifx is a
function in C 2 [a, b] that solves the boundary-value problem (5) then x
is the unique local minimizer of <1> subject to the boundary conditions
as constraints.
Proof. Let z E C 2 [a,b], z 1= x, z(a) = 0:, and z(b) = (3. Then the function
y = z - x satisfies O-boundary conditions but is not O. By calculations like those
in the preceding proof,
Using integration by parts on the middle term, we find that it is zero. Then
Equation (9) shows that <1>(z) > <1>(x). •
When the calculations indic~ted here are carried out, the system of equations
to be solved emerges as
n
LaijCj = bi (1 ~ i ~ n)
j=1
208 Chapter 4 Basic Approximate Methods in Analysis
11 [p(t)u~(t)uj(t) +
where
aij = q(t)Ui(t)Uj(t)] dt
bi = -1 1
J(t)Ui(t) dt
This completes the description of the Rayleigh-Ritz method for this problem.
In order to prove theorems about the convergence of the method, some
preliminaries must be dealt with. The following lemma is formulated for an
arbitrary topological space. Refer to Chapter 7, Section 6, pages 361ff, for basic
topology.
Proof. Let p = infxEx lI!(x). (We permit p = -00.) For any r > p, the set
(11) (px')' - qx = J
(12) x(O) = x(l) = 0
11
inner-product notation
(u, v) = u(t)v(t) dt
and I 112is the accompanying quadratic norm. From Equation (9) in the proof
of Theorem 2, we have
The union of these subspaces is the space of all functions having the form
n
t t---+ t(l - t) L aktk
k=O
f; p(t) dt, so that u' = p, u(O) = 0, and lIu' - x' 1100 < c. Then lu(s) - x(s)1 =
I f;[u'(t) - x'(t)] dtl < c. Since x(l) = 0, lu(1)1 < c. Put v(t) = tu(1). Then
Iv'(t)1 = lu(I)1 < c and Iv(t)1 ~ lu(I)1 < c. Notice that u - v is a polynomial
that takes the value 0 at 0 and 1. Hence u - v contains w(t) = t(1 - t) as a
factor, and belongs to one of the spaces Un. Also,
(13)
Proof. The bilinear form B defines an inner product on X. The norm arising
from the inner product is written IlxilB
= y'B(x, x). Since B is continuous,
there is a positive constant a such that IB(x,y)1 ~ Consequently,allxllllYII.
IlxilB allxll·
~ On the other hand, from the condition of ellipticity, we have
IlxilB v11llxll·
~ Thus the two norms on X
are equivalent. Hence, (X,II·IIB)
is complete and therefore a Hilbert space. Also, K is a closed convex set in
this Hilbert space. By the Riesz Representation Theorem, ¢(x) = -2B(v,x) for
some v in X. Write
.
This shows that our'minimization problem is the standard one of finding a point
of K closest to v, in the Hilbert space setting. Theorem 2 in Section 2.1 (page
64) applies and establishes the existence of a unique point x in K solving the
~~.
Historical note. A biographical article about George Green is [Cannl], and a
book by the same author is [Cann2]. When you are next in England, you would
212 Chapter 4 Basic Approximate Methods in Analysis
enjoy visiting Nottingham and seeing the well-restored mill, of which George
Green was the proprietor. His collected works have been published in [Green].
Problems 4.6
11 u(t)x(t) dt =0
x(O) =6 x(l) =~
Suggestion: Multiply by x' and integrate, or try some likely functions containing param-
eters.
4. Let x be an element of C 2 [a, b] that minimizes the functional
subject to the constraints x(a) = (t and x'(b) = O. Find the two-point boundary-value
problem that x solves.
5. Use au elementary change of variable in the problem
{
(px')' - qx =J
x(a) =a x(b) = f3
to find an equivalent problem having homogeneous boundary conditions. Thus, yea) =
y(b) = 0 in the new variable. Is the new problem also of Sturm-Liouville form?
8. (Continuation) Use the two preceding problems to show that on any finite-dimensional
subspace in X, the infimum of <I> is attained.
9. Solve the two-point boundary-value problem
x(O) =0 x(l) =1
This is deceptively similar to Problem 3, but harder. Look for a solution of the form
x(t) = 2:::0 antn. You should find a "general" solution of the differential equation
containing two arbitrary constants ao and a1. All remaining coefficients can then be
Section 4.7 Collocation Methods 213
obtained by a recurrence relation. After imposing the condition x(O) = 0, your solution
will contain only the powers t, t 4 , t1, t lO , ..• The parameter al will be available to secure
the remaining boundary condition, x(l) = 1. Reference: [Dav].
10. Let [a, b] be a compact interval in JR, and define
Prove that if Ilx~112 ---+ 0 and if infa~t~b Ixn(t)1 ---+ 0, then IlxnlLXl ---+ O. Is the result
true when the interval is replaced by [a, oo)? Assume x~ continuous.
11. In C 1 [a, b] consider the two norms in the preceding problem. Prove that
Assume that f = og/ox. Prove that any C2-function that minimizes <l>(x) subject to the
constraints x(O) = Q and x(l) = (3 is a solution of the boundary-value problem. Show by
example that the converse is not necessarily true.
13. Prove that there exists a polynomial of degree 5 such that p(O) = p'(O) = p"(O) = p'(I) =
p"(I) = 0 and p(l) = 1.
14. (Continuation) Let a < b. Using the polynomial in the preceding problem, show that
there exists a polynomial q of degree 5 such that q( a) = q' (a) = q" ~a) = q' (b) = q" ( b) = 0
and q(b) = 1. With the help of q construct a nondecreasing C -function f such that
f(t) = 0 on (-00, a) and f(t) = 1 on (b,oo).
15. Solve the integral equation
(1) Ax= b
214 Chapter 4 Basic Approximate Methods in Analysis
n
(2) LcjAuj=b
j=l
¢i(X) = x(t i )
x" + px' + qx = f
(3) {
x(O) = x(l) = 0
As usual, it makes matters easier to select base functions that satisfy the ho-
mogeneous part of the problem. Suppose that we letuj(t) = (1 - t)t j for
j = 1,2, ... ,n. As the functionals, we use ¢i(X) = x(t i ), where the points
t 1, ... , tn can be chosen in the interval [0, 1]. For example, we can let ti = (i-l )h,
where h = 1/(n - 1). The operator A is defined by
Ax = x" + px' + qx
and by computing we find that
The matrix whose elements are (AUj)(ti) is easily written down, but it is not
instructive to do so. It will probably be an ill-conditioned matrix, because the
base functions we have chosen are not suited to numerical work. Better choices
for the base functions Ui would be the Chebyshev polynomials (suitable to the
interval in question) or a set of B-splines.
More examples of collocation techniques will be given later in this sec-
tion, but first we shall discuss the important technique of turning a two-point
Section 4.7 Collocation Methods 215
x"=j(t,X) O~t~l
(4) {
x(O)=O x(l)=O
t(l-S) O~t~s~l
(5) G(t, s) = {
s(l - t) O~s~t~l
Notice that G is defined on the unit square in the st-plane, and vanishes on the
boundary of the square. Although G is continuous, its partial derivatives have
jump discontinuities along the line s = t. Using the Green's function as kernel,
we define an integral equation
d
-
dt a
l k (t)
h(s)ds = h(k(t))k'(t)
216 Char ter 4 Basic Approximate Methods in Analysis
+ G(t,t)f(t,x(t)) + /t Gt(t,s)f(s,x(s))ds
(Fx)(t) = -1 1
G(t, s)J(s,x(s)) ds x E C[O, 1]
We shall prove that F is a contraction. We have
~ 11 k G(t,s)iu(s) - v(s)i ds
= (k/8)llu - vll oo
It follows that
IIFu - Fvll
oo ~ (k/8)llu - vll
oo
and that F is a contraction. Now apply Banach's Theorem, page 177, taking
note of the fact that C[O, 1], with the supremum norm, is complete. •
Section 4.7 Collocation Methods 217
Our existence theorem does not apply to this directly, and some changes of
variables are called for. We set
z(t) = x(t) - 3t + 7
Next we set
t = -1 + 2s y(s) = z(t)
and find that y should solve this problem:
To this problem we can apply the preceding corollary. The function f(s,r) =
2escosr satisfies a Lipschitz condition, as we see by applying the mean value
theorem:
If(s, rl) - f(s, r2) = I~~ (s,r3)1 h - r21
The derivative here is bounded as follows
Since the Lipschitz constant 2e is less than 8, the boundary-value problem (11)
has a solution y. Hence (9) has a solution x, and it is given by
xl/=f(t,X) O~t~1
(12) {
x(O) = x(1) = 0
218 Chapter 4 Basic Approximate Methods in Analysis
and solve it instead. If we discretize both problems (12) and (13) in a certain
uniform way, the two new problems will be equivalent, a result to which we now
turn our attention.
The standard discretization of the boundary value problem (12) is done by
introducing a formula for numerical differentiation, as in Section 4.1. For the
integral equation, we require a formula for numerical integration, and choose for
this purpose a simple Riemann sum. Thus the discretized problems are
{
Yi-1 - 2Yi + Yi+1 = h 2 f(t i, Yi)
(14)
Yo = Yn+1 = 0
n
(15) Yi = -h LG(ti,tj)f(tj,Yj) O~i~n+l
j=1
In both of these we have set h = 1/(n + 1) and ti = ih. Of course, Y E IRn+2.
Notice that we have used the fact that G vanishes on the boundary of the square.
Now suppose that (Yo, . .. ,Yn+l) solves the equations in (15). Then Zi = Yi
for 0 ~ i ~ n + 1. Consequently, Liy = LiZ = !(ti , y;). Since Zo = Zn+l = 0,
from (17), we have also Yo = Yn+l = O. Thus Y solves the equations in (14).
Conversely, if Y solves the equations in (14), then Liy = !(t i , Yi) = LiZ.
Since the second divided differences of Y and Z are equal, these two vectors can
differ only by an arithmetic progression. But Yn+l = Zn+l and Yo = zo, so the
vectors are in fact identical. Thus Y satisfies the equations in (15). •
Now reconsider the integral equation (13), which is equivalent to the
boundary-value problem (12). One advantage of the integral equation is that
many different numerical quadrature formulas can be applied to it. The most ac-
curate of these formulas do not employ equally spaced nodes. The idea of using
unequally spaced points in the discretized problem of (14) would not normally
be entertained, as that would only complicate matters without producing any
obvious advantage in precision. The quadrature formulas of maximal accuracy
are well known, however, and are certainly to be recommended in the numerical
solution of integral equations, in spite of their involving unequally spaced nodes.
A quadrature formula of the type needed here will have the form
(19)
Notice that this equation can be used used in a practical way in functional
iteration. We can start with any Yo in e[O, 1] and define inductively
n
Ym+l(t) = - 'L,AjG(t,sj)!(Sj,Ym(Sj))
j=l
solution of such a system may be a difficult matter, but for the moment we shall
suppose that a solution Y = (Y1,"" Yn) has been obtained. Let us use x to
denote the solution function for the integral equation (13). It is to be hoped
that !Yi - X(Si)! will be small. Here the nodes of the quadrature formula are
Sl, ... , Sn. Two functions that enter the theorem are
(21) u(t) = 11
o
G(t,s)J(s,x(s))ds- tAjG(t'Sj)J(Sj,x(Sj))
j=l
(22) v(t) = 11
o
G(t, s) ds - t AjG(t, Sj)
j=l
If the quadrature formula is a good one, these functions will be small in norm.
We continue to assume the Lipschitz inequality (8) on f.
where A = [1 - kG + IlvIloo)]-l
Proof. Let Ei = !X(Si) - Yi! and E = maXEi' Then for each i we have
As in the preceding proof, the sum in this last inequality has the upper bound
~+ Ilvlloo'
Hence we have
•
The Lipschitz condition in (8) is usually established by estimating the
partial derivative 12 == 8f(t, s)/8s and using the mean value theorem. If
1121 ~ k < 8 on the domain where 0 ~ t ~ 1 and -00 < S < 00, then we
can also use Newton's method to solve the discretized integral equation in (20).
The equations that govern the procedure can be derived in the following way.
Suppose that an approximate solution (Yl, Y2, . .. ,Yn) for system (20) is avail-
able. We seek to calculate corrections hi so that the vector (Yl + hI, ... ,Yn + h n )
will be an exact solution of (20). Thus we desire that
n
(23) Yi + hi = - L AjG(Si, Sj)f(Sj, Yj + hj)
j=1
Of course, we take just the linear terms in the Taylor expansion of the nonlinear
expression f(sj,Yj + hj) and use the resulting linear equations to solve for the
hi' These linear equations are
n
(24) Yi+hi =- LAjG(si,Sj)[f(Sj,Yj) + hjh(sj,Yj)]
j=1
in which
and
n
di = -Yi - LAjG(si,sj)f(sj,Yj)
j=1
222 Chapter 4 Basic Approximate Methods in Analysis
Equation (25) has the form (I + E)h = d. We can see that 1+ E is invertible
(nonsingular) by verifying that E 00 < 1: I I
n n
IIElloo = max L IEijl = max
• j=1
L AjG(si, sj)lh(sj, Yj)1
• j=1
(26) lb
a
x(t)w(t) dt ;::::: :ti=1 Aix(ti) x E C[a,b]
(28) 1 a
b
x(t)w(t) dt = 1Li=1
a
b n
X(ti)€i(t)W(t) dt =
n
L x(ti )
i=1
1 a
b
€i(t)W(t) dt
•
Example 2. If (t1,t2, t3) = (-1,0,+1) and [a,b] = [-1,1]' what is the quadra-
ture formula produced by the preceding method when w(t) = I? We follow the
prescription and begin with the functions €i:
The integrals J~l Ci(t) dt are ~, ~, and ~, and the quadrature formula is therefore
Let p be the unique monic polynomial in ITn that is orthogonal to ITn-l' or-
thogonality being defined by the inner product (30). Let the nodes t tn be
the zeros of p. These are known to be simple zeros and lie in (a, b), although we
l,... ,
do not stop to prove this. (See [Ch], page 111.) By Theorem 6, there is a set
of weights Ai for which the quadrature formula (26) is exact on ITn-l. We now
show that it is exact on IT 2n - l . Let x E IT2n-l. By the division algorithm, we
can write x = qp + r, where q (the quotient) and r (the remainder) belong to
ITn-l. Now write
lb = lb + lbxw qpw rw
Since p ~ ITn-l and q E ITn-l the integral J qpw is zero. Since p(t i ) = 0, we
have X(ti) = r(ti). Finally, since r E TIn-I> the quadrature formula (26) is exact
for r. Putting these facts together yields
lba
x(t)w(t)dt = lb
a
r(t)w(t)dt = tAir(ti) = tAix(t i )
i=l i=l
•
Problems 4.7
1. Refer to the proof of Theorem 3 and show that if Z is a vector in IRn+2 for which LiZ = 0
(1 ~ i ~ n), then Z is an arithmetic progression.
2. Prove that if, in Equation (28), a = -b, w is even, and the nodes are symmetrically
placed about the origin, then the formula will give correct results on when n is odd.TIn
3. Prove that if the formula (26) is exact on TI2n-I' then the nodes must be the zeros of
a polynomial orthogonal to TIn-I.
4. Let (x, y) = fl x(t)y(t) dt. Verify that the polynomial p(t) = t 3- ~t is orthogonal to
TI2. Find the Gaussian quadrature formula for this case, Le., n = 3, w(t) == 1, a = -1,
b = +1.
5. Define
Verify that this improper integral converges whenever x and y are continuous functions on
the interval [-1,1]. Accepting the fact that the Chebyshev polynomial T3(t) = 4t 3 -3t is
orthogonal to TI2' find the Gaussian quadrature formula in this case. Hint: T3(COSO) =
cos 30. Use the change of variable t = cos 0 to facilitate the work.
6. Consider this 2-point boundary value problem:
x(O) =0 x(l) =1
By using Theorem 2, show that the problem has a unique solution in the space in C[O, 1].
7. Prove that the general second-order linear differential equation
ux" + vx' + wx =f
can be put into Sturm-Liouville form, assuming that u > 0, by applying an integrating
factor exp J(v - u')/u.
{
X' = f(t,x)
x(O) =0
Prove that it is correct.
10. Prove that if x E C[O, 1] and if x satisfies the integral relation (6), in which f is continuous,
then x E C2[0, 1].
11. Prove that this two-point boundary value problem has no solution:
12. Convert the two-point boundary value problem in Problem 11 to an equivalent homoge-
neous problem on the interval [0,1], and explain why Theorem 2 and its corollary do not
apply.
13. An integral equation of the form
is called a Hammerstein equation. Show that it can be written in the form x+AFx = v,
where A and F are respectively a linear and a nonlinear operator defined by
17. Prove that if Uo = 0 and L;u = 0 for i = 1,2, ... , then U; = ia for a suitable constant a.
(Refer to the proof of Theorem 3, page 218, for definitions.)
18. Write down the fixed-point problem that is equivalent to the boundary-value problem in
Equation (11), page 217. Take one step in the iteration, starting with Yo(t) = O. Check
your answer against ours: Yl(t) = 2[e t + (1 - e)t - 1J.
19. Consider a numerical integration formula
j b
x(t)w(t) dt ~ 2: A;x(t;)
n
a i=l
Assume that w is positive and continuous on [a, bJ. Assume also that t; are n distinct
points in [a, bJ. Prove that the formula gives correct results for "most" functions in eta, bJ.
Interpret the word "most" in terms of dimension of certain subspaces.
20. Prove that the following two-point boundary-value problem has a continuous solution
and that the solution satisfies x(t) = x(l - t):
functional <I> : K --+ IR is given, where K (the domain of <I» is a subset of some
Banach space X. Usually <I> is nonlinear. Let
A goal somewhat more modest than finding the minimum point is to gen-
erate a minimizing sequence for <I>. That means a sequence Xl, X2, . .. in K
such that
(2)
Proof. Recall that lower semi continuity of <I> means that each set of the form
K)..={XEK:<I>(X):::;A}
is closed. If A > p, then K).. is nonempty. The family of closed sets {K)..
A > p} has the finite-intersection property (Le., the intersection of any finite
subcollection is nonempty). Since the space is compact,
(see [Kel]). Any point in this intersection satisfies the inequality <I>(x) :::; p. •
Section 4.8 Descent Methods 227
The preceding theorem can be proved also for a space that is only count-
ably compact. This term signifies that any countable open cover of the space
has a finite subcover. ([Kel] page 162). A consequence is that each sequence
in the space has a cluster point. Let Xn be chosen so that <I>(x n ) < p + lin.
Let x· be a cluster point of the sequence [x n ]. Then <I>(x·) ::;; p. Indeed, if
this inequality is false, then for some m, <I>(x·) > p + 11m. Since <I> is lower
semicontinuous, the set
These matters are discussed fully in Chapter 3. The linear functional <I>' (x) is
usually called the gradient of <I> at x. In the special case when X = lRn, and
x = (6,6, ... , ~n), h = (111,112, ... , 11n), it has the form
8<I>
L
n
(4) <I>'(x)h = ~(X)11i hE lRn
i=1 ~.
If <I>' (x) exists at a specific point x, then for any hEX, we have
d
(5) dt <I>(x + th)lt=o = <I>'(x)h
d
dt <I>(x + th) = <I>'(x + th)h
The left-hand side of Equation (5) is called the directional derivative of <I> at
x in the direction h. The existence of the Frechet derivative is sufficient for the
228 Chapter 4 Basic Approximate Methods in Analysis
existence of the directional derivative, but not necessary. (An example occurs
in Problem 2.) The mapping
d
h 1----+ dt <I> (x + th) It=O
is called the Gateaux derivative. (The mathematician R. Gateaux was killed
while serving as a soldier in the First World War, September 1914.)
If, among all h of norm 1 in X, there is a vector for which <I>' (x) h is a
maximum, this vector is said to point in the direction of steepest ascent. Its
negative gives the direction of steepest descent. These matters are most easily
understood when X is a Hilbert space. Suppose, then, that <I> is a functional
on a Hilbert space X and that <I>'(x) exists for some point x. By the Riesz
representation theorem for functionals on a Hilbert space, the functional <I>' (x)
is represented by a vector v in X, so that <I>'(x)h = (h, v): If Ilhll
= 1, then by
the Cauchy-Schwarz inequality,
Ilhllllvil = Ilvll
(h,v) ~ l(h,v)1 ~
We have equality here if and only if h is taken to be v/llvll.Thus the (unnor-
malized) direction of steepest ascent is v.
An iterative procedure called the method of steepest descent can now
be described. If any point x is given, the direction of steepest descent at x is
computed. Let this be v. The functional <I> is now minimized along the "ray"
consisting of points x + tv, t E JR. This is done by a familiar technique from
elementary calculus; namely, we solve for t in the equation
d
dt <I> (x + tv) = 0
d
(11) dt <I> (x + tv) = 2(Ax - b, v) + 2t(Av, v)
The minimum of <I>(x + tv) occurs when this derivative is zero. The value of t
for which this happens is
(12) t= (b-Ax,v)(Av,v)-l
When this value is substituted in Equation (10) the result is
This shows that we can cause <I>(x) to decrease by passing to the point x + tv,
except when b - Ax1-v. If b - Ax =J 0, then many directions v can be chosen for
our purpose, but if Ax = b, we cannot decrease <I>(x). •
In the problem under consideration, the directional derivative of <I> is ob-
tained by putting t = 0 in Equation (11):
It follows that the direction of steepest descent is the residual vector r = b- Ax.
(Positive scalar factors can be ignored in specifying a direction vector.) The
algorithm for steepest descent in this problem is therefore described by these
formulas:
(15)
Since the method of steepest descent is not competitive with the conjugate
direction methods on this problem, we will not go into further detail, but simply
state without proof the following theorem. See [KA], pages 606-608.
There is more to this theorem than meets the eye, .because the hypotheses on
A imply its invertibility, and consequently the equation Ax = b has a unique
solution for each b in the Hilbert space. See the lemma in Section 4.9, page 234,
for the appropriate formal result.
230 Chapter 4 Basic Approximate Methods in Analysis
How does the method of steepest descent perform on this example? We prefer
to let Mathematica do the work, and give it these inputs:
A={{1.,2.},{2.,5.}}
b={3.,1.}
Inverse [A]
%.b
The output is A-I = [~2 ~2] and the solution, x = (13, -5f. Next, we
program Mathematica to compute 10 steps of steepest descent, starting at x =
(0,0). The following input accomplishes this.
x={O. ,O.}
Do [r=b-A.xiPrint [r] iphi=-x.(r+b)iPrint[phi] i
t=(r.r)/(r.A.r)iy=x+t rix=YiPrint[x],{10}]
After 10 steps, the output is x = (5.7, -1.7) and <I> = -22.4587. Since the
solution is x· = (13, -5) and <I> = -34, the algorithm works very slowly. Of
course, with some starting points, the solution will be obtained in one step.
Such starting points are x* + SV, for any eigenvector v. Here are Mathematica
commands to compute eigenvectors of A:
A={{1.,2.},{2.,5.}}
Eigenvectors [N[A]]
If we start the steepest descent process at a remote point such as x· + 100v, the
first step (carried out numerically) gives a point very close to x·. The contours
(level sets) of <I> for this example are shown in Figure 4.1.
o
-2
-4
-6
-8
o 5 10 15 20
Figure 4.1
Section 4.8 Descent Methods 231
Problems 4.8
behave on a ray x + tv? Where is the minimum point on this ray, and what is the
minimum value of <I> on this ray? What is the direction of steepest descent? What are
the answers if A is self-adjoint?
6. A functional <I> is said to be convex if the condition 0 < A < 1 implies
<I>(AX + (1 - A)y) ~ A<I>(X) + (1 - A)<I>(y)
for any two points x and y. Is the functional x >-t (Ax-2b, x) convex when A is Hermitian
and positive definite?
7. Let A be any bounded linear operator on a Hilbert space, and let H be a positive definite
Hermitian operator. Put
<I>(x) = (b - Ax, H(b - Ax»
Discuss methods for solving Ax = b based upon the minimization of <I>. Investigate the
equivalence of the two problems, give the Gateaux derivative of <I>, and derive the formula
for steepest descent. In the latter, the method of Lagrange multipliers would be helpful.
Determine the amount by which <I>(x) decreases in each step.
8. What happens to the theory if the coefficient 2 is replaced by 1 in Equation (9)?
9. Prove that when the method of steepest descent is applied to the problem Ax = b the
minimum value of <I> is -(x, b), where x is the solution of the problem.
10. Let the method of steepest descent be applied to solve the equation Ax = b, as described
in the text. Show that
Show that if b is not in the closure of the range of A, then <I>(xn) -t -00.
15. Prove that if infllxll=l (Ax, x) = m > 0, then the method of steepest descent (described
in Equation (15» has this property:
16. In the method of steepest descent, we expect successive direction vectors to be orthog-
onal to each other. Why? Prove that this actually occurs in the example described by
Equation (15).
17. In the method of steepest descent applied to the equation Ax = b, explain how it is
possible for <I> to be bounded below on each line yet not bounded below on the whole
Hilbert space.
18. Use the definition of Gateaux derivative given on page 228 or in Problem 3.1.21 (page
120) to verify that Equation (5) gives the Gateaux derivative of <I>.
In this section we continue our study of algorithms for solving the equation
(1) Ax= b
A general descent algorithm goes as follows. At the nth step, a vector Xn is avail-
able from prior computations. By means of some strategy, a "search direction"
is determined. This is a vector V n . Then we let
(3)
Formula (3) ensures that <I>(Xn+l) will be as small as possible when Xn+l is
restricted to the ray Xn + tvn .
In this algorithm, considerable freedom is present in choosing the search
direction Vn . For example, in the method of steepest descent, Vn = b - Axn .
We shall discuss an alternative that has many advantages over steepest descent.
One advantage is that the idea of searching for a minimum value of a functional
Section 4.9 Conjugate Direction Methods 233
is abandoned, and we retain only the algorithm in Equations (3). The operator
(or matrix in the finite-dimensional case) need not be self-adjoint or positive
definite. Finally, the direction vectors vn are subject to weaker hypotheses.
First some definitions are needed. For an operator A, a sequence of vectors
VI, V2, ... in X is said to be A-orthogonal if
(4)
This new concept reduces to the familiar type of orthogonality if A is the identity
operator. The descent algorithm (3) is called a conjugate direction method
if the search directions VI, V2,'" are nonzero and form an A-orthogonal set.
A slightly stronger hypothesis is that our set of vectors Vi is A-orthonormal,
meaning that the condition (vi,Avj) = bij is fulfilled. The formula for On in
Equation (3) is then simpler.
Now assume that Equation (5) is true for a certain index n. In order to prove
Equation (5) for n + 1, let 1 ::s; i ::s; n and use Equation (6) to write
(8)
For i < n, both terms on the right side of Equation (8) are zero. For i = n the
definition of On shows that the right side is zero, as in Equation (7). •
This shows that [xnl is a Cauchy sequence. Hence Xn ---+ x for some x. By the
continuity of A, Yn = AX n ---+ Ax, and Y = Ax E R(A).
Next we observe that R(A)l. = O. Indeed, if Y E R(A)l. then for every x,
(10) (m > 0)
(11)
(12)
Section 4.9 Conjugate Direction Methods 235
(13)
This shows that the right side of Equation (12) represents the partial sum of the
Fourier series of A-I b - Xl, if we use for this expansion the inner product
These two inner products lead to the same topology on X because of Equation
(10). Hence Xn - Xl -t A-lb - Xl. •
In the conjugate direction algorithm, there is still some freedom in the choice
of the direction vectors Vi. In the conjugate gradient method, these vectors
are generated in such a way that (for each n) Xn minimizes <I> on a certain
linear variety of dimension n - 1. The conjugate gradient algorithm appears in
a number of different versions. For a theoretical analysis of the method, this
version seems to be the best:
I. To start, let Xl be arbitrary, and define VI = b - AXI.
II. Given Xn and Vn , we set
(16)
It follows that
(21)
(22) an ~ l/M
(23) (vn,rn}/en ~ m
(24)
To prove Inequality (22), we start with Equation (15), written in the form
rn = Vn + .Bn-lvn-l
Section 4.10 Methods Based on Homotopy and Continuation 237
Hence
an = (rn,vn)(vn,Aun)-l ~ 11M
At this stage, we have established that
(1) f(x) = 0
Here f can be a mapping from one Banach space to another, say f : X --+ Y.
This problem is so general that it includes systems of algebraic equations, integral
equations, differential equations, and so on. We will describe a tactic called the
238 Chapter 4 Basic Approximate Methods in Analysis
One then attempts to solve each equation h( ti, x) = 0, (0 ,,;; i ,,;; m). Assuming
that some iterative method will be used (such as Newton's method), it makes
sense to use the solution at the ith step as the starting point in computing a
solution at the (i + 1)st step. .
This whole procedure is designed to cure the difficulty that plagues Newton's
method, viz., the need for a good starting point.
The relationship (2), which embeds the original problem (1) in a family of
problems, is an example of a homotopy that connects the two functions f and
g. In general, a homotopy can be any continuous connection between f and g.
Formally, a homotopy between two functions f, g : X -+ Y is a continuous map
(3) h : [0, 1J x X -+ Y
such that h(O, x) = g(x) and h(l, x) = f(x). If such a map exists, we say that f
is homotopic to g. This is an equivalence relation among the continuous maps
from X to Y, where X and Y can be any two topological spaces.
An elementary homotopy that is often used in the continuation method is
(6) O=h(t,x(t))
(8)
This is a differential equation for x. Its initial value is known, because x(O) has
been assumed to be known. Upon integrating this differential equation across
the interval 0 ~ t ~ 1 (usually by numerical procedures), one reaches the value
x(I), which is the solution to Equation (1).
Example 1. Let X = Y = ]R2, and define
[
sin 6 + e~2 - 3 ]
f(x) = (6 + 3)2 - 6 - 4
h2 = f , (x) =
[COS 6
-1
'x
[J()]
_l_-.!..[26 1+6
-A
-e~2]
C
U COS"l
The differential equation that controls the path leading away from the point Xo is
Equation (8). In this concrete case it is a pair of ordinary differential equations:
When this system was integrated numerically on the interval 0 ~ t :::; 1, the
terminal value of x (at t = 1) was close to (12, 1). In order to find a more
240 CImpter 4 Basic Approximate Methods in Analysis
accurate solution, we can use Newton's iteration starting at the point produced
by the homotopy method. The Newton iteration replaces any approximate root
x by x - 15, the correction 15 being defined by
(These matters are the subject of Section 3.3, beginning at page 125.) In the
current example, the vector 15 is
6 6
k=O 12.000000000000000000 1.0000000000000000000
k=1 12.691334908752890571 1.0864168635941113213
k=2 12.628177397290770959 1.0777753827891591357
k=3 12.628268254380085321 1.0777773669468545670
k=4 12.628268254564651450 1.0777773669690025700
k=5 12.628268254564651450 1.0777773669690025700
1. Sl----/------+-----+------+----+-l/
6 8 10 12
Figure 4.2
In an example such as this one, the differential equation need not be solved
numerically with high precision, because the objective is to end at a point near
the solution-in fact, near enough so that the classical Newton method will
succeed if started at that point.
A formal result that gives some conditions under which the homotopy
method will succeed is as follows. This result is from [OR].
Another way of describing the path x(t) has been given by Garcia and Zangwill
[GZ]. We start with the equation h(t,x) = 0, assuming now that x E lR,n and
t E [0,1]. A vector y E lR,n+l is defined by
where 6,6, . .. are the components of x. Thus our equation is simply h(y) =
O. Each component of y, including t, is now allowed to be a function of an
independent variable s, and we write h(y( s)) = O. Differentiation with respect
to s leads to the basic differential equation
(9) h'(y)y'(s) = 0
The variables sand t start at O. The initial value of x is x(O) = Xo. Thus
suitable starting values are available for the differential equation (9).
Since f and 9 are maps of lR,n into lR,n, h is a map of lR,n+l into lR,n. The
Fnkhet derivative h'(y) is therefore represented by an n x (n + 1) matrix, A.
The vector y'(s) has n + 1 components, which we denote by 1J~, 1J~, .. . , 1J~+1' By
appealing to the lemma below, we can obtain another form for Equation (9),
namely
(10) (1 ~ j ~ n + 1)
where Aj is the n x n matrix that results from A by deleting its jth column.
Let us illustrate this formalism with a problem similar to the one in Example 1.
Example 2. Let f be the mapping
x _ [~r 3~i +
f()- 66+6
- 3]
We take the starting point Xo = (1,1) and use the homotopy of Equation (4).
Then
h(t,x) =
- 3~~ + 2 + t]
[~r66 -1 +7t
The differential equation (9) is given by
(11)
The derivatives in this system are with respect to s. Since we want t to run from
o to 1, it is clear (from the equation governing t) that we must let s proceed
to the left. Alternatively, we can appeal to the homogeneity in the system, and
simply change the signs on the right side of (12). Following the latter course,
and performing a numerical integration, we arrive at these two points:
s = .087 , t = .969, ~1 = -2.94 , 6 = 1.97
s = .088 , t = 1.010 , 6 = -3.02, 6 = 2.01
Either of these can be used to start a Newton iteration, as was done in Example
1. The path generated by this homotopy is shown in Figure 4.3. •
2.2
1.8
1.6
1.4
1.2
-3 -2 -1
Figure 4.3
A drawback to the method used in Example 2 is that one has no a priori knowl-
edge of the value of s corresponding to t = 1. In practice, this may necessitate
several computer runs.
Proof. Select any row (for example the ith row) in A and adjoin a copy of it
as a new row at the top of A. This creates an (n + 1) x (n + 1) matrix B that
is obviously singular, because row i of A occurs twice in B. In expanding the
determinant of B by the elements in its top row we obtain
n+1 n+1
0= detB = ~)-I)jaij det(A j ) = L aijXj
j=1 j=1
In this equation t will run from 0 to 00. We seek a curve or path, x = x(t), on
which
0= h(t,x(t)) = f(x(t)) - e-tf(xo)
Section 4.10 Methods Based on Homotopy and Continuation 243
This is, of course, the formula for Newton's method. It is clear that one can
expect to obtain better results by solving the differential equation (14) with
a more accurate numerical method (incorporating a variable step size). These
matters have been thoroughly explored by Smale and others. See, for example,
[Sm].
Application to Linear Programming. The homotopy method can be
used to solve linear programming problems. This approach leads naturally to
the algorithm proposed in 1984 by Karmarkar [Kar]. In explaining the homotopy
method in this context, we follow closely the description in [BroS].
Consider the standard linear programming problem
maximize cT x
(15) {
subject to Ax = b and x ;;:: 0
Here, c E ]Rn, x E ]Rn, b E ]Rm, and A is an m x n matrix. We start with a
feasible point, i.e., a point X O that satisfies the constraints. The feasible set
is
F = { x E]Rn : Ax = b and x;;:: 0 }
Our intention is to move from xO to a succession of other points, remaining
always in F, and increasing the value of the objective function, cT x. It is
clear that if we move from XO to Xl, the difference Xl - x O must lie in the null
space of A. We shall try to find a curve t H x(t) in the feasible set, starting at
X O and leading to a solution of the extremal problem. Our requirements are
D(x) = [
Xl X2 . 0]
o Xn
244 Chapter 4 Basic Approximate Methods in Analysis
If this is the case, then from Equations (15) and (17) we shall have
x; = XiCi(X)
and clearly x~ -+ 0 if Xi -+ O.
In order to satisfy requirement (ii), it suffices to require Ax' = O. Indeed,
if Ax' = 0 then Ax(t) is constant as a function of t. Since Ax(O) = b, we have
Ax(t) = b for all t. Since x' = F = DC, we must require ADC = O. This is
most conveniently arranged by letting C = PH, where H is any function, and
P is the orthogonal projection onto the null space of AD.
Finally, in order to secure property (iii), we should select H so that cT x(t)
is increasing. Thus, we want
d
0< dt (c T x(t)) = cT x' = cT F(x) = cT DC = cT DPH
A convenient choice for H is Dc, for then we have, (using v = Dc),
{x: x~O}
Pv = v- BTz
As mentioned earlier, the initial-value problem (18) need not be solved very
accurately. A variation of the Euler Method can be used. Recall that the Euler
Method for Equation (16) advances the solution by
Problems 4.10
x - 2y + y2 + y3 - 4 = -x - y + 2y2 - 1 = 0
by the homotopy method used in Example 2, starting with the point (0,0). (All the
calculations can be performed without recourse to numerical methods.)
2. Consider the homotopy h(t,x) = tf(x) + (1- t)g(x), in which
f(x) = x2 - 5x + 6 g(x)=x 2 -1
f(t) =0 g(t) = 2
Distributions
246
Section 5.1 Definition and Examples 247
i=l
If Q is a multi-index, there is a partial differential operator D'" corresponding to
it. Its definition is
a) "'1 ( a ) "'2 (a) "'n
( aXl al"'l
D'" = aX2 ... aXn = ax~l ... ax~n
This operates on functions of n real variables Xl, ... , Xn. Thus, for example, if
n = 3 and Q = (3,0,4), then
'"
D ¢= aXla3a¢x34
7
The space COO(JR n ) consists of all functions ¢ : JRn -+ JR such that D"'¢ E
C(JR n ) for each multi-index Q. Thus, the partial derivatives of ¢ of all orders
exist and are continuous.
A vector space 1>, called the space of test functions, is now introduced.
Its elements are all the functions in COO(JR n ) having compact support. The
support of a function ¢ is the closure of {x : ¢( x) i- O}. Another notation for
1> is C~(JRn). The value of n is usually fixed in our discussion. If we want to
show n in the notation, we can write 1>(JRn ).
At first glance, it may seem that 1> is empty! After all, an analytic function
that vanishes on an open nonempty set must be 0 everywhere. But that is a
theorem about complex-valued functions of complex variables, whereas we are
here considering real-valued functions of real variables.
An important example of a function in 1> is given by the formula
c. exp(lxl2 - 1)-1 if X E JRn and Ixi < 1
(1) p(x) = {
o if X E JRn and Ixi ~ 1
where c is chosen so that J p(x) dx = 1. Here and elsewhere we use Ixl for the
Euclidean norm:
Ixl~ (t,xlf
The graph of p in the case n = 1 is shown in Figure 5.1 .
.8
0.6
0.4
0.2
-1 -0.5 0.5
The fact that p E D is not at all obvious, and the next two lemmas are inserted
solely to establish this fact.
248 Chapter 5 Distributions
where Q(x) = x 2 [P(x) - P'(x)]. By the first part of the proof, lim !,(x) = O. It
x.l.O
remains only to be proved that f'(O) = o. We have, by the mean value theorem,
where ~(h) is strictly between 0 and h. (Note that h can be positive or negative
in this argument.) We have shown that
Q(l/x)e-l/X x> 0
!,(x) = {
o x:!(O
This has the same form as f, and therefore f' is continuous. The argument can
be repeated indefinitely. The reader should observe that our argument requires
the following version of the mean value theorem: If 9 is continuous on [a, b] and
differentiable on (a, b), then for some ~ in (a, b)
Proof. The function f in the preceding lemma (with P(x) = 1) has the prop-
erty that p(x) = cf(1-lxI 2 ). Thusp = cfog, where g(x) = 1-lx1 2 and belongs
to coo(JRn). By the chain rule, Do.p can be expressed as a sum of products of
ordinary derivatives of f with various partial derivatives of g. Since these are
all continuous, DO. p E C(JRn) for all multi-indices Q. •
(2)
playa role in certain arguments, such as in Sections 5.5 and 6.8. They, too, are
mollifiers.
The linear space 1) is now furnished with a notion of sequential convergence.
A sequence [cPj] in 1) converges to 0 ifthere is a single compact set K containing
the supports of all cPj, and if for each multi-index 0:,
We write cPj --# 0 if these two conditions are fulfilled. Further, we write cPj --# cP
if and only if cPj - cP --» o. The use of the symbol --» is to remind the reader of
the special nature of convergence in 1). Uniform convergence to 0 on K of the
sequence DC<cPj means that
sup I(DC<cPj)(x)l-+ 0 as j -+ 00
xEK
sup I(DC<cPj)(x)l-+ 0
xElRn
Continuity and other topological notions will be based upon the convergence
of sequences as just defined. In particular, a map F from 1) into a topological
space is continuous if the condition cPj -+> cP implies the condition F( cPj) -+
F(cP). The legitimacy of defining topological notions by means of sequential
convergence is a matter that would require an excursus into the theory of locally
convex linear topological spaces. We refer the reader to [Rul] for these matters.
The next result gives an example of this type of continuity.
(3) (¢ E 1»
(4) ii(¢) = 1 00
¢(x) dx (¢ E 1»
•
Example 3. Let f : lRn -+ lR be continuous. With f we associate a distribution
1by means of the definition
(5) i<¢) = J f(x)¢(x) dx (¢ E 1»
1
The linearity of is obvious. For the continuity, we observe that if ¢j --+> 0, then
there is a compact K containing the supports of the ¢j. Then we have
(¢ E 1»
1 if x ~ 0
H(x) = {
o if x <0
then Example 2 above illustrates the principle in Example 3, although H is
obviously not continuous. •
T(¢) = ( (¢ 1»)
iR,n ¢(x) dJl(x) E
Consequently,
Problems 5.1
1. Describe the null space of DO in the case n = 2. Do this first when the domain is coc(JRn)
and second when it is 1>.
2. Let I : JR --t JR. Suppose that I' exists and is continuous in the two intervals
(-00,0) , (0,00). Assume further that limJ'(x) = limJ'(x). Does it follow that f'
x.j.O xtO
is continuous on JR? Examples and theorems are wanted.
°
3. Prove that for each xo E JRn and for each r > there is an element </J of 1> such that the
set {x: </J(x) ¥ o} is the open ball B(xo, r) having center xo and radius r.
4. Prove that if 0 is any bounded open set in JRn, then there exists an element </J of 1> such
that {x: </J(x) ¥ o} = O. Hints: Use the functions in Problem 3. Maybe a series of such
functions)' 2- k</Jk will be useful. Don't forget that the points of JRn whose coordinates
are ration;tlform a dense set.
5. For each v E JRn there is a translation operator Evon 1>. Its definition is (Ev</J)(x) =
</J(x - v). Prove that Ev is linear, continuous, injective, surjective, and invertible from
1> to 1>.
6. For each </J in coo(JRn) there is a mUltiplication operator M</> defined on 1> by the
equation M</>1/; = </J'I/J. Prove that M</> is linear and continuous from 1> into 1>. Under
what conditions will M</> be injective? surjective? invertible?
7. For suitable </J there is a composition operator C</> defined on 1> by the equation
C</>1/; = 1/; 0 </J. What must be assumed about </J in order that C</> map 1> into 1>? Prove
that C</> is linear and continuous from 1> to:D. Find conditions for C</> to be injective,
surjective, or invertible.
8. Prove that E v , as defined in Problem 5, has this property for all test functions </J and 1/;:
9. Prove that if T is a distribution and if A is a continuous linear map of 1> into 1>, then
To A is a distribution. Use the notation of the preceding problems and identify 8~ 0 E v ,
8~ 0 M</>, and 8~ 0 C</> in elementary terms. What is (8~ 0 Ev 0 M8 0 C",)(</J)?
10. Show that DO Di3 = Do+i3, and that consequently DO Di3 = Di3 DO.
11. Let </J E 1>. Prove that if there exists a multi-index a for which Det</J = 0, then </J = 0.
°
Suggestion: Do the cases lal = and lal = 1 first. Proceed by induction on lal.
12. Prove (in detail) that each test function is uniformly continuous.
13. Prove that 1> is a ring without unit under pointwise multiplication. Prove that 1> is an
ideal in the ring coc(JRn). This means that I</J E 1> when I E coo and </J E 1>.
14. For </J E 1>(JR), define T(</J) = L:;;"=o(Dk</J)(k). Prove that T is a distribution.
°for
°
15. Give an example of a sequence [</JjJ in 1> such that [Det</JjJ converges uniformly to
each multi-index a, yet [</JjJ does not converge to in the topology of 1>.
16. Show that supp( </J) is not always the same as {x : </J( x) ¥ o}. Which of these sets contains
the other? When are these sets identical?
Section 5.2 Derivatives of Distributions 253
17. A distribution T.is said to be of order 0 if there is a constant C such that IT(¢)I ~
CIi¢li oo (for all test functions ¢). Which regular distributions are of order O?
18. Prove that the Dirac distributions in Example 1 are not regular.
19. Give a rough estimate of c in Equation 1. (Start with n = 1.)
20. Show that the notion of convergence in 1) is consistent with the linear structure in 1).
We have seen that the space 1)' of distributions is very large; it contains (im-
ages of) all continuous functions on IR n and even all locally integrable functions.
Then, too, it contains functionals on 1) that are not readily identified with func-
tions. Such, for example, is the Dirac distribution, which is a "point-evaluation"
functional. We now will define derivatives of distributions, taking care that
the new notion of derivative will coincide with the classical one when both are
meaningful.
Definition. If T is a distribution and 0: is a multi-index, then ao.T is the
distribution defined by
(1)
Notice that it is a little simpler to write ao.T = T 0 ( - D)o.. The first
question is whether ao.T is a distribution. Its linearity is clear, since T and DO.
are linear. Its continuity follows by the same reasoning. (Here Theorem 1 from
the preceding section is needed.)
The next question is whether this new definition is consistent with the old.
Let f be a function on IR n such that DO. f exists and is continuous whenever
f
10:1 ~ k. Then is a distribution, and when 10:1 ~ k,
(2)
To verify this, we write (for any test function </»
(3)
(DO. f)~(</» = J (DO. f)</> = (_1)10. 1 J f DO.</> = (_1)10. 1 f(Do.</»
= (ao. f)(</»
In this calculation integration by parts was used repeatedly. Here is how a single
integration by parts works:
1 ax,
00
-00
af
- . </> dXi = f<I>\
00
-00
-
1 00
-00
a</>
f -.
ax, dXi
Since </> E 1), </> vanishes outside some compact set, and the first term on the
right-hand side of the equation is zero. Each application of integration by parts
transfers one derivative from f to </> and changes the sign of the integral. The
number of these steps is 10:1 = I:~=l O:i·
Now, it can happen that ao. f i- (DO. f)~ for a function f that does not have
continuous partial derivatives. For an example, the reader should consult [Rul],
page 144.
254 Chapter 5 Distributions
Theorem 1. The operators [)C> are linear from 1>' into 1>'. Further-
more, ()O'a = a ao. = ao.+ f3 for any pair of multi-indices.
f3 f3
Proof. The linearity of ao. is obvious from the definition, Equation (1). The
commutative property rests upon a theorem of classical calculus that states that
lor any f
C ·
unctIOn f 0 f two varia a2 f an d ayax
. bles, I·f axay a2 f eXIst
. an d are contmuous,
.
then they are equal. Therefore, for any ¢ E 1>, we have DO. Df3¢ = Df3 Do.¢.
Consequently, for an arbitrary distribution T we have
Proof. Prior to beginning the proof we define some linear maps. Let 1 be the
1:
distribution defined by the constant 1:
1(¢) = ¢(x) dx (¢ E 1»
1:
A¢ = ¢ -1(¢)'IjJ (¢ E 1»
Here we used the elementary facts that 1(¢') = 0 and that B¢' = ¢. •
256 Chapter 5 Distributions
Proof. Adopt the notation of the preceding proof. The familiar equation
¢(x) = dx
d jX -00 ¢(y)dy
says that ¢ = DB¢, and this is valid for all ¢ E M. Since A¢ E M for all ¢ E 1),
we have A¢ = DBA¢ for all ¢ E 1). Consequently, if aT = 0, then for all test
functions
T(¢) = T(A¢ + I(¢)¢) = T(DBA¢) + I(¢)T(¢)
= -(8T)(BA¢) + T(¢)I(¢)
= T(¢)I(¢)
Thus T = c, with c = T(¢).
We state without proof a generalization of Theorem 2.
•
Theorem 4. If T is a distribution and K is a compact set in JR n ,
then there exists an f E C(JRn) and a multi-index 0: such that for all
¢ E 1) whose supports are in K,
2. Let 8 and ii be the Dirac and Heaviside distributions. What are an8 and an ii?
3. Find all the distributions T for which a"T = 0 whenever 101 = 1.
4. Use notation introduced in the proof of Theorem 2. Prove that loA = 0, that DBA = A,
and that A 0 D = D. Prove that BoA is continuous on 1).
5. The characteristic function of a set A is the function XA defined by
X A(S) = { 1 if sEA
o if s It A
If A = (a, b) C JR, what is the distributional derivative of X A?
6. For what functions f on JR is the equation a1 = l' true?
7. Let n = 1. Prove that D : 1) ~ 1) is injective. Prove that a : 1)1 ~ 1)1 is not injective.
8. Work in 1)(JRn). Let a be a multi-index such that 101 = 1. Prove or disprove that
D" : 1) ~ 1) is injective. Prove or disprove that a" : 1)1 ~ 1)1 is injective.
9. Let n = 1. Is every test function the derivative of a test function?
Section 5.3 Convergence of Distributions 257
10. Let f E GOC(JR) and let H be the Heaviside function. Compute the distributional deriva-
tives of fH. Show by induction that am(fH) = HD m f + 2::;:'=-01D k f(0)a m - k - 1 8.
11. Prove that the hyperplane M defined in the proof of Theorem 2 is the range of the
d
operator dx when the latter is interpreted as acting from 1)(JR1 ) into 1)(JR1).
12. If two locally integrable functions are the same except on a set of measure 0, then the
corresponding distributions are the same. If H is the Heaviside function, then H'(x) = 0
except on a set of measure O. Therefore, the distributional derivative of H should be O.
Explain the fallacy in this argument.
13. Find the distributional derivative of this function:
COSx x>O
f(x) = {
sinx x ~0
The only real issue is whether T is continuous, and the proof of this requires
some topological vector space theory beyond the scope of this chapter.
•
The previous theorem and its corollaries stand in sharp contrast to the
situation that prevails for classical derivatives and functions. Thus one can
construct a pointwise convergent sequence of continuous functions whose limit
is discontinuous. For example, consider the functions fk shown in Figure 5.2.
1
k
Figure 5.2
Section 5.3 Convergence of Distributions 259
d d
- L!k=L - fk
dx dx
f(x) = LTkcos3kx
k=I
This function is continuous but not differentiable at any point! (This example
is treated in [Ti2] and [Ch]. See also Section 7.8 in this book, pages 374ff, where
some graphics are displayed.)
Example. Let fn(x) = cosnx. This sequence of f~ctions does not converge.
Is the same true for the accompanying distributions fn? To answer this, we take
any test function ¢> and contemplate the effect of fn on it:
Here the interval [a, b] is chosen to contain the support of ¢>. For large values
of n the Coo function is being integrated with the highly oscillatory function
fn. This produces very small values because of a cancellation of positive areas
and negative areas. The limit will be zero, and hence fn -+ O. This conclusion
can also be justified by writing in = 9~, where 9neX) = sinnx/n. We see that
gn -+ 0 uniformly, and that the equations !in -+ 0 and Tn -+ 0 follow, in 1)' .
•
Theorem 3. Let i, iI, 12, .. · belong to Lloc(JR n), and suppose
that fj -+ f pointwise almost everywhere. If there is an element
9 E Lloc(JR n) such that Ifjl ~ g, then 1J -+ Tin 1)'.
(1)
for all test functions ¢>. We have h¢> E LI(K) if K is the support of ¢>. Further-
more, Ih¢>1 ~ gi¢>l and (fj¢»(x) -+ (f¢»(x) almost everywhere. Hence by the
Lebesgue Dominated Convergence Theorem (Section 8.6, page 406), Equation
(1) is valid. •
260 Chapter 5 Distributions
~ 1 Ixl<r
Ifj'I/Jl + 1
Ixl:;;,r
1/j'I/J1
IlimJ;(¢»-8(¢»1 ~E
J
the difference between (f')~ and (f~)'. The prime symbol has different meanings in
different contexts.
Before getting to the main topic of this section, let us record some results from
multivariate algebra and calculus.
Recall the definition of the classical binomial coefficients:
if 0 ~ k ~ m
(1)
otherwise
These are the coefficients that make the Binomial Theorem true:
(2)
(l~i~n)
If (3 ~ 0:, then 0: - (3 is the multi-index whose components are O:i - (3i' Finally,
if (3 ~ 0:, we define
The function x >-7 xO: is a monomial. For n = 3, here are seven typical mono-
mials:
1
These are the building blocks for polynomials. The degree of a monomial xO:
is defined to be 10:1. Thus, in the examples, the degrees are 10, 9, 3, 0, 1, 1, and
1. A polynomial in n variables is a function
262 Chapter 5 Distributions
in which the sum is finite, the Co are real numbers, and a E Z+'. The degree of
pis
max{lal : Co =I o}
If all Co are 0, then p(x) = 0, and we assign the degree -00 in this case. A
polynomial of degree 0 is a constant function. Here are some examples, again
with n = 3:
PI (x) = 3 + 2XI - 7X~X3 + 2XIX~X~
P2(X) = v2XIX3 - 7l"X~X~X3
These have degrees 12 and 9, respectively.
The completely general polynomial of degree at most k in n variables can
be written as
X f-t L
coxo
lol~k
= IT ax. i = -::--"o-;-:-"""'o"-;:----=-
n aOi alol
DO -0-
ax lax 2 .•. axon
i=l' 1 2 n
A further definition is
n
a! = al!a2! ... an! = II ail
i=l
if 0 ~ f3 ~ a
otherwise
Proof·
Section 5.4 Multiplication of Distributions by Functions 263
-- '"'
L.. ... '"'
"I
L.. II ai Xif3i Yi"i- f3 i --
"n n ( ) 'L..
"' IIn ( ai ) IIn Xjf3j Yj"rf3j
f3l =0 f3n=O i=l (3i 0';;;f3';;;" i=l (3i j=l
•
We will usually abbreviate the inner product (x, y) of two vectors x, Y E JRn
by the simpler notation xy.
-_ '"'
L..
m ()
m. '"'
L.. L.,,x"m-j
x n+l
j=O ) l"l=j a.
Jk = {a E J : al = k}
Note that as a runs over Jk, the multi-indices (a2, . .. , an) are all distinct. By
the induction hypothesis, we then infer that for all a E Jk, Cu = O. Since k runs
from 0 to m, all Cu are O. •
We want to calculate the dimension of lIm(JR n). The following lemma is
needed before this can be done. Its proof is left as a problem.
#{aEZ~:lal~m}= ( m+n)
n
#{a E Z+: a ~ m} = m + 1 = ( m+
1 1)
Section 5.4 Multiplication of Distributions by Functions 265
Assume that the formula is correct for a particular n. For the next case we write
= f (k:n) (m::t 1)
k=O
=
(6)
266 Chapter 5 Distributions
(7)
•
Since 1)' is a vector space, a distribution can be be multiplied by a constant
to produce another distribution. Multiplication of a distribution T by a function
f E Coo (JR n ) can also be defined:
U· T)(</J) = TU</J)·
aUT) = D f . T +f . 8T
Section 5.4 Multiplication of Distributions by Functions 267
Proof·
h(x) = l x
f(y) dy.
a(T - h) = aT - i = 0
we conclude that T - h~ for some constant c. (See Theorem 3 in Section 5.2,
page 256.) Hence T = h + c.
If u is not zero, let v = exp J u dx. Then v' = vu and v E Coo (JR). Then vT
is well-defined, and by Theorem 4,
Proof. The right side is just the limit of Riemann sums for the integral. In
the case n = 2, we set up a lattice of points in JR 2 . These points are of the form
(ih,jh) = h(i,j) = ha, where a runs over the set of all multi-integers, having
positive or negative entries. Each square created by four adjacent lattice points
has area h 2 • •
Problems 5.4
1. Prove that if vE coo(JRn) and if f E Ltoc(JRn ), then ;;J = vi
2. For integers n and m, (,~) = (n.:'m)' Is a similar result true for multi-indices?
3. Prove that if T j -t T in 1)1 and if f E Coo (JR n ), then fTj -t fT.
4. Let 8 be a test function such that 8(0) f. O. Prove that every test function is the sum of
a multiple of 8 and a test function that vanishes at O.
5. (Continuation) Prove that if f E Ck(JR) and f(O) = 0, then f(x)/x, when defined appro-
priately at 0, is in Ck-1(JR).
268 Chapter 5 Distributions
6. (Continuation) Let n = 1 and put 'Ij;(x) = x. Prove that a distribution T that satisfies
'lj;T = 0 must be a scalar multiple of the Dirac distribution.
7. For fixed 'Ij; in COO(JR n ) there is a multiplication operator M", defined on 1)1 by the
equation M",T = 'lj;T. Prove that M", is linear and continuous.
8. Prove that the product of a COO-function and a regular distribution is a regular distri-
bution.
9. Prove that if hi = f and h E C 1 (JR), then ot" = J
10. Let </> E Coo(JR) and let H be the Heaviside function. Compute o(</>H).
11. Prove the Leibniz formula for the product of a COO-function and a distribution.
12. Define addition of multi-indices a and {3 by the formula (0 + (3)i = 0i + {3i for 1 :::;; i :::;; n.
If {3 :::;; 0, we can define subtraction of {3 from a by (a - (3)i = 0i - {3i' Define also
a! = 01!02!'" on!. Prove that
a a!
if {3:::;; a
({3) = (3!(0 - (3)!
13. (Continuation) Express (1 + Ixl 2)m as a linear combination of "monomials" x"'. Here
x E JRn, mEN, a E Nn.
14. Prove that for any multi-index 0,
2
3 3
4 6 4
Each entry in Pascal's triangle (other than the l's) is the sum of the two elements
appearing above it to the right and left. Prove that this statement is correct, Le., that
5.5 Convolutions
(1) (f * ¢)(x) = r
illt n
f(y)¢(x - y) dy
The integral will certainly exist if ¢ E 1> and if f E Lfoc (]Rn), because for each
x, the integration takes place over a compact subset of ]Rn. With a change of
variable in the integral, y = x - z, one proves that
(f * ¢)(x) = r
illtn
f(x - z)¢(z) dz = (¢ * f)(x)
In taking the convolution of two functions, one can expect that some favor-
able properties of one factor will be inherited by the convolution function. This
vague concept will be illustrated now in several ways. Suppose that f is merely
integrable, while ¢ is a test function. In Equation (1), suppose that n = 1, and
that we wish to differentiate f * ¢ (with respect to x, of course). On the right
1:
side of the equation, x appears only in the function ¢, and consequently
(f * ¢)'(x) = f(y)¢'(x - y) dy
Since ¢(x) vanishes outside the unit ball in ]Rn, ¢j(x) vanishes outside the ball
of radius 1/j, as is easily verified. Hence in the equation above the only values
of z that have any effect are those for which Izl
< 1/j. If f is uniformly
continuous, the calculation shows that f * ¢j(x) is close to f(x), and we have
therefore approximated f by the smooth function f * ¢. Variations on this idea
will appear from time to time.
270 Chapter 5 Distributions
(5)
Exf(¢) = JExf· ¢ = J f(y - x)¢(y) dy = J f(z)¢(z + x) dz
= j(E- x¢)
Proof. By linearity (see Problem 3), it suffices to consider the case when ¢ = o.
- # 0 in 2>, then for all x,
If ¢j
(T * ¢j)(x) = T(ExB¢j) -+ 0
by the continuity of B, Ex, and T (Problem 8).
•
Lemma 3. Let [Xj] be a sequence of points in JRn converging to x.
For each ¢ E 2>,
(7)
Proof. If K I = {x, Xl, X2, .•. } and if K 2 is the support of ¢, then (as is easily
verified) the supports of E xj ¢ are contained in the compact set
Proof. Since It I < 1, there is a single compact set K containing the supports of
Ft ¢ and 88¢ . By the mean value theorem (used twice) we have (for 0 < 0,0' < 1)
Xl
= I~(x) - 8¢ (X - Ote) I
8XI 8XI
The norm used here is the supremum norm on K. Our inequality shows that
as t ---+ 0, (Ft¢)(x) ---+ (:~) (x) uniformly in X on K. Since ¢ can be any test
function, we can apply our conclusion to DO¢, inferring that FtDO¢ converges
uniformly to -88 DO¢ on K. Since DO commutes with Ft (Problem 9) and
Xl
with other derivatives, we conclude that DO Ft ¢ converges uniformly on K to
DO 88¢. This proves that the convergence of Ft ¢ is in accordance with the
Xl
notion of convergence adopted in 'D. •
(9)
Proof. From Equations (3) and (2) we infer that
Hence
= lim(Ft¢)(y)
t~O
D(T * ¢) = T * D¢
By iteration of this basic result we obtain, for any multi-index Q,
•
Corollary. JfT E 1)' and ¢ E 1), then T * ¢ E coo(JRn).
Proof. We have to prove that DO(T*¢) E C(JR n) for all multi-indices Q. Put
'¢ = DO¢. Then by the theorem,
5. Fixing a distribution T, define the convolution operator GT by GT¢> = T * ¢>. Show that
tixGT = T ExB.
6. Prove that the vector sum of two compact sets in ]Rn is compact. Show by example that
the vector sum of two closed sets need not be closed. Show that the vector sum of a
compact set and a closed set is closed.
7. For 27r-periodic functions, define (f * g) (x) = J02" f(y)g(x - y) dy. Compute the convo-
lution of f(x) = sin x and g(x) = cosx.
8. Prove that B and Ex are continuous linear maps of 1) into 1). Are they injective? Are
they surjective?
9. Prove that DQEx = ExDQ.
10. What is ti * ¢>?
11. Which of these equations is (or are) valid?
(a) B(Ex(¢>(Y))) = ¢>(x - y)
(b) (B(Ex¢»)(Y) = ¢>(x - y)
(c) Ex(B(¢>(y))) = ¢>(-x - y)
(d) Ex(B(¢>(y))) = ¢>(x - y)
The constants CCi may be complex numbers. Clearly, A can be applied to any
function in cm(lRn).
Example 1. What are the fundamental solutions of the operator D in the case
n = I? (D = ~).
dx We seek all the distributions T that satisfy aT
_ = 8. We saw
in Example 1 of Section 5.2 (page 254) that aH = 8, where H is the Heaviside
function. Thus Ii is one of the fundamental solutions. Since the distributions
sought are exactly those for which aT = aIi, we see by Theorem 3 in Section
5.2 (page 256) that T = Ii + c for some constant c.
For the proof of this basic theorem, consult [Ho] page 189, or [Ru1] page 195.
The next theorem reveals the importance of fundamental solutions in the study
of partial differential equations.
274 Chapter 5 Distributions
Proof. Let A = L,caDa. Then L,ca8 aT = 8. The basic formula (the theo-
rem of Section 5, page 271) states that
D a (T*</»=8 a T*</>
From this we conclude that
8( v . T) = Dv . T + v . ar = av . T + v . (8 - aT) =v .8 =8
Consequently, by Example 1,
1 -
v . T = if + c and T = -(H + C)
v
Thus T is a regular distribution f, and since c is arbitrary, we use c = 0, arriving
at
f(x) = e- ax H(x)
1:
A solution to the differential equation is then given by
= 100
e-ay</>(x - y) dy
Section 5.6 Differential Operators 275
Lemma 1. For x
a
=f. 0, -a Ixi = Xjlxl
-1
.
Xj
Proof·
Lemma 2.
Proof·
For reasons that become clear later, we require a function 9 (not a con-
stant) such that .6.g(lxl) = 0 throughout ]Rn, with the exception of the singular
point x = O. By Lemma 3, we see that 9 must satisfy the following differential
equation, in which the notation r = Ixl has been introduced:
n-1
g"(r) +-r
- g'(r) = 0
(1)
(2)
For sufficiently small £, the support of </> will be contained in {x : Ixl ~ £-1}.
The integral in (2) can be over the set
Section 5.6 Differential Operators 277
An appeal will be made to Green's Second Identity, which states that for regions
n satisfying certain mildhypotheses,
(3)
The boundary of A€ is the union of two spheres whose radii are c and c 1 . On
the outer boundary, </> = ~ </> = 0 because the support of </> is interior to A€. The
following computation will also be needed:
Hence when c --+ 0, this term approaches O. The symbol an represents the "area"
of the unit sphere in IRn. As for the other term,
IJ1xl=€
f [</>(x) - </>(0)]~lxI2-n . NI dS::;; (n - 2) f IxI1-nl</>(x) - </>(0)1 dS
J1xl=€
In this calculation, w(c) is the maximum of I</>(x) - </>(0)1 on the sphere defined
by Ixl = c. Obviously, w(c) --+ 0, because </> is continuous. Thus the integral in
Equation (3) is
Hence this is the value of the integral in Equation (1). We have established,
therefore, that ill = (2 - n)a n t5. Summarizing, we have the following result.
278 Chapter 5 Distributions
= i:
(A1)(¢) = 1(¢" - 2a¢' + b¢)
Guided by previous examples, we guess that / should have as its support the
interval [0, 00). The integral above then is restricted to the same interval. Using
integration by parts, we obtain
The easiest way to make this last expression simplify to ¢(O) is to define / on
[0,00) in such a way that
This is an initial-value problem, which can be solved by writing down the general
solution of the equation in (i) and adjusting the coefficients in it to achieve (ii)
and (iii). The characteristic equation of the differential equation in (i) is
x~O
x<O
Section 5.6 Differential Operators 279
(5)
(6)
Notice that the parentheses in Equation (5) are necessary because c",T 0 D is
ambiguous; it could mean (c",T) 0 D.
It is useful to define the formal adjoint of the operator A in Equation (4).
It is
(7)
Notice that this definition is in harmony with the definition of adjoint for oper-
ators on Hilbert space, for we have
(AT)(¢) = T(A*¢) (T E 1)' , ¢ E 1))
and this can be written in the notation of linear functionals as
(AT,¢) = (T,A*¢) (T E 1)' , ¢ E 1))
Using Example 4 as a model, we can now prove a theorem about funda-
mental solutions of ordinary differential operators (i.e., n = 1).
(iii) (0 ~ j ~ m - 2)
280 Chapter 5 Distributions
Problems 5.6
1. If the coefficients c'" are constants, then the formal adjoint of the operator c",D'" is L
)'(-I)I"'l c",D"'. If the former is denoted by A, then the latter is denoted by A*. Prove
that for any distribution T, AT = To A *.
2. (Continuation) Prove that the Laplacian
A= L (8:J
n 2
i=l
is self-adjoint; i.e., A* = A.
3. Solve the equation Y" + 2Y' +Y = 8 + 8' in the distribution sense, using a function of
the form Y(x) = H(x)f(x).
4. If P is a polynomial in n variables, say P = L c",x"', and if D is the n-tuple
10. Let n = 1, and find a fundamental solution of the operator dxd?2' Use it to give a solution
to u" = ,p in the form of an integral.
11. Let = 1 and f(x) = eiklxl. Show that a multiple of J is a fundamental solution of the
::2 +
n
13. In Examples 2 and 3 find more general solutions by retaining the constant in H + c.
14. Complete Example 4 by obtaining the fundamental solution when d = O.
Section 5.7 Distributions with Compact Support 281
Proof. Define
exp[x 2 /(x 2 - 1)] Ixl < 1
g(x) = {
o Ixl ~ 1
and
•
9(X-l) x~l
f(x) = {
1 otherwise
The graphs of f and 9 are shown in Figure 5.3.
1 ~~~--~-+--~~-----~ .1 -~-~
O.s o . SI-----!----+---f-+-----i
o . 61-----'--+--+--+--+-·---j o .61----I--+-+--+------j
o . 41~~·---+-+ 0.4
0_2
-2 2 -2 2
Figure 5.3
Proof. Use the function f from the preceding lemma, and define
on a neighborhood of K.
Proof. (Rudin) Let [B(Xi' ri)] denote the sequence of all closed balls in ]Rn
having rational center Xi, rational radius ri, and contained in a member of A.
By the preceding lemma, there exists for each i a test function 'lj;i such that
0:::;; 'lj;i:::;; 1, 'lj;i(X) = Ion B(xi,rd2), and 'lj;i(X) = 0 outside of B(xi,ri). Put
</>1 = 'lj;1 and
(1) (i ~ 2)
It is clear that on the complement of B(Xi' ri), we have 'lj;i(X) = 0 and </>i(X) = o.
By induction we now prove that
(2)
Equation (2) is obviously correct for i = 1. If it is correct for the index i-I,
then it is correct for i because
</>1 + ... + </>i = 1- [(1-11>1) ... (1-'lj;i-1)] + [(l-'lj;1)··· (l-'lj;i-1)]'lj;i
= 1 - [(l-'lj;d··· (1 -'lj;i-d(1-'lj;i)]
Since 0 :::;; 'lj;i :::;; 1 for all i, we see from Equation (2) that
00
On the other hand, if X E U::1 B(Xi' rd2), then 'lj;i(X) = 1 for some i in
{I, ... ,m}. Then </>l(X) + ... + </>m(x) = 1 from Equation (2). Since the open
balls B(Xi' rd2) cover n, each compact set K in n is contained in a finite union
U::1 B(xi,rd2). This establishes (c). •
Fixing a distribution T, we consider a closed set F in ]Rn having this prop-
erty:
(3) T(</» = 0 for all test functions </> satisfying supp(</» C]Rn" F
Proof. Let F be the family of all closed sets F having property (3). Then
supp(T) = n{F: F E F}
Section 5.7 Distributions with Compact Support 283
Being an intersection of closed sets, supp(T) is itself closed. The only question
is whether it has property (3). To verify this, let cf> be a test function such that
supp(cf» c JR n " supp(T). It is to be shown that T(cf» == o. By De Morgan's
Law,
Notice that cf> == cf> 2::;:1 '!j1i, because if cf>(x) == 0, the equation is obviously true,
while if cf>(x) i- 0, then x E supp(cf» and 2::;:1 1/Ji(X) == 1. Hence, by the linearity
ofT,
Definition. The space E is defined to be the space Coo (JR n ) with convergence
•
defined as follows: cf>j -+ 0 if for each multi-index 0, Dacf>j(x) converges uni-
formly to 0 on every compact set.
(4) (f*¢»(X)= r
JJRn
f(y)¢>(x-y)dy
The convolution of a distribution T with a test function ¢> has been defined by
(5) (T * ¢»(x) = T(ExB¢»
where (B¢»(x) = ¢>(-x) and (Ex¢)(Y) = ¢(y - x).
Now observe that if T has compact support, then T (or more properly, its
extension T) can operate (as a linear functional) on any element of E. Conse-
quently, in this case, (5) is meaningful not only for ¢> E 1> but also for ¢> E E.
Equation (5) is adopted as the definition of the convolution of a distribution
having compact support with a function in coo(JRn).
Section 5.7 Distributions with Compact Support 285
Here 6(¢) = ¢(O) and (B¢)(x) = ¢(-x). We first verify that (6) is meaningful,
i.e., that each argument is in the domain of the operator that is applied to it.
Obviously, B¢ E 2) and T * B¢ E E by the corollary in Section 5.5, page 272. If
8 has compact support, then by the preceding lemma, 8 * (T * B¢) E E. Hence
6 can be applied. On the other hand, if T has compact support, then T * B¢
is an element of E having compact support; in other words, an element of 2).
Then 8 * (T * B¢) belongs to E, and again 6 can be applied.
It is a fact that we do not stop to prove that 8 * T is a continuous linear
functional on 2); thus it is a distribution. (See [Ru1], page 160].)
Finally, we indicate the source of the definition in Equation (6). If 8 and T
are regular distributions, then they correspond to functions J and 9 in L}oc(JR n).
In that case,
= II J(y)g(z)(B¢)(-y-z)dzdy
= II J(y)g(z)¢(y + z) dz dy
= II J(y)g(x - y)¢(x) dx dy
Problems 5.7
1. Refer to the theorem concerning partitions of unity, page 282, and prove that for each x
there is an index j such that I/>i (x) = 0 for all i > j.
2. Let [xil be a list of all the rational points in IRn. Define T by T(I/» = L::':1 2- i 4>(Xi),
where I/> is any test function. Prove that T is a distribution and supp(T) = IRn.
3. For a distribution T, let ;:1 be the family of all closed sets F such that T( 4» = 0
when I/> I F = O. Let ;:2 be the family of all closed sets F such that T(I/» = 0 when
supp(l/» c IR n "F. Show that ;:1 is generally a proper subset of ;:2.
286 Chapter 5 Distributions
4. Refer to the theorem on partitions of unity and prove that 1j; L~=l ¢i ..... 1j; as j ..... 00,
provided that supp(1j;) C O.
5. Prove that the extension of T as defined in the proof of Theorem 3 is independent of the
particular 1j; chosen in the proof.
6. If ¢ E 2), T E 2)', and supp(¢) n supp(T) = 0 (the empty set), then T(¢) = O.
7. If T E 2)' and supp(T) = 0, what conclusion can be drawn?
8. Show that if ¢ E COO(lR n ), if T E 2)', and if ¢(x) = 1 on a neighborhood of supp(T),
then ¢T = T.
9. Why, in proving Lemma 1, can we not take 9 to be a multiple of the function p introduced
in Section 7.1? _
10. Let f E C(lRn). Show that supp(f) = supp(f).
11. Let T be an arbitrary distribution, and let K be a compact set. Show that there exists
a distribution S having compact support such that S(¢) = T(¢) for all test functions ¢
that satisfy supp(¢) C K.
12. Prove that a distribution can have at most one continuous extension on c.
13. Prove that if a distribution does not have compact support, then it cannot have a con-
tinuous extension on C.
14. Let T be a distribution and N a neighborhood of supp(T). Prove that for any test
function ¢, T(¢) depends only on ¢ IN.
15. Prove or disprove: If two test functions ¢ and 1j; take the same values on the support of
a distribution T, then T(¢) = T(1j;).
16. Refer to the theorem on partitions of unity and prove that the balls B(x;, r;/2) cover the
complement of the support of T.
17. Prove that if f and 9 are in LtoclRn, then for all test functions ¢,
F( 8) = 1 00
I(t)e- st dt
The theory of this transform enables us to write down the equation satisfied by
F:
287
288 Chapter 6 The Fourier Transform
Thus the Laplace transform has turned a differential equation (1) into an alge-
braic equation (2). The solution of (2) is
F(s) = (13 + o:s - 0:)/(s2 - S - 2)
By taking the inverse Laplace transform, we obtain f :
f(t) = -Ho: + (3)e 2t + ~(20: - (3)e- t
The Fourier transform, now to be taken up, has applications of the type
•
just outlined as well as a myriad of other uses in mathematics, especially in
partial differential equations. The Fourier transform can be defined on any
locally compact Abelian group, but we confine our attention to IR n (which is
such a group). The material presented here is accessible in many authoritative
sources, such as [Ru1], [Ru2], [Ru3], [SW], and [Fol].
The reader should be aware that in the literature there is very little uni-
formity in the definition of the Fourier transform. We have chosen to use here
the definition of Stein and Weiss [SWJ. It has a number of advantages, not the
least of which is harmony with their monograph. It is the same as the definition
used by Horvath [HorvJ, Dym and McKean [DM], and Lieb and Loss [LLJ. Other
favorable features of this definition are the simplicity of the inversion formula,
elegance in the Plancherel Theorem, and its suitability in approximation theory.
We define a set of functions called characters e y by the formula
Ion [-1,1]
f(x) = {
o elsewhere
Then
~ 1 . e-27rixy X =l
f(y) = ( e-27rtxy dx = - - . - I
L1 -2'Trzy
x=-l
1 e27riy _ e- 27riy
•
sin(2'TrY)
-2'Triy 'Try 2i 'Try
The function x f----l- sin (2'Trx )/ ('Trx) is called the sine function. It plays an
important role in signal processing and approximation theory. See [CL], Chapter
29, and the further references given there.
If f E L1(JR n ), what can be said of j? Later, we shall prove that it is
continuous and vanishes at 00. For the present we simply note that it is bounded.
Indeed, 1111100 ~ Ilf11 1, because
Proof. We verify the first equation and leave the second to the problems. We
have
h(x, y) = g(x - y)
Let us prove that h is measurable. It is not enough to observe that the map
(x, y) ~ x - Y is continuous and that 9 is measurable, because the composition
of a measurable function with a continuous function need not be measurable.
For any open set 0 we must show that h- 1 (0) is measurable. Define a linear
transformation A by A(x, y) = (x - y, x + y). The following equivalences are
obvious:
Since 9 is measurable, g-l(O) and g-l(O) X ]Rn are measurable sets. Since A
is invertible, A-I is a linear transformation; it carries each measurable set to
another measurable set. Hence h- 1 (0) is measurable. Here we use the theorem
that a function of class C 1 from ]Rn to ]Rn maps measurable sets into measurable
sets, and apply that theorem to A-I.
The function F(x, y) = I(y)g(x - y) is measurable, and
=II e_x{u+y-u)f(u)g(y-u)dudy
~ =
f(x) I ( X) e-~1I:tXY
- f y - -:;: ~. dy
It follows that
J
and that
21j(x)1 ~ If(Y) - f(y - ~) Idy
292 Chapter 6 The Fourier Transform
(Evf)(x) == f(x - v)
19 (f * g)(x) == J f(y)g(x - y) d,
x t---+ (x 2 + a 2)-1
sin(27rx)
x t---+ --''---Co. X[-l,l]
7rX
Section 6.1 Definitions and Basic Properties 293
Problems 6.1
1. Prove Theorem 1.
2. Does the group JRn have any continuous characters other than those described in the
text?
3. Express 8(f * etl in terms of a Fourier transform.
4. What are the characters of the additive group Z?
5. Find the Fourier transform of the function
f(x) = {ocosx 0~ x ~ 1
elsewhere.
L
k=N
j(x) ~ f(k)e-27rikx
k=-N
Under what conditions does the approximate equation become an exact equation?
I:
19. (The Autocorrelation Theorem) Prove that if
g(x) = f(u)f(u+x)du
lil
then 9 = 2 .
J
20. Prove that f * 9 = f g.J J
21. Assume that f is real-valued and prove that the maximum value of f * Bf occurs at the
origin. The definition of B is (Bf)(x) = f( -x).
22. Recall the Heaviside function H from Section 5.1, page 250. Define f(x) = e- ax H(x)
and g(x) = e- bx H(x). Compute f * g, assuming that 0 < a < b.
23. Prove that if f is real-valued, then lil
2 is an even function.
24. We have adopted the following definition of the Fourier transform:
(Flf)(y) = r
iRn
e-27rixy f(x) dx
294 Chapter 6 The Fourier Transform
1.
Other books and papers sometimes use an alternative definition:
(F2f)(y) = -1)n/2
( e-'xy f(x) dx
211" Rn
g(x) = !2 1(~
211"
+ x) + !2 1(~
211"
- x)
29. Define the operator B by the equation (Bf)(x) = f( -x), and prove that f *Bf is always
even if f is real-valued.
The space S, also denoted by S(JRn ), is the set of all ¢ in coo(JR n) such that
p. DCi¢ is a bounded function, for each polynomial P and each multi-index Q.
Functions with this property are said to be "rapidly decreasing," and the space
itself is called the Schwartz space. In the case n = 1, membership in the
Schwartz space simply requires sUPx Ixm¢(kl(x)1 to be finite for all m and k.
Example 1. The Gaussian function ¢ defined by
2
¢(x) = e- 1xl
belongs to S.
It is easily seen, with the aid of Leibniz's formula, that if ¢ E S, then
•
p. ¢ E S for any polynomial P, and DCi¢ E S for any multi-index Q.
We note that S(JR n ) is a subspace of Ll(JR n ). This is because functions in
S decrease with sufficient rapidity to be integrable. Specifically, if ¢ E S, then
the function x t----+ (1 + IxI2)n¢(x) is bounded, say by M. Then
= 1
M 00 r n - 1wn (1 + r 2)-n dr < 00
In this calculation we used "polar" coordinates and the "method of shells." The
thickness of the shell is dr, the radius of the shell is r, and the area of the shell
is rn-1wn , where Wn denotes the area of the unit sphere in JRn.
Definition. In S, convergence is defined by saying that ¢j -» 0 if and only
if P(x)· DCi¢j(X) -t 0 uniformly in JRn for each multi-index Q and for each
polynomial P. In other terms, lip·
DCi¢j 1100 -t 0 for every multi-index Q and
every polynomial P, the sup-norm being computed over JR n .
Section 6.2 The Schwartz Space 295
Proof. Let ¢j ...... o. We ask whether Q . D{3 (P . ¢j) -+ 0 uniformly for each
polynomial Q and multi-index (3. By using the Leibniz formula, this expression
can be exhibited as a sum of terms Q,. DO¢j, where the Q, are polynomials and
0: is a multi-index such that 0: ~ (3. Each of these terms individually converges
uniformly to zero, because that is a consequence of ¢j ...... 0 in S. Therefore, their
sum also converges to O. •
Proof. It suffices to deal with the case of one monomial and establish that
DOe y = (27riy)Oe y . We have
296 Chapter 6 The Fourier Transform
Since the Fourier map f t--+ 1is linear, it suffices to prove that
In this integral we can use integration by parts repeatedly to transfer all deriva-
tives from 4> to the kernel function e_ y . Each use of integration by parts will
introduce a factor of -1. Observe that no boundary values enter during the
integration by parts, since 4> E S. Using also the preceding lemma, we find that
the integral becomes successively
a )2
A=L n
j=1
(
ax
J
Equivalently,
•
Theorem 2. If </> E Sand P is a polynomial, then P( - D / (27ri))¢; =
P</>. Equivalently, P(D)¢; = P;;P, where p*(y) = P( -27riy).
= J P*(y)cx(Y)¢(Y) dy = P*¢(x) •
In the preceding proof, one requires the following theorem from calculus.
See, for example, [Wid 1] page 352, or [Bart] page 271.
d {'Xl (Xl a
dx 10 f(x, t) dt = 10 ax f(x, t) dt
Jl'l/Jj(x)1 J+
j,
dx < € (1 IxI 2)-n dx = c€
and this shows that J I'l/Jjl -+ O. From the inequality
A variety of hypotheses can be adopted for this result. See, for example, [SW]
page 252, [Lanl] page 373, [Yo] page 149, [Gri] page 32, [Wall page 60, [Kat]
page 129, [Fri] page 104, [Ho] page 177, [DM] page 111, [Fol] page 337, [Til]
page 60.
(In verifying these calculations, notice that the exponents are negative.) Then
we have
00
L If(x + v)1 ~ cL Ilvll:,n-€ = cL L Ilvll:,n-€
v,",o v ,",0 j=11IvII00=j
cLrn-€#{v: Ilvll = j}
00
=
j=1
j=1 j=1
By a theorem of Weierstrass (the "M-Test", page 373)), this proves that the
function F(x) = Lv f(x + v) is continuous, for its series is absolutely and
uniformly convergent. The function F is integer-periodic: For Jl E zn,
F(x + Jl) = L f(x + Jl + v) = L f(x + v) = F(x)
v v
Let Q = [0, l)n) the unit cube in JR n . The Fourier coefficients of the periodic
function Fare
Av = 1 Q
F(x)e- 27riVX dx = 1L
Q ~
f(x + Jl)e-27riVX dx
v v
From this we see that the Fourier series of F, Lv Ave27riVX, is uniformly and
absolutely convergent. By the classical theory of Fourier series (such as in [Zy],
vol. II, page 300) we have
It follows that
L f(v) = F(O) = L All = :E j(v) •
v v
300 Chapter 6 The Fourier Transform
Problems 6.2
1. Prove Lemma 2.
2. Prove Lemma 3.
3. Let f be an even function in L1(JR). Show that
~f(t) = 1 00
f(x)cos(27rtx)dx
The right-hand side of this equation is known as the Fourier Cosine Transform of f.
4. Let B be the operator such that (Bf)(x) = f( -x). Find formulas for ii] and Bj How
are these related?
5. What is the Leibniz formula appropriate for the operator (D/(27ri»"'?
6. Let f E L1(JRn) and g(x) = f(Ax), where A is a nonsingular n x n matrix. Find the
relation between / and g.
7. Prove that, after Fourier transforms have been taken, the differential equation f'(x) +
xf(x) = 0 becomes (fJ'(t) + 47r 2 tf(t) = o.
8. Give a complete formal proof that P . f E S whenever P is a polynomial and f E S.
9. Prove that for a function of n variables having the special form
n
f(X1, ... ,Xn ) = II/j(Xj)
j=1
we have
n
f(tl> ... , tn) = II jj(tj)
j=1
(t # 0)
11. Let fm(x) = 1 if Ixl :::;; m, and fm(x) = 0 otherwise. (Here x E JR, and m = 1,2, .... )
Compute fm * It and show that it is the Fourier transform of a function in L1(JR).
12. Interpret Lemma 4 as a statement about eigenvalues and eigenvectors of a differential
operator.
13. Prove, by using Fubini's Theorem, that for functions f and 9 in U(JRn), J /g J= fi·
14. Explain why e- 1xl is not in S.
15. Prove that if ¢> E S, then ~ exists for any multi-index Q.
16. Let P be a polynomial on JRn and let 9 be an element of COO (JRn) such that Ig(x)1 :::;; IP(x)1
for all x E JRn. Is the mapping f o--t gf continuous from S into S?
17. Prove that S is the subspace of Coo (JRn) consisting of all functions </> such that for each
Q, the map x o--t x"'(D"'</>)(x) is bounded.
18. Show that </>j -» 0 in S if and only if P(D)( Q¢>j) -t 0 uniformly in JRn for all polynomials
P and Q.
19. Prove that if P is a polynomial and c is a scalar, then P(cD)e ll = P(27ricy)ell.
20. Prove that if P is a polynomial and c is a scalar, then P(cD)¢ = P;;P, where Pc(x) =
P( -27ricx).
21. Prove that P(-D)¢ =-;+;;, where P+(x) = P(27rix).
22. Prove that for x E JRn, Ix"'l:::;; Ixl''''l.
23. Using the operator B in Problem 4, prove that = BJ. f
24. Prove that
dk
dx k eX = eX Pk(X)
2 2
where the polynomials Pk are defined recursively by the equations PQ(x) 1 and
Pk+1(X) = 2XPk(X) + p~(x).
Section 6.3 The Inversion Theorems 301
I:
30. The first moment of a function f is defined to be
xf(x)dx
Prove that under suitable hypotheses, the first moment is (jj'(O)/( -2ni).
In the previous section it was shown that the operator F defined by F(4)) = J;
is linear and continuous from S into S. In this section our goal is to prove that
F is surjective and invertible, and to give an elegant formula for F- 1 .
j=1
We prove our result first when n = 1 and then derive the general case. Define,
for x E JR, the analogous function 1j;(x) = e-...x 2 • Since 1j;'(x) = e- 1rX2 (_27rx) =
-27rx1j;(x), we see that 1j; is the unique solution of the initial-value problem
(~)'(x) + 27rx~(x) = 0
302 Chapter 6 The Fourier Transform
~
'lj;(0) 100 100
= -00 'lj;(X) dx = -00 e- 7rX 2
dx =1
(See Problem 10 for this.) We have seen that 'lj; and;;; are two solutions of
the initial-value problem (1). By the theory of ordinary differential equations,
=;;;.
'lj; =
This proves the theorem for n 1. Now we notice that
Proof. We use the conjurer's tricks of smoke and mirrors. Let 8 be the func-
tion in the preceding theorem, and put g(x) = 8(xj)..). Then g(y) = )..nO()..y).
(Problem 8 in Section 6.1, page 293.) By Problem 13 in Section 6.2, page 300,
= J ¢G)O(u) du
Thus J'l/J(x)(f - F)(x) dx = 0 for all 'l/J E S, because ¢ can be any element of
S. The same equation is true for all 'l/J E 1>, since V is a subset of S. Now apply
Theorem 2 of Section 5.1, page 251, according to which 9 = 0 when 9 = o. The
conclusion is that that f(x) = F(x) almost everywhere. •
Problems 6.3
1. Does:F commute with the operators B and Ex?
2. Find the inverse Fourier transform of the function
sint It I :::; 7r
{
f(t) = 0 It I > 7r
4. Let f E Ll(JR) and define h(x) = J: f(t)dt. Prove that if h E Ll(JR), then h(t)
(27rit)-1 j(t).
5. For the function f(x) = e- 1xl , show that j(x) = 2/(1 + 47r 2 x 2 ). Show that jis analytic
in a horizontal strip in the complex plane described by the inequality IIm(z)1 < 1/(47r2 ).
=
(Here n 1.)
6. Let f(x) = e- X for x ;;:: 0 and let f(x) = 0 for x < O. Find j and verify by direct
J
integration that f(x) = j. ex·
7. In Section 6.1 we saw that the following is a Fourier transform pair:
I -l:::;x:::;l
f(x) = {
o otherwise
Prove that f belongs to £l(JR) but j does not. Explain why this does not violate the
inversion theorem.
8. Using Theorem 1 of this section and Problem 8 in Section 6.1, page 293, prove that the
Fourier transform of the function 1>(x) = e- ax2 is
Prove also that :F- 11> = :F1>. Prove that this last equation follows from the sole fact that
1> is an even function.
9. Prove that if f is odd and belongs to Ll(JR), then
. 1
t~
- f(t) =
2 0
00
f(x) sin(27rtx) dx
The right-hand side of this equation defines the Fourier Sine Transform of f.
10. Prove that
This can be accomplished by considering the square of this integral, which can be written
as the (double) integral of e-(x 2 +y2) over JR2. This double integral can be computed by
polar coordinates.
Section 6.4 The Plancherel Theorem 305
This section is devoted to extending the Fourier operator from the Schwartz
space S(JRn ) to L2(JR n ). It turns out that the extended operato!; has a number
of endearing properties, leading one to conclude that L2(JRn) is the "natural"
setting for this important operator.
(T E ~I,¢ E~)
(1)
Applying this to the right side of Equation (1), we see that DQ(f*g) is continuous
for every multi-index Q. •
The Lebesgue space U(JRn), where 1 :::;; p < 00, has as elements all mea-
surable functions f such that Ifl P E Ll(JR n ). The norm is Ilfllp
= IlifIPII~/p.
Further information about these spaces is found in Section 8.7, pages 409ff.
306 Chapter 6 The Fourier Transform
Proof. The continuous functions with compact support form a dense set in
LP, if 1 ~ P < 00. Hence, if e > 0, then there exists such a continuous function
h for which 111-hll p ~ e. Let the support of h be contained in the ball Br of
radius r centered at 0. By the uniform continuity of h there is a 8 > 0 such that
Proof· Since J Pk = I,
Here we need Lemma 3: If I E Ll(JR n ) and c > 0, then there is a 8 > 0 such
that liEd - 1111
~ c whenever Izl ~ 8. If p(x) = 0 when Ixl > r, then Pk(X) = 0
when Ixl > r/k. Hence when r/k ~ 8 we will have * Pk - III /111
~ c. Lemma 2
shows that 1* Pk E COO(JRn ). •
Section 6.4 The Plancherel Theorem 307
Proof. Let I E Ll(JRn), and let c > O. We wish to find an element of 1>(Rn)
within distance c of I. The function 1* Pk from the preceding theorem would
be a candidate, but it need not have compact support. So, we do the natural
thing, which is to define
I(x) if Ixl ~ m
Im(x) ={
o elsewhere
Then Im(x) -+ I(x) pointwise, and the Dominated Convergence Theorem (page
406) gives us J Ilml -+ J III. Consequently, we can select an integer m such that
11/111 -111m 111 < c/2. Then
[ I/(x)1 dx < c/2
J1xl>m
(f,g) = (i,9) or
This is proved with the following calculation, in which the inversion theorem is
used:
(1, 9) = J JJ
i(y) g(y) dy = l(x)e-21rixy g(y) dx dy
= J J J
I(x) e21riXYg(y)dydx = I(x)g(x)dx
i(x) = f l(y)e-27riXY dy
1
It now follows that = lim 1m,the limit being taken in the L2 sense. This state
of affairs is often expressed by writing
In this equation, L.I.M. stands for "limit in the mean," and this refers to a limit
in the space L2.
Another procedure for generating a sequence that converges to in L2 is to 1
select an orthonormal basis [un] for L2, to express I in terms of the basis, and
to take Fourier transforms:
L (f, um)um
00
I =
m=O
1= L
00
(f, um)u;.
m=O
m=O
Problems 6.4
1.In Lemma 2, can we conclude that 1* 9 belongs to Co(JRn)?
2. Explain why Lemma 3 is not true mrthe space Loo(JRn).
3. Prove that S C L2(JRn).
4. Prove that S is dense in L2(JRn).
5. Prove that if I, 9 E S, then fIg = f 1'9.
6. Prove this theorem: Let Y be a deJllle subspace of a normed linear space' X. Let A E
.c(Y, Z), where Z is a Banach space, Then there is a unique A E .c(X, Z) such that
A I Y = A and IIAII = IIAII. Suggestions: If x EX, then there is a sequence y" E Y such
that y" -+ x. Put Ax = limAy". Show that the limit exists and is independent of the
sequence y".
7. In the situation of Problem 6, show that if A is an isometry, then so is A.
8. Show that neither of the inclusions Ll(JRn ) C L2(JRn), L2(JRn ) C Ll(JRn ) is valid.
9. Find an element of L2(JR) '- Ll(JR) and compute its Fourier-Plancherel transform.
10. Prove that the Fourier transform of the function
I(x) = {e-o aZ x ~0
x<o
1
is (21rix + a)-l. Here, a> O. Show that E L2(JR) '- Ll(JR).
11. Prove that the equation jg = 1*9holds for functions in L2(JRn).
12. The eigenvalues of:F : L2(JR) -t L2(JR) are ±1, ±i, and no others. Show that hm =
(-i)mhm, where h m is the Hermite function
Suggestions: Prove that hm+l(x) = h~(x) - 21rxhm (x). Then prove that hm+l(X) =
-i(hm)'(x)~+ 21rixhm (x). Show that the functions (-l)mhm obey the same recurrence
relation as h m .
13. For I and 9 in S, prove that
15. Define f(x} = 0 for x < 1 and f(x} = X-I for x;:;, 1. Find 1.
Useful reference: Chapter 5
in [AS].
1
16. Is this reasoning correct? If f E L1, then is continuous. Since L1 is dense in L2, the
same conclusion must hold for f E L2.
17. The variance of a function f is defined to be liufli/llfli, where u(x} = x. Prove this
version of the Uncertainty Principle: The product of the variances of f and 1cannot be
less than 1/(411"}.
We will give some representative examples to show how the Fourier transform
can be used to solve differential equations and integral equations. Then, an
application to multi-variate interpolation will be presented. These are what
might be called direct applications, as contrasted with applications to other
branches of abstract mathematics.
(1) P{D) = Lm
cjIY = L cj{27ri)3
. m . ( D
-2.
)j
j=O j=O 7rZ
(6)
If h is a function such that Ii = 1/ P+, then Equations (6) and (8) yield
(9)
i:
In detail,
The Fourier transform of the function g(x) = e- 1xl is g(t) = 2/(1 + 41l' 2 t 2 )
(Problem 5 of Section 6.3, page 304). Hence the Fourier transform of Equation
(11) is
u(t) =:;
Joo (1 +z2)(b+iz)
1 e dz
-00
itz
The integrand, call it f(z), has poles at z = +i, -i, and ib. In order to evaluate
this integral, we use the residue calculus, as outlined at the end of this section.
Let the complex variable be expressed as z = x + iy. Then
I:
Hence by Theorem 4 at the end of this section,
2i(b-l) 2i(b + 1)
e- t 2e- bt
=b-l
- - +I-b2
--
After taking Fourier transforms and using Theorem 4 in Section 6.1 (page 290)
we have
1 00
-00 e-1x-s1u(s) ds = e- x
2
/2
k(x) = e- 1xl
Section 6.5 Applications of the Fourier Transform 313
From Problems 6.3.5 and 6.3.8 (page 304), we have these Fourier transforms:
~ 2
k( x) = -:-(1-+-4-7r-::C2-:
x2O-:-)
To take the inverse transform, use the principle in Theorem 1 of Section 6.2
(page 296) that P{D)g = p+ . g. We let P(x) = (1 - x2)/2, so that P+(x) =
P(27rix) = (1 + 47r 2X2)/2. Then
u= P+g = P{D)g
The inverse transform then gives us
•
As another example of applications of the Fourier transform, we consider a
problem of multi-variate interpolation. First, what is meant by "multi-variate
interpolation"? Let us work, as usual, in ]Rn. Suppose that at a finite set of
points called "nodes" we have data, interpreted as the values of some unknown
function. We will assume that the nodes are all different from one another.
Since we will not need the components of the nodes, we can use the notation
Xl, X2, ... ,X m for the set of nodes. Let the corresponding data values be real
numbers AI, A2, ... , Am. We seek now an "interpolating" function for this infor-
mation. That will be some nice, smooth function that is defined everywhere and
takes the values Ai at the nodes Xi. (Polynomials are not recommended for this
task.) One way of obtaining a simple interpolating function is to start with a
suitable function f, and use linear combinations of its translates to do the job.
Thus, we will try to accomplish the interpolation with a function of the form
m
x>---t L,cjf(x - Xj)
j=l
L, Cjf(Xi - Xj) = Ai (1 ~ i ~ m)
j=l
This is a system of m linear equations in the m unknowns Cj. How can we be sure
that the system has a solution? Since we want to be able to solve this problem
for any Ai, we must have a nonsingular coefficient matrix. This can be called the
"interpolation matrix"; it is the matrix Aij = f(Xi - Xj). A striking theorem
gives us an immense class of useful functions f to play the role described above.
314 Chapter 6 The Fourier Transform
Proof. Let J = g, where 9 E L1(JR n ) and g(x) > 0 everywhere. The interpola-
tion matrix in question must be shown to be positive definite. This means that
u' Au> 0 for all nonzero vectors u in em. We undertake a calculation of this
quadratic form:
m m
u' Au = L L UkAkjUj = L L UkUJi(Xk - Xj)
k=l j=l
L uje 27r;YXj
m
h(y) =
j=l
So far, we have proved only that the interpolation matrix A is nonnegative
definite. How can we conclude that the final integral above is positive? It will
suffice to establish that the functions y f----7 e27riyxj form a linearly independent
set, for in our computation, the vector U was not zero. Once we have the linear
independence, it will follow that Ih(y)12 is positive somewhere in JR n , and by
continuity will be positive on an open set. Since 9 is positive everywhere, the
final integral above would have to be positive. The linear independence is proved
separately in two lemmas. •
Lemma 1. Let AI, ... ,Am be m distinct complex numbers, and let
C1, ... , Cm be complex numbers. If 2:,;' 1cje AjZ = 0 for all z in a subset
of e that has an accumulation point, then 2:,;'1 ICj I = o.
(1 ~ j <k ~ m)
Each set Hjk is the intersection of two hyperplanes in Rn. (See Problem 4.)
Hence each Hjk is a set of Lebesgue measure 0 in R n , and the same is true of
any countable union of such sets. The finite family of sets Hjk therefore cannot
cover the open set 0, which must have positive measure. Now define, for t E C,
the function J(t) = 2:~ cje(wjf.l t . Since ~ E 0, our hypothesis gives us J(l) = O.
Let U be a neighborhood of 1 in C such that t~ E 0 when t E U. Since J(t) = 0
on U, Lemma 1 shows that 2:;"=1 ICjl = o. •
More information on the topic of interpolation can be found in the textbook
[eLl. Functions of the type J, as in Theorem 1, are said to be "strictly positive
definite on Rn." They are often used in neural networks, in the "hidden layers,"
where most of the heavy computing is done.
The remainder of this section is devoted to a review of the residue calculus.
This group of techniques is often needed in evaluating the integrals that arise in
inverting the Fourier transform.
00
1 ( J(z) dz
Cn =-
27ri Jc (z - ()n+l
n=-oo
(12) 1 . ( J(z) dz
C-l = -2
7rZ Jc
316 Chapter 6 The Fourier Transform
Example 4. The integral fa eZ /Z4 dz, where C is the unit circle, can be
computed with the principle in Equation (12). Indeed, the given integral is 27ri
times the residue of eZ / z4 at O. Since
4 (
e/z
Z
= Z2 z3
l+z+-+-+··· ) /
z4
2! 3!
Proof. Draw mutually disjoint circles C I , ... , Cm around the singUlarities and
contained within C. The integral around the path shown in the figure is zero,
by Cauchy's integral theorem. (Figure 6.1a depicts the case m = 2.) Therefore,
In this equation, divide by 27ri and note that the negative terms are the residues
Of!at(I, ... ,(m. •
o
(a) (b)
Figure 6.1
1 1 i/2 i/2
J(z) = z2 +1 = (z + i)(z - i) z+i z-i
Section 6.5 Applications of the Fourier Transform 317
The residue at i is therefore -i/2, and the value of the integral is 1r. •
Proof. Write J = p/q, where p and q are polynomials. Since J is proper, the
degree of p is less than that of q. Hence the point at 00 is not a singularity
of f. Now, C is the boundary of one region containing the poles, and it is
also the boundary of the complementary region in which J is analytic. Hence
Ie J(z)dz = O. •
-00
J(z) dz is the sum of the
residues at the poles in the upper half-plane.
Proof. Consider the region shown in Figure 6.1b, where C is the semicircular
arc and r is chosen so large that all the poles of J lying in the upper half-
plane are contained in the semicircular region. On C we have z = re i8 and
dz = ire i8 d6. Hence
By Theorem 3,
j-r
r J(z) dz + (
le
J(z) dz = 21ri x (sum ofresidues)
[~ u(s) ds -1 00
u(s) ds + u(x) = f(x)
318 Chapter 6 The Fourier Transform
(1) Uxx = Ut
The function f gives the initial temperature distribution in the rod. We define
1:
u(y, t) to be the Fourier transform of U in the space variable. Thus
Taking the Fourier transform in Equations (1) and (2) with respect to the space
variable, we obtain
_47r2y2U(Y, t) = Ut(y, t)
(3) {
u(y,O) = f(y)
Here, again, we use the principle of Theorem 1 in Section 6.2, page 296: P{Dfu =
P+u, where P+(x) = P(27rix).
Equation (3) defines an initial-value problem involving a first-order linear
ordinary differential equation for the function u(y, .). (The variable y can be
ignored, or interpreted as a parameter.) We note that (u)t = (Ut). The phe-
nomenon just observed is typical: Often, a Fourier transform will lead us from
a partial differential equation to an ordinary differential equation. The solution
of (3) is
~ 2 2
(4) u(y, t) = f(y)e- 47r Y t
Section 6.6 Applications to Partial Differential Equations 319
Now let us think of t as a parameter, and ignore it. Write Equation (4) as
2 2
u(y, t) = J(y)G(y, t), where G(y, t) = e- 4 71' y t. Using the principle that ¢ylj; =
~ ~ ~ ~~
(6)
Example 2.
u(x, t) = (47rt)-1/21: J(x -
We consider the problem
z)e- z2 /(4t) dz
•
Uxx = Ut x ;?: 0, t;?: 0
(7) {
u(x,O) = f(x), u(O,t) = 0 x ;?: 0, t;?: 0
1 (""
(8) u(O,t) = (47rt)-Z L"" J(-z)e- Z
2
/
4t dz
The easiest way to ensure that this will be zero (and thus satisfy the bound-
ary condition in our problem) is to extend J to be an odd function. Then the
integrand in Equation (8) is odd, and u(O, t) = 0 automatically. So we define
f( -x) = - f(x) for x > 0, and then Equation (6) gives the solution for Equa-
tion (7). •
Example 3. Again, we consider the heat equation with boundary conditions:
x ;?: 0, t;?: 0
(9)
u(O, t) = g(t)
Because the differential equation is linear and homogeneous, the method of su-
perposition can be applied. We solve two related problems, viz.,
Again this is an ordinary differential equation, linear and of the first order. Its
solution is easily found to be
If w is made into an odd function by setting w(x, t) = -w( -x, t) when x < 0,
then we know from Problem 9 in Section 6.3 (page 304) that
i:
Therefore by the Inversion Theorem (Section 6.3, page 303)
or
w(x,t) = -47Ti 1-00
e21rlXYye-41r 2 Y t
00. 2it 0
e41r 2 Y2 Ug(CT)dCTdy
w(x,t) = --i
7T
1 it
00
-00
ze 1XZ
.
0
e- z 2 (t-u)g(CT)dCTdz
•
Example 4. The Helmholtz Equation is
~u- gu= f
in which ~ is the Laplacian, L~=1 8 2 /8x%. The functions f and 9 are prescribed
on IRn, and u is the unknown function of n variables. We shall look at the special
case when 9 is the constant 1. To illustrate some variety in approaching such
problems, let us simply try the hypothesis that the problem can be solved with
an appropriate convolution: u = f * h. Substitution of this form for u in the
differential equation leads to
Carrying out the differentiation under the integral that defines the convolution,
we obtain
(j)(~h)1\ - iii = i
Section 6.7 Tempered Distributions 321
From this equation cancel the factor j, and then express (t.h)" as in Example
2 in Section 6.2 (page 297):
h(x) _ -1
- 1 + 41T21x12
The formula for h itself is obtained by use of the inverse Fourier transform, which
leads to
h(x) = 1Tn/2fo'X! C n/ 2 exp( -t - I1TXI2 It) dt
The calculation leading to this is given in [Ev], page 187. In that reference, a
different definition of the Fourier transform is used, and Problem 6.1.24, page
293, can be helpful in transferring results among different systems. •
Problems 6.6
1. Define the Sine Transform by the equation
Show that (f")S(t) = 21rtf(0) - 41r 2 t 2 fS(t). (Two integrations by parts are needed, in
addition to the assumption that fELl.)
2. Define the Cosine Transform by the equation
f
Show that = 2ff - 2if:·
4. If w S is the sine transform in the first variable in the function (x, t) >---t w(x, t), what is
the difference between (wslt and (Wt)s?
5. Define a scaling operator by the equation (S)..f)(x) = flAX). Prove that DO. 0 S).. =
Alo.IS).. 0 DO..
6. If the problem Llu - u = f is solved by the formula u = f * h, for a certain function h,
how can we solve the problem Llu - c2 u = f, assuming c > O?
Let us recall the definitions of two important spaces. The space 1), the space of
"test functions," consists of all functions in COO (lRn) that have compact support.
(Of course, 1) depends on n, but the notation does not show this.) In 1) we
define convergence by saying that cPj ...... 0 in 1) if there is one compact set
322 Chapter 6 The Fourier Transform
containing the supports of all ¢j and if (DO: ¢j) (x) converges uniformly to 0 for
each multi-index o.
The space S consists of all functions ¢ in Coo (JR n ) such that the function
p. DO:¢ is bounded, for all polynomials P and for all multi-indices o. In S we
defined ¢j -» 0 to mean that p. DO:¢j converges uniformly to 0, for each P and
for each o.
It is clear that 1> c S. A distribution T, being a continuous linear functional
on 1>, mayor may not possess a continuous linear extension to S. If it does
possess such an extension, the distribution T is said to be tempered.
(2) p. L (~)DO:-/34>'D/3(1-'l/Jj)
/3';;;0:
Section 6.7 Tempered Distributions 323
Since ¢> E S, we must have p. DOI.-{3¢> E S also. Hence IxI 2 1P(x)· DOI.-{3¢>(x) I is
bounded, say by M. This bound can be chosen to serve for all {3 in the range
o ~ {3 ~ ct. Increase M if necessary so that for all (3 in that same range,
Also, for Ixl ~ j, ID{3(l- 'lj;j)1 ~ M. Hence the expression in (2) has modulus
no greater than
J(¢» = ( f(x)¢>(x) dx
Jan
Suppose that P is a polynomial such that f /P E L1. Write
1(</»= jU/P)(P.</»
Since ¢> E S, P¢> is bounded, and the integral exists. If ¢>j ....... 0 in S, then
P( x )¢>j (x) -t 0 uniformly on JRn, and consequently,
•
Definition. The Fourier transform of a tempered distribution T is defined by
the equation T(¢» = T(¢) for all ¢> E S. An e~ivalent equation is T = To F,
where F is the Fourier operator mapping ¢> to ¢>.
We used Theorem 1 in Section 6.2, page 296, in this calculation. The other
equation is left as Problem 6. •
Problems 6.7
1. Prove that every f in L1 (IRn) is a tempered distribution.
2. Prove that every polynomial is a tempered distribution.
3. Prove that if f is measurable and satisfies If I ~ IPI for some polynomial P, then f is a
tempered distribution.
4. Prove that the function f defined by f(x) = =
eX, (n 1) is not a tempered distribution.
Note that f is a distribution, however,
1
5. Let f(x) = (Ixl + 1)-1. Explain how it is possible for to belong to L2(JR) in spite of
the fact that the integral J~oo f(x)e-27rixt dx is meaningless.
6. Prove the remaining part of Theorem 4.
7. Under what circumstances is the reciprocal of a polynomial a tempered distribution?
8. Define oa by oa(¢) = ¢(a). Compute Ja.
9. Prove that our definition of the Fourier transform of a tempered distribution is consistent
with the classical Fourier transform of a functiolk.
10. Is the function f(x) = e- ixi a member of S? Is f a tempered distribution?
=
11. What flaw is there in defining T¢ T¢ for T E 1)' and ¢ E V?
12. Find the Fourier transforms of these functions, interpreted as tempered distributions:
(a) g(x) = x (x E JR)
Section 6.8 Sobolev Spaces 325
Sometimes we do not belabor the distinction between I and 1, and think of each
I in Ltoc(O) as a distribution. The clear advantage of this is that such functions
will possess derivatives of all orders (in the distribution sense). Derivatives of
this type are called "weak derivatives" or "distribution derivatives" to distinguish
them from the classical derivatives, which are then called "strong" derivatives.
Thus if f E Lt'2S(O) and if Q is a multi-index, DO. f need not exist in the classical
sense, but an f will always exist. Recall that in this book the symbol an was
reserved for distributions, and
(2)
326 Chapter 6 The Fourier Transform
Equation (2) can be written more succinctly as EPT = To (- D)e>. Then, for
any polynomial P, we have p(a)T = To P( -D).
The classical spaces LP(n), for 1 ~ P < 00, are defined as follows. The
elements of LP(n) are the Lebesgue measurable functions f defined on n for
which
(4)
The verification of the norm axioms is relegated to the problems. Notice that in
Equation (4) the conventions of the above definition are being used.
To prove this, let <P be a test function. By the Holder Inequality (Section 8.7,
page 409),
Section 6.S Sobolev Spaces 327
This proves that 801. fO = fOl., and shows that for lal ~ k, 801. fO E U(n). By the
definition of the Sobolev space, fO E Wk,p(n). Finally, we write
The test function space 1>(n) is, in general, not a dense subspace of the
Sobolev space Wk,p(n). This is easy to understand: Each ¢ in 1>(n) has compact
support in n. Consequently, ¢(x) = 0 on the boundary of n. The closure
of 1>(n) in Wk,p(n) can therefore contain only functions that vanish on the
boundary of n. However, the special case n = JRn is satisfactory from this
standpoint:
For the proof of Theorem 2, consult [Hu]. Some closely related theorems
are given in this section.
In many proofs we require a mollifier, which is a test function 'I/J having these
additional properties: 'I/J ~ 0, 'I/J(x) = 0 when Ilxll
~ I, and J'I/J = 1. Then one
puts 'l/Jj(x) = jn'I/J(jx). A "mollification of f with radius E" is then 'l/Jj * f with
11 j < L These matters are discussed in Section 5.1 (pages 246ff) and Section
5.5 (pages 269ft").
familiar calculations and HOlder's inequality (Section 8.7, page 409) we have
r'
=
J
where J1(Bj ) is the Lebesgue measure of the ball of radius I/j. By enclosing
that ball in a "cube" of side 2/j, we see that J1(Bj ) ~ (2/j)n. Thus,
In order to estimate the right-hand side in the above inequality, use Problem 6
to get
Proof. Let B 1 , B 2 , ... be a sequence of open balls such that Bi c f! for all i and
UBi = f!. The center and radius of Bi are indicated by writing' Bi = B(Xi, Ti).
Appealing to Theorem 1 in Section 5.7 (page 282), we obtain a partition of
unity subordinate to the collection of open balls. Thus, we have test functions
<Pi satisfying 0 ::::; <Pi ::::; 1. Further, SUPP(<Pi) C Bi , and for any compact set K
in f!, there exists an integer m such that L:~ <Pi = 1 on a neighborhood of K.
Now suppose that 1 E Wk,P(f!). Let 0 < E < 1/2. Eventually, we shall find a
Coo-function 9 in Wk,P(f!) such that 111 - gil < 2E.
Select a sequence Di .J- 0 such that B(Xi, (1 + Di)Ti) c f! for each i. Define
1; = <Pd· Let gi be a mollification of 1 with radius DiTi. At the same time,
we decrease Di if necessary to obtain the inequality Ilgi - 1;llw k ,p(fl) < E/2i.
(This step requires the preceding lemma.) Define 9 = L: gi. If 0 is a bounded
open set in f!, then 0 is compact, and for some integer m, L:::1 <Pi = 1 on a
neighborhood of o. On 0, we have
m m m
L 1; =L <pd = 1 L <Pi =1
i=1 i=1 i=1
Then we can perform the following calculation, in which the norm in the space
Wk,P(O) is employed (until the last step, where the domain f! enters):
m 00 00
I";;p<oo
Embedding Theorems. Here we explore the relations that may exist be-
tween two Sobolev spaces, in particular, the relation of one such space being
continuously embedded in another.
For general normed linear spaces (E, II . II E) and (F, II . II F), we say that F
is embedded in E (and write F y E) if
(a) FeE;
(b) There is a constant c such that lilliE , ; ell/li
F for all I E F.
Part (a) of this definition is algebraic: It asserts that F is a linear subspace
of the linear space E. Part (b) is topological: It asserts that the identity map
I: F -+ E is continuous (i.e., bounded). Indeed, if
(f E C[a,bJ)
•
Example 2. If 1 ,,;; s < r < 00 and if the domain 0 has finite Lebesgue
measure, then LT(O) Y £"(0). To prove this, start with an I in LT(O) and
write r = ps. We may assume that I ~ O. Then f" is in LP(O) because
J J
f"P = y. Use the Holder Inequality (page 409) with conjugate indices p and
q = p/(p - 1):
•
Theorem 5. Wl,2(1R) Y WO,OO(IR).
Proof. (In outline. For details, see [LL], Chapter 8.) Let I be an element of
Wl,2(1R). Since 1>(IR) is dense in W 1 ,2(1R), there exists a sequence [Ii] in :D(IR)
converging to I in the norm of Wl,2. Each Ii has compact support and therefore
satisfies Ii(±oo) = O. Since lilI = (fl)' /2, we have
Section 6.8 Sobolev Spaces 331
By taking the limit of a suitable subsequence, we obtain the same equation for
I, at almost all points x. Then, with the aid of the Cauchy-Schwarz inequality
and the inequality between the geometric and arithmetic means, we have
Consequently,
II(x)1 ~ ~ IIIII~ + III/II~
This establishes the embedding inequality:
•
The next theorem is one of many embedding theorems, and is given here as
just a sample from this vast landscape. It involves one of the spaces W~,p(n).
This space is defined to be the closed subspace of Wk,p(n) generated by the set
of test functions 1J(n). For this theorem and many others in the same area,
consult [Zie] pages 53ff, or [Ad] pages 97ff.
Theorem 8 (and others like it) can be used to establish that a distributional
solution of a partial differential equation is in fact a classical solution.
332 Chapter 6 The Fourier Transform
The Sobolev-Hilbert Spaces. The spaces Wk,2(n) are Hilbert spaces and
are conventionally denoted by Hk(n). For the special case n = IR n , we can
follow Friedlander [FriJ, and define them for arbitrary real indices s as follows.
The space HS(lRn) consists of all tempered distributions T for which
Matters not touched upon here: (1) The importance of conditions on the
boundary of n for more powerful embeddings. (2) The Sobolev spaces for non-
integer values of k. (3) The duality theory of Sobolev spaces; i.e., identifying
their conjugate spaces as function spaces.
Problems 6.8
1. Prove that the norm defined in Equation (4) satisfies all the postulates for a norm.
2. Prove that LP(fl) C Ltoc(fl).
3. Prove that for 1::;; P < 00, 1)(fl) c LP(fl) c 1)'(fl). Show that the embedding of LP(fl)
in 1)'(fl) is continuous and injective.
4. Show that the function
I Ixl < 1
f(x) =
{
o Ixl ~ 1
belongs to WO'P(JR) but not to W1,p(JR).
5. Prove this theorem of W.H. Young. If f E LP(JR) and 9 E L1(JR), then f * 9 E LP(JR), and
(f,g) =.2.::
"~k n
1(D"f)(D"g)dx
13. Let 9 and h be locally integrable functions on the open set fl. If
l g(x)¢(x) dx = l h(x)D"¢(x) dx
Additional Topics
The Contraction Mapping Theorem was proved in Section 4.2, and was accompa-
nied by a number of applications that illustrate its power. In the literature, past
and present, there are many other fixed-point theorems, based upon a variety of
hypotheses. We shall sample some of these theorems here.
In reading this chapter, refer, if necessary, to Section 7.6 for topological
spaces, and to Section 7.7 for linear topological spaces.
Let us say that a topological space X has the fixed-point property if every
continuous map f : X -+ X has a fixed point (that is, a point p such that
f(p) = p). An important problem, then, is to identify all the topological spaces
that have the fixed-point property. A celebrated theorem of Brouwer (1910)
begins this program.
We shall not prove this theorem here, but refer the reader to proofs in [DS]
page 468, [Vic] page 28, [Dug] page 340, [Schj] page 74, [Lax], [KA] page 636,
[Sma] page 11, [Gr] page 149, and [Smi] page 406.
333
334 Chapter 7 Additional Topics
h( x) = inf { A : ~ E U , A > 0 }
Proof. ([Day], [Sma]) Let K be such a set, and let f be a continuous map of
K into K. We denote the family of all convex, symmetric, open neighborhoods
of 0 by {Ua : 0: E A}. The set A is simply an index set, which we partia1ly
order by writing 0: ~ {3 when Ua C U{3. Thus ordered, A becomes a directed set,
Section 7.1 Fixed-Point Theorems 335
x'(t) = f(t,x(t))
(2) {
x(O) =0
where x = (XI,X2, ... , xn) and f = (iI, 12,··· ,in).
Although the choice of initial values Xi(O) = 0 may seem to sacrifice gen-
erality, these initial values can always be obtained by making simple changes of
variable. Changing t to t - a shifts the initial point, and changing Xi to Xi - Ci
shifts the initial values.
The space en [a, b] consists of n-tuples of functions in eta, b]. If x =
(Xl, ... ,Xn ) E en [a, b], we write
336 Chapter 7 Additional Topics
Proof. Refer to Section 4.2, page 179, where an initial-value problem is shown
to be equivalent to an integral equation. In the present circumstances, the
integral equation arising from Equation (2) is
First, we shall prove that A maps D into D. Let xED and y = Ax. Since
Ilxlloo ~ T, the inequality IIx(s)lIl ~ T follows for all s in the interval [O,a].
Hence
t
i=l i=l 0 i=l 0
= l a
/h(s,x(s))/ ds = l a
Ilf(s,x(s))lll ds ~ a(~) = T
The set A(D) is an equicontinuous subset of the bounded set Din en [0, a]. By
the Ascoli Theorem (Section 7.4, page 349), the closure of A(D) is compact.
By Mazur's Theorem (Theorem 10, below), the closed convex hull H of A(D)
is compact. Since D is closed and convex, H c D. The preceding corollary is
x
therefore applicable, and A has a fixed point in H. Then ~ r, and Ilxlloo x
solves the initial-value problem. •
Proof. Let En be the ball and sn-l the sphere that is its boundary. Suppose
that f : En -+ sn-l, that f is continuous, and that f(x) = x for all x E sn-l.
Let 9 be the antipodal map on sn-l, given by g(x) = -x. Then go f has
no fixed point (in violation of the Brouwer Fixed-Point Theorem). To see this,
suppose g(f(z)) = z. Then f(z) = -z and 1 = Ilf(z)11 11- zll Ilzll·
= = Thus
z E sn-l. The point z contradicts our assumption that f(x) = x on sn-l. •
The next theorem is a companion to the corollary of Theorem 3. Notice that
the hypothesis of convexity has been transferred from the range to the domain
of f.
Proof. Let r denote the radial projection into B defined by r(x) = x if Ilx/1 ~ 1
r( I I I I
and x) = x I x if x > 1. This map is continuous (Problem 1). Hence 0 f r
maps B into a compact subset of B. By Theorem 6, r 0 f has a fixed point x in
B. If IIxll
= 1, then IIf(x)1I = 1 by hypothesis, and we have x = r(f(x)) = f(x)
by the definition of If r. IIxll
< 1, then IIr(f(x))1I < 1 and x = r(f(x)) = f(x),
again by the definition of r. •
IIxll ~ 1- £
1- £ ~ IIxll ~ 1
Notice that gc is continuous, since the two formulas agree when = 1 - £. IIxll
If x E 8B, then IIxll
= 1 and gc(x) = fo(x) E B. Thus f maps 8B into B. If
K is a compact set containing all the images ft(B), then gc(B) c K, by the
definition of gc. The map gc satisfies the hypotheses of Theorem 7, and gc has
a fixed point Xc in B.
I I
We now shall prove that for all sufficiently small £, Xc ~ 1 - £. If this
is not true, then we can let £ converge to zero through a suitable sequence of
values and have, for each £ in the sequence, IIx l
c > 1 - £. Since gc(x c ) = xc,
we see that Xc is in K. By compactness, we can assume that the sequence of £'s
has the further properties Xc ~ Xo and (1 c -lIx /D/£
~ t, where Xo E K and
t E [0, 1]. By the definition of gc'
The points x€ belong to K, and for any cluster point we will have x = JI(x) .•
Problems 7.1
1. Prove that the radial projection defined in the proof of Theorem 7 is continuous.
2. Prove Theorem 7 for an arbitrary closed convex set that contains 0 as an interior point.
Hint: Replace the norm by a Minkowski functional as in the proof of the lemma.
3. In Theorem 6 assume that D is closed. Show that the theorem is now an easy corollary
of Theorem 3, by using the closed convex hull of K and Mazur's Theorem.
4. Prove that the unit ball in £2(Z) does not have the fixed-point property by following
this outline. Points in £2(Z) are functions on Z such that L Ix(n)12 < 00. Let 8 be
the element in P(Z) such that 8(0) = 1, and 8(n) = 0 otherwise. Let A be the linear
operator defined by (Ax)(n) = x(n + 1). Define f(x) = (1 - Ilxll)8 + Ax. This function
maps the unit ball into itself continuously but has no fixed point. This example is due
to Kakutani.
5. In IR n , define B = {x : 0 < IIxll ~ I} and S = {x : Ilxll = I}. Is there a continuous
map f : B -+ S such that f(x) = x when xES? (Cf. Theorem 5.)
6. In an alternative exposition of fixed-point theory, Theorem 5 is established first, and
then the Brouwer theorem is proved from it. Fill in this outline of such a proof. Suppose
f : B n -+ B n is continuous and has no fixed point. Define a retraction 9 of B n onto
sn-l as follows. Let g(x) be the point where the ray from f(x) through x pierces sn-l.
7. In 1904, Bohl proved that the "cube" K = {x E IR n : IIxlioo ~ I} has this property: If f
maps K continuously into K and maps no point to 0, then for some x on the boundary
of K, f(x) is a negative multiple of x. Using Bohl's Theorem, prove that the boundary of
K is not a retract of K (and thus substantiate the claim that Bohl deserves much credit
for the Brouwer Theorem).
Let X and Y be two topological spaces. The notation 2Y denotes the family
of all subsets of Y. Let <I>: X -t 2Y . Thus, for each x E X, <I>(x) is a subset of
Y. Such a map is said to be set-valued. A selection for <I> is a map f : X -t Y
such that f(x) E <I>(x) for each x E X. Thus f "selects" an element of <I>(x),
namely f(x). If <I> (x ) is a nonempty subset of Y for each x EX, then a selection
f must exist. This is one way of expressing the axiom of choice. In the setting
adopted above, one can ask whether <I> has a continuous selection. The Michael
Selection Theorem addresses this question.
Here is a concrete situation in which a good selection theorem can be used.
Let X be a Banach space, and Y a finite-dimensional subspace in X. For each
x EX, we define the distance from x to Y by the formula
is nonempty. That is, each x in X has at least one nearest point (or "best
approximation") in Y. In general, the nearest point will not be unique. See the
sketch in Figure 7.1 for the reason.
-1
-2 -1
Figure 7.1
In the sketch, the box with center at 0 is the unit ball. The line of slope 1
represents a subspace Y. The small box is centered at a point x outside Y. That
box is the ball of least radius centered at x that intersects Y. The intersection is
elI(x). The set elI(x) is the set of all best approximations to x in Y. In this case
elI(x) is convex, since it is the intersection of a subspace with a ball. It is also
closed, by the definition of cP and the continuity of the norm. Now we ask, is
there a continuous map f : X -+ Y such that for each x, f(x) is a nearest point
to x in Y? One way to answer such questions is to invoke Michael's theorem, to
which we now turn.
First some definitions are required. An open covering of a topological
space X is a family of open sets whose union is X. One covering B is a re-
finement of another A if each member of B is contained in some member of
A. A covering B is said to be locally finite if each point of X has a neigh-
borhood that intersects only finitely many members of B. A Hausdorff space X
is paracompact if each open covering of X has a refinement that is an open
and locally finite covering of X ([KelJ, page 156). Clearly, a compact Hausdorff
space is paracompact.
It is a nontrivial and useful fact that all metric spaces are paracompact
([KelJ, page 156). In many applications this obviates the proving of paracom-
pactness by means of special arguments.
Given the set-valued mapping cP : X -+ 2Y and a subset U in Y, we put
For the proof of this theorem, we refer the reader to [Michl] and [Mich2]. As an
application of Michael's theorem, we give a result about approximating possibly
discontinuous maps by continuous ones.
Proof. Let A denote the number on the right in Inequality (1). For each x EX,
define
cp(x) = {h EH: Ilf(x) - hll.~ A}
This set is nonempty because g(x) E cp(x). (Notice that 9 is a selection for cP
but not necessarily a continuous selection.) The set cp(x) is closed and convex
in the Banach space H.
We shall prove that <I> is lower semicontinuous. Let U be open in H. It is to
be shown that cP- (U) is open in X. Let x E cP- (U). Then cp( x) n U is nonempty.
Select h in this set. Then hE U and Ilf(x) - hll
~ A. Also Ilf(x) - g(x)11 < A.
So, by considering the line segment from h to g(x), we conclude that there is
E
an h' U such that Ilf(x) - h'li
< A. Since f is continuous at x, there is a
neighborhood N of x such that
Another important theorem that follows readily from Michael's is the the-
orem of Bartle and Graves:
342 Chapter 7 Additional Topics
<I>(y) = {x EX: Ax = y}
Obviously, each set <I>(y) is closed, convex, and nonempty. Is <I> lower semicon-
tinuous? Let 0 be open in X. We must show that the set <I>- (0) is open in
Y. But <I>-(O) = A(O) by a short calculation. By the Interior Mapping Theo-
rem (Section 1.8, page 48), A(O) is open. Thus <I> is lower semicontinuous, and
by Michael's theorem, a continuous selection f exists. Thus f(y) E <I>(y) , or
A(f(y)) = y. •
In the literature there are many selection theorems that involve measur-
able functions instead of continuous ones. If X is a measurable space and Y a
topological space, a map <I> : X --+ 2Y is said to be weakly measurable if the
set
{x EX: <I>(x) n 0 is not empty}
is measurable in X for each open set 0 in Y. (For a discussion of measurable
spaces, see Section 8.1, pages 381ff.) The measurable selection theorem of Ku-
ratowski and Ryll-Nardzewski follows. Its proof can be found in [KRNJ, [Part],
and [Wag].
The next three theorems are called "separation theorems." They pertain
to disjoint pairs of convex sets, and to the positioning of a hyperplane so that
the convex sets are on opposite sides of the hyperplane. In]R2, the hyperplanes
are lines, and simple figures show the necessity of convexity in carrying out
this separation. In Theorem 3, one can see the necessity of compactness by
considering one set to be the lower half plane and the other to be the set of
points (x, y) for which y ~ X-I and x > O.
Section 7.3 Separation Theorems 343
Proof. The set Kl - K2 is closed and convex. (See Problems 1.2.19 on page
12 and 1.4.17 on page 23.) Also, 0 ~ Kl - K 2 , and consequently there is a
ball B(O, r) that is disjoint from Kl - K 2 . By the preceding theorem, there is a
nonzero continuous functional ¢ such that
Proof. For the sufficiency of the condition, assume the condition to be true.
Thus, 0 ~ co(U). By Theorem 3, there is a vector x and a real number A such
that co(U) and 0 are on opposite sides of the hyperplane
{y: (y,x) = A}
We can suppose that (y, x) > A for y E co(U) and that (0, x) < A. Obviously,
A> 0 and x solves the system (1).
Now assume that system (1) is consistent and that x is a solution of it. By
continuity and compactness, there exists a positive € such that (u, x) ~ € for all
u E U. For any v E co(U) we can write a convex combination v = 2: Biui and
then compute
Sn = {x E]Rn : x ~ 0 and t
i=1
Xi = 1}
then this common number is the amount that one player can be sure of winning
and is the limit on what the other can lose.
In the more interesting case (which will include the case just discussed) the
players will make random choices of the two integers (following carefully assigned
probability distributions) and play the game over and over. Player 1 will assign
346 Chapter 7 Additional Topics
Proof. It is easy to prove an inequality :,;; between the terms in the above
equation. To do so, let u E Sn and v E Sm. Then
Since u and v were arbitrary in the sets Sn and Sm, respectively, we can choose
them so that we get
Now suppose that a strict inequality holds in Inequality (2). Select a real
number T such that
Consider the matrix A' whose generic element is aij - T. By Theorem 5 (applied
° °
actually to -A'), either A'u:';; for some u E Sn or v T A' ~ for some v E Sm.
If the first of these alternatives is true, then for all X E Sm, we have x T A'u :,;;
0. In quick succession, one concludes that maxx x T A'u :,;; 0, miny maxx x T A'y :,;;
0, and miny max x x T Ay :,;; T. In the last inequality we simply compute the
bilinear form x T A'y, remembering that x E Sm and y E Sn. The concluding
inequality here is a direct contradiction of Inequality (3).
Similarly, if there exists v E Sm for which v T A' ~ 0, then we have for all y E
Sn, v T A'y ~ 0, miny v T A'y ~ 0, max x miny x T A'y ~ 0, and max x miny x T Ay ~
T, contradicting Inequality (3) again. •
The common value of these three quantities is called the value of the game.
Convenient references for these matters and for the theory of games in general
are [McK] and [Mor].
Problems 7.3
1. Let J :X x Y --+ JR, where X and Yare arbitrary sets. Prove that
supinfJ(x,y} ~ infsupJ(x,y}
x y Y x
(In order for this to be universally valid, one must admit +00 as a permissible value for
the supremum and -00 for the infimum.)
2. Prove for any u E JRn that maxxESn (u, x) = maxl";;i";;n Ui.
3. Let Pn denote the set of x in JRn that satisfy x ;::: O. Prove that for u E JRn and >. E JR
these properties are equivalent:
a. Pn C {x: {u,x} > >.}
b. u E Pn and >. < O.
4. Saddle points. If there is a pair of integers (T, s) such that ai. ~ ar• ~ arj for all i
and j, then ar. is called a "saddle point" for the rectangular game. Prove that if such a
point exists, each player has an optimal strategy of the form (0, ... ,0,1,0, ... , OJ.
5. (A variation on Theorem 4) Let X be any linear space, <I> a set of linear functionals on
X. Prove that the system of linear inequalities
7. (Separation theorem in Hilbert space) Let K be a closed convex set in Hilbert space, and
u a point outside K. Then there is a unique point v in K such that for all x in K,
8. Let </>1, </>2, ... , </>n be continuous linear functionals on a normed space. Let aI, a2, .. ·, an
be scalars, and define affine functionals tPi(X} = q,i(X} + ai. Define F(x} = maxi tPi(X}.
Prove that F is bounded below if and only if the inequality maxi q,i(X} ;::: 0 is true for
all x of norm 1.
this for L1-spaces, and the Frechet-Kolmogorov Theorem does it for the LP-
spaces. The most extensive source for results on this topic is [DS] , Chapter 4.
See also [Yo] for the Frechet-Kolmogorov Theorem.
We begin with spaces of continuous functions. Let (X, d) and (Y, p) be
compact metric spaces. For example, X and Y could be compact intervals on
the real line. We denote by C(X, Y) the space of all continuous maps from X
into Y. It is known that continuity and uniform continuity are the same for
maps of X into Y. Thus, a map I of X into Y belongs to C(X, Y) if and only
if there corresponds to each positive e a positive 8 such that p(J( u), I( v)) < e
whenever d( u, v) < 8.
The space C(X, Y) is made into a metric space by defining its distance
function ~ by the equation
where B(f, e) is the ball {g : ~(f, g) < e}. A finite set of continuous functions is
obviously equicontinuous, and therefore there exists a 8 for which this implication
is valid:
The index j is chosen so that ~(g,/j) < e. The above inequality establishes the
equicontinuity of K.
Section 7.4 The Arzela-Ascoli Theorems 349
Proof. Given c > 0, put Sk = {x : 1!k(x)1 ~ c}. Then each Sk is closed, and
Sk+l C Sk. For each x there is an index k such that x 1- Sk. Hence n~=l Sk is
empty. By compactness and the finite intersection property, we conclude that
n~=l Sk is empty for some n. This means that Sn is empty, and that Ifn(x)1 < c
for all x. Thus 1!k(x)1 < c for all k ~ n. This is uniform convergence. •
We conclude this section by quoting some further compactness theorems.
In the spaces LP(IR), the following characterization of compact sets holds. Here,
1 ~ p < 00. A precursor of this theorem was given by Riesz, and a generalization
to locally compact groups with their Haar measure has been proved by Wei!. See
[Edw] page 269, [DS] page 297, and [Yo] page 275.
lim f
h..... O}a
If(x + h) - f(x)I P dx = 0
lim f If(x)IP dx = 0
M..... oo}lxl>M
Problems 7.4
5. Let K be a set of continuously differentiable functions on [a, b]. Put K' = {I' : f E K}.
Prove that if K' is bounded, then K is equicontinuous.
6. Let K be an equicontinuous set in C[a, b]. Prove that if there exists a point Xo in [a, b]
such that {J(xo) : f E K} is bounded, then K is bounded.
7. Define an operator L on the space C[a, b] by
where k is continuous on [a,b] x [a,b]. Prove that L is a compact operator; that is, it
maps the unit ball into a compact set.
8. Prove or disprove: Let [fn] be a sequence of continuous functions on a compact space.
Let f be a function such that If(x) - fn(x)l.J. 0 for all x. Then f is continuous and the
convergence fn -t f is uniform.
9. Select an element a E £2, and define
Prove that K is compact. The special case when ai = Iii gives the so-called Hilbert
cube.
10. Prove that not every compact set in £2 is of the form described in the preceding problem.
11. Reconcile the compactness theorems for L2 and £2, in the light of the isometry between
these spaces.
Furthermore,
These two equations prove that Yn E Xn "Xn- l and that those inclusions men-
tioned above are proper.
By the Riesz Lemma, (Section 1.4, page 22), there exist points Xn such
that Xn E X n, IIXnl1 == 1, and dist(xn, Xn-d ~ 1/2. If m > n, then we
have Bmx m = 0 because Xm E X rrt == ker(Bm). Also, Brrt-I xn = 0 because
Xn E Xn C X m- l . Finally, Bmx n = 0 because Xn E Xn C X m. These
observations show that
Yn·
~
-+y, we infer that Xn t· = Yn·t -Axn1.· -+
y-u. Then Y = lim Bx n1.· = B(y-u),
and y is in the range of B. This completes the proof in this case.
If [xnl contains no bounded subsequence, then Ilxnll-+
00. Since y =I 0, we
can discard a finite number of terms from the sequence [xnl. and assume that
Xn 1:. K for all n. Using Riesz's Lemma, construct vectors Vn = kn + OnXn so
that Ilvnll
= 1, k n E K, and dist(vn,K) ~ 1/2. Note that
(1)
and
Since each Ak is compact (for k ~ 1), Bn is the identity plus a compact operator.
Thus Xn is closed by Lemma 2.
If x E Xn for some n, then for an appropriate u we have
Thus
(2)
If all the inclusions in the list (2) are proper, we can use Riesz's Lemma to
select Xn E Xn such that IIXn I n,
= 1 and dist(x X n+1 ) ~ 1/2. Then, for n < m,
we have
Proof. Suppose that B+A is injective. Then so are B- 1 (B+A) and I +B- 1 A.
Now, the product of a compact operator with a bounded operator is compact.
(See Problem 7.) Thus, Theorem 1 is applicable, and I + B-1 A is surjective.
Hence so are B(I + B- 1 A) and B + A. The proof of the reverse implication is
similar. •
Section 7.5 Compact Operators and the Fredholm Theory 355
because 't/J 0 A E X'. Thus AX n ->. o. IfAx n does not converge strongly to 0,
there will exist a subsequence such that IIAXni II ~ E > O. By the compactness
of A, and by taking a further subsequence, we may assume that AXni -+ Y for
some y. Obviously, IIYII ~ E. Now we have the contradiction AXni ->. y and
AXni ->. o. •
Lemma 4. Let [Anl be a bounded sequence of continuous linear
transformations from one normed linear space to another. If Anx -+ 0
for each x in a compact set K, then this convergence is uniform on K.
Proof. Suppose that the convergence in question is not uniform. Then there
exist a positive E, a sequence of integers ni, and points x ni E K such that
IIAniX ni II ~ E. Since K is compact, we can assume at the same time that
x ni converges to a point x in K. Then we have a contradiction of pointwise
convergence from this inequality:
y=
k=l
(See Problems 24-26 in Section 1.6, pages 38-39.) The functionals Ak are con-
tinuous, linear, and satisfy sUPk IIAkl1 < 00. By taking the partial sum of the
356 Chapter 7 Additional Topics
first n terms, we define a projection Pn of Y onto the linear span of the first n
vectors Vk. Now let A be a compact linear transformation from X to Y, and let
S denote the unit ball in X. The closure of A(S) is compact in Y, and Pn - I
converges pointwise to 0 in Y. By the preceding lemma, this convergence is
uniform on A(S). This implies that (PnA - A)(x) converges uniformly to 0 on
S. Since each PnA has finite-dimensional range, this completes the proof. I
n
LCjVj - AX = b
j=l
n
L cjAvj - AAx = Ab
j=l
n n n n
L Cj L aijVi - A L CiVi = L (3i vi
j=l i=l i=l i=l
Section 7.5 Compact Operators and the Fredholm Theory 357
l[n n n n]
=":\ ~ ~ aijCjVi - ~ (3iVi - A~ CiVi +b= b
This analysis has established that the original problem is equivalent to a
matrix problem of order n, where n is the dimension of the range of the operator
A. The actual numerical calculations to obtain x involve solving the Equation
(3) for the unknown coefficients Cj. We have not yet made any assumptions to
guarantee the solvability of Equation (3).
Now one can prove the Fredholm Alternative for this case by elementary
linear algebra. Indeed we have these equivalences (in which A =I- 0):
(i) A - AI is surjective.
(ii) For each b there is an x such that Ax - AX = b.
(iii) For all «(31, ... ,(3n) the system 'L.7=laijCj - ACi = (3i (i = 1, ... ,n) is
soluble.
(iv) The system 'L.7=1 aijCj - ACi = 0 has only the trivial solution.
(v) The equation Ax - AX = 0 has only the trivial solution.
(vi) A is not an eigenvalue of the operator A.
(vii) A - AI is injective.
Integral Equations. The theory of linear operators is well illustrated in the
study of integral equations. These have arisen in earlier parts of this book, such
as in Sections 2.3, 2.5, 4.1, 4.2, and 4.3. A special type of integral equation
has what is known as a degenerate kernel. A kernel is called degenerate or
separable if it is of the form k(s, t) = 'L.~ Ui(S )Vi(t). The corresponding integral
operator is
= i>i(t)
i=1
J Ui(s)x(s)ds
If we use the inner-product notation for the integrals in the above equation,
we have the simpler form
n
Kx = L(X,Ui)Vi
i=1
It is clear that K has a finite-dimensional range and is therefore a compact
operator. (Various spaces are suitable for the discussion.)
358 Chapter 7 Additional Topics
= t;
n-l[
(x, Ui) + ai(x, Un)
]
Vi = t;
n-l
(X, Ui + aiUn)Vi
n
L aijCj - >'Ci = f3i (l:(i:(n)
j=1
By Problem 2 in Section 4.3 (page 189), (An - )..1)-1 exists (when n :;:;: m). Now
write
Problems 7.5
7. Prove that the set of compact operators in £(X, X) is an ideal. This means that if A
is compact and B is any bounded linear operator, then AB and BA are compact. (This
property is in addition to the subspace axioms.)
8. Let A be a compact operator on a Banach space. Prove that if I - A is injective, then it
is invertible.
9. Let A,B E £(X,X). Assume that AB = BA and that AB is invertible. Prove
that (a) A(AB)-l = (AB)-IA; (b) B-1 = A(AB)-I; (c) A-I = B(AB)-I; (d)
A(AB)-1 = (AB)-IA; (e) (BA)-l = A- I B-l.
10. Let A be a linear operator from X to Y. Suppose that we are in possession of elements
U 1, U2, ... , Un whose images under A span the range of A. Describe how to solve the
equation Ax = b.
11. Prove that if A is a compact operator on a normed linear space, then for some natural
number n, the ranges of (I + A)n, (I + A)n+l, . .. are all identical.
12. Let A be a linear transformation defined on and taking values in a linear space. Prove
that if A is surjective but not injective, then ker(An) is a proper subset of ker(A n+l),
for n = 1,2,3, ...
13. Let A be a bounded linear operator defined on and taking values in a normed linear
space. Suppose that for n = 1,2,3, ... , the range of An properly contains the range of
An+l. Prove that the sum A + K is never invertible when K is a compact operator.
Section 7.6 Topological Spaces 361
14. Prove that if An are continuous linear transformations acting between two Banach spaces,
and if Anx --+ 0 for all x, then this convergence is uniform on all compact sets. (Cf.
Lemma 4.)
15. Give examples of operators on the space Co that have one but not both of the properties
injectivity and surjectivity.
16. (More general form of Lemma 5) Let A be an operator defined by the equation Ax =
2:~=1 <Pi(X)Vi, in which x E X, Vi E Y, and <Pi E X'. Prove that there is no loss of
generality in supposing that {VI, ... , vn} and {<PI, ... , <Pn} are linearly independent sets.
17. Show how to solve the equation Ax-x = b if the range of A is spanned by {AUl,.'" AUn},
for some Ui EX. Prove that the equation is solvable if 1 is not an eigenvalue of A.
18. Let A and B be members of £(X, Y), where X and Yare Banach spaces. Suppose that
B is invertible and that (B-1 A)m is compact for some natural number m. Prove that
B + A is surjective if and only if it is injective.
19. Provide the details for the assertions in Example 2.
20. Use the Stone-Weierstrass Theorem to prove this result of Diaconis and Shahshahani
[DiS]: If X is a normed linear space and f is a continuous function from X to JR, then
for any compact set K in X and for any positive c there exist <Pi E X' and coefficients
Ci such that
sup If(X) -
xEK
t
i=l
I
ci e4>d x ) < c
21. Prove this for an arbitrary compact operator A: The transformation I +A is surjective
if and only if -1 is not an eigenvalue of A.
22. Prove the finite-dimensional version of the Fredholm alternative, which we formulate as
follows, for an arbitrary matrix A and vector b: The system Ax = b is consistent if and
=
only if the system yT A 0 yTb of 0 is inconsistent.
23. Discuss the existence and uniqueness of solutions to the integral equation
that lies wholly in V. The pair (X, d) is a metric space, and is a topological
space, it being understood that its topology is the one just described. All normed
linear spaces are metric spaces, because the equation d(x, y) = Ilx - YII defines
a metric. •
A topological space is said to be a Hausdorff space if for any pair of
distinct points x and y there is a disjoint pair of open sets U and V such that
x E U and y E V. Every metric space is a Hausdorff space, since B(x,c) and
B(y,c) will be disjoint from each other if c is sufficiently small. The Hausdorff
property is one of a number of separation axioms that topological spaces may
satisfy. It is useful in questions of convergence, for it ensures that a sequence
(or net) can converge to at most one limit.
A base for a topology T is any subfamily B of T such that every open set
is a union of sets in B. For example, the open intervals with rational endpoints
form a base for the usual topology on JR. In a discrete space, the singletons {x}
form a base.
Section 7.6 Topological Spaces 363
A partially ordered set is a pair (D, -<) in which D is a set and -< is a
relation obeying these axioms:
a. Q -< Q
h. If Q -< j3 and j3 -< " then Q -< ,.
A directed set is a partially ordered set in which an additional axiom is re-
quired:
c. Given Q and j3 in D, there is, E D such that Q -< , and j3 -< ,.
The reader will recognize N as a familiar example of a directed set, it being
understood that -< is the ordinary relation~. Another important example is the
set of all neighborhoods of a point in a topological space, where -< is interpreted
as J.
A net or generalized sequence is a function on a directed set. This is
obviously more general than a sequence, which is a function on No We can use
the notation [x, D, -<J for a net, specifying the function x, the directed set D, and
the relation -<. When we need not concern ourselves with niceties, the notation
[xaJ can be used, just as we abuse notation for sequences and write [xnJ.
Useful conventions are as follows. A net [xaJ is eventually in a set V if
there is a j3 such that Xa E V whenever j3 -< Q. If a net is eventually in every
neighborhood of a point y, then we say that the net converges to y. Let us
illustrate with one example of a theorem employing nets.
Proof. Let X be the space, T the topology, and S the subbase in question.
Assume that every cover of X by elements of S has a finite sub cover. Suppose
that X is not compact. We seek a contradiction.
The family of open covers that do not have finite subcov~rs is a nonempty
family. Partially order this family by inclusion, and invoke Zorn's Lemma (Sec-
tion 1.6, page 32). This maneuver produces an open cover A of X that is
maximal with respect to the property of possessing no finite subcover. Define
A' = S n A. Certainly, no finite subfamily of A' covers X. Since all sets in A'
are members of the subbase, our hypotheses imply that A' itself does not cover
X.
This last assertion implies that there exists a point x that is contained in
no member of A'. Since A is an open cover of X, we can select U E A such that
x E U. By the properties of a subbase, there exist sets Sl,"" Sn in S such that
x E n7=1 Si cU. Since x is contained in no member of A', one concludes that
Si tI- A'. Hence Si tI- A. By the maximal property of A, each enlarged family
Au {Si} contains a finite subcover of X, for i = 1,2, ... ,n. Hence, for each i in
{I, 2, ... ,n}, there is an open set Oi that is a union of finitely many sets in A
and has the property Oi U Si = X. Define B = 0 1 U· .. U On. Then B U Si = X
for each i, and n7=1 (B U Si) = X. It follows that
Since B is the union of finitely many sets in A, and since U E A, we see that
a finite subfamily of A covers X, contradicting a property of A established
previously. . •
If we have two topological spaces, say (Xl, 71) and (X2' 1;), then we can
topologize the Cartesian product Xl x X 2 in a standard way: We take as a base
for the topology of Xl x X 2 the family of all sets A x B, where A is open in Xl
and B is open in X 2 . (The topology itself consists of all unions of sets in the
base.)
The notion of a product extends to any family (finite, countable, or un-
countable) of topological spaces (X i ,7i), where i E I. The index set I can be
of arbitrary cardinality. The product space is denoted by IIX i or II{Xi : i E I}
and is defined to be the set of all functions x on I such that x( i) E Xi for all
i E I. In this context, we usually write Xi = x( i). This is exactly the process by
which we construct JRn from R We take n factors, all equal to JR. The generic
element of the product space is a "vector" that we write as x = [Xl, X2, ... , x n ].
Thus, x is a function on the index set {I, 2, ... , n}.
For each i E I there is a projection Pi from the product space X = IIXi to
Xi' It is defined by Pi(x) = Xi' The topology on X is taken to be the weakest
one that makes each of these projections continuous. One then must require
that each set Pi- l [OJ be open when a is an open set in Xi. The family of all
these sets is taken as a subbase for the product topology on X.
366 Chapter 7 Additional Topics
Wc S and X = U{ 0 : 0 E W}
For each i let Vi be the family of all open sets in Xi whose inverse images by Pi
are in W:
Assertion: For some i, Vi covers Xi. To prove this, assume that it is false. Then
for each i, Vi fails to cover Xi, and consequently, there exists a point Xi E Xi
such that Xi ~ U{O : 0 E V;}. By the Axiom of Choice we can select these
points Xi simultaneously and thereby construct an X in X such that
Consequently, we have for each i and for each open set 0 in Xi the following
implications:
However, W consists exclusively of sets having the form pi-l(O), and so the
above implication reads as follows:
This contradicts the fact that W is a cover of X, and proves the assertion.
Now select an index j E I such that Vj covers X j . By the compactness of X j '
a finite subfamily ofVj covers Xj, say 0 1 , ... , On E Vj and Xj = OlU·· ·UOn . It
follows (by using Pj-l) that X = Pj-l(Ol)U.· ·UPj-l(On). Since these n sets are
in W (by the definition of Vj), we have found a finite subcover in W, as desired .
Although normed linear spaces have served us well in this book, there are some
matters of importance in applied mathematics that require a more general topol-
ogized linear space. (The theory of distributions is a pertinent example.) What
is needed is a linear space in which the topological notions of continuity, com-
pactness, completeness, etc., do not necessarily arise from a norm and its induced
metric. The appropriate definition follows.
Definition. A linear topological space is a pair (X, T) in which X is a linear
space and T is a topology on X such that the algebraic operations in X are
continuous.
Being more specific about the continuity, we say that the two maps
are continuous, the first being defined on X x X and the second being defined
on lR x X. There is a corresponding definition if the scalar field is taken to be
C rather than R
We remind the reader that the sets belonging to the family T are called
the open sets, and a neighborhood of a point x is any set U such that for some
open set CJ we have x E CJ cU. The continuity axioms above can be stated in
terms of neighborhoods like this:
a. If U is a neighborhood of x + y, then there exist neighborhoods V of x
and W of y such that v + w is in U whenever v E V and wE W.
b. If U is a neighborhood of AX, then there are neighborhoods V of A and
W of x such that aw E U whenever a E V and w E W.
A very useful fact is that the topology is completely determined by the
neighborhoods of o. This is formally stated in the next lemma.
Proof. Hold z fixed, and define f(x) = x + z. This mapping sends 0 to z. Let
V be a neighborhood of z. Since f is continuous, f-l(V) is a neighborhood of
o. Observe, now, that rl(V) = {x : f(x) E V} = {x : x + z E V} = -x + V.
Conversely, assume that -z+ V is a neighborhood of O. We have f-l(X) = x-z,
and f- 1 is also continuous. It maps z to O. Hence (f-l)-1 carries -z + V to a
neighborhood of z. But
Proof. The Hausdorff property is that for any pair of points x i- y there must
exist neighborhoods U and V of x and y respectively such that the pair U, V
is disjoint. Select a neighborhood W of 0 such that x - y rt. W. Then (using
the continuity of subtraction) select another neighborhood W' of zero such that
W' - W' c W. Then x + W' is disjoint from y + W', for if z is a point in
their intersection, we could write z = x + WI = Y + W2, with Wi E W'. Then
x - y = W2 - WI E W' - W' c W. The other half of the proof is even easier:
just separate any nonzero point from 0 by selecting a neighborhood of zero that
excludes the nonzero point. •
At this juncture, we should alert the reader to the fact that some authors
assume the Hausdorff property as part of the definition of a linear topological
space.
Let X be a linear space (preferably without a topology, so that confusion
between two topologies can be avoided in what we are about to discuss). The
notation X' signifies the algebraic dual of X, i.e., the space of all linear maps
from X into the scalar field (IR or C). We can use X to define a "weak" topology
on X', and we can use X to define a weak topology on X'. There is an abstract
description that includes both of these constructions, but let us proceed in a
more pedestrian manner. What we have in mind is rather simple: We want
the topologies to lead to pointwise convergence in both cases. Although we
did not discuss it here, a topology can be defined by specifying the meaning of
convergence of nets. The topic is addressed in [Kel], pages 73-76.
The topology on X induced by X' can be called the weak topology. A base
for the neighborhood system at 0 is given by all sets of the form
In this equation c is any positive number, and {¢l, ... , ¢n} is any finite su bset of
X'. Convergence in this topology means the following: A net [x,,] in X converges
to a point x if [¢(x,,)] converges to ¢(x) for every ¢ EX'.
The topology on X' induced by X is often called the weak* topology. A
base for the neighborhood system of 0 is given by
Here, c is any positive number and {Xl, ... , x n } is any finite set in X. With
this topology, a net [¢,,] in X' converges to ¢ if and only if [¢,,(x)] converges to
¢(x) for all x E X. This is pointwise convergence.
The topologies just described are both Hausdorff topologies, as is easily
deduced from Theorem 1. Also, one sees immediately that the space X' is a
subspace of IR x , since the latter is the space of all functions from X to 1R, while
the former contains only linear functions. This observation leads one to surmise
that the Tychonoff Theorem can help in understanding compactness in X'. The
result that carries this out requires one further notion: In a linear topological
space, a set A is bounded if for any neighborhood of zero, say U, we have
A C AU for some real A.
Section 7.7 Linear Topological Spaces 369
Proof. Let K be a compact set in X'. The set K is closed because in any
Hausdorff space compact sets are closed. (See [Kel], page 141.) To prove that
K is bounded, let U be any neighborhood of 0 in X'. It is to be shown that
K c )..U for some )... First, select a "basic" neighborhood V = V(c; Xl, . .. ,xm )
contained in U. Then
K C U{¢+ V: ¢ E K}
(An easy calculation justifies the equation in this string of inclusions.) Conse-
quently, K C ().. + l)U and K is bounded.
For the converse, let K be a closed and bounded set in X'. For each X in
X, define
Ux = {¢ E X': 1¢(x)1 ~ I}
Naturally, we are more interested in the spaces that are already linear topo-
logical spaces. In this case, there will be two topologies on X and three on X'.
(Notice that X' will have a weak* topology coming from X and a weak topology
coming from X".) The originally given topologies can be called the "strong"
topologies, in contrast to the "weak" ones discussed above. (Rudin argues for
the term "original topology" instead of "strong topology," because elsewhere in
functional analysis, "strong topology" means something else.)
370 Chapter 7 Additional Topics
Proof. The linear space X* (whose elements are continuous linear functionals)
is a subspace of X' (whose elements are linear functionals). The weak* topology
in X* is the relative topology in X* derived from the weak* topology on X'. By
the preceding theorem, we need only prove that UO is closed and bounded in the
weak* sense in X'. If we have a net [¢oJ in UO and ¢o -' ¢, then ¢o(x) ---t ¢(x)
for all x E U. Consequently, 1¢(x)1 ~ 1 for all x E U, and ¢ E uo. Thus UO is
closed in the weak* topology of X'. If W is any neighborhood of 0 in X', then
W contains a set of the form
Proof. In the preceding theorem, take U to be the unit ball of X. The polar
of U will then be the unit ball in X*. •
If the neighborhoods of 0 in a linear topological space have a base consisting
of convex sets, the space is said to be locally convex. It is these spaces that
we shall emphasize in the following discussion. Among such spaces we find the
normed spaces and the pseudo-normed spaces. A pseudo-norm or seminorm
is a real-valued function p defined on a linear space X such that:
1. p(AX) = IAlp(x) for all A E lR,x E X
2. p(x + y) ~ p(x) + p(y) for all x, y E X
It follows that p(O) = 0 and that p(x) ~ 0 for all x E X. If p is a semi norm on
a linear space X, then in a standard way X receives a locally convex topology.
Namely, a base for the neighborhoods of 0 is taken to be the family of sets
Proof. Let P be the family of all continuous semi norms defined on the given
space. Let U be a neighborhood of 0 in the original topology. First we must
prove that U contains one of the sets V(€;PI, ... ,Pn)' Since the space is locally
convex, U contains a convex neighborhood U I of O. By the continuity of scalar
multiplication, there exists a convex neighborhood U2 of 0 and a number 8 > 0
such that ex E U I whenever x E U2 and Icl < 8. The set U3 = U{,\U2 : 1,\1 < 1}
is a convex neighborhood of 0 contained in U. Its Minkowski functional P is
continuous because for any r > 0 and any x E rU3 , we have p(x) ~ r. Thus,
V(~;p) C W3 C U. (Minkowski functionals were defined in the the proof of
Theorem 1 in Section 7.3, page 343.)
Now let V be any "basic" neighborhood of 0 in the new topology. Say,
V = V(€;PI,." ,Pn)' Since each Pi is continuous in the original topology, V is
open in the original topology. It therefore contains a convex neighborhood of 0
from the original topology. •
One of the main justifications for emphasizing locally convex linear topological
spaces is that such spaces have useful conjugate spaces. For any linear topolog-
ical space X, one can define X' to be the linear space of all continuous linear
functionals on X. Without further assumptions, X' may have only one ele-
ment, namely O! A good example of this phenomenon is the space £P in which
o < P < 1. The topology is given by a norm-like functional that is actually not
a norm (since it fails the triangle inequality):
Ilxll = L
(
00
Ixnl P
)l/P
n=l
Example. Consider the space 1) of test functions on IRn. This space was
defined in Chapter 5, page 247. Its elements are Coo functions having compact
support. The convergence to zero of a sequence [<t>k] in 1) was defined to mean
372 Chapter 7 Additional Topics
that there was one compact set containing the supports of all ¢k, and on that
compact set, DOt¢k converged uniformly to zero, for every 0:. This notion of
convergence can be defined with a sequence of seminorms. For j = 1,2,3 ... ,
define
Pj(¢) = sup{I(DOt¢)(x)1 : x E IR n , IIxll ~ j, 10:1 ~ j}
Thus topologized, the space of test functions becomes a locally convex linear
topological space. Its conjugate space is the space of distributions. •
In a linear topological space, a set A is totally bounded if, for any neigh-
borhood U of 0, A can be covered by a finite number of translates of U:
m
Ac U(Xi+U)
i=l
Proof. Let K be such a set in such a space. By the preceding lemma, co( K) is
totally bounded. Hence co(K) is closed and totally bounded. Since the ambient
space is complete, co(K) is complete and totally bounded. Hence, by Theorem 6,
it is compact. •
Section 7.8 Analytic Pitfalls 373
The purpose of this section is to frighten (or amuse) the reader by exhibiting
some examples where erroneous conclusions are reached through an analysis
that seems at first glance to be sound. In every case, however, some theorem
pertinent to the situation has been overlooked. The relevant theorems are all
quoted somewhere in this section or elsewhere in the book. Proofs or references
are given for each of them. A connecting thread for many of these examples is
the question of whether interchanging the order of two limit processes is justified.
We begin with some matters from the subject of Calculus.
Here is an elementary example to show what can go wrong:
1
'L 2n cos(3nx)
00
(1) f(x) =
n=l
consists of analytic terms, and the function f should be a "nice" one. We think
that the function defined by the series should inherit the good properties of the
terms in the series. Indeed, in this example, f is continuous, by the Weierstrass
M-Test. This test, or theorem, goes as follows.
00
The hypotheses in display (2) constitute the "M- Test." In modern notation, we
could write instead E~=l Ilgnlloo < 00. In the example of Equation (1), one can
set gn(x) = 2- n cos(3nx) and see immediately that the constants Mn = 2- n
serve in Weierstrass's Theorem.
374 Chapter 7 Additional Topics
The Weierstrass M-Test gives us some hypotheses under which we can in-
m m
lim lim ' " gn(x
h-+O m-+oo L...,
+ h) = lim lim
m-+oo h-+O
L gn(x + h)
n=l n=l
00
J'(x) = - L3 n 2- n sin(3 n x)
n=l
But here there is an alarming difference, as the factors 3n 2- n are growing, not
in [Ti2] or [Ch]. A sketch showing a partial sum of the series is in Figure 7.2.
When we take more terms and blow up the picture, we see more or less the same
behavior, which reminds us of fractals. See Figure 7.3, where a magnification
factor of about 15 has been used.
Section 7.8 Analytic Pitfalls 375
Since differentiation involves a limiting process, the theorem just quoted is again
providing hypotheses to justify the interchange of two limits.
What can be said, in general, to legitimate interchanging limits? A famous
theorem of Eliakim Hastings Moore gives one possible answer to this question.
Proof. Define g(m) = lim f(n, m) and h(n) = lim f(n, m). Let
n m
€ > O. Find a
positive integer M such that
Letting n tend to 00, we arrive at x = I:~1 aiui, where ai = limn ani. This
limit is justified by the Hilbert space structure. Indeed, by the properties of an
orthonormal sequence, we must have ani = (xn, Ui), and therefore limn ani =
limn(xn,ui) = (X,Ui).
After examining the proof and discovering the flaw in it, the reader should
contemplate the following special case. Define Xn = (Ul + ... + un) In. Certainly,
Xn is in the convex hull of U. Since Xn is given by an orthonormal expansion,
we have IIxnl12 = n(1/n 2 ) = lin. This calculation shows that Xn -+ O. Hence 0
is in the closure of the convex hull of U. But it is not possible to represent 0 as
an "infinite" convex combination of the vectors Un. The only representation of
o is the trivial one, and those coefficients do not add up to 1.
The foregoing example shows that in general, for a series of constants,
n 00
expanding that function in a Taylor series. The expansion in powers of the vari-
able is not obtained simply by allowing more and more terms in a polynomial
and appealing to the Weierstrass approximation theorem. Indeed, only a select
few of the continuous functions will have Taylor series. If f is continuous, say
on [-1,1]' then there is a sequence of polynomials Pn such that Ilf - Pnlloo -+ O.
If it is desired to represent f as a series, one can write
00'
n
lim Pn(x) = lim L =L
00
where Ci = n->oo
lim Cni. This last limit will not exist in most cases.
The expansion of a function in an orthogonal series has its own cautionary
examples. Consider the orthonormal family of Legendre polynomials, PO,Pl, ...
defined on the interval [-1, 1]. They have the property
For any continuous function f defined on this same interval, we can construct
its series in Legendre functions:
Here we write ~ to remind us that equality mayor may not hold. It is only
asserted that each continuous function has a corresponding formal series in Leg-
endre functions. Can we not appeal to the Weierstrass approximation theorem
to conclude that the series does converge to f? By now, the reader must guess
that the answer is "No". The reason is not at all obvious, but depends on a
startling theorem in Analysis, quoted here. (A proof is to be found in [Ch].)
where the coefficients ak are as above, defines a projection of the type appearing
in Theorem 4. That is, Pn is a continuous linear idempotent map from C[-I, 1]
onto IIn. Hence, IlFnll -+ 00. By the Banach-Steinhaus Theorem (Chapter 1,
Section 7, page 41) the set of fin C[-I, 1] for which the series above converges
uniformly to f is of the first category (relatively small) in C[-I, 1].
One should think of this phenomenon in the following way. The space
C[-I, 1] contains not only the nice familiar functions of elementary calculus, but
also the bizarre unmanageable ones that we do not see unless we go searching
for them. Most functions are of the latter type. See Example 1 on page 42 to
be convinced of this. To guarantee convergence of the series under discussion,
one must make further smoothness assumptions about f. For example, if f is
an analytic function of a complex variable in an ellipse that contains the line
segment [-1,1]' then the Legendre series for f will converge uniformly to f on
that segment. For results about these series, consult [San].
Interchanging the order of two integrals in a double integral can also involve
difficulties. The Fubini Theorem in Chapter 8 addresses this issue. Here we offer
an example of a double integral in a discrete measure space, where the integrals
become sums. This is adapted from [MT].
Example. Consider a function of two positive integers defined by the formula
0 ifm> n
f(n,m) = { -1 ifm = n
2m - n ifm < n
The two possible sums can be calculated in a straightforward way, and they turn
out to be different:
0000 00 11
L L f(n, m) = L L f(n, m) = L [-1 + 2 + 4 + ...] = L 0 = 0
m n m=1n=m m=1 m
00 noon
LLf(n,m) = L L f(n,m) = f(I,I) + L L f(n,m)
n m n=1 m=1 n=2m=1
00
= -1 + L
00
_2 1 - n = -1-1 = -2
n=2
The difficulty here is not to be attributed to the fact that our domain N x N
stretches infinitely far to the right and upwards in the 2-dimensional plane. One
can make this example work on the unit square in the plane by the following
construction. Define intervals In = [l/(n + 1), lin]. On each rectangle In X 1m
define a function F whose integral over that rectangle is f(n, m), as defined
previously. We then find, from the above calculations, that
whereas
t t F(x,y)dydx = LLf(n,m) =-2
io io n m
By referring to the Fubini Theorem, page 426, we see that our functions
f and F do not satisfy the essential hypothesis of that theorem: They are not
integrable over the Cartesian domain. The function If I, for example, has an
infinite number of values +1, and so cannot yield a finite integral over N x N.
Had we wished to apply the Tonelli Theorem, the crucial missing hypothesis
would have been that f ) 0 or F ) O.
Let us return to the functions defined by infinite series, for such functions are
truly ubiquitous in Mathematics. We can ask, "How does integration interact
with the summation process? Can integration be interchanged with summa-
tion?" The answer is that the conditions for this to be valid are less stringent
than those for differentiation. This is to be expected, for (in general) integration
is a smoothing process, whereas differentiation is the opposite: It emphasizes or
magnifies the variations in a function. The relevant theorem, again conveniently
accessible in [Wi], is as follows.
This theorem is often used to obtain Taylor series for troublesome functions.
For example, if a Taylor series is needed for the Arctan function, we can start
with the relationship
d 1
L(-et
00
-d Arctan(t) = - - 2 =
t 1+t n=O
(3)
d
dx iTrg(x, t) dJ.L(t) = iTr ox
og
(x, t) dJ.L(t)
380 Chapter 7 Additional Topics
f(x) = hg(x,t)dJl(t)
The derivative f'(xo) exists if and only if for each sequence [xnJ converging to
Xo we have
!'(xo) = lim f(x n ) - f(xo) = lim f g(xn' t) - g(xo, t) dJl(t)
n--+oo Xn - Xo n--+oo iT Xn - Xo
By Hypothesis (B), the integrands in the preceding equation are bounded in
magnitude by the single L1-function G. The Lebesgue Dominated Convergence
Theorem (see Chapter 8, page 406) allows an interchange of limit and integral.
Hence
Theorem 8. Let (T, A, Jl) be a measure space such that Jl(T) < 00.
Let g : (a, b) x T ---+ IR. Assume that for each n, (ong/oxn)(x, t) exists,
is measurable, and is bounded on (a, b) x T. Then
Proof. Since Jl(T) < 00, any bounded measurable function on T is integrable.
To see that Hypothesis (B) of the preceding theorem is true, use the mean value
theorem:
g(x, t) - g(xo, t) = og (~,
I I I
!!( M t)1
x - Xo ox
where M is a bound for log/oxl on (a, b) x T. By the preceding theorem,
Equation (4) is valid for n = 1. The same argument can be repeated to give an
inductive proof for all n. •
Chapter 8
This chapter gives, in as brief a form as possible, the main features of measure
theory and integration. The presentation is sufficiently general to cover Lebesgue
measures and measures that arise in studying the continuous linear functionals
on Banach spaces.
Since measures are employed to assign a size to sets, they are often allowed
to assume infinite values. The extended real number system is designed to
assist in this situation. This is the set JR. = JR U {oo } U { - 00 }. The two new
points, 00 and -00, that have been adjoined to JR are required to behave as
follows:
(1) (-oo,oo)=JR
(2) x + 00 = 00 for x E (-00, ooJ
(3) xoo = 00 for x E (O,ooJ
(4) 000=0
From these rules various others follow, such as x- 00 = -00 when x E [-00, (0).
One advantage of JR" is that every subset of JR. has a supremum and an infimum
in JR". For example, the equation sup A ~ 00 means that for each x E JR, A
contains an element a such that a > x. Note that certain expressions, such as
-00 + 00, must remain undefined.
381
382 Chapter 8 Measure and Integration
Example 3. Let X be any set. For a finite subset A, let J.t(A) == # (A), the
number of elements in A. For all other sets, put J.t(A) == 00. This is called
counting measure. •
Example 4. Let X be any set and let Xo be any point in X. Define J.t(A) to
be 1 if Xo E A and to be 0 if Xo tf- A. •
Example 5. Let X be any infinite set. Let {X1,X2, ... } be a countable set
of (distinct) points in X. Let An be positive numbers (n == 1, 2, ... ). Define
J.t(A) == 2:{An : Xn E A}, and J.t(0) == o. •
Example 6. Lebesgue Outer Measure. Let X be the real line. Define
•
Example 7. Lebesgue Outer Measure on JRn • In JRn define the "unit
cube" to be the set Q of all points (6, ... , ~n) whose components lie in the
interval [0,1]. If x E JRn and A E JR, define x + AQ == {x + AV : v E Q}. For
A C JRn, define
•
Example 8. Lebesgue-Stieltjes Outer Measure. Let X be the real line,
and select a monotone non decreasing function, : JR --+ lR. Define
Notice that Lebesgue outer measure (in Example 6) is a special case, obtained
when ,(x) == x. •
In order to see that Examples 6, 7, and 8 are bona fide outer measures, one
can appeal to the following theorem.
Section 8.1 Extended Reals, Outer Measures, Measurable Spaces 383
A c 0
i=1
Gi , Gi E C}
defines an outer measure on X.
Proof. Assume all the hypotheses. There are now three postulates for an outer
measure to be verified. Our assumption about (3 implies that (3( G) ~ 0 for all
G E C. Therefore, p,(A) ~ 0 for all A. Since 0 c G for all G E C, p,(0) ,,-;; (3(G)
for all G. Taking an infimum yields p,(0) ,,-;; O.
If A c Band B C U~1 Gi , then A c U~1 Gi and p,(A) ,,-;; I:~1 (3(Gi ).
Taking an infimum over all countable covers of B, we have p,(A) ,,-;; p,(B).
Let Ai C X (i E N) and let € > O. By the definition of p,(Ai) there exist
Gij E C such that Ai C Uj:1 Gij and I:j:1 (3(Gij ) ,,-;; P,(Ai) + €/2 i . Since
U~1 Ai C U~=1 Gij , we obtain
The postulates for an outer measure do not include all the desirable at-
tributes that are needed for integration. For example, an essential property is
additivity:
An B = 0 ===} p,(A U B) = p,(A) + p,(B)
This property cannot be deduced from the axioms for an outer measure. See
Problem 6 for a simple example. Even Lebesgue outer measure is not additive,
although it seems to be a natural or intrinsic definition for "the measure of a
set" in lR. If we concentrate for the moment on this all-important example, we
ask, How can additivity be obtained? We could change the definition. But if the
measure of an interval is to be its length, changing the definition will not succeed.
The only solution is to reduce the domain of p, from 21R (or 2x in general) to a
smaller class of sets. This is the brilliant idea of Lebesgue (1901) that leads to
Lebesgue measure on lR. The procedure for accomplishing this domain reduction
is dealt with in the theorem of Caratheodory, proved later. First, we describe in
the abstract the sort of domain that will be used for measures.
384 Chapter 8 Measure and Integration
UAi EA.
00
Example 10. Let X be an arbitrary set, and let A = 2x (the family of all
subsets of X). Then (X, A) is a measurable space and A is the largest a-algebra
of subsets of X. •
Example 11. Let X be any set and let A be a particular subset of X. Define
A = {X, 0, A, X" A}. This is the smallest a-algebra containing A. We are
observing that, as long as A is nonempty, the set X and the empty set 0 must
belong to A. In other words, these two sets will always be measurable. •
Example 12. Let X be any set, and let A consist of all countable subsets of
X and their complements. Then A is the smallest a-algebra containing all finite
subsets of X. •
Problems 8.1
1. Does the extended real number system JR' become a topological space if a neighborhood
of 00 is defined to be any set that contains an interval of the form (a, ooJ, and similarly
for -oo?
2. Why, in defining Lebesgue outer measure, do we not "approximate from within" and
define
3. Prove that if J.L is an outer measure and if J.L(B) = 0, then J.L(A U B) = J.L(A).
4. An outer measure on a group is said to be invariant if J.L(x + A) = J.L(A) for all x and
A. Prove that Lebesgue outer measure has this property.
5. Under what conditions is Lebesgue-Stieltjes outer measure invariant, as defined in Prob-
lem 4?
6. Let X = {I, 2}, and define J.L(0) = 0, Jl(X) = 2, J.L( {I}) = 1 and J.L( {2}) = 2. Show ,that
J.L is an outer measure but is not additive.
7. Prove that the Lebesgue outer measure of each countable subset of JR is O.
8. How many outer measures having range in {O, 1, ... , n} are there on a set of n elements?
9. Prove that the Lebesgue outer measure of the interval [a, b] is b - a.
10. Let J.L be an outer measure on X, and let Y C X. Define v(A) = J.L(A) when A C Y. Is
v an outer measure on Y?
11. Does an outer measure necessarily obey this equation?
12. Let J.L be an outer measure on X, and let Y C X. Define v(A) = J.L(YnA). Is v an outer
measure on X?
13. Let J.L and v be outer measures on X. Define O(A) = max{J.L(A),v(A)} for all A C X. Is
o an outer measure on X?
14. Are the outer measures in Examples 3 and 5 additive?
15. What is the Lebesgue outer measure of the set of irrational numbers in [O,lJ?
16. Prove Lemma 1.
17. Prove that every countable set in JR is a Borel set.
18. Let (X, A) be a measurable space, and let A and B be two subsets of X. If A is measurable
and B " A is not, what conclusions can be drawn about Au B and A " B?
19. Let (X, A) be a measurable space, and let YEA. Define B to be the family of all sets
Y n A, where A ranges over A. Prove that (Y, B) is a measurable space.
20. Does there exist a count ably infinite IT-algebra?
21. Prove that A in Example 12 is a IT-algebra.
22. Is there an example of a IT-algebra containing exactly 5 sets?
386 Chapter 8 Measure and Integration
UAi) ~ LP,(A
(Xl (Xl
(2) p,( i) if Ai E A.
i=l i=l
•
Section 8.2 Measures and Measure Spaces 387
(1)
IV. The postulate 1-1(0) = 0 is true for 1-1 on A because it is a postulate for
outer measures. That I-I(A) ~ 0 follows from the postulates of an outer measure:
because 0 c A.
V. For the countable additivity of 1-1, look at the proof in Part III. If the
sequence AI, A 2 , • •. is disjoint, then Ci = Ai, and Equation (2) will read, with
A = U:l Ai,
n
1-1(8) ~ L 1-1(8 n Ai) + 1-1(8 '- A)
i=1
the concepts of j,t-measurable and A-measurable are the same (by the definition
of A). However, there can be other a-algebras present (for the same space X),
and there can be different kinds of measurability.
One example of this situation occurs when j,t is Lebesgue outer measure,
as defined in Example 6 of the preceding section (page 382). The a-algebra A
that arises from Caratheodory's Theorem is called the a-algebra of Lebesgue
measurable sets. A smaller a-algebra is the family B 'of all Borel sets. This
is the smallest a-algebra containing the open sets. It turns out that B is a
proper subset of A. In some situations one uses the measure space (lR, B, j,t) in
preference to (lR, A, j,t). It is convenient to use j,t without indicating notationally
whether its domain is 21R, or A, or B. Remember, however, that j,t on 21R is only
an outer measure, and countable additivity can fail for sets not in A.
We have seen that every outer measure leads to a measure via CaratModory's
Theorem. There is a converse theorem asserting, roughly, that every measure
can be obtained in this way.
Since this is true for all n, JL*(S) = JL*(A). By the preceding theorem, A is
JL* -measurable. •
Problems 8.2
1. Let JL be Lebesgue outer measure. Let IQi be the set of all rational numbers. Prove that
if IQi n [0,1] is contained in U~=l (ai, b;), then I:~=l {bi - ail ;:: 1. Show that this is not
true if we permit a countable number of intervals {ai, bi }.
2. Is there an example of a set X and a measure JL on 2 x such that JL{X}
for all points x in X?
= 1 and JL{ {x}} = °
3. In JR, is the smallest IT-a1gebra containing all singletons {x} the same as the IT-a1gebra of
all Borel sets?
4. Let {X,A,JL} be a measure space, and let JL' be the outer measure defined in Theorem 2
{page 389}. Define the inner measure JL. induced by JL via the equation
Prove these properties of p,.: {i} JL.{S} ~ JL'{S}; {ii} p,.{S} ~ 0; {iii} JL'{S} ~ JL.{T}
when SeT; {iv} JL.{0} = 0; {v} JL.{A} = JL{A} if A E A.
5. Prove that an outer measure JL on 2x is a measure on 2x if and only if every set in 2x
is JL-measurable.
6. Let {X,A,JL} be a measure space, and let Ai E A. Prove that P,{U:l Ai} =
limn-too JL{U~=l Ai}·
7. The symmetric difference of two sets A and B is A fC, B = {A" B} u {B" A}. Prove
°
that for measurable sets A and B in a measure space, the condition JL{A fC, B} = implies
that JL{A} = JL{B}.
8. Let {X, A, JL} be an incomplete measure space. Show how to enlarge A and extend JL so
that a complete measure space is obtained.
9. Let {X,A,JL} be a measure space. Let B be the family of all sets B E 2x such that
An B E A whenever A E A and JL{A} < 00. Show that B is a IT-a1gebra containing A.
10. Prove that if {X, A, JL} and {X,A,v} are measure spaces, then so is {X,A,JL+ v}. Gen-
eralize.
11. If {X,A,JL} and {X,A,v} are measure spaces such that JL;:: v, is there a measure (J {on
=
A} such that v + (J JL? {Caution: 00 - 00 is not defined in JR'.}
12. Let {X, A,JL} be a measure space. Suppose that An E A and An+l C An for all n. Does
it follow that JL{n:=l An} = limn-too JL{An }?
14. Prove that for any outer measure JL and any set A such that JL{A} = 0, A is JL-measurable.
Section 8.3 Lebesgue Measure 391
15. Let (X, A, IL) be a measure space, and let B E A. Define v on A by writing v(A) =
t-t(A n B). Prove that (X, A, v) is a measure space.
16. Let (X,A,t-t) be a measure space. Let An E A and L::'=I t-t(An) < 00. Prove that the
set of x belonging to infinitely many An has measure O.
17. Let (X,A,t-t) be a measure space for which t-t(X) <00.Let An be a sequence ofmeasur-
able sets such that Al C A2 C ... and X = U An. Show that t-t(X" An) .j. o.
In this section f.l will denote both Lebesgue outer measure and Lebesgue measure.
Both are defined for subsets of JR by the equation
(1)
The outer measure is defined for all subsets of JR, while the measure f.l is the
restriction of the outer mea.'Jure to the o--algebra described in Caratheodory's
Theorem (page 387). The sets in this o--algebra are called the Lebesgue measur-
able subsets of JR. It is a very large class of sets, bigger than the o--algebra of
Borel sets. The latter has cardinality c, while the former has cardinality 2c .
Proof. Consider first a compact interval [a, b]. Since [a, b] c (a - c, b + c), we
conclude from the definition (1) that J.t([a, bJ) ,,;; b - a + 20: for every positive c.
Hence f.l([a, b]) ,,;; b- a. Suppose now that f.l([a, b]) < b-a. Find intervals (ai, bi )
such that [a, b] C U~l (ai, bi ) and L~1 Ibi - ail < b- a. We can assume ai < bi
for all i. By compactness and renumbering we can get [a, b] c U:=1 (ai, b;).
It follows that L~1 (b i - ai) < b - a. By renumbering again we can assume
a E (al,bJ), b1 E (a2,b 2), b2 E (a3,b3), and so on. There must exist an index
k ,,;; n such that b < bk. Then we reach a contradiction:
n k k-l
b - a > ~)bi - ai) ~ ~)bi - a;) = bk - al + ~)bi - ai+d > bk - al > b - a
i=1 i=1 i=1
If J is a bounded interval of the type (a, b), (a, b], or [a, b), then from the
inclusions
[a + c, b - c] c J c [a - c, b + c]
we obtain b - a - 2c ,,;; f.l(J) ,,;; b - a + 2c and f.l( J) = b - a.
Finally, if J is an unbounded interval, then it contains intervals [a, b] of
arbitrarily great length. Hence f.l( J) = 00. •
392 Chapter 8 Measure and Integration
Proof. (S.J. Bernau) The family of Borel sets is the smallest a-algebra con-
taining all the open sets. The Lebesgue measurable sets form a a-algebra. Hence
it suffices to prove that every open set is Lebesgue measurable.
Recall that every open set in JR can be expressed as a countable union of
open intervals (a, b). Thus it suffices to prove that each interval (a, b) is Lebesgue
measurable. We begin with an interval of the form (a, 00), where a E JR.
To prove that the open interval (a, 00) is measurable, we must prove, for
any set S in JR, that
(2)
Let us use the notation III for the length of an interval I. Given c: > 0, select
open intervals In such that S C U~=l In and L IInl < j.l(S) + c:. Define I n =
In n (a,oo), Kn = In n (-oo,a), and Ko = (a - C:,a + c:). Then we have
UI
00
Sn(a,oo) c n
n=l
U Kn
00
Consequently,
L
00
Because c: was arbitrary, this establishes Equation (2). Since the measurable sets
make up a a-algebra, each set of the form (-00, b] = JR" (b, 00) is measurable.
Hence the set (-00, b) = U~=l (-00, b - ~] is measurable and so is (a, b)
(-oo,b)n(a,oo). •
Proof. The statement means that j.l(S) = j.l(v+S) for all S E 2IR and all v E R
The translate v + S is defined to be {v + x : XES}. Notice that the condition
S c U: 1 (ai, b;) is equivalent to the condition x+S c U:l (x+ai, x+bi ). Since
the length of (x + ai,x + bi ) is the same as the length of (ai,b i ), the definition
of J.l gives equal values for j.l(S) and j.l(x + S). •
Section 8.3 Lebesgue Measure 393
°
Theorem 4. There exists no translation-invariant measure v
defined on 21R such that < v([O, 1]) < 00. Consequently, there exist
subsets of lR that are not Lebesgue measurable.
Proof. The second assertion follows from the first because if every set of reals
were Lebesgue measurable, then Lebesgue measure would contradict the first
assertion.
To prove the first assertion, suppose that a measure v exists as described.
By the preceding lemma, the set P given there has the property
00
00 ] 00 00
•
394 Chapter 8 Measure and Integration
Problems 8.3
1. Zermelo's Postulate states that if F is a disjoint family of nonempty sets, then there is
a set that contains exactly one element from each set in the family F. Prove this, using
the Axiom of Choice.
2. Prove that the set P in the lemma is not Lebesgue measurable.
3. Prove that Lebesgue measure restricted to the Borel sets is not complete.
4. An F.,.-set is any countable union of closed sets, and a G8-set is any countable intersection
of open sets. Prove that both types of sets are Borel sets.
5. Prove that the Cantor "middle-third" set is an uncountable Borel set of Lebesgue mea-
sure O. This set is defined in Problem 1.7 26, page 46.
6. Prove that the infimum in the definition of Lebesgue outer measure is attained if the set
is bounded and open.
7. Prove that the set Q of all rational numbers is a Borel set of measure O.
8. Prove that for any Lebesgue measurable set A of finite measure and for any c > 0 there
are an open set G and a closed set F such that Fe A c G and p,(G) ~ p,(F) + c.
9. Let S be a subset of lR such that for each c > 0 there is a closed set F contained in S for
which p,(S" F) < c. Prove that S is Lebesgue measurable.
10. Prove that a set of Lebesgue measure 0 cannot contain a nonmeasurable set, but every
set of positive measure does contain a nonmeasurable set.
11. Under what set operations is 2R " B closed? Here B is the u-algebra of Borel sets.
12. Prove that if S c lR and for every c > 0 there is an open set G containing S and satisfying
J.t( G " S) < c, then S is Lebesgue measurable.
13. In Theorem 4, is the result valid when the domain of 1/ is a subset of 21R?
Proof. We shall prove that each condition implies the one following it, and
that f implies that f is measurable. That a implies b follows from the equation
f-l([a,oo]) = n~=1 f-l((a - ~,oo]) and from the properties of a a-algebra.
That b implies c follows in the same manner from the equation f- 1 ([-00, a)) =
X "-f-l([a,ooJ). That c implies d follows from the equation f- 1([-00,aj) =
n:'d- 1([-00,a+ ~)). That d implies e follows from writing f- 1 ((a,b)) =
U:'1 f- 1 ([-00, b - ~)) "- f- 1 ([-00, aJ). That e implies f is a consequence of
the theorem that each open set in JR- is a countable union of intervals of the
form (a, b), where a and b are in JR-. To complete the proof, assume condition
f. Let S be the family of all sets 8 contained in JR- such that f-l(8) EA. It is
straightforward to verify that S is a a-algebra. By hypothesis, each open set in
JR- belongs to S. Hence S contains the a-algebra of Borel sets. Consequently,
f-l(B) E A for each Borel set B, and f is measurable. •
Our next goal is to study, for a given measurable space (X, A), the class of
measurable functions. First, we define the characteristic function of a set A
to be the function X A given by
I if x E A
XA(x) = { 0
if x f{. A
= ri, 00])]
i=1
To verify this, notice that f(x) + g(x) > a if and only if a - g(x) < f(x), and
this last inequality is true if and only if a- g(x) < ri < f(x) for some i. The last
term in the displayed equation is a countable union of measurable sets, because
f and 9 are measurable.
If f is measurable, then so is )..j, because (AI) -1 (( a, 00]) is either 0 (when
A = 0 and a ~ 0), or X (when A = 0 and a < 0), or f-l((a/A,oo]) (when
A> 0), or f-l([-oo,a/A») (when A < 0).
If f is measurable, then so is f2, because (f2)-1 ((a, 00]) is X when a < 0,
and it is f-l((ya,OO]) U f-l([-OO,_ya») when a ~ O. From the identity
:t :t
f 9 = (f + g)2 - (f - g)2 it follows that f 9 is measurable if f and 9 are
measurable.
If J; are measurable and if g(x) = sUPi J;(x), then 9 is measurable because
g-1 ((a, 00]) = U~ 1 f i- 1(( a, 00]). A similar argument applies to infima, if we
use an interval [-00, a).
If Ii are measurable and g(x) = lim sup li(X), then 9 is measurable, because
g(x) = lim n-+ oo sUPi>n J;(x) = infn sUPi>n J;(x). A similar argument applies to
the limit infimum. •
Consider now a measure space (X, A, Jl). Let I and 9 be functions on X
taking values in JR'. If the set {x: f(x) =1= g(x)} belongs to A and has measure
0, then we say that f(x) = g(x) almost everywhere. This is an equivalence
relation if the measure space is complete (Problem 1). More generally, if P( x) is
a proposition, for each x in X, then we say that P is true almost everywhere if
the set {x : P(x) is false} is a measurable set of measure O. The abbreviation a.e.
is used for "almost everywhere." The French use p.p. for "presque partout."
On the right side of this equation we see the union of two sets. The first of
these is measurable because it is f- 1 (( a, (0») "A. The second set is measurable
Section 8.4 Measurable Functions 397
Let (X, A, Ji) be a measure space, and let I, h, 12, ... be measurable func-
•
tions. We say that In ---+ I almost uniformly if to each positive c there
corresponds a measurable set of measure at most c on the complement of which
In ---+ I uniformly.
•
Let (X, A) be a measurable space. A simple function is a measurable
function I : X ---+ JR' whose range is a finite subset of JR'. Then I can be written
in the form I = L:~=l AiXAi' where the Ai can be taken to be distinct elements
of JR', and Ai can be the set {x : I(x) = Ai}. It then turns out that each
Ai is measurable, that these sets are mutually disjoint, and their union is X.
(Problem 2)
gn = L 2~XAf +nXBn
i
The sets A? and Bn are measurable, by Theorem 1. Hence gn is a simple
function. The definition of gn shows directly that gn ~ f. In order to verify
that gn(x) converges to f(x) for each x, consider first the case when f(x) i- 00.
i +1 t 1
For large n and a suitable i, x E A? Then f(x) - gn(x) < - - - - = -.
2n 2n 2n
On the other hand, if f(x) = +00, then gn(x) = n -+ f(x).
For the monotonicity of gn (x) as a function of n for x fixed, first ver-
ify (Problem 3) that (for i < n2n) A? = A~:l u A~i~\. If x E A~:l,
then gn+l(X) = =
2i/2 n + 1 = gn(x). If x E A~t;.lll then gn+l(X) =
i/2 n
(2i + 1)/2n + 1 2i/2 n+l = gn(x). If x E B n , then f(x) ;;:: n, and therefore
;;::
Problems 8.4
1. Prove that the relation of two functions being equal almost everywhere is an equivalence
relation if the underlying measure space is complete.
2. Prove the assertions about the sets Ai that were mentioned in the definition of a simple
function.
3. In the proof of Theorem 5, ven.fy t hat A(n)
i = A(n+1)
2i U A(n+1)
2i+1·
4. Let (X,A) be a measurable space, and let r1,r2, ... be an enumeration of the rational
numbers. Prove that a function f : X --t JR' is measurable if and only if all the sets
f- 1(ri,00]) are measurable.
5. Prove that every Borel set in IR' is one of the four types B, B U {oo}, B U {-oo},
B U {+oo, -oo}, where B is a Borel set in R
6. Prove that in order for f to be measurable it is necessary and sufficient that f- 1 (0) be
measurable for all open sets 0 in IR and that f-1( {co}) and f-1( {-co}) be measurable.
7. Prove that if f and 9 are measurable functions, then the sets {x : f(x) = g(x)}, {x :
f(x) ;;:: g(x)}, and {x: f(x) > g(x)} are measurable.
8. Prove that if f and 9 are measurable functions and if (f + g)( x) is assigned some constant
value on the set where f(x) and g(x) are infinite and of opposite sign, then f + 9 is
measurable.
9. Let (IR, A) be the measurable space in which A is the family of all Lebesgue measurable
sets. Give an example of a nonmeasurable function in this setting.
10. Let B be the u-algebra of Borel sets in R Let A be the u-algebra of Lebesgue measurable
sets. >ove that B is a proper subset of A.
11. Let (X, A, J.I) be a measure space for which J.I(X) < 00. Prove that if f is a measurable
function that is finite-valued almost everywhere, then for each t: > 0 there is an M such
that J.I{ {x : If(x)1 > M}) < t:.
Section 8.5 The Integral for Nonnegative Functions 399
12. Let (X, A) be a measurable space and I a measurable function. What can you say about
the following set?
{s: s c JR and rl(S) E A}
13. Prove that the composition log of two Borel measurable functions on JR is Borel mea-
surable.
14. Let (X, A, tt) be a measure space and I a measurable function. For each Borel set B in
JR- define v(B) = ttU- 1 (B)). Show that v is a Borel measure, i.e., a measure on 8, the
(7-algebra of Borel sets.
15. If III is measurable, does it follow that I is measurable?
16. Show that the composition of two Lebesgue measurable functions need not be Lebesgue
measurable.
17. Prove that if I is a real-valued Lebesgue measurable function, then there is a Borel
measurable function equal to I almost everywhere.
18. Let X = N, A = 2x, and let tt be counting measure, as defined in Example 3 in
Section 8.1, page 382. Let In be the characteristic function of the set {1,2, ... ,n}.
Prove that the sequence [In] has property (a) but not property (b) in Egorov's Theorem.
Resolve the apparent contradiction.
19. Refer to Problem 7 in Section 8.2, page 390, for the definition of the symmetric difference
=
of two sets. Prove that IX A - XBI X A [', B'
20. Prove that a monotone function I : JR --+ JR is Borel measurable.
21. Prove that the set of points where a sequence of measurable functions converges is a
measurable set.
With any measure space (X, A, JL) there is associated (in a certain standard
way) an integral. It will be a linear functional on the space of all measurable
functions from X into JR-. The motivation for an appropriate definition arises
from our wish that the integral of the characteristic function of a measurable set
should be the measure of the set:
(1) ! XA = JL(A) (A E A)
The requirement that the integral act linearly leads to the definition of the
.integral of a simple function f:
(2)
In Equation (2) we assume that the sets Ai,'" ,An are mutually disjoint and
that the Qi are distinct. Such a representation f = L QiXAi is called canonical.
! i=l
k
f= L,BiJL(Bi ) = L,BiJL(
k
i=l
UAj) = LL,BiJL(A
jEJi jEJ;
j)
k
i=l
k n
= L L ojJL(A j ) = LOjJL(Aj )
j=l
•
Lemma 2. If g and f are simple functions such that g ~ f, then
f g ~ f f·
Proof. Start with canonical representations, as described following Equa-
tion (2):
n k
g= LOiXAi f= L,BjXBj
i=l j=l
Then we have (non-canonical) representations conforming to Lemma 1:
n k k n
g= L L OiXAinBj f = LL,BjXA;nBj
i=l j=l j=li=l
! n
g = LLOiJL(Ai
i=l
k
j=l
n Bj ) ! k n
f = L L,BjJL(Aj n B j )
j=li=l
Hence
•
The next step in the process involves the approximation of nonnegative mea-
surable functions by simple functions, as addressed in Theorem 5 of Section 8.4,
page 397. Suppose, then, that gl, g2, ... are nonnegative simple functions such
that gn t f. Then we want the integral of f to be the limit of f gn. For technical
reasons this is best accomplished by defining
Proof. Since i itself is simple, the expression on the 'right of Equation (3) is
at least f j. On the other hand, if 9 is simple and if 9 :::;; i, then by Lemma 2,
f 9 :::;; f j. By taking a supremum, we see that the right side of Equation (3) is
at most f i· •
Lemma 4. If i and 9 are nonnegative simple functions, then
f(J + g) = f i + f g.
/(g + 1)
n k
= L L(ai + ,Bj)/l(Ai n B j )
i=1 j=1
(4)
402 Chapter 8 Measure and Integration
Since this is true for any () in (0,1), one concludes that J 9 ~ limn J In. In this
inequality take a supremum over all simple 9 for which 0 ~ 9 ~ I, arriving at
JI ~ limn J In. •
j 1= jUXA + IX B ) = j IXA + j IX B
~j ooXA + j OX B = ooJL(A) + OJL(B) = 0
Proof. Recall that the limit infimum of a sequence of real numbers [xnl is
defined to be limn-H)o infi~n Xi. The limit infimum of a sequence of real-valued
functions is defined pointwise: (liminf fn)(x) = liminf fn(x) = limgn(x), where
gn(x) = infi~n J;(x). Observe that gn-l(X) ~ gn(x) ~ fn(x) and that gn t
lim inf fn. Hence by Theorem 1 (The Monotone Convergence Theorem)
Similarly, J gX B = O. Hence
J
Theorem 5 states that f is not affected if f is altered on a set of measure
0, while retaining measurability.
Problems 8.5
1. Give an example in which strict inequality occurs in Fatou's Lemma (Theorem 4).
2. Show that a monotone convergence theorem for decreasing sequences is not true. For
example, consider In as the characteristic function of the interval [n,oo).
3. Prove that if I is nonnegative and Lebesgue integrable on IR and if F(x) = J~oo I, then
F is continuous.
4. Define In(x) to be n if Ixl ~ lin and to be 0 otherwise. What are Jlim In and lim J In?
5. Prove or disprove: If (X,A,/L) is a measure space and if A and B are measurable sets,
J
then IX A -- XBI = /L(A f:>. B). Recall that A f:>. B = (A" B) U (B" A).
6. Let I be Lebesgue measurable on [O,lJ, and define cp(t) = /L(J-l((--oo,t»). Find the
salient properties of cpo For example, is it continuous from the right or left? Is it mono-
tone? Is it measurable? Is it invertible? What are limt~oc cp(t) and limt---+-oo cp(t)?
7. (Continuation). Define J*(x) = sup{t : cp(t) ~ x}. Prove that cp(t) ~ x if and only
if t ~ J*(x). Prove that the sets {x : I(x) < t} and {x : J*(x) < t} have equal
measure. Hence, J* is called an equimeasurable nondecreasing rearrangement of
I. Prove that the sets {x : I(x) ~ t} and {x : J*(x) ~ t} have the same measure. Prove
that the sets {x : I(x) > t} and {x : J*(x) > t} have the same measure. Prove that
J*(cp(x» ~ x ~ cp(f*(x».
8. Give an example to show that the nonnegativity hypothesis cannot be dropped from
Fatou's Lemma (Theorem 4).
404 Chapter 8 Measure and Integration
9. Let fn be nonnegative measurable functions (on any measure space). Prove that if
fn -t f and f ~ fn for all n, then I fn -t If.
10. Prove, for any sequence in JR, that lim inf ( -Xn) = -lim sup Xn.
11. Let X = [0,11. Is there a Borel measure J1 on X that assigns the same positive measure
to each open interval (0, lin), n = 1,2,3, ... ?
12. If f is a bounded function, then there is a sequence of simple functions converging uni-
formly to f. (The domain of f can be any set, and no measurability assumptions are
needed.)
13. Let fn be measurable functions such that fn ~ 0 a.e. ("almost everywhere") and fn t f
a.e. Prove that I fn t If.
14. Let fn be nonnegative and measurable. Prove that I L::=1 fn = L::=1 I fn.
15. Let f be nonnegativ~ and measurable. Prove that I f = IA f, where A = {x : f(x) > a}.
16. Let fn be measurable functions such that fn ~ fn+l ~ 0 for all n and I fn -l. o. Prove
that fn -l. 0 a.e.
17. Let f be the characteristic function of the set of irrational points in [0, 11. Is f measurable?
What are the Riemann and the Lebesgue integrals of f?
18. Prove that if {An} is a disjoint sequence of measurable sets and if X = U::: 1 An, then
If = L::=1 IAn f·
19. Prove that if fn are nonnegative measurable functions for which L::=1 I fn ~ 0, then
fn -t 0 a.e.
20. Give an example of a sequence of Riemann integrable functions such that the inequalities
o ~ fn ~ fn+l ~ 1 hold, yet limfn is not Riemann integrable.
21. Give an example of a sequence of simple functions fn converging pointwise to a simple
function f, and yet I Ifn - fl f+ O.
In the preceding section, the integral for nonnegative functions was developed in
the general setting of an arbitrary measure space (X, A, fL). Next on the agenda
is the extension of the integral to "arbitrary" functions.
Definition. Let (X,A,fL) be a measure space, and let f : X --+ JR". We define
Jf=Jr-Jr
The general definition just given for the integral is in harmony with the
previous definition, Equation (3), Section 8.5, page 400, in the cases where both
definitions are applicable. Indeed, if f ~ 0, then f+ = f and f- = o.
Definition. A function f : X --+ IR+ is said to be integrable, if it is measurable
and if J If I < 00. The set of all integrable functions on the given measure space
is denoted by £1 (X, A, p,), or simply by £1 ifthere can be no ambiguity about
the underlying measure space.
Theorem 1. The set £1(X, A, p,) is a linear space, and the integral
is a linear functional on it.
Proof. Let f and 9 be members of £1. To show that f+g E £1, write h = f+g,
and
Since these are all nonnegative functions, Theorem 2 of Section 8.5, page 402, is
applicable, and
Therefore, by Lemma 1,
With this equation now established, we use Lemma 5 in Section 8.5 (page 401)
to write
j If + gl ~ j(lfl + Igl) = j If I + j Igl < 00
For scalar multiplication, observe first that if >. ~ 0 and f ~ 0, then the
definition of the integral in Equation (3) of Section 8.5 (page 400) gives Af = J
>. J f· If f ~ 0 and>' < 0, then
406 Chapter 8 Measure and Integration
Since f g < 00, we conclude that f I ~ lim inf f In. Since -land -In satisfy
the hypotheses of our theorem, the same conclusion can be drawn for them:
f - I ~ lim inf f - In. This is equivalent to - f I ~ -lim sup f In and to f I ~
lim sup f In. Putting this all together produces
In order to establish the second part of the theorem, it now suffices to prove
it in the special case that f is an integrable simple function. It is therefore a
linear combination of characteristic functions of measurable sets of finite mea-
sure. It then suffices to prove this part of the theorem when I = X A for some
measurable set A having finite measure. By the definition of Lebesgue mea-
sure, there is a countable family of open intervals {In} that cover A and satisfy
JL(A) ~ l:~=l JL(In) < JL(A) + c. There is no loss of generality in assuming
that the family {In} is disjoint, because if two of these intervals have a point
in common, their union is a single open interval. Since the series l: JL(In) con-
verges, there is an index m such that l:~=m+l JL(In) < c. Put B = U;:'=1 In,
E = U~=m+l In, h = X B , and cp = X E . Then h is a step function. Since
A c B U E, we have f ~ h + cp. Then
Ih - II ~ Ih + cp - II + Icpl = (h + cp - f) - cp
Consequently,
Jlh-fl~J(h+cp-f)+ J cp
For the third part of the proof it suffices to consider an f that is an integrable
step function. For this, in turn, it is enough to prove that the characteristic
function of a single compact interval can be approximated in Ll by a continuous
function that vanishes outside that interval. This can certainly be done with a
piecewise linear function. •
The linear space £1 (X, A, JL) becomes a pseudo-normed space upon intro-
J
ducing the definition Ilfll = If I· Since a function that is equal to 0 almost
everywhere will satisfy Ilfll = 0, we will not have a true norm unless we interpret
each I in Ll as an equivalence class consisting of all functions equal to f almost
everywhere. This manner of proceeding is eventually the same as introducing
the null space of the norm, N = {g E Ll : Ilgll = O}, and considering the quo-
tient space £1 IN. The elements of this space are cosets 1+ N, and the norm
of a coset is defined to be Ilf + Nil = inf{llf + gil: 9 EN}. This is the same as
Ilfll·
A consequence of these considerations is that for I in Ll, the expression
I (x) is meaningless. After all, f stands for a class of functions that can differ
from each other on sets of measure O. The single point x is a set of measure
zero, and we can change the value of f at x without changing I as a member
of Ll. The conventional notation J f(x) dx should always be interpreted as
J f· Remember that the integral of f is not affected by changing the values
408 Chapter 8 Measure and Integration
of f on any set of measure 0, such as the set of all rational points on the line!
Problems 8.6
Prove that the set of measurable functions on a given measurable space is closed under
these lattice operations. Prove the same assertion for L1
2. Complete the proof of Lemma 1.
3. Let X = (0,1), let A be the a-algebra of Lebesgue measurable subsets of (0, 1), and let J.l
be Lebesgue measure on A. Which of these functions are in £1(X,A,J.l): (a) f(x) = x- 1,
(b) g(x) = X- 1/ 2 , (c) h(x) = exp(-x- 1 ), (d) k(x) = log x?
4. If f = 9 almost everywhere, does it follow that f+ = g+ and f- = g- almost everywhere?
What can be said of the converse?
5. Prove or disprove: If Un} is a sequence of measurable functions such that fn t f, then
J fn t J f·
J J
6. Prove that if f E L1(X, A, J.l), then If I E L1 and I fl ~ If I· Verify that If I = f+ + f-,
that f = f+ - f-, that 0 ~ f+ ~ If I, and 0 ~ f- ~ If I·
7. Show that from the five hypotheses fn integrable, h integrable, 9 measurable, fn -t f,
J J
Ifni ~ h one cannot draw the conclusion fng -t fg. Find an appropriately weak
additional hypothesis that makes the inference valid.
8. This problem and the next four involve convergence in measure. If I, h, h, ... are
measurable functions on a measure space (X, A, J.l) and if limn J.l{x : Ifn(x) - f(x)1 > E}
is 0 for each E > 0, then we say that fn - t f in measure. Prove that if In - t f
almost uniformly, then fn -t f in measure. (Almost uniform convergence is defined in
Section 8.4, page 397.)
9. Consider the following sequence of intervals: A1 = [0,1]' A2 = [0,1/2], A3 = [1/2,1],
A4 = [0,1/4]' A5 = [1/4,1/2], A6 = [1/2,3/4]' A7 = [3/4,1]' As = [0,1/8], ... Let fn
denote the characteristic function of An. Prove that f n - t 0 in measure but / n does not
converge almost everywhere.
10. Using Lebesgue measure, test the sequence fn = X 1n - 1,nj for pointwise convergence,
convergence almost everywhere, convergence almost uniformly, and convergence in mea-
sure.
11. Let (X,A,J.l) be a measure space such that J.l(X) < 00. Let /,/1,12, ... be real-valued
measurable functions such that /n -t / almost everywhere. Prove that /n -t f in
measure.
12. Prove that the Monotone Convergence Theorem (page 401), Fatou's Lemma (page 403),
and the Dominated Convergence Theorem (page 406) are valid for sequences of functions
converging in measure.
13. Prove that if A is a Lebesgue measurable set of finite measure, then for each E > 0 there
is a finite union B of open intervals such that J.l(A '" B) < E.
14. Prove that if / is Lebesgue measurable and finite-valued on a compact interval, then
there is a sequence of continuous functions gn defined on the same interval such that
gn -t / almost uniformly.
15. Prove Lusin's Theorem: If / is Lebesgue measurable and finite-valued on [a, b] and
if E > 0, then there is a continuous function 9 defined on [a, b] that has the property
J.l{x : /(x) # g(x)} < E.
Section 8.7 The LP -Spaces 409
Throughout this section, a fixed measure space (X, A, fL) is the setting. For
each p > 0, the notation LP(X,A,fL), or just LP, will denote the space of all
measurable functions f such that J ifl P < 00. The case when p = 1 has been
considered in the preceding section. We write
although this equation generally does not define a norm (nor even a seminorm
if p < 1). The case p = 00 will be included in our discussion by making two
special definitions. First, f E L OO shall mean that for some M, If(x)1 ~ M
almost everywhere. Second, we define
·11
Theorem 1. HOlder's Inequality. Let 1 ~ p ~ 00, - + - = 1,
p q
f E LP, and 9 E Lq. Then fg E Ll and
(3)
Proof. The semi norms involved here are homogeneous: Ilvll = 1,\lllfll. Con-
sequently, it will suffice to establish Equation (3) in the special case when Ilfllp =
Ilgllq = 1. At first, let p= 1 and q= 00. Since 9 E Loo, we have Ig(x)1 ~ M a.e.
for some M. From this it follows that J Ifgl ~ M J If I = Mllfil l . By taking
the infimum for all M, we obtain Ilfgll l ~ Ilflllllgll oo .
Suppose now that p > 1. We prove first that if a > 0, b> 0, and ~ t ~ 1, °
then atb l - t ~ ta + (1 - t)b. The accompanying Figure 8.1 shows the functions
of t on the two sides of this inequality (when a = 2 and b = 12). It is clear that
we should prove convexity of the function 'P(t) = atb l - t . This requires that we
prove 'P"(t) ? 0. Since log'P(t) = tioga + (1- t) 10gb, we have
12~
10
Figure 8.1
Now let a = If(x)IP, b = Ig(xW, t = lip, 1- t = 11q. Our inequality yields
then If(x)g(x)1 ~ ~lf(x)IP + ~lg(xW. By hypothesis, the functions on the right
in this inequality belong to Ll. Hence by integrating we obtain
= < 00
By the homogeneity of Minkowski's inequality, we may assume that I f +9 II P = 1.
Observe now that Holder's Inequality is applicable to the product Ifllf + glP-l
and to the product Igllf + gIP-l.
Consequently,
•
Section 8.7 The LP-Spaces 411
Proof. The case p = 00 is special and is addressed first. Let lfn] be a Cauchy
sequence in L 00. Define
By Problem 1, these sets all have measure o. Hence the same is true of their
union, E. If x EX" E, then Ifn(x) - fm(x)1 ~ Ilfn - fmlloo'
and thus [fn(x)]
is a Cauchy sequence in IR for each x EX" E. This sequence converges to a
number that we may denote by f(x). Define f(x) = 0 for x E E. On X" E,
If(x)1 = lim Ifn(x)1 ~ lim Ilfnll oo
< 00. (Use the fact that a Cauchy sequence
in a metric space is bounded.) Thus, f E L=. To prove that Ilfn - -+ 0, fll oo
let € > 0 and select N so that Ilfn - fmll oo
< € when n > m > N. Then
Ifn(x) - fm(x)1 < € on X" E, and If(x) - fm(x)1 ~ € for m > N.
All the cases when 1 ~ P < 00 can be done together. Let [fn] be a Cauchy
sequence in LP. For each k = 1,2,3, ... there exists a least index nk such that
the following implication is valid:
It follows that nl ~ n2 ~ ... and that Ilfnk+l - fnkllp < 2- k. Let 90 = 0 and
9k = fnk+l - Ink for k ~ 1. Then
00 00
2: 119kll p < 2: Tk = 1
k=O k=l
Problems 8.7
1. Prove that if IE £,x', then the set {x: I/(x)1 > 1l/11co} has measure O.
2. Let X be any set, and take A to be 2 x and J.L to be counting measure. In this setting, the
space LP(X, A, J.L) is often denoted by ep(X). Prove that for each I E ep(X) the support
of I is countable. Here, the support of I is defined to be {x EX: I(x) # O}.
3. (Continuation) Prove that if X is a set of n points, then dimeP(X) is n.
4. (Continuation) For n = 2, draw the set {f E ep(X) : 1I/IIp = I} using p = 1,2,10,00.
5. Let In E LOO and In ~ O. Prove that sup Il/n II co = II sup Inll oo '
6. In LP(X, A, J.L), write I == 9 if III - gllp = O. Prove that I == 9 if and only if I = 9 a.e.
(Thus the equivalence relation is independent of p.) Prove that the equivalence relation
is "consistent" with the other structure in LP by establishing that the conditions h == 12
and gl == g2 imply that II + gl == 12 + g2, >'h == >'/2, and Ilhll = 111211·
7. The space ep(J'\I) of Problem 2 is usually written simply as ep, and if IE ep, we usually
write In instead of I(n). Show that if IE ep , 9 E eq , lip + llq = 1, then Ig E e1 and
L:::"=ll/ngnl ~ (L:::"=ll/nIP)I/P(L:::"=llgnlq)l/q.
8. Let (E, II II) be a pseudo-normed linear space. Let M = {f E E : 11/11 = O}. Prove that
M is a linear subspace of E. In the quotient space ElM the elements are cosets 1+ M.
Define IIII + Ml = inf{1I1 + gil: gEM}. Show that this defines a norm in ElM.
9. Let I and In belong to Loo(X,A, J.L). Show that 111- Inll oo ~ 0 if and only if In ~ I
almost uniformly. (See the definition in Section 8.4, page 397.)
10. Let (X, A, J.L) be a measure space for which J.L(X) < 00. Show that if 0 < a < (3 ~ 00,
then L!3 C La. Show that the hypothesis J.L(X) < 00 cannot be omitted.
11. Prove that if 0 < a < (3 ~ 00, then ea c e!3. (See the definition in Problem 7.)
12. Show that in the proof of the Riesz-Fischer Theorem (Theorem 3), the sequence [/nJ
need not converge to I almost everywhere. Consider, for example, the characteristic
functions of the intervals [O,IJ, [0, ~J, [~, IJ, [0, %], [%,~], [~, IJ, ... Show that Il/nll p ~ 0
but In(x) is divergent for each x in [O,IJ.
13. Let (X, A, J.L) be a measure space for which J.L(X) < 00. Prove that for each I E Loo,
limp--+ oo II/lIp = 11/11 00 ,
14. Prove that for any measure space, if 0 < a < (3 ~ 00, then LOO n La C L'x n Li3.
15. Prove for any measure space: If 0 < a < (3 < I < 00, then L'" n L -y C Li3 n L -y.
16. Let I(x) = [xlog 2(llx)J-l and prove that I is in £1[0, ~J but is not in Up>! LP[O, H
17. Prove that if [/nJ is a Cauchy sequence in LP, then it has a subsequence that converges
almost everywhere.
18. Let I and In belong to LP. If Il/n - Illp ~ 0 and In ~ 9 a.e., what relationship exists
between I and g?
19. Let (X,A,J.L) be a measure space for which J.L(X) = 1. (Such a space is a probability
space.) Prove that if I and 9 are positive, measurable, and satisfy Ig ~ 1, then the
J J
inequality I . 9 ~ 1 holds.
20. Prove that if In E £1(X,A,J.L) and L:::"=!ll/nll! < 00, then In ~ 0 a.e.
21. If 0 < Jill < 00, then there is a continuous function 9 having compact support such
that Jig # O.
22. Prove that if IE LP(X, A, J.L) for all sufficiently large values of p, and if the limit of 1I/IIp
Section S.S The Radon-Nikodym Theorem 413
25. Prove that for I E L1(X, A, Jl) we have I f II ~ f III. When does equality occur here?
26. Let 1 < p < 00, l/p+ l/q = 1, and I E LP. Prove that I/IP E L1, that I/lp-1 E Lq, and
that for r oF 0, I/lr E LP/r.
where X A is the characteristic function of the set A. The set A and the function
I should be measurable with respect to the underlying measure space (X, A, Il).
Now suppose that a second measure 1/ is defined on the a-algebra A. If
I/(A) = 0 whenever Il(A) = 0, we say that 1/ is absolutely continuous with
respect to Il, and we write 1/ « Il.
One easy way to produce such a measure 1/ is given in the next theorem.
(1) (A E A)
1/ (U
,=1
Ai) = 1 ,I = JIX = JI L X = JL IX
UA,
UAi Ai Ai
00
= LI/(A;)
i=l
414 Chapter 8 Measure and Integration
This calculation used the Monotone Convergence Theorem (Section 8.5, page 401).
The absolute continuity of v is clear: if I-l(A) = 0, then v(A) = fA f = o. •
It is natural to seek a converse for this theorem. Thus we ask whether each
measure that is absolutely continuous with respect to I-l must be of the form
in Equation (1). The answer is a qualified "Yes." It is necessary to make a
slight restriction. Consider a general measure space (X, A, I-l). We say that X
(or I-l) is <T-finite if X can be written as a countable union of measurable sets,
each having finite measure. For example, the real line with Lebesgue measure is
<T-finite, since we can write IR = U~=d-n, n].
Proof. We prove the theorem first under the assumption that I-l(X) < 00 and
v(X) < 00. Consider the Hilbert space L2 = L2(X, A, I-l + v). For any f in
L2, define ip(f) = f f dl-l. It is easily verified that ip is a linear functional on
I}. Furthermore it is bounded (continuous) because by the Holder Inequality
(Theorem 1 in Section 8.7, page 409)
J
lip(f)1 = I f· 1dl-ll ~ J I.1
If d(1-l + v) ~ IIfl12 111112
By the Riesz Representation Theorem for Hilbert space (Section 2.3, page 81)
there exists an element ho in L2 such that
Thus I-l(B) = 0 and ho(x) > 0 a.e. (with respect to I-l). Since v « I-l, we have
v(B) = 0 also. Hence for any A E A,
To see that h ~ 0 a.e., with respect to J.L, write A = {x : h(x) < O}, so that
o ~ I/(A) = fA hdJ.L ~ 0, whence J.L(A) = O.
For the second half of the proof we assume only that J.L and 1/ are a-finite.
Then X = U~=l An = U~=l Bn, where An and Bn are measurable sets such
that J.L(An) < 00 and I/(Bn) < 00 for each n. Write the doubly-indexed family
Ai n B j as a sequence Cn. Then X = UCn , J.L(Cn ) < 00, and I/(Cn) < 00.
With no loss of generality we assume that the sequence [Cn] is disjoint. Define
measures I/n and J.Ln by putting I/n(A) = I/(A n Cn) and J.Ln(A) = J.L(A n Cn).
Since 1/ « J.L, we have I/n « J.Ln for all n. By the first half of the proof there exist
functions hn such that I/n(A) = fA hn dJ.Ln, for all A E A. Since the Cn-sequence
is disjoint, we can define h on X by specifying that h(x) = hn(x) for x E Cn.
Then we have
For the uniqueness of h, suppose that fA h dJ.L = fA h' dJ.L for all A E A. Letting
A = {x : h(x) > h'(X)}, we have
It follows that J.L(A) = O. By symmetry, the set where h'(x) > h(x) is also of
measure O. Hence h = h' a.e. (J.L). •
The preceding paragraphs have involved the concept of absolute continuity
of one measure with respect to another. The antithesis of this is "mutual singu-
larity." Two measures J.L and 1/ on the same measure space are said to be mu-
tually singular if there is a measurable set B such that J.L(B) = I/(X "B) = O.
This relation is written symbolically as J.L .1 1/. It is obviously a symmetric
relation.
(A E A)
Obviously, 1/1 + 1/2 = 1/. By Problem 8.2.15, page 391, 1/1 and 1/2 are measures.
Let us prove that 112 .l J.L. Since h = 0 on B, we have J.L(B) = fB h d(J.L + II) = O.
On the other hand, 1/2 (X "B) = I/((X "B) n B) = 1/(0) = O. Next, we prove
416 Chapter 8 Measure and Integration
The same argument will prove that //(A n D) = //4(A). Hence //2 = //4. Since
// = //1 + //2 = //3 +//4, one is tempted to conclude outright that //1 = //3,
However, if A is a set for which //2(A) = //4(A) = 00, we cannot perform the
necessary subtraction. Using the O'-finite property of the space, we find a disjoint
sequence of measurable sets Xn such that X = UXn and //(Xn) < 00. Then
//1(X n n A) = //3(X n n A) for all n and for all A. It follows that //1(A) = //3(A)
and that //1 = //3'
•
Problems 8.8
1. Is the relation of absolute continuity (for measures) reflexive? What about symmetry
and transitivity? Is it a partial order? A linear order? A well-ordering? Give examples
to support each conclusion.
2. Solve Problem 1 for the relation of mutual singularity.
dll
3. The function h in the Radon-Nikodym Theorem is often denoted by - . Prove that
dJ.L
dll dll dJ.L .
- = - - If II « J.L « (J.
d(J dJ.L d(J
d(1I + (J) dll d(J.
4. Refer to Problem 3 and prove that - - - = - +- If II « J.L and (J « J.L.
dJ.L dJ.L dJ.L
dJ.L dll
5. Refer to Problem 3 and prove that - - = 1 If. II « J.L « II.
dll dJ.L
7. Let X = [0,1]' let A be the family of all Lebesgue measurable subsets of X, let II be
Lebesgue measure, and let J.L be counting measure. Show that II «J.L. Show that there
exists no function h for which II(A) = fA h dJ.L. Explain the apparent conflict with the
Radon-Nikodym Theorem.
8. Prove that in the Radon-Nikodym Theorem, h(x) < 00 for all x. Show also that if
II(X) < 00, then h E LI(X,A,J.L).
9. Extension of Radon-Nikodym Theorem. Let J.L and II be measures on a measurable space
(X,A). Suppose that there exists a disjoint family {B",} of measurable sets having these
properties:
(i) J.L(B",) < 00 for all Q.
is meaningless.
Proof. For the first assertion, let f..t1 and f..t2 be measures, and suppose that f..t1
is finite. Put f..t = f..t1 - f..t2. To see that f..t is a signed measure, note first that f..t
does not assume the value +00. Next, we have f..t(0) = 0 since f..t1 and f..t2 have
this property. Finally, let {Ai} be a disjoint sequence of measurable sets. Then
00 00
= :~::>l(Ai) - Lf..t2(Ai )
i=l i=l
n 00
Notice that on the second line of this calculation the first sum is finite, although
the second may be infinite.
For the other half of the proof, let f..t be a signed measure that does not
assume the value +00. In an abuse of language, we say that a (measurable) set
S is positive if f..t(A) ~ 0 for all measurable subsets A in S. Define
Let Sn be a sequence of positive sets such that f..t(Sn) t (), and define P =
U~=l Sn. Let us prove that P is a positive set. If A c P, we write
Since An C Sn, we have f..t(An) ~ O. Since A is the union of the disjoint family
{An}, it follows that
00
f..t(A) = L f..t(An) ~ 0
n=l
Since P is a positive set,
measure. Define sets AI, A 2, ... as follows. Let n1 be the first positive integer
such that there exists a set Al satisfying
Since 0 < Jl(A) = Jl(A ...... AI) + Jl(A 1), we see that A . . . Al is a subset of A having
positive measure. It is therefore not a positive set. Hence there is a first positive
integer n2 and a set A2 such that
Continue in this manner, finding at the kth step a set Ak and an integer nk such
that
By the symmetry in this situation, we can prove Jl1 ~ VI. Hence Jl1 = VI and
Jl2 = V2· •
Proof. By the preceding theorem, there exist measures v+ and V- such that
V = v+ - v- and v+ .1. V-. Consequently, there exists a measurable set P for
which v+(X . . . P) = 0 = v-(P). If A is a measurable set satisfying Jl(A) = 0,
then Jl(A n P) = 0 and v(A n P) = 0, by the absolute continuity. Hence
It follows that hI and h2 are finite almost everywhere. Thus, there is nothing
suspicious in the equation
1. Use the Jordan decomposition theorem to prove the Hahn decomposition theorem.
2. Prove that J.L+ in Theorem 1 has the property
6. Let J.L and v be measures on the measurable space (X, A). Suppose that 0 is the only
measurable function such that v(A) ~ fA f dJ.L, for all A E A. Prove that v.l J.L.
7. Let (X, A, J.L) be a measure space such that each singleton {x} is measurable. Define
v(A) to be the sum of all J.L({x}) as x ranges over A. Does this define a measure on A?
8. Is the function h in Theorem 2 unique?
Suppose that two measure spaces are given: (X, A, p,) and (Y, H, v). Is there a
suitable way of making the Cartesian product X x Y into a measure space? In
particular, can this be done in such a way that
( f(x,y) = { ( f(x,y)dv(y)dp,(x)?
ixxY ixiy
Section 8.10 Product Measures and Fubini's Theorem 421
Ex = {y E Y : (x, y) E E}
EY={xEX:(x,Y)EE}
Proof. Define
We shall prove that M is a O'-algebra containing all rectangles. From this it will
follow that M ::) A ® B, since the latter is the smallest O'-algebracontaining all
rectangles. Then, if E E A ® B, we can conclude that E E M and that EY E A
for each y. Now consider any rectangle E = A x B. If Y E B, then EY = A E A.
If Y f- B, then EY = 0 E A. Thus in all cases EY E A and E E M. Next, let E
be any member of M. The equation
(1)
]Y co
Er
CO
(2) [
i~ Ei = ild
Proof. Let C be the collection referred to, and let E and F be members of C.
Then E and F have expressions E = U~=1 (Ai x Bi) and F = U;:1 (Cj x Dj ),
both being unions of disjoint families. Since
n m
(3) EnF= UU[(AiIlCj) x (BinDj)]
i=1 j=1
422 Chapter 8 Measure and Integration
we see that E n FEe, and that C is closed under the taking of intersections.
From the equation
we get
(X X
n
Y) '- E = (X x Y) '- U(A i x B i ) =
i=1
nn
i=1
[(X x Y) '- (Ai x B i )]
= n
n
i=1
{[(X,-Ai ) x Bi] U [X x (Y,-Bi)l}
Proof. Assume the hypothesis in (1), and define Bn = An '- An-I. The se-
quence {Bn} is disjoint, and consequently,
To establish (2), assume its hypothesis. Then {AI '- An} is an increasing se-
quence, and by part (1) we have
= n--+oo
lim JL(A I '- An) = lim (JL(Ad -
n--+oo
JL(An))
= JL(AI) - n-+oo
lim JL(An) •
Proof. Let M and S be respectively the monotone class and the a-algebra
generated by C. Since every a-algebra is a monotone class, we have M C S.
The rest of the proof is devoted to showing that M is a a-algebra (so that
ScM).
For any set F in the monotone class M we define
KF = {A: the sets A" F, F" A, and AuF belong to M}
Assertion 1 KF is a monotone class.
There are two properties to verify, one of which we leave to the reader. Suppose
that Ai E KF and Al C A2 C ... Let A = U:l Ai. Then Ai" F, F" Ai, and
Ai U F all belong to M and form monotone sequences. Since M is a monotone
class, we have
00
i=1
00
i=1
00
i=1
These calculations establish that A E K F .
Assertion 2 If FE C, then C C K F .
To prove this let E be any element of C. Since C is an algebra, we have E" F,
F" E, and EuF all belonging to C and to M. By the definition of KF, E E K F.
Assertion 3 If FE C, then M C KF.
To prove this, note that KF is a monotone class containing C, by Assertions 1
and 2. Hence KF :J M, since M is the smallest monotone class containing C.
Assertion 4 If F E C and E E M, then E E KF.
This is simply another way of expressing Assertion 3.
Assertion 5 If FE C and E E M, then F EKE.
This is true because the statement E E KF is logically equivalent to F EKE.
Assertion 6 If E E M, then C C KE.
This is a restatement of Assertion 5.
Assertion 7 If E E M, then M C KE.
This follows from Assertions 6 and 1 because KE is a monotone class containing
C, while M is the smallest such monotone class.
Assertion 8 M is an algebra.
To prove this, let E and F be members of M. Then F E KE by Assertion 7.
Hence E" F, F" E, and E U F all belong to M.
Assertion 9 M is a a-algebra.
To prove this, let Ai E M and define Bn = Al U··· U An. By Assertion 8, M
is an algebra. Hence Bn E M and Bl C B2 C ... Since M is a monotone class,
U~=1 Bn EM. It follows that U~=1 An EM. •
424 Chapter 8 Measure and Integration
We can carry out the same argument for v(Ex) to see that E E M.
In the second part of the proof, let C denote the class of all sets in A @ B
that are unions of finite disjoint families of rectangles. By Lemma 2, C is an
algebra. We shall prove that C c M. Let E E C. Then E = U7=1 Ei where
E 1 , ... ,En is a disjoint set of rectangles. Hence
exist An E A and Bn E B such that X = U~=1 An, Y = U~=1 Bn, J1(An) < 00,
and I/(Bn) < 00.We may suppose further that Al C A2 C ... and that Bl C
B2 C ... Let {E;} be a decreasing sequence of sets in M, and set E = n~=1 En.
We want to prove that E E M. Since E = U~=I[En (An X Bn)] and since Mis
closed under "increasing unions," it suffices to prove that fJ n (An X Bn) E M
for each n. We therefore define
00
This measure ¢ is called the product measure of J.l and v. It is often denoted
by J.l Q9 1/.
426 Chapter 8 Measure and Integration
Proof. It is clear that the set function ¢ has the property ¢(0) = 0 and the
property ¢(E) ~ O. If {Ei} is a disjoint sequence of sets in A@ B, then {En is
a disjoint sequence in A. Hence, by the Dominated Convergence Theorem,
¢(Q Ei) = i (Q
J-L ( Eir) dll(Y) = i J-L(Q EY) dll(Y)
the preceding lemma asserts that (3) is also true in this case. Part (4) is true
by symmetry. For part (5), write
The other equality is similar. Thus, Theorem 2 is true when f is the character-
istic function of a measurable set.
If f is a simple function, then f has properties (1) to (5) by the linearity of
the integrals.
Section 8.10 Product Measures and Fubini's Theorem 427
(II) ft(U Ai) = L~1 ft(Ai) for any disjoint sequence of measurable sets Ai.
L Ift(Ai)1
00
Iftl(A) = sup
i=1
where the supremum is over all partitions of A into a disjoint sequence of measur-
able sets. It is clear that Ift(A)1 ::::; Iftl(A), because {A} is a competing partition
of A. The theory goes on to establish that Iftl is an ordinary (i.e., nonnegative)
measure and Iftl(X) < 00. This feature distinguishes the theory of complex or
signed measures from the traditional nonnegative measures. References: [DS],
[Roy], [Ru3], [HS], [Berb3], [Berb4].
The Fubini Theorem in this new setting is as follows:
Problems 8.10
429
430 References
[GZ2] Garcia, C. B. and W. I. Zangwill, "Finding all solutions to polynomial systems and
other systems of equations," Math. Programming 16 (1979), 159-176.
[GZ3] Garcia, C. B. and W. I. Zangwill, Pathways to Solutions, Fixed Points and Equilibria,
Prentice Hall, Englewood Cliffs, N.J., 1981.
[GF] Gelfand, I. M. and S. V. Fomin, Calculu.. of Variations, Prentice-Hall, Englewood Cliffs,
N.J., 1963.
[GV] Gelfand, I. M. and N. Ya. Vilenkin, Generalized Functions, 4 volumes, Academic Press,
1964. (Vol. 1 is by Gelfand and G.E. Shilov.)
[Go] Godel, K., The Consistency of t te Axiom of Choice and of the Generalized Continuum
Hypothesis with the Axioms of Set Theory, Princeton University Press, 1940.
[GP] Goffman, C. and G. Pedrick, First Course in Functional Analysis, Chelsea Publishing
Co., New York. Reprint, American Mathematical Society.
[Gol] Goldberg, R. R., Fourier Transforms, Cambridge University Press, 1970.
[Gold] Goldstein, A. A., Constructive Real Analysis, Harper and Row, New York, 1967.
[Gr] Graves, L. M., The Theory of Functions of Real Variables, McGraw-Hill, New York, 1946.
[Green] Green, G., Mathematical Papers of George Green, edited by N.M. Ferrers, Amer.
Math. Soc., Providence, RI, 1970.
[Gre] Greenberg, M. D., Foundations of Applied Mathematics, Prentice Hall, Englewood Cliffs,
NJ, 1978.
[Gri] Griffel, D. H., Applied FUnctional Analysis, John Wiley, New York, 1981.
[Gro] Groetsch, C. W., Elements of Applicable F'lmctional Analysis, Marcel Dekker, New York,
1980.
[Hall] Halmos, P. R., "What does the spectral theorem say?," Amer. Math. Monthly 70
(1963),241-247.
[HaI2] Halmos, P. R., A Hilbert Space Problem Book, van Nostrand, Princeton, 1967.
[HaI3] Halmos, P. R., Introduction to Hilbert Space, Chelsea Publishing Co., New York, 1951.
[HaI4] Halmos, P. R., Measure Theory, Van Nostrand, New York, 1950. Reprint, Springer-
Verlag, New York.
[Hel] Helson, H., Harmonic Analysis, Addison-Wesley, London, 1983.
[Hen] Henrici, P. Discrete Variable Methods in Ordinary Differential Equations, Wiley, New
York, 1962,
[Hesl] Hestenes, M. R., Calculus of Variations and Optimal Control Theory, Wiley, New York,
1965.
[Hes2] Hestenes, M. R., "Elements of the Calculus of Variations" pp. 59-91 in Modem Math-
ematics for the Engineer, E. F. Beckenback, ed., McGraw-Hill, New York, 1956.
[HS] Hewitt, E. and K. Stromberg, Real and Abstract Analysis, Springer-Verlag, New York,
1965.
[HP] Hille, E. and R. S. Phillips, FUnctional Analysis and Semigroups, Amer. Math. Soc.,
Providence, RI 1957.
[HS] Hirsch, M. W. and S. Smale, "On algorithms for solving f(x) = 0," Comm. Pure Appl.
Math. 32 (1979), 281-312.
[Hoi] Holmes, R. B., Geometric Functional Analysis and its Applications, Springer-Verlag,
New York, 1975.
[Ho] Hormander, 1., The Analysis of Linear Partial Differential Operators I, Springer-Verlag,
Berlin, 1983.
[Horv] Horvath, J., Topological Vector Spaces and Distributions, Addison-Wesley, London,
1966.
[Hu] Huet, D., Distributions and Sobolev Spaces, Lecture Note #6, Department of Mathemat-
ics, University of Maryland, 1970.
[Hur] Hurley, J. F., Multivariate Calculus, Saunders, Philadelphia, 1981.
[In] Ince, E. L., Ordinary Differential Equations, Longmans Green, London, 1926. Reprint,
Dover Publications, New York, 1948.
[IK] Isaacson, E. and H. B. Keller, Analysis of Numerical Methods, Wiley, New York, 1966.
[Jal] James, R., "Weak compactness and reflexivity," Israel J. Math. 2 (1964), 101-119.
References 433
[Ja2] James, R., "A non-reflexive Banach space isometric with its second conjugate space,"
Proc. Nat. Acad. Sci. U.S.A. 37 (1951),174-177.
[Jam] Jameson, G. J. 0., Topology and Normed Spaces, Chapman and Hall, London, 1974.
[JKP] Jaworowski, J., W. A. Kirk, and S. Park, Antipodal Points and Fixed Points, Lecture
Note Series, Number 28, Seoul National University, Seoul 1995.
[Jon] Jones, D. S., The Theory of Generalised Functions, McGraw-Hill, 1966. 2nd. Edition,
Cambridge University Press, 1982.
[Jo] Jones, F., Lebesgue Integration on Euclidean Space, Jones and Bartlet, Boston, 1993.
[JLJ] Jost, J., and X. Li-Jost, Calculus of Variations, Cambridge University Press, 1999.
[KA] Kantorovich, L. V. and G. P. Akilov, FUnctional Analysis in Normed Spaces, Pergamon
Press, London, 1964.
[KK] Kantorovich, L.V. and V.l Krylov, Approximate Methods of Higher Mathematics, Inter-
science, New York, 1964.
[Kar] Karmarkar, N., "A new polynomial-time algorithm for linear programming," Combina-
torica 4 (1984), 373-395.
[Kat] Katznelson, Y., An Introduction to Harmonic Analysis, Wiley, New York, 1968. Reprint,
Dover Publications, New York.
[Kee] Keener, J. P., Principles of Applied Mathematics, Addison-Wesley, New York, 1988.
[Kel] Kelley, J. L., General Topology, D. Van Nostrand, New York, 1955. Reprint, Springer-
Verlag, New York.
[KN] Kelley, J. L., 1. Namioka, et a!., Linear Topological Spaces, D. Van Nostrand, New York,
1963.
[KS] Kelley, J. L. and T. P. Srinivasen, Measure and Integral, Springer-Verlag, New York,
1988.
[Kello] Kellogg, O. D., Foundations of Potential Theory, Dover, New York.
[Ken] Keener, J. P., Principles of Applied Mathematics, Perseus Books Group, Boulder, CO,
1999.
[KC] Kincaid, D. and Cheney, W., Numerical Analysis, 3nd ed., Brooks/Cole, Pacific Grove,
CA., 2001.
[KF] Kolmogorov, A. N. and S. V. Fomin, Introductory Real Analysis, Dover Publications,
New York, 1975.
[Ko] Korner, T. W., Fourier Analysis, Cambridge University Press, 1988.
[Kras] Krasnoselski, M.A., Topological Methods in the Theory of Nonlinear Integral Equations,
Pergamon, New York, 1964.
[Kr] Kress, R. Linear Integral Equations, Springer-Verlag, Berlin, 1989. 2nd edition, 1999.
[Kre] Kreysig, E., Introductory Functional AnalYSis with Applications, Wiley, New York, 1978.
[KRN] Kuratowski, K. and C. Ryll-Nardzewski, "A general theorem on selectors", Bull. Acad.
Polonaise Sciences, Serie des Sciences Math. Astr. Phys. 13 (1965), 397-403.
[Lane] Lanczos, C., Applied Mathematics, Dover Publications, New York, 1988.
[Lanl] Lang, S., Analysis II, Addison-Wesley, London, 1969.
[Lan2] Lang, S., Introduction to Differentiable Manifolds, Interscience, New York, 1962.
[Las] Lass, H., Vector and Tensor Analysis, McGraw Hill, New York, 1950.
[Lax] Lax, P. D., "Change of variables in multiple integrals," Amer. Math. Monthly 106
(1999),497-501.
[LSU] Lebedev, N. N., 1. P. Skalskaya, and Y. S. Uflyand, Worked Problems in Applied Math-
ematics, Reprint, Dover Publications, New York, 1979.
[Leis] Leis, R., Initial Boundary Value Problems in Mathematical Physics, Wiley, New York,
1986.
[Li] Li, T. Y. "Solving polynomial systems," Math. Intelligencer 9 (1987), 33-39.
[LL] Lieb, E. H. and M. Loss, Analysis, Amer. Math. Soc., Providence, 1997.
[LT] Lindenstrauss, J. and L. Tzafriri, Classical Banach Spaces I, Springer-Verlag, Berlin.
[LM] Lions, J.L. and E. Magenes, Nonhomogeneous Boundary Value Problems and Applica-
tions, Springer-Verlag, New York, 1972.
434 References
[Lo] Logan, J. D., Applied Mathematics: A Contemporary Approach, Wiley, New York, 1987.
[Lov] Lovett, W. V., Linear Integral Equations, McGraw-Hill, New York, 1924. Reprint, Dover
Publications, New York, 1950.
[Loo] Loomis, L. H., An Introduction to Abstract Harmonic Analysis, Van Nostrand, New
York, 1953.
[Lue1] Luenberger, D. G., Introduction to Linear and Nonlinear Programming, Addison-
vVesley, London, 1965.
[Lue2] Luenberger, D. G., Optimization by Vector Space Methods, Wiley, New York, 1969.
[MT] Marsden, J. E. and A. J. Tromba, Vector Calculus (2nd ed.), W.H. Freeman, San fran-
cisco, 198!.
[Mar] Martin, J. B., Plasticity: Fundamentals and General Results, MIT Press, Cambridge,
MA,1975.
[Mas] Mason, J., Methods of Functional Analysis for Applications in Solid Mechanics, Elsevier,
Amsterdam, 1985.
[Maz] Mazja, V. G., Sobolev Spaces, Springer-Verlag, Berlin, 1985.
[McK] McKinsey, J. C. C., Introduction to the Theory of Games, McGraw-Hill, New York,
1952.
[Mey] Meyer, G. H., "On solving nonlinear equations with a one-parameter operator embed-
ding," SIAM J. Numer. Analysis 5 (1968), 739-752.
[Michl] Michael, E., "Continuous Selections," Ann. Math. 63 (1956), 361-382.
[Mich2] Michael, E., "Selected Selection Theorems," Amer. Math. Monthly 63 (1956), 233-
238.
[Mil] Milne, W. E., Numerical Solution of Differential Equations, Dover, New York.
[Moo] Moore, R. E., Computational FUnctional Analysis, Wiley, New York, 1985.
[Morl] Morgan, A. "A homotopy for solving polynomial systems," Applied Math. and Compo
18 (1986), 87-92.
[Mor2] Morgan, A. Solving Polynomial Systems Using Continuation for Engineering and Sci-
entific Problems, Prentice Hall, Englewood Cliffs, N.J., 1987.
[Morr] Morris, P., Introduction to Game Theory, Springer-Verlag, New York, 1994.
[NaSn] Naylor, A.W. and G.R. Snell, Linear Operator Theory in Engineering and Science,
Springer-Verlag, New York, 1982.
[NazI] Nazareth, J. L., "Homotopy techniques in linear programming," Algorithmica 1 (1986),
529-535.
[Naz2] Nazareth, J. L., "The implementation of linear programming algorithms based on ho-
motopies," Algorithmica 15 (1996),332-350.
[Nel] Nelson, E., Topics in Dynamics, Vol 1: Flows, Princeton University Press (1969).
[NSS] Nickerson, H. K., D. C. Spencer, and N. E. Steenrod, Advanced Calculus, van Nostrand,
New York, 1959.
rOD] Oden, J. T. and 1. F. Demkowicz, Applied FUnctional Analysis, CRC Press, New York,
1996.
[OdR] Oden, J. T. and J. N. Reddy, An Introduction to the Mathematical Theory of Finite
Elements, Wiley, New York, 1976.
[01] Olver, F. W. J., Asymptotics and Special Functions, Academic Press, New York, 1974.
[OR] Ortega, J. M. and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several
Variables, Academic Press, New York, 1970.
[Par] Park, Sehie, "Eighty years of the Brouwer fixed point theorem," in Antipodal Points and
Fixed Points by J. Jaworowski, W. A. Kirk, and S. Park. Lecture Notes Series, No. 28, Seoul
National University, 1995. 55-97.
[Part] Parthasarathy, T., Selection Theorems and Their Applications, Lecture Notes in Math.
No. 263, Springer-Verlag, New York, 1972.
[Ped] Pedersen, M., FUnctional Analysis in Applied Mathematics and Engineering, CRC, Boca
Raton, FL, 1999.
[Pet] Petrovskii, I. G., Lectures on the Theory of Integral Equations, Graylock Press, Rochester,
NY, 1957.
References 435
[PM] Polyanin, A. and A. V. Manzhirov, Handbook of Integral Equations, CRC Press, Boca
Raton, FL, 1998.
[PBGM] Pontryagin, L. S., V. G. Boltyanskii, R. V. Gamkrelidze, E. F. Mishchenko, The
Mathematical Theory of Optimal Processes, Interscience Pub!., New York, 1962.
[Pry] Pryce, J. D., Numerical Solution of Sturm-Liouville Problems, Oxford University Press,
1993.
[Red] Reddy, J.N., Applied FUnctional Analysis and Variational Methods in Engineering,
McGraw-Hill, New York, 1986.
[RS] Reed, M. and B. Simon, Methods of Modern Mathematical Physics, Vo!' I, Academic
Press, New York, 1980.
[Rh1] Rheinboldt, W. C., Numerical Analysis of Parameterized Nonlinear Equations, Wiley,
New York, 1986.
[Rh2] Rheinboldt, W. C., "Solution fields of nonlinear equations and continuation methods,"
SIAM J. Numer. Analysis 17 (1980), 221-237.
[Ri] Richtmyer, R. D., Principles of Advanced Mathematical Physics, 2 volumes, Springer-
Verlag, New York, 1978.
[RN] Riesz, F. and B. Sz.-Nagy, FUnctional Analysis, Frederick Ungar, 1955. Reprint, Dover
Publications, New York, 1991.
[Rie] Riesz, T., Perturbation Theory for Linear Operators, Springer-Verlag, New York, 1966.
[Ro] Roach, G. F., Green's FUnctions, 2nd ed., Cambridge University Press, 1982.
[Ros] Rosenbloom, P. C., "The method of steepest descent" in Numerical Analysis, J. H.
Curtiss, ed., Symposia in Applied Math., vo!.VI, 1956, 127-176.
[Roy] Royden, H. 1., Real Analysis, Macmillan, New York, 1968.
[Rub] Rubin, H. and J. E. Rubin, Equivalents of the Axiom of Choice, North Holland Pub!.
Co., Amsterdam, 1985.
[Ru1] Rudin, W. FUnctional Analysis, McGraw-Hill, New York, 1973.
[Ru2] Rudin, W., Fourier Analysis on Groups, Interscience, New York, 1963.
[Ru3] Rudin, W., Real and Complex Analysis, 2nd ed., McGraw-Hill, New York, 1974.
[Sa] Saaty, T. L., Modern Nonlinear Equations, McGraw-Hill, New York, 1967. Reprint, Dover
Publications, New York, 1981.
[Sag] Sagan, H., Introduction to the Calculus of Variations, McGraw-Hill Book Co., 1969.
Reprint, Dover Publications, 1992.
[San] Sansone, G. Orthogonal Functions, Interscience, New York, 1959.
[Schul Schur, I., "Uber lineare Transformationen in der Theorie der unendlichen Reihen," J.
Reine Angew. Math. 151 (1920), 79-11l.
[Schj] Schwartz, J. T., Non-Linear FUnctional Analysis, Gordon and Breach, New York, 1969.
[Schl] Schwartz, L., Mathematics for the Physical Sciences, Addison-Wesley, London, 1966.
[SchI2] Schwartz, 1., Theorie des Distributions, I, II, Hermann et Cie, Paris, 1951.
[SemI Semadeni, Z., Schauder Bases in Banach Spaces of Continuous FUnctions, Lecture
Notes in Mathematics, vo!' 918, Springer-Verlag, New York, 1982.
[Sho] Showalter, R. E., Hilbert Space Methods for Partial Differential Equations, Pitman,
London, a977. (Available on-line from http://ejde.math.swt.edu/ /mono-toc.htm!.)
[SimI Simmons, G. F., Introduction to Topology and Modern Analysis, McGraw-Hill, 1963.
[Sing] Singer, I., Bases in Banach Spaces (2 volumes), Springer-Verlag, Berlin. 1970, 1981.
ISm] Smale, S., "Algorithms for solving equations," Proceedings of the International Congress
of Mathematicians, 1986.
[Sma] Smart, D. R., Fixed Point Theorems, Cambridge University Press, 1974.
[Smi] Smith, K. T., Primer of Modern Analysis, Bogden and Quigley, Belmont, CA, 1971.
Springer-Verlag, Berlin, 1983.
[So] Sobolev, S. 1., Applications of Functional Analysis in Mathematical Physics, Amer. Math.
Soc. Translations Series, 1963.
[Sta] Stakgold, I., Green's FUnctions and Boundary Value Problems, Wiley, New York, 1979.
436 References
[SW] Stein, E. M. and G. Weiss, Introduction to Fourier Analysis on Euclidean Spaces, Prince-
ton University Press, 1971.
[StWi] Stoer, J. and C. Witzgall, Convexity and Optimization in Finite Dimensions, Springer-
Verlag, New York, 1970.
[Str1] Strang, G., Linear Algebra and Its Applications, 3rd ed., Harcourt Brace Jovanov;ch,
San Diego, 1988.
[Str2] Strang, G., Introduction to Applied Mathematics, Wellesley-Cambridge, Wellesley, MA,
1986.
[Sz] Szego, G., Orthogonal Polynomials, American Mathematical Society Colloquium Publica-
tions, vo!' 23, 1959.
[Tay1] Taylor, A. E., Advanced Calculus, Ginn, New York, 1955.
[Tay2] Taylor, A. E., Introduction to Functional Analysis, Wiley, New York, 1958. Reprint,
Dover Publications.
[Tay3] Taylor, A. E. General Theory of Functions and Integration, Blaisdell, New York, 1965.
Reprint, Dover Publications, New York.
[Til] Titchmarsh, E. C., Introduction to the Theory of Fourier Integrals, Oxford University
Press, 1937. Reprinted by Chelsea Pub!. Co., New York, 1986.
[Ti2] Titchmarsh, E. C., The Theory of Functions, Oxford University Press, 1939.
[Tod] Todd, M. J., "An introduction to piecewise linear homotopy algorithms for solving
systems of equations" in Topics in Numerical Analysis, P. R. Turner, ed., Lecture Notes in
Mathematics, vo!' 965, Springer-Verlag, New York, 1982, 147-202.
[Tri] Tricomi, F. G., Integral Equations, Interscience, New York, 1957. Reprint, Dover Publi-
cations, New York, 1985.
[Vic] Vick, J. W., Homology Theory, Academic Press, New York, 1973.
[Wac] Wacker, H. G., ed., Continuation Methods, Academic Press, New York, 1978.
[Wag] Wagner, D. H., "Survey of measurable selection theorems: an update," in Measure
Theory Oberwolfach 1979, D. Kolzow, ed., Lecture Notes in Mathematics, vo!' 794, Springer-
Verlag, Berlin, 1980,
[Wall Walter, G. G., Wavelets and Other Orthogonal Systems with Applications, CRC Press,
Boca Raton, FL, 1994.
[Was] Wasserstrom, E., "Numerical solutions by the continuation method," SIAM Review 15
(1973), 89-119.
[Wat] Watson, L. T., "A globally convergent algorithm for computing fixed points of C 2 maps,"
Appl. Math. Comput. 5 (1979), 297-311.
[Wein] Weinstock, R., Calculus of Variations, with Applications to Physics and Engineering,
McGraw-Hill, New York, 1952. Reprint, Dover Publications 1974.
[West] Westfall, R. S., Never at Rest: A Biography of Isaac Newton, Cambridge University
Press, 1980.
[Whi] Whitehead, G. W., Homotopy Theory, MIT Press, Cambridge, Massachusetts, 1966.
[Wid1] Widder, D. V., Advanced Calculus, 2nd ed., Prentice-Hall, Englewood Cliffs, NJ, 1961.
Reprint, Dover Publications, New York.
[Wie] Wiener, N., The Fourier Integral and Certain of Its Applications, Cambridge University
Press, Cambridge, 1933. Reprint, Dover Publications, New York, 1958.
[Wilf] Wilf, H. S., Mathematics for the Physical Sciences, Dover Publications, New York, 1978.
[Will] Williamson, J. H., Lebesgue Integration, Holt, Rinehart and Winston, New York, 1962.
[Yo] Yosida, K., Functional Analysis, 4th ed., Springer-Verlag, Berlin, 1974.
[Youl] Young, L. C., Lectures on the Calculus of Variations and Optimal Control Theory,
Chelsea Publishing Co., 1980.
[Youn] Young, N., An Introduction to Hilbert Space, Oxford University Press, 1988.
[Ze] Zeidler, E., Applied Functional Analysis, Springer-Verlag, New York, 1995.
[Zem] Zemanian, A. H., Distribution Theory and Transform Analysis, Dover Publications,
New York, 1987.
[Zie] Ziemer, W. P., Weakly Differentiable Functions, Springer, New York, 1989.
[Zien] Zienkiewicz, O. C. and K. Morgan, Finite Elements and Approximation, Wiley, New
York, 1983.
[Zy] Zygmund, A., Trigonometric Series, 2nd ed., Cambridge University Press, 1959.
Index
A-orthogonal, 233 Bounded functional, 81
A-orthonormal, 233 Bounded map, 25
Absolute continuity, 413 Bounded set, 20, 368
Absolutely convergent, 14, 17 Brachistochrone Problem, 153,
Accumulation point, 12 157ff
Adjoint of an operator, 50, 82-83 Brouwer's Theorem, 333
Adjoint space, 34 Calculus of Variations, 152
Affine map, 120 Canonical embedding, 58
Alaoglu Theorem, 370 Cantor set. 46
Alexander's Theorem, 36.') Caratheodory's Theorem, 387
Algebra of sets, 421 Category argument, 45, 46, 47, 48
Almost everywhere, 396 Category, 41
Almost periodic functions, 76, 77 Catenary, 153, 156, 169
Almost uniformly, 397 Cauchy sequence, 10
Angle between vectors, 67 Cauchy-Riemann equations, 199
Annihilator, 36 Cauchy-Schwarz Inequality, 62
Approximate inverse, 188 Cesaro means, 13
Arzela-Ascoli Theorems, 347ff Chain Rule, 121
Autocorrelation, 293 Chain, 31
Axiom of Choice, 31 Characteristic function of a set,
Babuska-Lax-Milgram Theorem, 395
201 Characters, 288
Baire Theorem, 40 Chebyshev polynomials, 214
Banach limits, 37 Closed Graph Theorem, 49
Banach space, 10 Closed Range Theorem, 50
Banach-Alaoglu Theorem, 370 Closed graph, 47
Banach-Steinhaus Theorem, 41 Closed mapping, 47
Bartle-Graves Theorem, 342 Closed set, 16
Base for a topology, 362 Closure of a set, 16, 363
Base, 5 Cluster point, 12
Basin of attraction, 135 Collocation methods, 213ff
Bernoulli, J", 153 Compact operator, 85, 351
Bessel functions, 179 Compact set, 8
Bessel's Inequality, 72 Compactness in the weak
Best approximation, 192 topologies, 369
Bilinear functional, 201 Compactness, 19, 20, 364
Binomial Theorem, 262 Complete measure space, 387
Binomial coefficients, 261 Completeness, 9, 10, 15, 21
Biorthogonal System, 82, 192 Completion of a space, 15, 60
Bohl's Theorem, 339 Composition operator, 252
Borel Sigma-algebra, 384 Condensation of singularities, 46
Borel sets, 392 Conjugate direction methods, 232
Bounded above, 6 Conjugate gradient method, 235
437
438 Index