0% found this document useful (0 votes)
19 views60 pages

175 Main

Uploaded by

myturtle game01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views60 pages

175 Main

Uploaded by

myturtle game01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

MATH 175 NOTES

MOOR XU
NOTES FROM A COURSE BY LEON SIMON

Abstract. These notes were taken during Math 175 (Functional Analysis) taught by Leon
Simon in Spring 2011 at Stanford University. They were live-TEXed during lectures in vim
and compiled using latexmk. Each lecture gets its own section. The notes are not edited
afterward, so there may be typos; please email corrections to [email protected].

1. 3/28
You’ve probably got a vague notion of what functional analysis is about. It is the study
of continuous linear operators on infinite dimensional spaces. This is interesting and has
applications.

1.1. Review of vector spaces. Recall the basics of vector spaces.


Definition 1.1. A linear space or vector space X has two operations: addition and multi-
plication. For u, v ∈ X and a scalar λ, we can define u + v and λu. For this course, scalars
will be in either R or C, and X will be called either a real vector space or a complex vector
space.
There are eight vector space axioms.
(1) u + v = v + u.
(2) u + (v + w) = (u + v) + w.
(3) There exists 0 such that u + 0 = u for all u ∈ X.
(4) If u ∈ X then there exists −u ∈ X with u + (−u) = 0. (We usually write u − v
instead of the more cumbersome u + (−v).
(5) λ(µu) = (λµ)u for all u ∈ X and scalars λ, µ.
(6) λ(u + v) = λu + λv.
(7) (λ + µ)u = λu + µu.
(8) 1u = u for all u ∈ X.
Recall the concepts of linear dependence, linear independence, and span.
Definition 1.2. v1 , . . . , vn ∈ X are linearly dependent if there exist scalars c1 , . . . , cn not all
zero with c1 v1 + · · · + cn vn = 0. Linearly independent means “not linearly dependent.”
Definition 1.3. If A ⊆ X is a nonempty set,
span A = {c1 v1 + · · · + cn vn | n ≥ 1, c1 , . . . , cn scalars} .
Definition 1.4. W ⊂ X is a subspace of X if 0 ∈ W and W is closed under addition and
multiplication by scalars.
1
Definition 1.5. X is finite dimensional if there exist finitely many vectors v1 , . . . , vn ∈ X
with X = span {v1 , . . . , vn }. Here, v1 , . . . , vn is called a basis of X.
If X is not finite dimensional, we say that X is infinite dimensional.
Example 1.6. Rn = {x = (x1 , . . . , xn ) | xi ∈ R} with addition defined as x + y = (x1 +
y1 , . . . , xn + yn ) and multiplication by scalars defined as λx = (λx1 , . . . , λxn ) for λ ∈ R. This
is a real vector space with the standard basis vectors.
Example 1.7. In the same way, we can define Cn as a complex vector space. This can also
be viewed as a real vector space.
Viewed as a complex vector space, we clearly have dim Cn = n. Viewed as a real vector
space, we have dim Cn = 2n.
We consider an example that is more relevant to functional analysis.
Example 1.8. Define
( ∞
)
X
`2 (R) = x = {xj }j=1,2,... | xi ∈ R, x2j < ∞
j=1

with addition and multiplication by scalars defined componentwise. Similarly, define


( ∞
)
X
`2 (C) = z = {zj }j=1,2,... | zj ∈ C, |zj |2 < ∞
j=1

with the same operations. We should also check that they are vector spaces; this is easy to
do.
Example 1.9. Define C(R) = {continuous functions R → R} with the natural operations
(f + g)(x) = f (x) + g(x) and (λf )(x) = λf (x). This is a real vector space. Similarly, we can
define C(C) = {continuous functions C → C} with the same operations; this is a complex
vector space.
Notice that the set of real polynomials {p(x) = a0 + a1 x + · · · + an xn } is a subspace of
C(R).
1.2. Inner product spaces.
Definition 1.10. A complex inner product space is a complex vector space with an inner
product, denoted (u, v) ∈ C. The inner product is a map (·, ·) : X × X → C with the
properties
(1) (u, v) = (v, u)
(2) (λu + µv, w) = λ(u, w) + µ(v, w) (linear in the first component)
(3) (u, u) is real and positive for u 6= 0.
A real inner product is defined analogously, with C replaced by R.
Remark. Note that we have (λu, v) = λ(u, v), but (u, λv) = λ(u, v). Also, check that
(u, v + w) = (u, v) + (u, w).
Example 1.11. X = Rn with (real) inner product (x, y) = x · y defined as the dot product.
Example 1.12. X = Cn with (complex) inner product (z, w) = nj=1 zj wj .
P

2
Example 1.13. For X = `2 (R), we can define the inner product as (x, y) = ∞
P
xy.
2
P∞ j=1 j j
Similarly, for X = ` (C), we can define the inner product as (v, w) = j=1 zj wj .
We should check that these series converge absolutely. This is an easy exercise.
Definition 1.14. Define the inner product norm or length of u to be
p
kuk = (u, u).
We can now derive some properties. We begin with a basic identity.
Proposition 1.15. (u + v, u + v) = (u, u) + (u, v) + (v, u) + (v, v). Therefore, ku + vk2 =
kuk2 + (u, v) + (u, v) + kvk2 , so hence
ku + vk2 = kuk2 + 2 Re(u, v) + kvk2 .
Similarly, we have
ku − vk2 = kuk2 − 2 Re(u, v) + kvk2 .
Adding these gives the “parallelogram identity”
ku + vk2 + ku − vk2 = 2(kuk2 + kvk2 ).

2. 3/30
P∞ P∞ P∞
Proposition 2.1. If j=1 x2j and j=1 yj2 are convergent, then j=1 xj yj is absolutely con-
vergent.
Proof. Recall the Cauchy-Schwarz inequality says that |xj yj | ≤ 21 (x2j + yj2 ). Then
N N
X 1X 2
|xj yj | ≤ (x + yj2 ) ≤ C. 
j=1
2 j=1 j

2.1. Inner product spaces. Recallpthat we were talking about inner product spaces. We
have the inner product norm kuk = (u, u). This has a number of properties.
(1) ku + vk2 = kuk2 + kvk2 + 2 Re(u, v).
(2) kλuk = |λ| kuk.
p q
Proof. kλuk = (λu, λu) = λλ(u, u) = |λ| kuk. 

(3) |(u, v)| ≤ kuk kvk for all u, v ∈ X. This is the Cauchy-Schwarz inequality.

Proof. Exercise. 

(4) Triangle inequality: ku + vk ≤ kuk + kvk

Proof. Properties (1) and (3) imply that


ku + vk2 ≤ kuk2 + kvk2 + 2|(u, v)| ≤ kuk2 + kvk2 + 2 kuk kvk = (kuk + kvk)2 . 
3
2.2. Norms. Let X be any linear space.
Definition 2.2. k·k is a norm on X if
(a) kλuk = |λ| kuk for all scalars λ and for all u ∈ X.
(b) ku + vk ≤ kuk + kvk for all u, v ∈ X.
(c) kuk > 0 if u 6= 0.
Example
p 2.3. The above properties are true if X is an inner product space and the kuk =
(u, u). Every inner product gives a norm, but not all norms come from inner products.
2.2.1. Relation to metric spaces.
Definition 2.4. Define d(u, v) = ku − vk. This is called the norm metric.
Recall the three properties of a metric:
(1) d(u, v) = d(v, u)
(2) d(u, v) ≤ d(u, w) + d(w, v) for all u, v.
(3) d(u, v) ≥ 0 and d(u, v) = 0 if and only if u = v
We can check that the norm metric is indeed a metric. For example, the triangle inequality
follows from the triangle inequality for norms: d(u, v) = ku − vk = ku − w + w − vk ≤
ku − wk + kv − wk = d(u, w) + d(v, w).
Now that we have a metric, we can review some metric space terminology.
Definition 2.5. Bρ (u) = {v ∈ X : d(v, u) < ρ} = {v ∈ X : kv − uk < ρ} is the open ball of
radius ρ and center u.
Similarly, B ρ (u) = {v ∈ X : d(v, u) ≤ ρ} = {v ∈ X : kv − uk ≤ ρ} is the closed ball of
radius ρ and center u.
Definition 2.6. U ⊂ X is open if u ∈ U implies that there exists ρ > 0 such that Bρ (u) ⊂ U .
C ⊂ X is closed if C contains all of its limit points, i.e. y = lim uk with uk ∈ C for all k
implies that y ∈ C.
Definition 2.7. A set K ⊂ X is compact if for every sequence {xk }k=1,... ⊂ K there exists

a convergent subsequence xkj j=1,... with lim xkj ∈ K.
Definition 2.8. X is complete if every Cauchy sequence in X converges in X.
Here’s a diagram of what we’ve considered so far:
inner product spaces ⊂ normed spaces ⊂ general metric space.
Definition 2.9. A complete normed linear space is called a Banach space.
Definition 2.10. A complete inner product space is called a Hilbert space.
Remark. This theory might seem abstract, but functional analysis was developed to solve
real problems, in areas such as partial differential equations. We’ll respect that history and
we will discuss applications of functional analysis to ODEs. All of this theory was created
to attack concrete problems.
Before we can talk about more general spaces, we should first understand finite dimensional
normed linear spaces. These have some special properties.
We can pick any basis e1 , . . . , en . Then for any x ∈ X, we can write x as x = nj=1 xj ej .
P
4
Remark. We will call xj as the coordinates of x with respect to the basis, and we will define
x = (x1 , . . . , xn ) to be a point in Cn or Rn .
Then (using the Cauchy-Schwarz inequality), we have
v v
n n n n
u n
X X X uX uX
kej k2 = M kxkRn .
u
kxk = xj e j ≤ kxj ej k = |xj | kej k ≤ t 2
|xj | t
j=1 j=1 j=1 j=1 j=1

We have therefore shown that kxk ≤ M kxkRn for every x ∈ X.


Proposition 2.11. Define S = {x ∈ X : kxkRn = 1}. Then S is a compact subset of X.

Proof. Let x(k) k=1,... ∈ S. Then x(k) Rn = 1 for all k. This means that x(k) are in the unit

sphere of Rn , which is compact. Hence, there exists a convergent subsequence x(kj ) j=1,...
with lim x(kj ) = y. Here, y is on the unit sphere. Let y = nj=1 yj ej .
P

Note that x(kj ) − y ≤ M x(kj ) − y Rn → 0 as j → ∞. We also get the inequality


1 − y − x(kj ) Rn
≤ y Rn
= y − xkj + xkj Rn
≤ y − x(kj ) Rn
+ 1,
which completes the proof. 
Proposition 2.12. We also have that kxk is continuous on X (and hence on S).
Proof.
|f (x) − f (y)| = |kxk − kyk| ≤ kx − yk = d(x, y). 
We know that continuous functions on compact sets attain their minima. Let m =
minx∈S kxk. Then x 6= 0 implies that
x x
∈ S =⇒ ≥ m,
kxkRn kxkRn
so that kxk ≥ m kxkRn . Together with what we did before, we have proved that
m kxkRn ≤ kxk ≤ M kxkRn
for all x ∈ X. This norm k·k is therefore equivalent to the Euclidean norm.

3. 4/1
3.1. Finite dimensional spaces. Recall that we are working with a finite dimensional
normed linear space X with norm k·k. Last time, we showed that for any given basis
e1 , . . . , en , we proved that there exist M, m > 0 with m kxkRn ≤ kxk ≤ M kxkRn for all
x ∈ X.
Proposition 3.1. Note that this guarantees that all norms are equivalent in a finite dimen-
sional space. This means that if k·k1 and k·k2 are two norms in a finite dimensional space
X, then there exists a constant C with C −1 kxk2 ≤ kxk1 ≤ C kxk2 .
Proof. We check this fact. We have
m1 kxkRn ≤ kxk1 ≤ M1 kxkRn
m2 kxkRn ≤ kxk2 ≤ M2 kxkRn
5
This shows that
m1 M1
kxk2 ≤ m1 kxkRn ≤ kxk1 ≤ M1 kxkRn ≤ kxk2 ,
M2 m2
which proves our statement. 
Proposition 3.2. All norms in a finite dimensional space gives the same open sets.
Proof. Define
Bρk·k1 (y) = {x ∈ X : kx − yk1 < ρ} .
M2 k·k k·k
Note that kx − yk1 < ρ implies that kx − yk2 < m1
ρ. Therefore, Bρ 1 (y) ⊆ B M22 (y). The
m1
ρ
opposite fact can be shown similarly. 
Proposition 3.3. In a finite dimensional normed space X, the closed unit ball is compact.
Proof. Let {xk }k=1,... be a sequence in B1 (0) = {x ∈ X : kxk ≤ 1}.
We have that kxk knRn ≤o m1 · 1. Therefore, in Rn , this sequence is bounded. Hence, there
exists a subsequence xkj which converges to y ∈ Rn . In X, there is a corresponding point
y = nj=1 yj ej . Then xkj − y ≤ M xkj − y
P
→ 0. This proves compactness. 
Rn
Lemma 3.4. If X is any infinite dimensional normed linear space, the closed unit ball is
not compact.
Proof. Take some e1 ∈ X, say ke1 k = 1.
Take e2 ∈ X \ span {e1 }.
Take e3 ∈ X \ span {e1 , e2 }.
Inductively, take en ∈ X \ span {e1 , . . . , en−1 }. These must exist because otherwise the
space would be finite dimensional.
Homework 1, problem 8 said that there exists wn ∈ span {e1 , . . . , en−1 } with 0 < λn =
ken − wn k = min {ken − yk : y ∈ span {e1 , . . . , en }}.
Define
en − wn e n − wn
ẽn = = .
ken − wn k λn
For n > l, we have
en − wn
kẽn − ẽl k = − ẽl .
λn
We claim that kẽn − ẽl k ≥ 1. Otherwise,
en − wn
− ẽl < 1,
λn
which means that
ken − (wn + λn ẽl )k < λn .
Note that wn + λn ẽl ∈ span {e1 , . . . , en−1 }, which contradicts the definition of λn as the
minimal distance. Therefore, we indeed have kẽn − ẽl k ≥ 1 and hence there is no convergent
subsequence. 
Remark. Everything we said also works in complex spaces; just change R to C.
This concludes the discussion of finite dimensional spaces vs infinite dimensional spaces.
6
3.2. More about completeness. We haven’tnyet proven that any infinite space o is complete.
2
P∞ 2
We claim that this is true for the space `R = x = (x1 , x2 , . . . ) : j=1 xj < ∞ .

Definition 3.5. Define (x, y) = ∞


P
j=1 xp
j yj . Check that this is indeed an inner product. We
also get the inner product norm kxk = (x, x).
Proposition 3.6. `2R is complete with respect to this norm. This means that it is a Hilbert
space.

Proof. Let x(k) k=1,... be a Cauchy sequence. Let ε > 0. Then there exists N such that
x(k) − x(l) < ε for every k > l ≥ N . This is
v
u ∞ (k)
uX
t (xj − x(l) 2
j ) = x
(k)
− x(l) < ε.
j=1
n o
(k) (l) (k)
Therefore, |xj − xj | < ε for all k > l ≥ N and for all j = 1, 2, . . . . Hence, xj
k=1,2,...
is a Cauchy sequence in R for all j = 1, 2, . . . . There is therefore some yj ∈ R with
(k)
limk→∞ xj = yj .
We have the inequality
v v
u∞
uM uX
uX (k)
t (x − x(l) )2 ≤ t (x(k) − x(l) )2 < ε.
j j j j
j=1 j=1

In this inequality, take the limit as k → ∞. Then


v
uM
uX
t (yj − x(l) )2 ≤ ε
j
j=1

for all l ≥ N and for all M . Therefore, we see that


v
u∞
uX
(l) (l)
y−x = t (yj − xj )2 ≤ ε
j=1

for all l ≥ N . This shows that y − x(l) ∈ `2R . We also know that x(l) ∈ `2R , so that
y ∈ `2R . Furthermore, for every ε > 0, there exists N with y − x(l) ≤ ε for all l ≥ N .
Hence, lim x(l) = y in `2R . This proves that every Cauchy sequence converges, which proves
completeness.
Note that the crucial step was to use the completeness of R. 
Definition 3.7. A set A is convex means that for all x, y ∈ A, the line segment joining them
is in A, i.e. tx + (1 − t)y = y + t(x − y) ∈ A for all t ∈ [0, 1].
Theorem 3.8. Let X be any Hilbert space, and let A be any nonempty closed convex subset
of X. Let x ∈ X \ A. Then there exists a unique nearest point of A to x. More precisely,
there exists a ∈ A such that kx − ak < kx − yk for every y ∈ A \ {a}.
7
4. 4/4
Last time, we stated Theorem 3.8 that we can find a unique closest point. We can now
prove it. The proof is based on the parallelogram inequality. We don’t yet know that the
minimum exists, but we can consider the infimum.
Proof of Theorem 3.8. Letq α = inf {kx − yk : y ∈ A}. For all k = 1, 2, . .q
. , there exists
yk ∈ A with kx − yk k < α2 + k1 , since otherwise we would have kx − yk ≥ α2 + k1 for all
y ∈ A, contradicting the definition of α.
We can now apply the parallelogram identity. This states that kz − wk2 + kz + wk2 =
2(kzk2 + kwk2 ). We plug in z = x − yk and w = x − yl . Then
 
2 2 2 2 2 1 2 1
kyk − yl k + k(x − yk ) + (x − yl )k = 2(kx − yk k + kx − yl k ) ≤ 2 α + + α +
k l
Note that k(x − yk ) + (x − yl )k2 = 4 kx − (yk + yl )/2k2 . By the convexity of A, we know
that (yk + yl )/2 ∈ A. Therefore, kx − (yk + yl )/2k > α. Hence, we have
 
2 2 2 1 2 1
kyk − yl k + 4α ≤ 2 α + + α + ,
k l
so
2 2
kyk − yl k2 ≤ +
k l
for all k, l = 1, 2, . . . . Therefore, {yk } is a Cauchy sequence. X is a Hilbert space, so
it is complete. Hence, {yk } is convergent, so there exists a ∈ X with a = lim yk , i.e.
lim ka − yk k = 0. Hence, a ∈ A because A is closed.
We now claim that kx − ak is the minimum distance. We have
r
1
α ≤ kx − ak = kx − yk + yk − ak ≤ kx − yk k + kyk − ak ≤ α2 + + kyk − ak → α + 0.
k
Therefore, kx − ak = α.
We still need to show uniqueness. Suppose that ã ∈ A also has the minimum distance
kx − ãk = α. We want to show that a = ã. We can plug a and ã in the parallelogram law
in place of yk and yl . This gives
2
2 2 a + ã
2
ka − ãk + 4α ≤ ka − ãk + 4 x − = 2(kx − ak2 + kx − ãk2 ) = 4α2 ,
2
so hence ka − ãk2 = 0 and therefore a = ã. 
4.1. Orthogonality.
Definition 4.1. In a Hilbert space X, vectors x1 , . . . , xN ∈ X are orthogonal means that
(xi , xj ) = 0, for all i 6= j, i, j = 1, . . . , N .
x1 , . . . , xN ∈ X are orthonormal if
(
0 i 6= j
(xi , xj ) =
1 i = j.
There is a fundamental identity for orthonormal vectors.
8
Proposition 4.2. Let x ∈ X, and suppose that e1 , . . . , eN are orthonormal. Let λ1 , . . . , λN
be scalars, let cj = (x, ej ) for j = 1, . . . , N . Then
N 2 N N
X 2
X X
2
x− λj ej = kxk + |ci − λi | − |ci |2 .
j=1 i=1 i=1

Proof.
N 2 N N
!
X X X
x− λj ej = x− λi ei , x − λj ej
j=1 i=1 j=1
N
! N
! N N
!
X X X X
= (x, x) − λi ei , x − x, λj ej + λi e i , λj ej
i=1 j=1 i=1 j=1
N
X N
X N
X
= kxk2 − λ i ci − λj cj + |λi |2
i=1 j=1 i=1
N
X N
X
= kxk2 + |ci − λi |2 − |ci |2 . 
i=1 i=1

This identity is nice. Staring at it for a bit, we can read off the point that satisfies the
closest point property.
Theorem 4.3. The point of span {e1 , . . . , eN } which has minimum distance from x is exactly
PN
i=1 ci ei and it is the unique such point.

Intuitively, we expect that this minimum point on span {ei } is orthogonal. We can check
this:
N N
! N N N N
X X X X X X
x− ci e i , λj ej = λj (x, ej ) − ci λi = cj λ j − ci λi = 0.
i=1 j=1 j=1 i=1 j=1 i=1

As we expected, this means that


N
!
X
x− ci ei ∈ (span {e1 , . . . , eN })⊥ .
i=1

Theorem 4.4. Let e1 , e2 , . . . be an (infinite) sequence of orthonormal vectors in the Hilbert


space X. Then for every x ∈ X,
(i) ∞
P
i=1 ci ei is convergent, where ci = (x, ei ) for all i, that is, there exists y ∈ X with
N
X
lim ci ei − y = 0.
N →∞
i=1

(ii) This is Bessel’s inequality and it is very important.


X∞
|ci |2 ≤ kxk2
i=1
P∞
(iii) Equality holds in (ii) if and only if x = i=1 ci ei .
9
The proof will use the identity 4.2.
Proof. We will first prove (ii). By identity 4.2 with λi = ci for all i, we have
N 2 N
X X
0≤ x− ci e i = kxk − |ci |2 ,
i=1 i=1
PN 2 2
so therefore i=1 |ci | ≤ kxk for all N . This proves (ii).
Now we prove (iii). We have

X N
X N
X ∞
X
2 2 2 2
|ci | = kxk ⇔ lim |ci | = kxk ⇔ lim x− ci e i ⇔ x = ci e i .
N →∞ N →∞
i=1 i=1 i=1 i=1
PN
Finally, we prove (i). let SN = i=1 ci ei . Then N > M implies that
N 2 N ∞
2
X X X
kSN − SM k = ci e i = |ci |2 ≤ |ci |2 → 0
i=M +1 i=M +1 i=M +1

as M → ∞. This proves that SN is a Cauchy sequence, and hence SN is convergent. That


concludes the proof. 

5. 4/6
X is a Hilbert space and e1 , e2 , . . . is an orthonormal sequence.
P∞
Definition 5.1. e1 , e2 , . . . is a complete orthonormal sequence if x = i=1 (x, ei )ei for all
x ∈ X.
Theorem
P∞ 5.2. The statement (C) that e1 , e2 , . . . is a complete orthonormal sequence (i.e.
x = i=1 (x, ei )ei for all x ∈ X) is equivalent to each of the following:
(i) Equality holds in Bessel’s inequality for every x ∈ X (i.e. kxk2 = ∞ 2
P
i=1 |ci | )
(ii) There does not exist an x ∈ X \ {0} with (x, ei ) = 0 for every i.
(iii) span {e1 , e2 , . . . } is a dense subset of X.
Proof. We already proved P
(i) last time in Theorem 4.4.
(ii) =⇒ (C). Consider ∞ i=1 (x, ei )ei . We know that this converges. Then

!
X
x− (x, ei )ei , ej
i=1

! ∞
X X
= (x, ej ) − (x, ei )ei , ej = (x, ej ) − (x, ei )(ei , ej ) = (x, ej ) − (x, ej ) = 0.
i=1 i=1

By (ii), this means that x − ∞


P
i=1 (x, ei )ei =P
0, which is (C).P
(C) =⇒ (ii). Suppose x ∈ X, then x = ∞ i=1 (x, ei )ei =

i=1 ci ei , so x = 0 if (x, ei ) = 0
for every i.
(C) =⇒ (iii). Take x ∈ X. Then (C) implies that

X N
X
x= ci ei = lim ci e i ,
N →∞
i=1 i=1
10
which is a limit of sums in span {e1 , e2 , . . . }. Therefore, x is a limit point of span {e1 , e2 , . . . }.
Since x was arbitrary, this means that span {e1 , e2 , . . . } is dense.
(iii) =⇒ (C). Take any point x ∈ X. Then (iii) implies that x = limN →∞ yN where
yN = Q
P N N
j=1 λj ej ∈ span {e1 , . . . , en }. This means that

QN 2
X
x− λN
j ej →0
j=1

as N → ∞. By the fundamental inequality that we proved last time, we have


QN 2 QN 2
X X
x− cj e j ≤ x− λN
j ej → 0.
j=1 j=1

Note that this is also true with sums up to M , with M ≥ QN and λLj = 0 for all L > N .
Let ε > 0. Then there exists N0 such that
M
X
x− cj ej < ε
j=1
PM P∞
for all M ≥ N0 . This is the definition of convergence, so x = limM →∞ j=1 cj ej = j=1 cj ej .

Example 5.3. Define
 Z π 
L2C [−π, π] = f: 2
|f | exists and is finite .
−π
This has an inner product Z π
1
(f, g) = f (x)g(x) dx.
2π −π
We can easily check the inner product properties.
This is not as simple as it looks. What type of integration are we using? The Riemann
integral is not good enough. Using the Riemann integral, this space lacks completeness.
There are a huge class of counterexamples.
Therefore, we will use the Lebesgue integral. We will spend a couple of lectures on this,
but for now, don’t worry about it. All Riemann-integrable functions are also Lebesgue-
integrable with the same integral; however, Lebesgue integration allows us to handle a much
larger class of functions. In particular, Lebesgue integration gives us completeness.
There is an extremely important application of the abstract theory that we have just
developed. There is a simple orthonormal sequence in this space. This is
 ix −ix 2ix −2ix
1, e , e , e , e , . . . , enix , e−nix , . . . .
Proposition 5.4. This is an orthonormal sequence.
Proof.
( 2π
h=i(n−m)x
1 n=m
Z π Z π
1 1 2π
(einx , eimx ) = einx e−imx dx = ei(n−m)x dx = 1 e
i 
2π −π 2π −π 2π i(n−m)
= 0 n 6= m.
11
In particular, all the previous theory applies to this case. In fact, we will show that this
is a complete orthonormal sequence.

6. 4/8
We are in the process of talking about an extremely important application. We are
considering the following example:
Example 6.1.
 Z π 
X= L2C ([−π, π]) = f = f1 + if2 : 2
|f | exists and is finite
−π
1

The inner product (f, g) = 2π −π
f (x)g(x) dx is clear. For now, we will believe that this
is a complete space (with the Lebesgue integral). Historically, this is the main reason why
people use the Lebesgue integral. This will be proved soon.
The big claim is
Proposition 6.2.
1, eix , e−ix , e2ix , e−2ix , . . . , einx , e−inx , . . .
is a complete orthonormal sequence.
Proof. Recall that for an orthonormal sequence to be complete, we need
X∞
x= (x, en )en
n=1
for all x. In this case, we have the Fourier series of f

X ∞
X N
X
(x, en )en = (f, einx )e−inx = lim (f, einx )einx .
n→∞
n=1 n=−∞ n=−N

(We don’t have to worry too much about the limit because we already proved earlier that
everything converges.) Here,
Z π
1
inx
(f, e ) = f (t)e−int dt.
2π −π
Therefore,
N Z π Z π N
X 1 −int inx 1 X
SN (f )(x) = f (x)e dte = f (x) ein(x−t) dt.
n=−N
2π −π 2π −π n=−N

We have the Dirichlet kernel


XN
DN (s) = eins = e−iN s + e−i(N −1)s + · · · + 1 + · · · + eiN s
n=−N

1 − e(2N +1)is e−iN s − e(N +1)is


= e−iN s (1 + eis + e2is + · · · + e2iN S ) = e−iN S =
1 − eis 1 − eis
eis/2 (e−i(N +1/2)s − ei(N +1/2)s ) sin(N + 21 )s
= =
eis/2 (e−is/2 − eis/2 ) sin 21 s
12
for 0 < |s| ≤ π. Note also that DN (0) = 2N + 1.
Here, convergence does not mean pointwise convergence. We actually mean L2 conver-
gence, i.e.
N
X
f− (f, einx )einx → 0
n=−N
as N → ∞ with the L2 inner product norm. This is what we need to show.
Remark. Bessel’s inequality in this case states that
∞ Z π
X
2 2 1
|cn | ≤ kf k = |f (t)|2 dt.
n=−∞
2π −π

This is nice, but we don’t need it here.


We will now show that span {1, eix , e−ix , . . . } is dense in L2C ([−π, π]), i.e. if f ∈ L2C ([−π, π])
then for every ε > 0, there are λj for j = −N, . . . , N such that
N
X
f− λj eijx < ε.
j=−N

First, we check this for when f is a continuous function with f (−π) = f (π) = 0. We claim
that we can use
m Z π m
1 X 1 1 X
SN (f )(x) = f (t) DN (x − t) dt.
m + 1 N =0 2π m + 1 −π N =0
Here,
m m
1 X 1 X e−iN s − ei(N +1)s
Km (x) = DN (s) =
m + 1 N =0 m + 1 N =0 1 − eis
1
Pm −iN s is
Pm iN s

N =0 e − e N =0 e
= m+1
1 − eis
1 − e−i(m+1)s i(m+1)s
 
1 1 is (1 − e )
= · −e
m + 1 1 − eis 1 − e−is 1 − eis
= ···
1 sin2 ( m+1
2
s)
= 2 s
m + 1 sin ( 2 )
where the calculation will be finished next time. This is the Fejer kernel. This is a nonneg-
ative function. 
7. 4/11
We want to show completeness of the orthonormal sequence {einx : n = 0, ±1, . . . } in
L2C [−π, π].
Last time, we showed that
N
X
SN (f )(x) = cn einx
n=−N
13
with Z π
1
int
cn = (f, e ) = f (t)e−int dt.
2π −π
Then Z π
1
SN (f )(x) = f (t)DN (x − t) dt
2π −π
with
N
X
ins sin(N + 21 )s
DN (s) = e = .
n=−N
sin 2s
Note that DN (0) = 2N + 1. Consider
m Z π m Z π
1 X 1 1 X 1
SN (f )(x) = f (t) DN (x − t) dt = f (t)Kn (x − t) dt,
m + 1 N =0 2π −π m + 1 N =0 2π −π
where we define
m
1 − e−i(m+1)s i(m+1)s
 
1 X 1 1 is 1 − e
Km (s) = DN (s) = −e
m + 1 N =0 m + 1 1 − eis 1 − e−is 1 − eis
1 − e−i(m+1)s eis (1 − ei(m+1)s )
 
1 1
= −
m + 1 eis/2 (e−is/2 − eis/2 ) e−is/2 (eis/2 − e−is/2 ) eis/2 (e−is/2 − eis/2 )
1 − e−i(m+1)s 1 − ei(m+1)s
 
1 1
= − −is/2
m + 1 e−is/2 − eis/2 eis/2 − e−is/2 e − eis/2
+ e−i(m+1)s − 2 1 (ei(m+1)s/2 − e−i(m+1)s/2 )2
 i(m+1)s 
1 e
= =
m+1 (eis/2 − e−is/2 )2 m+1 (eis/2 − e−is/2 )2
1 sin2 ( m+1 2
s)
= 2 s
m + 1 sin 2
1
Pm
for 0 < |s| ≤ π. Additionally, Km (0) = m+1 N =0 (2N + 1) = m + 1.
This was not an obvious thing to do. It’s not clear that the average of the SN should be
nice, but it does come out nicely.
Let’s consider some properties of Km . Notice that it is even and positive. It is zero
whenever m+1 2
= kπ. There is a sharp peak at zero, with height m + 1. There are a lot
of zeros, and between the zeros, there are violent oscillations. It is a good idea to draw a
picture of this.
What is the area under the graph? We have
Z π m Z
1 1 1 X π
A= Km (s) ds = DN (s) ds = 1.
2π −π 2π m + 1 N =0 −π
This is nice. It is like an oscillatory version of the Dirac delta “function”. In the distribution
sense, these converge to the Dirac delta.
Consider this function on the intervals [−π, −δ) ∪ (δ, π], where δ < |s| ≤ π. Then
1 1
0 ≤ Km (s) ≤ →0
m + 1 sin2 2δ
as m → ∞ for fixed δ.
14
Take f = g, where g is continuous and g(−π) = g(π) = 0. Extend g to be 2π-periodic.
Then m Z π
1 X 1
SN (g)(x) = g(t)Kn (x − t) dt.
m + 1 N =0 2π −π
This is an average of the Fourier partial sums of g, and these functions are in span {einx }.
We claim that this is a really good approximation to g. To see this, consider
m Z π
1 X 1
SN (g)(x) − g(x) = g(t)Km (x − t) dt − g(x)
m + 1 N =0 2π −π
Z π
1
= (g(t) − g(x))Km (x − t) dt
2π −π
1

because 2π −π
Km (x − t) dt = 1. Make a change of variable s = x − t, and observe that all
functions under consideration are 2π-periodic. Then
Z x+π Z π
1 1
= (g(x − s) − g(x))Km (s) ds = (g(x − s) − g(x))Km (s) ds .
2π x−π 2π −π
For any δ ∈ (0, π), we can break this integral up into
Z δ Z
1 1
= (g(x − s) − g(x))Km (s) ds + (g(x − s) − g(x))Km (s) ds .
2π −δ 2π δ≤|s|≤π
Note that g is uniformly continuous because it is continuous on a compact set. Let ε > 0.
Then there exists δ > 0 such that |x − y| < δ implies that |g(x) − g(y)| < ε. Therefore,
picking the appropriate δ, we have
Z δ Z
1 1
≤ (g(x − s) − g(x))Km (s) ds + (g(x − s) − g(x))Km (s) ds
2π −δ 2π δ≤|s|≤π
Z δ Z
1 1
≤ |g(x − s) − g(x)|Km (s) ds + |g(x − s) − g(x)|Km (s) ds.
2π −δ 2π δ≤|s|≤π
Let g is continuous on a compact set, so it attains its maximum. Let L = max |g|. Then we
have
Z δ Z
ε 2L
≤ Km (s) ds + Km (s) ds
2π −δ 2π δ≤|s|≤π
Z π
ε L 1 1
≤ Km (s) ds +
2π −π π m + 1 sin2 2δ
L 1 1
=ε+ < ε + ε = 2ε
π m + 1 sin2 2δ
L 1 1
provided that m > m0 , where π m0 +1 sin2 δ < ε. Therefore,
2

m
1 X
max SN (g)(x) − g(x) < 2ε
[−π,π] m + 1
N =0
for all m > m0 . We’re almost done. We have proved this for when g is continuous. We want
to show that this is true for all L2 functions, and we’ll finish this next time.
15
8. 4/13
We’ve almost finished the proof that our orthonormal sequence {einx : n = 0, ±1, . . . } is
dense in L2C [−π, π].
We proved that if g : [−π, π] → R is continuous with g(−π) = g(π) = 0, then for any
ε > 0, there exists m0 with
m
1 X
max g(x) − SN (g)(x) < ε
x∈[−π,π] m + 1 N =0
for all m ≥ m0 . Then,
v
m
u Z m 2
π
1 X u1 1 X
g− SN (g) =t g(x) − SN (g)(x) dx ≤ ε.
m + 1 N =0 2π −π m + 1 N =0
L2

Now, take any f ∈ L2C [−π, π].


One of the properties of the Lebesgue integral (that we will
prove later) is that the continuous functions are dense in L2 .
Proposition 8.1. There exists a continuous g with g(−π) = g(π) = 0 and kf − gkL2 < ε.
Using this claim, we’re almost done. Then
m m
1 X 1 X
f− SN (g)(x) = f −g+g− SN (g)(x)
m + 1 N =0 m + 1 N =0
L2 L2
m
1 X
≤ kf − gkL2 + g − SN (g)(x) <ε+ε
m + 1 N =0
L2
for all m > m0 .
We have now shown that {einx } is complete in L2 space.
8.1. Lebesgue integral. Let’s talk about the Lebesgue integral.
We should first consider the idea of “measure zero.” Let’s remind ourselves of what this
means in the Riemann theory.
Definition 8.2. A set S ⊂ [a, b] has content zero if for every ε > 0 there exists finitely many
PN
open intervals I1 , . . . , IN with S ⊂ ∪N
j=1 Ij and j=1 |Ij | < ε, where |Ij | is the length of Ij .

In the Lebesgue theory, we make what appears to be an innocent change, by allowing


infinitely many intervals.
Definition 8.3. A set S ⊂ [a, b] has Lebesgue measure P∞ zero if for every ε > 0 there exists
open intervals I1 , I2 , . . . such that S ⊂ ∪∞ I
j=1 j and j=1 |Ij | < ε.

This small change doesn’t seem profound, but it makes a huge difference to the theory.
Lemma 8.4. If S1 , S2 , . . . each has measure zero, then ∪∞
j=1 Sj also has measure zero.

Remark. This is hopelessly false in the case of content zero.


A special case of this lemma is when each Sj contains precisely one point. Then every
countable set of points also has measure zero. For example, the set of all rationals has
Lebesgue measure zero.
16
Proof. Let ε > 0 be given. Since Sj has measure zero, there are intervals where Sj ⊂
P∞ j
I1j ∪ I2j ∪ · · · = ∪∞ j
i=1 Ii with i=1 |IP
ε
i | < 2j .
j ∞ ∞ j P∞ ε
Then ∪∞
P
j=1 Sj ⊂ ∪i,j Ii and i=1 j=1 |Ii | < i=1 2i = ε. 
Definition 8.5. Any property is true almost everywhere on [a, b] if it holds except for a set
of measure zero.
Definition 8.6. ϕ is a step function on an interval [a, b] if there exists a partition a = x0 <
x1 < · · · < xN = b such that ϕ|(xi−1 ,xi ) = ai is constant.
Note that the sum, difference, or product of step functions is a step function. This requires
picking a common refinement of the two partitions. The integral of a step function is just
Z XN
ϕ= ai (xi − xi−1 ).
[a,b] i=1
We can now state the main technical lemma that allows us to define the Lebesgue integral
for a huge class of functions. We defer the proof until later.
Lemma 8.7.
(1) Suppose that {ϕk } is an increasing sequence of step o on [a, b], (i.e. ϕk (x) ≤
nRfunctions
ϕk+1 (x) for all x ∈ [a, b] and for all k) such that [a,b] ϕk is bounded. Then
k=1,2,...
{ϕk (x)}k=1,2,... is a bounded (hence convergent) sequence for almost all x ∈ [a, b]. (i.e.
there exists S ⊂ [a, b] of measure zero with {ϕk (x)} bounded forRall x ∈ [a, b] \ S)
(2) If {ψk } is any other increasing sequence of step functions with [a,b] ψk bounded, and
R R
with lim ψk (x) = lim ϕk (x) for almost every x ∈ [a, b], then lim [a,b] ϕk = lim [a,b] ψk .
Definition 8.8. Let L0 [a, b] be the set of all real valued functions Rf : [a, b] → R such
that there exists an increasing sequence of step functions {ϕk } with [a,b] ϕk bounded and
lim ϕk (x) = f (x) for almost every x ∈ [a, b].
Now, define Z Z
f = lim ϕk .
[a,b] [a,b]

The first thing we do when we meet a mathematical definition is to ask: Does it make
sense? It is well-defined by the second part of our lemma.
We now look at some properties of this integral. We will prove these next time.
Properties 8.9.
R R R
(1) If α, β ≥ 0, and f, g ∈ L0 , then αf + βg ∈ L0 and [a,b] (αf + βg) = α [a,b] f + β [a,b] g.
(2) If f, g ∈ L0 then max {f, g} , Rmin {f, g}R ∈ L0 .
(3) If f, g ∈ L0 with f ≤ g then [a,b] f ≤ [a,b] g.
Warning: f ∈ L0 does not imply that −f ∈ L0 .
9. 4/15
Recall that L0 is the set of f : [a, b] → R such that f (x)R = lim ϕk (x) for almost every
x ∈ [a, b], with ϕk+1 ≥ ϕk , where ϕk are step functions with ϕk bounded.
We have some properties:
17
Properties 9.1.
(1) RIf f ∈ L0Rand if f˜ : [a, b] → R with f˜(x) = f (x) for almost every x, then f˜ ∈ L0 and
[a,b]
f˜ = [a,b] f .
R R R
(2) α, β ≥ 0 and f, g ∈ L0 imply thatRαf + Rβg ∈ L0 and (αf + βg) = α f + β g.
(3) f, g ∈ L0 and f ≤ g implies that f ≤ g.
Proof. We prove (3). Pick increasing sequences ϕk and ψk with f = lim ϕk and g = lim ψk
almost everywhere.
Let ϕ̃k = min {ϕk , ψk }. Note that this is still a step function, and it still converges almost
everywhere to f . To see this, notice that
lim ϕ̃k (x) = min {lim ϕk (x), lim ψk (x)} = min {f (x), g(x)} = f (x)
almost everywhere. Similarly, we define ψ̃k (x)R= maxR {ϕk (x),
R ψk (x)}R to be a step function
with lim ψ̃k (x) = g(x). Therefore, we see that f ← ϕ̃k ≤ ψ̃k → g.
The point is that we should reduce everything to step functions. 
A big defect is that f ∈ L0 does not imply that −f ∈ L0 . We enlarge this space to get
around this problem.
Definition 9.2. Let L1 = {g − h : g, h ∈ L0 }. This is the space of Lebesgue integrable
functions.
Observe that L1 is indeed a linear space.
We should check this. For example, for multiplication by scalars, if g − h ∈ L1 , we should
show that λ(g − h) ∈ L1 . To do this, we have two cases: λ > 0 and λ < 0. In each case,
either λg, λh ∈ L0 or −λg, −λh ∈ L0 , so the result follows simply.
R R R
Definition 9.3. For f = g − h ∈ L1 , we define f = g − h.
Such a definition might plausibly be nonsense. There might be lots of ways of writing f as
a difference of two functions in L0 . Suppose that g1 − h1 = g2 − h2 . Then g1 +Rh2 = gR2 + h1 .
Each
R R in LR0 , so we
isRa sum of functions R the linearity in L0 to see that g1 + h2 =
R can use
g2 + h1 . Therefore, g1 − h1 = g2 − h2 . Therefore, our definition works.
It seems like we got something for nothing, by extending L0 so simply to L1 . But we can’t
get something for nothing. It turns out that it will be hard to show some really simply facts.
We have some properties:
Properties 9.4.
(1) RIf f ∈ RL1 and f˜ : [a, b] → R with f˜(x) = f (x) almost everywhere then f˜ ∈ L1 and
f˜ = f . R R
(2) If f1 , f2 ∈ L1 with f1 ≤ f2 , then
R f 1R≤ f2 .
(3) If f ∈ L1 then |f | ∈ L1 and | f | ≤ |f |.
(4) If f ∈ L1 then there exists Ra decreasing
R sequence {gk } ⊂ L0 with gk (x) → f (x) for
almost every xR∈ [a, b] and gk → f .
(5) If f ∈ L1 and |f | = 0, then f (x) = 0 for almost every x ∈ [a, b]. R
(6) If we have a sequence {fk }⊂ L1 of nonnegative functions fk ≥ 0 with fk → 0, then
there exist a subsequence fkj j=1,2,... with fkj (x) → 0 for almost every x ∈ [a, b].
18
Proof. (1) follows trivially from the corresponding fact about L0 .
We check (2). f1 = g1 − h1 and f2 = gR2 − h2 for g1R, g2 , h1 , h2 ∈ L0 . WeR know Rthat
g1 − h1 ≤ g2 − h2 , so g1 + h2 ≤ g2 + h1 , and so (g1 + h2 ) ≤ (g2 + h1 ), and hence f1 ≤ f2 .
We check (3). f ∈ L1 , so f = g − h, where g, h ∈ L0 . Here, we use a trick to express
|g − h| as a differenceRof functions
R in L0 , that is, |g − h| = max {g, h} − min {g, h} ∈ L1 .
Now, f ≤ |f |, so that f ≤ |f |.
WeR check (4). f ∈ L1 , Rso that fR = g − h for g, h ∈ L0 . Then h = lim ϕk with ϕk+1 ≥ ϕk
and ϕk bounded and ϕk → h; a similar statement is true for g = lim ψl . Then
g − h = lim(g − ϕk ) almost everywhere, and g − ϕk = liml→∞ (ψl − ϕk ) is the limit is a
sequence of step functions, so g − ϕk ∈ L0 , so we are done.
Note that (6) implies (5); just take fk = |f | for all k.
We nowRneed to prove (6). This is a bit trickier than the previous Rproperties. We pick fkj
such that fkj < 21j and kj+1 > kRj . We can R certainly do this because fk → 0. By (4), there
1
exist gj ∈ L0 with gj ≥ fkj and gj ≤ fkj + 2j . Pick an increasing sequence {ψj,i }i=1,2,...
R R
of step functions with limi→∞ ψj,i (x) = gj (x) for almost every x and limi→∞ ψj,i = gj
for j = 1, 2, . . . . We can take these to be nonnegative, as otherwise we could just redefine
ψj,i = max {ψj,i , 0}.
Define ψi = ij=1 ψj,i , where ψi is a nonnegative step function. Then for i > N ,
P

N Z
X N
Z X Z i Z
X
ψj,i ≤ ψj,i ≤ ψi = ψj,i
j=1 j=1 j=1
i Z i Z i
X X 1 X 1
≤ gj ≤ fkj + ≤ =2
j=1 j=1
2j j=1
2j−1
R
for every i. Then ψi is bounded and ψi+1 ≥ ψi . By the main technical lemma 8.7, we see
that ψi (x) are bounded almost everywhere, i.e. for almost every x, there exists Mx with
N
X
ψj,i (x) ≤ ψi (x) ≤ Mx
j=1

for all i > N . Therefore, j=1 gj (x) ≤ Mx for all N . Hence, ∞


PN P
j=1 gj (x) ≤ Mx . This series
converges, so gj (x) → 0 as j → ∞ for almost every x. Hence, we also have fkj (x) → 0 for
almost every x ∈ [a, b], which concludes the proof. 

10. 4/18
Recall that we defined L1 ([a, b]) = {f : f = g − h, g, h ∈ L0 [a, b]}.
Definition 10.1. Define Z
kf k1 = |f |.
[a,b]

Note that this is almost a norm. It has two of the three norm properties. We have
Z Z
kf1 + f2 k1 = |f1 + f2 | ≤ (|f1 | + |f2 |) = kf1 k1 + kf2 k1 ,
[a,b] [a,b]
19
and kλf k1 = |λ| kf k1 . However, for the third property, kf k1 = 0 implies that f (x) = 0 for
almost every x ∈ [a, b]. It formally fails the third property, but it does not fail badly. We
say that k·k1 is a seminorm.
Remark. If f ∈ L1 [a, b], there exists a sequence {ζk } of step functions such that kf − ζk k1 →
0 and ζk (x) → f (x) for almost every x ∈ [a, b]. This sequence converges in the seminorm
and converges almost everywhere pointwise.
Proof. We Rhave f =R g − h with g, h ∈ L0 . This means that g(x) = lim ϕk (x) for almost every
R lim Rϕk = g, with ϕk+1 ≥ ϕk . Likewise, h(x) = lim ψk (x) for almost every x and
x and
lim ψk = h with ψk+1 ≥ ψk .
Take ζk = ϕk − ψk . Then ψk (x) = ϕk (x) − ψk (x) → g(x) − h(x) for almost every x. Also,
since g − ϕk ≥ 0 and h − ψk ≥ 0 almost everywhere, we have
Z Z Z Z
|f − ζk | ≤ |g − ϕk | + |h − ψk | ≤ (g − ϕk ) + (h − ψk ) → 0. 
[a,b] [a,b] [a,b] [a,b]
1 1
Theorem 10.2 (Completeness of L ). Suppose that {fk } ⊂ L [a, b] is Cauchy with respect
to k·k1 . (This means that for every ε > 0 there exists N such that kfk − fl k1 < ε for all
l > k ≥ N .) Then there exists f ∈ L1 [a, b] such that kfk − f k1 → 0 (i.e. fk → f with respect
to k·k1 ).
Proof. Consider applying the Cauchy property with ε = 2−j . Then there exists kj such
that fl − fkj < 2−j for all l ≥ kj , and we can choose kj+1 > kj for every j. Hence,
fkj+1 − fkj 1 < 2−j .
By the above remark, there exists a step function ζj with fkj − ζj 1 < 2−j . Then we can
see that
kζj+1 − ζj k1 = ζj+1 − fkj+1 + fkj+1 − fkj + fkj − ζj 1
≤ ζj+1 − fkj+1 1
+ fkj+1 − fkj 1
+ fkj − ζj 1
< 2−j + 2−j + 2−j ≤ 2−j+2 .
Take ζ0 ≡ 0. Observe that we can always write a+ = max {a, 0} ≥ 0 and a− = max {−a, 0} ≥
0, so that a = a+ − a− ; then
l
X l
X l
X
ζl = (ζj − ζj−1 ) = (ζj − ζj−1 )+ − (ζj − ζj−1 )− = Φl − Ψl ,
j=1 j=1 j=1

where Φl and Ψl are increasing sequences of step functions. Then


Z Z X l Xl
(Φl + Ψl ) = |ζj − ζj−1 | = kζj − ζj−1 k1 ≤ 4.
[a,b] [a,b] j=1 j=1

Then the first part of the main technical lemmaR 8.7 tells us Rthat Φl (x) →
R g(x) and ΨRl (x) →
h(x) almost everywhere, where g, h ∈ L0 , and [a,b] g = lim [a,b] Φl and [a,b] h = lim [a,b] Ψl .
Therefore, ζl → g(x) − h(x) almost everywhere. Define f = g − h ∈ L1 . We know that this
is the pointwise limit of ζl . In addition,
Z Z
kf − ζl k1 = |f − ζl | = |(g − Φl ) − (h − Ψl )|
Z Z Z Z
≤ |g − Φl | + |h − Ψl | = (g − Φl ) + (h − Ψl ) → 0.
20
Now,
Z Z
|f − fkl | =
|f − ζl + ζl − fkl | ≤ kf − ζl k1 + kζl − fkl k1 → 0.
R
On the other hand, we know that |fl − fkl | = kfl − fkl k1 → 0 as l → ∞ by the Cauchy
property. Hence,
kf − fl k1 ≤ kf − fkl k1 + kfkl − fl k1 → 0.
We cooked up a function f , and this took some work to do. We needed to find an increasing
sequence of step functions. Then we checked that f actually satisfied the properties that we
wanted. This concludes the proof. 

There is an important corollary, called the Monotone Convergence Theorem.


Theorem 10.3 (Monotone Convergence Theorem). Suppose that we have R an increasing
sequence of functions {fk } ⊂ L1 , i.e. fk+1 ≥ fk . Further suppose that [a,b] fk is bounded.
R R
Then there exists an f ∈ L1 such that fk (x) → f (x) almost everywhere and [a,b] fk → [a,b] f .

Proof. Take l > k. Then


Z Z Z
kfk − fl k1 = (fl − fk ) = fl − fk .
[a,b] [a,b] [a,b]
R R
Also, [a,b] fk is a bounded increasing sequence, so lim [a,b] fk exists. This implies that {fk }
1
Ris Cauchy.R By completeness, we have therefore shown that there exists f ∈ L such that
f → [a,b] f .
[a,b] k R
Now, kfk − f k1 → 0, which means that [a,b] |fk − f | → 0. By Property 9.4, there exists

a subsequence fkj with fkj (x) → f (x) almost everywhere, which therefore means that
fk (x) → f (x) almost everywhere. 
R
Corollary 10.4. Let {fk } ⊂ L1 with [a,b] |fk | is bounded, and assume also that there exists
f : [a, b] → R such that f (x) = lim fk (x) for almost every x ∈ [a, b]. Then f ∈ L1 .
R R
Remark. Caution: It may be false in general that fk → f .
This is a very powerful fact that the pointwise limits of L1 functions converge to an L1
function. Nothing of this sort is true for the Riemann integral. This fact is extremely general.

11. 4/20
Today we will discuss linear functionals, and go back to the Lebesgue integral next time.
We first make some final remarks about orthogonality.

11.1. Final remarks on orthogonality. We work in a Hilbert space X. Let E be any


non-empty set.
Definition 11.1. The orthogonal complement of E is
E ⊥ = {x ∈ X : (x, e) = 0 for all e ∈ E} .
21
We claim that E ⊥ is a closed linear subspace of X. This should be an easy exercise. We
check that it is closed. Suppose that yk → y and yk ∈ E ⊥ for all k, and take any e ∈ E.
Then
|(y, e)| = |(y − yk , e) + (yk , e)| = |(y − yk , e)| ≤ ky − yk k kek → 0,
so therefore y ∈ E ⊥ .
Lemma 11.2. If M is a closed linear subspace of X then
(i) M ⊥ ∩ M = {0}.
(ii) x ∈ X implies that x = y + z with y ∈ M and z ∈ M ⊥ , and this representation is
unique.
(iii) (M ⊥ )⊥ = M .
Proof. We prove (i). If x ∈ M ∩ M ⊥ then (x, x) = 0, so that kxk2 = 0 and hence x = 0.
We prove (ii). Recall that there exists y ∈ M with kx − yk = minz∈M kx − zk. Notice
that kx − y − tλzk2 has minimum at t = 0, for all scalar λ 6= 0 and nonzero z ∈ M . Then
k(x − y) − tλzk2 = kx − yk2 + t2 kλzk2 − 2t Re(x − y, λz)
has minimum at t = 0. This implies that Re(λ(x − y, z)) = 0. Take λ = (x − y, z), so we
get (x − y, z) = 0, and hence x − y ∈ M ⊥ .
We still need to check uniqueness. Assume that x = y + z = ỹ + z̃ with y, ỹ ∈ M and
z, z̃ ∈ M ⊥ . Subtracting, we see that y − ỹ = z̃ − z are in M and M ⊥ , so by part (i), we see
that y − ỹ = z̃ − z = 0, which implies uniqueness.
We prove (iii). Take x ∈ M . Then (x, w) = 0 for every w ∈ M ⊥ . This says that
x ∈ (M ⊥ )⊥ . Hence, M ⊂ (M ⊥ )⊥ . We should prove the reverse inclusion.
Take any x ∈ (M ⊥ )⊥ . By part (ii), we can write x = y + z with y ∈ M and z ∈ M ⊥ .
This tells us that 0 = (x, z) = (y, z) + (z, z) = (z, z) = kzk2 . This means that z = 0, so
x = y ∈ M . Hence, we’ve shown that (M ⊥ )⊥ ⊂ M . Therefore, (M ⊥ )⊥ = M . 
The main content of this lemma was part (ii).
11.2. Linear functionals. Let X be any normed space.
Definition 11.3. f is a linear functional on X if f : X → R (if X is a real vector space) or
if f : X → C (if X is a complex vector space).
Lemma 11.4. Let f be such a linear functional on X. The following are equivalent:
(1) f is continuous (at each point of X)
(2) f is continuous at 0
(3) there exists M with |f (x)| ≤ M kxk for every x ∈ X.
When the final condition holds, we say that f is a bounded linear function. The final
condition is called Lipschitz continuity.
Proof. (1) =⇒ (2) is trivial.
We want to prove that (2) =⇒ (3). We use the ε-δ definition of continuity at 0 with
ε = 1. This means that there exists δ > 0 such that |f (x)| = |f (x) − f (0)| < 1 whenever
kxk < δ. Take x ∈ X \ {0}. Then
 
δ x
f < 1,
2 kxk
22
so hence |f (x)| ≤ 2δ kxk for every x ∈ X. This proves (2) =⇒ (3).
Finally, we will show that (3) =⇒ (1). Take x, y ∈ X, and |f (x) − f (y)| = |f (x − y)| ≤
M kx − yk, which implies continuity at y. 
From now on, we will usually refer to bounded linear functionals instead of continuous
linear functionals; this is more convenient for our purposes.
Definition 11.5. If f is a bounded linear functional on a nontrivial space X, we define
 
|f (x)|
kf k = sup : x ∈ X \ {0}
kxk
This makes sense because f is a bounded linear functional, so that the set above is
nonempty and bounded.
Definition 11.6. Let X ∗ be the set of all bounded linear functionals on X.
Note that X ∗ is a linear space.
Proposition 11.7. kf k is a norm.
Proof. We check each of the three properties:
(1) kλf k = |λ| kf k.
(2) kf + gk ≤ kf k + kgk.
(3) kf k ≥ 0, and kf k = 0 if and only if f = 0. 
Proposition 11.8. X ∗ is a Banach space.
We will prove this proposition later, but it is fairly elementary, so try to do this on your
own.
Theorem 11.9 (Riesz(-Fréchet) representation theorem). Suppose that X is any Hilbert
space, any f is any bounded linear functional on X. Then there exists y ∈ X such that
f (x) = (x, y) for all x ∈ X, and such y is unique.
Proof. First, we consider uniqueness. We have (x, y) = (x, ỹ) for all x ∈ X, Then (x, y − ỹ) =
0 for every x ∈ X. In particular, choose x = y − ỹ, so that ky − ỹk2 = (y − ỹ, y − ỹ) = 0, so
that y = ỹ.
Now, let K = ker f = {x ∈ X : f (x) = 0}. This is clearly a linear subspace of X. We
claim that it is closed. Suppose that yk → y with yk ∈ K for every K. Then f (y) = lim f (yk )
by continuity of f at y, but f (yk ) = 0 for every k. Hence f (y) = 0 and hence y ∈ K.
There are two possibilities. If f = 0 then K = X. This implies that we can take
y = 0. We can therefore safely assume that f is not the zero functional. This means that
K 6= X, so there exists a point z ∈ X \ K, and we can choose z ∈ K ⊥ \ {0}. Observe that
f (x) − λf (z) = f (x − λz) = 0 when λ = ff (x) (z)
, which means that x − ff (x)
(z)
z ∈ K, so it is
orthogonal to z, i.e.  
f (x) 2 f (x)
(x, z) − kzk = x − z, z = 0,
f (z) f (z)
so hence
f (x)
(x, z) = kzk2 ,
f (z)
23
and hence !
f (z)
f (x) = x, z ,
kzk2
f (z)
so we can simply choose y = kzk2
z. 

12. 4/22
Let’s finish our discussion of the Lebesgue integral. Recall that we proved the Monotone
Convergence Theorem 10.3.
This has a very useful corollary:
1
nR a givenofunction f : [a, b] → R and {fk } ⊂ L ([a, b]) with fk → f
Corollary 12.1. We have
almost everywhere, and [a,b] |fk | is bounded, then f ∈ L1 .
k=1,2,...

This has not many hypothesis.


Proof. The proof is two applications of the Monotone Convergence Theorem. Without loss
of generality, we may assume fk ≥ 0 for all k because otherwise we can write fk = fk+ − fk− ,
with fk+ and fk− as L1 functions converging to f+ and f− respectively.
Suppose that l ≥ 1, and define the increasing sequence
Fk,l = − min {fk , . . . , fk+l } ,
for l = 1, 2, . . . . Each of these is bounded above by zero, so the Monotone Convergence
Theorem applies. There is therefore some Fk = liml→∞ Fk,l ∈ L1 ([a, b]). Here,
Fk (x) = − inf {fk (x), . . . , fk+l (x)} ,
so that
−Fk = inf {fk (x), . . . , fk+l (x)} ≤ fk .
is an increasing sequence bounded above by fk . We can again apply the Monotone Conver-
gence Theorem. Then lim −Fk = f almost everywhere, so f ∈ L1 ([a, b]). 
Definition 12.2. Define L2 ([a, b]) to be the set of functions f : [a, b] → R such that both
f, f 2 ∈ L1 .
Proposition 12.3. L2 ([a, b]) is a linear space.
Proof. It is clear that if f ∈ L2 and λ ∈ R, then trivially λf ∈ L2 .
What’s less obvious is that if f, g ∈ L2 then f + g ∈ L2 . We should check this. We are
given f, f 2 , g, g 2 ∈ L1 . There exists a sequence ϕk of step functions with ϕk → f almost
2
everywhere. Likewise, there exists a sequence R R 2with ψk → f almost
ψk of step functions
everywhere. We can arrange to have lim |f − ϕk | → 0 and lim |f − ψk | → 0. Without
loss of generality, we can take ψk ≥ 0 because otherwise, replace ψk by max {ψk , 0}.
Recall that 
1
 h(x) > 0
sgn h(x) = −1 h(x) < 0

0 h(x) = 0.
√ 2
Let Ψk = sgn R {ϕk }2 ψk . Then Ψk (x) → R f2(x) Ralmost everywhere, and Ψk ≤ ψk , which
implies that [a,b] Ψk is bounded (in fact, Ψk ≤ ψk ).
24
Similarly,R there exists a sequence Φk of step functions with Φk (x) → g(x) almost every-
where and [a,b] Φ2k bounded. Therefore, (Φk + Ψk )2 → (f + g)2 almost everywhere. Also,
R
(Φk + Ψk )2 ≤ 2(Φ2k + Ψ2k ), so (Ψk + Φk )2 is bounded.
Finally, clearly f + g ∈ L1 , and by the Corollary 12.1 implies that (f + g)2 ∈ L1 . 
1
Suppose that f, g ∈ L2 [a, b]. Then f g = 4
((f + g)2 − (f − g)2 ) ∈ L1 . We can therefore
define an semi-inner product:
Definition 12.4. Z
(f, g)L2 = fg
[a,b]
for any f, g ∈ L2 .
We should check the properties. Trivially, this is linear in f , and (f, g) = (g, f ). It mildly
fails the last property: it is true that (f, f ) ≥ 0, and (f, f ) = 0 if and only if f = 0 almost
everywhere. We can define the L2 -seminorm:
p
Definition 12.5. kf k2 = kf kL2 = (f, f ).
The triangle inequality still holds: kf + gkL2 ≤ kf kL2 + kgkL2 .
Happily, that’s enough to show the Cauchy-Schwarz inequality:
Proposition 12.6. |(f, g)L2 | ≤ kf kL2 kgkL2 . Equivalently,
Z sZ Z
fg ≤ f 2 g2.
[a,b] [a,b] [a,b]

We proved that L1 is complete with respect to the L1 -seminorm. We claim that this is
also true in the L2 case.
Proposition 12.7. L2 is complete with respect to the seminorm k·k2 , that is, if {fk } ⊂
L2 [a, b] is a Cauchy sequence with respect to this seminorm k·k2 (i.e. for every ε > 0 there
exists N such that kfk − fk k2 < ε for every k > l ≥ N ), then there exists f ∈ L2 [a, b] with
kfk − f k2 → 0.
Proof. We can always write fk = fk+ − fk− , so hence without loss of generality we can take
fk ≥ 0. Then by the Cauchy-Schwarz inequality,
Z Z
2 2
kfk − fl k1 = |fk − fl | = |fk − fl |(fk + fl ) ≤ kfk − fl k2 kfk + fl k2 .
[a,b] [a,b]

If we use the Cauchy sequence property with ε = 1, then there exists some N1 such that
kfk − fl k2 < 1 for every k > l ≥ N1 . In particular, this means that kfk − fN1 k2 ≤ 1 for
every k ≥ N1 . By the triangle inequality, we can then write
kfk k2 = kfk − fN1 + fN1 k2 ≤ kfk − fN1 k2 + kfN1 k2 ≤ 1 + kfN1 k2
for all k ≥ N1 , so kfk k2 is bounded. Hence, kfk + fl k2 ≤ kfk k2 + kfl k2 ≤ C, so therefore
fk2 − fl2 1
≤ Cε
for all k > l > N . Therefore, {fk2 } is RCauchy with respect to the L1 -seminorm k·k1 . There-
fore,
 there exists h ∈ L1 ([a, b]) with |fl2 − h| → 0. Therefore, there exists a subsequence
fkj with fk2j → h almost everywhere.
25
Also, by Cauchy-Schwarz,

Z
kfk − fl k1 = |fk − fl | · 1 ≤ kfk − fl k2 · b − a,
[a,b]

i.e. {fk } is a Cauchy sequence in L1 , and so hence there exists f ∈ L1 with kfk − f k1 → 0.
There is again a subsequence {flj } with flj (x) → f (x) for almost every x. We can take
flj to be a subsequence of the fkj because fkj is also Cauchy in L1 . Therefore, we have a
subsequence {flj } such that flj → f almost everywhere (with f ∈ L1 ) and fl2j → h with
h ∈ L1 . Hence, h = f 2 almost everywhere, and hence f 2 ∈ L1 . Therefore, f ∈ L2 .
We need to check convergence in the L2 norm. Then by Cauchy-Schwarz,
Z Z
2 2
kfk − f k2 = (fk − f ) = |fk − f ||fk − f |
[a,b] [a,b]
Z
≤ |fk − f |(fk + f ) = fk2 − f 2 1 → 0,
[a,b]
2
so hence L is a complete space. 
There are still a few points to clean up, such as the proof of the technical Lemma 8.7.
Also, we’ll turn the seminorm to an actual norm through an underhanded trick.
13. 4/25
We will finish the discussion of the Lebesgue integral. We want to get around the difficulty
that our supposed “norm” is only a seminorm. R
Recall that for f ∈ L1 [a, b], we have the seminorm kf k1 = [a,b] |f |. Similarly, for f ∈
R
L2 [a, b], we have a semi-inner product (f, g) = [a,b] f g.
The problem is that kf k1 = 0 or kf k2 = 0 implies only that f (x) = 0 almost everywhere.
To get around this, instead of considering functions, we consider classes of functions.
Definition 13.1. The L1 class of f is defined as
f = g ∈ L1 [a, b] : g(x) = f (x) for almost every x ∈ [a, b] .


Note that g ∈ f if and only if f ∈ g if and only if f = g. Treat these as elements in a new
space instead of thinking of them as classes.
We can now define f + g = f + g. We should check that this makes sense. Indeed, if
f1 ∈ f and g1 ∈ g, then certainly f1 + g1 ∈ f + g. Similarly, λf = λf . Then we can define

Definition 13.2. L1 [a, b] = f : f ∈ L1 [a, b] is a linear space.
We can now define:
R R R
Definition 13.3. [a,b] f = [a,b] f and f 1
= kf k1 = [a,b]
|f |.
Proposition 13.4. L1 [a, b] is a normed (Banach) space.
Proof. We just need to check the properties: λf 1 = |λ| f and f + g 1 ≤ f 1 + g 1 .
Also, f = 0 implies that kf k1 = 0, so that f (x) = 0 almost everywhere, and so f ∈ 0 =
{f ∈ L1 [a, b] : f (x) = 0 a.e. x ∈ [a, b]}. 
Similarly, we can define
26
 R
Definition 13.5. L2 [a, b] = f : f ∈ L2 [a, b] , and we can define (f , g) = (f, g) = [a,b] f g
q
with a norm f 2 = (f , f ). This actually is now an inner product, so this is now a genuine
Hilbert space.
To finish off the Lebesgue theory, we still need to finish the proof of our main technical
lemma 8.7.
LemmanR 13.6o (Lemma 8.7). If {ψk } is an increasing sequence of step functions on [a, b]
with ψ bounded, then {ψk (x)} is bounded (and hence convergent) for almost every
[a,b] k
x ∈ [a, b].
Proof of Lemma 8.7. Let S = {x ∈ [a, b] : {ψk (x)} is not bounded}. Therefore, for x ∈ S,
we see that limk→∞ ψk (x) = ∞. We want to prove that S has measure zero.
Let’s agree to use the notation ψ̃k (x) = ψk (x) − ψ1 (x). Notice that this is a nonnegative
function, and ψ̃1 (x) ≡ 0. Equivalently, we see that S = {x ∈ [a, b] : {ψ̃k (x)} is not bounded}.
Take an arbitrary α > 0. Let Sk = {x ∈ [a, b] : ψ̃k (x) > α}. Then Sk+1 ⊃ Sk for all k, and
S1 = ∅. We can therefore write
N
[ −1
SN = (SN \ SN −1 ) ∪ (SN −1 \ SN −2 ) ∪ · · · ∪ (S2 \ S1 ) = (Sk+1 \ Sk )
k=1
N
[ −1 n o
= x : ψ̃k (x) ≤ α < ψ̃k+1 (x) .
k=1

(k)
Recall that ψ̃k are step functions. For each k = 1, 2, . . . , consider the partition a = x0 <
(k) (k)
x1 < · · · < xNk = b so that ψ̃k is constant on each interval of this partition. Choose these
partitions to be compatible with both ψ̃k and ψ̃k+1 (e.g. by taking a common refinement).
(k) (k)
Define Pk = {(xi−1 , xi ) : i = 1, . . . , Nk }. Then Sk+1 \ Sk is a subcollection of these. Then
we have
(k) (k)
Qk ⊂ Sk+1 \ Sk ⊂ Qk ∪ {x0 , . . . , xNk }
where Qk is some subcollection of Pk .
Therefore,
N[−1 N
[ −1  
(k) (k)
Qk ⊂ SN ⊂ Qk ∪ {x0 , . . . , xNk }
k=1 k=1
N −1
Thus ψ̃N (x) > α on ∪k=1 Qk , i.e.
N
X −1 X
ψ̃N ≥ α χI .
k=1 I∈Qk

We can now integrate this. This implies that


Z N
X −1 X
C≥ ψ̃N ≥ α |I|
[a,b] k=1 I∈Qk
27
for some constant upper bound C independent of N . Hence
∞ X
X Z
α |I| ≤ C = lim ψ̃k .
k=1 I∈Qk [a,b]

Notice that by our preceding inclusions, we have


N
[ −1
SN ⊂ (Qk ) ∪ EN
k=1

where EN is finite. Then


∞ ∞ N −1
! ∞
!
[ [ [ [
SN ⊂ Qk ∪ EN .
N =1 N =1 k=1 N =1
N −1
Note that ∪∞ ∞
N =1 EN is countable and hence has measure zero, and ∪N =1 ∪k=1 QK has sum
of lengths of intervals ≤ C/α.
Observe that if we take a point y ∈ S then y ∈ Sk for all sufficiently large k, so S ⊂
∪∞N =1 SN .
We can now finishS the proof.
PLet ε > 0 be given. Choose α > C/ε. Then select J1 , J2 , . . .
with ∪N =1 EN ⊂ ∞ J
j=1 j and |Jj | < ε.
∞ ∞ N −1
! ∞
! ∞ N −1
! ∞
!
[ [ [ [ [ [ [
S⊂ SN ⊂ Qk ∪ EN ⊂ Qk ∪ Jj ,
N =1 N =1 k=1 N =1 N =1 k=1 j=1

which now has length < 2ε, so we are done. 


This was the hardest proof of the Lebesgue theory, which shouldn’t be surprising since
everything followed from it.

13.1. Linear operators. Now we will consider the next topic, which is linear operators.
Definition 13.7. Let E and F be normed spaces. Then T : E → F is a linear operator (i.e.
linear map) if T (αx + βy) = αT (x) + βT (y) for all x, y ∈ E and for all scalars α, β.
Definition 13.8. We say that T is bounded if there exists M such that kT (x)k ≤ M kxk
for all x ∈ E.
We should be somewhat careful here; kT (x)k is a norm in F , while kxk is a norm in E.
In the same way as for the linear functionals (as in Lemma 11.4), we get the following
theorem:
Theorem 13.9. T is continuous at each point of E if and only if T is continuous at 0 if
and only if T is bounded.

14. 4/29
14.1. Bounded linear operators. We started our discussion of bounded linear operators.
Let E and F be normed spaces, and suppose that T : E → F is linear. Then T is bounded
28
means that there exists M with kT (x)k ≤ M kxk for every x ∈ E. In this case, we can define
the operator norm
 
kT (x)k
kT k = sup : x ∈ E \ {0} = sup {kT (x)k : kxk = 1} .
kxk
Exercise 14.1. Check that kT k is a norm on L(E, F ).
A special case of this discussion is when F = R or C, when T is just a linear functional.
Note that T is continuous if and only if T is continuous at 0 if and only if T is bounded.
Definition 14.2. Let L(E, F ) be the set of all bounded linear maps E → F , equipped with
the operator norm.
Lemma 14.3. L(E, F ) is a Banach space provided that F is a Banach space.
The proof will be very similar to all of the other completeness proofs that we’ve seen.
Proof. L(E, F ) is clearly a normed linear space, so we just have to check completeness. Let
{Tk }k=1,2,... be any Cauchy sequence in L(E, F ). That is, let ε > 0; then there exists N
such that kTk − Tl k < ε for every l > k ≥ N . This implies that kTk (x) − Tl (x)k ≤ ε kxk for
every x ∈ E. Take any fixed x; this means that {Tk (x)}k=1,2,... is Cauchy in F and hence
convergent (since F is a Banach space) to some limit which we call T (x).
Now,
T (λx + µy) = lim Tk (λx + µy) = lim (λTk (x) + µTk (y)) = λT (x) + µT (y).
k→∞ k→∞

Hence, T : E → F is linear. Now, liml→∞ (Tk (x) − Tl (x)) = Tk (x) − T (x).


Recall that if lim zk = z then lim kzk k = kzk because | kzk k − kzk | ≤ kzk − zk.
Therefore, limk→∞ kTk (x) − Tl (x)k = kTl (x) − T (x)k, and thus kTk (x) − T (x)k ≤ ε kxk
for each x ∈ E and for all k ≥ N . This says that Tk − T ∈ L(E, F ), and it has kTk − T k < ε
for all k ≥ N . Since L(E, F ) is linear, we see that T = Tk − (Tk − T ) ∈ L(E, F ), and
furthermore kTk − T k < ε for all k ≥ N implies that limk→∞ kTk − T k = 0. This concludes
the proof. 
Recall that E ∗ is the set of all bounded linear functionals on E, i.e. it is the set of all
bounded linear maps E → R if E is a real vector space, and it is the set of all bounded linear
maps E → C if E is a complex vector space. Since R and C are complete, we can apply the
previous lemma to get the following corollary:
Corollary 14.4. Let E be a normed space. Then E ∗ is complete.
Proposition 14.5. Let E, F, G be three normed spaces, and suppose S ∈ L(E, F ) and
T ∈ L(F, G). Then T ◦ S ∈ L(E, G) and kT ◦ Sk ≤ kT k kSk.
Proof. We check this claim. Here, kT ◦ S(x)k = kT (S(x))k ≤ kT k kS(x)k ≤ kT k kSk kxk
for all x ∈ E. The desired result now follows. 
In particular, when T ∈ L(E, E), this result allows us to consider the nth power T n =
T ◦ · · · ◦ T , and by induction, kT n k ≤ kT kn . Notice that equality needn’t hold. We might
even get that T n = 0. We use the convention that T 0 = I is the identity.
29
14.2. Inverses. Now we discuss inverses.
Definition 14.6. T ∈ L(E, F ) is invertible if there exists S ∈ L(F, E) with T ◦ S = IF and
S ◦ T = IE .
Remark. There exists a linear S (not necessarily bounded) with T ◦ S = IF and S ◦ T = IE
if and only if T is both one-to-one and onto.
Theorem 14.7 (Bounded inverse theorem). Such an S is automatically bounded if E and
F are Banach spaces.
We will not prove this theorem in this course.
Remark. Suppose that T ∈ L(E, F ). If there exists S such that T ◦ S = IF then T is onto.
If there exists S such that S ◦ T = IE then T is one-to-one. These statements are not the
same. Here’s a nice example:
Example 14.8. Let E = F = `2 , and define S to be the shift operator S(x1 , x2 , . . . ) =
(0, x1 , x2 , . . . ). This is one-to-one but not onto. There is also a reverse shift operator, so
that S̃(x1 , x2 , . . . ) = (x2 , x3 , . . . ) is onto but not one-to-one.
Note that S̃ ◦ S = 1`2 is the identity, but S ◦ S̃ 6= 1`2 is not the identity.
Here is a very useful theorem about finding inverses.
Theorem 14.9. Suppose that E is Banach and T ∈ L(E, E) with kT k < 1. Then I − T is
invertible. In fact,

X
−1
(I − T ) = T n.
n=0
P∞ n
PN n
Here, we have n=0 T = limN →∞ n=0 T with the limit taken in the operator norm.
Proof. Suppose that M > N . Then
M N M M ∞  
X
n
X
n
X
n
X n
X n N +1 1
T − T = T ≤ kT k ≤ kT k = kT k →0
n=0 n=0 n=N +1 n=N +1 n=N +1
1 − kT k
nP o
N n
as N → ∞, which means that n=0 T is a Cauchy sequence with respect to the
N =1,2,...
operator norm, and is hence convergent. Hence, our infinite sum actually makes sense.
Now ∞ ∞ ∞
X X X
(I − T ) ◦ Tn = Tn − T n = I,
n=0 n=0 n=1
and similarly, ( ∞ n
P
n=0 T ) ◦ (I − T ) = I. 
This is a very useful fact.
15. 5/2
Last time we proved that if E is a Banach space, A ∈ L(E,
PE), and kAk < 1, then I − A
−1 −1 ∞
is invertible (i.e. (I − A) ∈ L(E, E)). In fact, (I − A) = n=0 An in the operator norm.
Corollary 15.1. If E and F are Banach spaces, the set of all invertible A ∈ L(E, F ) is an
open subset of L(E, F ).
30
Proof. We claim that if A ∈ L(E, F ) is invertible then B ∈ L(E, F ) is invertible for every
B ∈ L(E, F ) such that kB − Ak < kA1−1 k . First, we check this claim.
Observe that if kB − Ak < kA1−1 k then k(B − A) ◦ A−1 k ≤ kB − Ak kA−1 k < 1. This
means by Theorem 14.9, I − (B − A) ◦ A−1 is invertible, which means that (B ◦ A−1 ) ◦ A = B
is invertible.
This proves the claim and hence the corollary. 

15.1. Adjoints. We briefly digress and talk about adjoints.


Definition 15.2. If H and K are Hilbert spaces and A ∈ L(H, K) is a bounded linear
operator, then there exists an unique operator A∗ ∈ L(K, H) with (A(x), y) = (x, A∗ (y)) for
every x ∈ H and y ∈ K. Here, A∗ is called the adjoint operator.
This is a direct consequence of the Riesz representation theorem.
Proof. Homework 5, problem 3 handled the real case. The complex case is almost exactly
the same, except we need to check that linearity works out correctly. We can do this:
(x, A∗ (λy + µw)) = (A(x), λy + µw) = λ(A(x), y) + µ(A(x), w) = λ(x, A∗ y) + µ(x, A∗ w)
= (x, λA∗ y) + (x, µA∗ w)
for every x ∈ H and y, w ∈ K, so that (x, A∗ (λy + µw) − (λA∗ y + µA∗ w)) = 0 for all x ∈ H,
which means that A∗ (λy + µw) = λA∗ (y) + µA∗ (w).
We also want to say that A∗ is unique. Suppose that (x, A∗ y − Ã∗ y) = 0 for all x ∈ H
and y ∈ K. Then A∗ y = Ã∗ y, and hence A∗ is unique. 
Proposition 15.3. A∗ is bounded, kA∗ k = kAk, and in fact, (A∗ )∗ = A.
Proof. Take x = A∗ y in the definition of the adjoint, and use the Cauchy-Schwarz inequality
to see that
kA∗ yk2 = (A(A∗ (y)), y) ≤ kA(A∗ (y))k kyk ≤ kAk kA∗ yk kyk ,
which means that kA∗ yk ≤ kAk kyk for every y ∈ K, which tells us that kA∗ k is bounded
and kA∗ k ≤ kAk.
In addition, we also have that (A∗ y, x) = (y, Ax). This shows that (A∗ )∗ = A, which
means that kAk ≤ kA∗ k. 
Let’s consider what this looks like in finite dimensional space.
Example 15.4. Suppose that H = Cn and K = Cm . Then A ∈ L(Cn , Cm ) is the same as
saying that A is given by an m × n matrix. Given z = (z1 , . . . , zn ), we want to have
m X
X n
Az = aij zj ei .
i=1 j=1

Suppose that w = (w1 , . . . , wm ). In this case, the adjoint property says that
m X
X n m X
X n
(A(z), w) = aij zj wi = zj (aij wi ).
i=1 j=1 i=1 j=1
31
Here, A∗ is given by
m
X

Aw= aij wi ,
i=1
which means that A∗ = (aij )T .
Definition 15.5. A ∈ L(H, H) is self-adjoint (or in the complex case, Hermitian) if A∗ = A.
In the finite dimensional case, H = Cn . This means that (aij )T = (aij ), which means that
aij = aji for all i and j. This also means that the diagonal entries are real.
Proposition 15.6. If A ∈ L(H, H) is self-adjoint then (Ax, x) is always real.
Proof. This is because (Ax, x) = (x, Ax) = (Ax, x) by the self-adjointness property. 
Proposition 15.7. If A ∈ L(H, H) is self-adjoint, then
kAxk |(Ax, x)|
kAk = sup = sup .
x6=0 kxk x6=0 kxk2
Proof. Let m = supx6=0 |(Ax,x)|
kxk2
. Notice that Cauchy-Schwarz implies that m ≤ supx6=0 kAxk
kxk
=
kAk. We need to prove the reverse inequality.
Note that (A(x + y), x + y) = (A(x), x) + (A(y), y) + (A(y), x) + (A(x), y). Here, notice
that (A(y), x) = (x, A(y)) = (A(x), y). Then we have
(A(x + y), x + y) = (A(x), x) + (A(y), y) + 2 Re(A(x), y)
(A(x − y), x − y) = (A(x), x) + (A(y), y) − 2 Re(A(x), y).
Subtracting, we get that
4 Re(A(x), y) = (A(x + y), x + y) − (A(x − y), x − y)
≤ m(kx + yk2 + kx − yk2 ) ≤ 2m(kxk2 + kyk2 ).
kxk
Suppose that A(x) 6= 0. Then choose y = kAxk A(x), to get 4 kxk kA(x)k ≤ 4m kxk2 .
Therefore, kAxk ≤ m kxk for every x, so therefore kxk ≤ m, which means that kAk = m. 
16. 5/4
Today we will talk about the spectrum. Suppose that E is a complex Banach space and
suppose that we have a bounded linear map A ∈ L(E, E).
16.1. Spectrum.
Definition 16.1. The spectrum of A is σ(A) = {λ ∈ C : λI − A is not invertible}.
Recall that we say that A ∈ L(E, E) is invertible if A is 1-1, onto, and A−1 ∈ L(E, E).
Remark. {λ : λI − A is not 1:1} ∪ {λ : λI − A is not onto} ⊂ σ(A).
If λI − A is not 1:1, then there exists x 6= 0 with (λI − A)(x) = 0, i.e. Ax = λx. Here,
λ is an eigenvalue, and this means that all eigenvalues are in σ(A). Note that the spectrum
might be much bigger, however.
Example 16.2. Let E = Cn with the usual inner product norm. A ∈ L(E, E) is given by
an n × n matrix, so σ(A) is exactly the set of eigenvalues.
32
Example 16.3. Let E = `2C . Take any z = (z1 , z2 , . . . ) ∈ `2C . Let S be the shift oper-
ator S(z) = (0, z1 , z2 , . . . ). Clearly, S ∈ L(E, E); in fact, it is norm preserving. It has
no eigenvalues. To see this, suppose that there were some z such that Sz = λz. Then
(0, z1 , z2 , . . . ) = (λz1 , λz2 , . . . ). This means that z1 = 0, which means that z2 = 0, . . . . That
is, z = 0, and hence no λ is an eigenvalue. However, the spectrum of this shift operator is
actually very big: σ(S) = {λ ∈ σ : |λ| ≤ 1}.
Lemma 16.4. σ(A) is a closed subset of {λ ∈ C : |λ| ≤ kAk}.
Proof. Take any λ ∈ / σ(A), which implies that λI − A is invertible. Recall that the set of
invertible operators in L(E, E) is an open subset of L(E, E) with respect to the operator
norm. Then there exists δ > 0 such that kB − (λI − A)k < δ implies that B is invertible.
In particular, if µ ∈ C then |µ − λ| < δ implies that k(µI − A) − (λI − A)k < δ, and hence
µI − A is invertible. We have therefore shown that σ(A) is closed.
Now, suppose that |λ| > kAk. Then λI − A = λ(I − λ1 A). Note that Aλ = kAk |λ|
< 1, so
by our general theorem 14.9, this means that λI − A is invertible.
This means that no λ with |λ| > kAk is in σ(A), or in other words, we’ve proved that
σ(A) ⊂ {λ : |λ| ≤ kAk}. 
16.2. Compact operators.
Definition 16.5. Suppose that E and F are normed linear spaces. A linear operator T :
E → F is compact if T ({x ∈ E : kxk ≤ 1}) is contained in a compact subset of F .
Remark. Note that this is the same as saying that for all R > 0, we know that T ({x ∈ E :
kxk < R}) is contained in a compact set.
T compact implies that T is bounded. Assume otherwise; then T is not bounded. Then
there exists a sequence {xk } ⊂ E with kxk k = 1 and kT (xk )k ≥ k, which means that
there does not exist a convergent subsequence of {T (xk )}. If there were such a convergent
subsequence T (xkj ) → y then T (xkj ) → kyk.
Example 16.6. Suppose that E is an infinite dimensional normed space. Then IE ∈ L(E, E)
is certainly bounded, but it is not compact because the closed unit ball in E is not compact:
there exists a sequence e1 , e2 , · · · ∈ E with kej k = 1 for all j and kei − ej k ≥ 1 with i 6= j.
Example 16.7. Suppose that T ∈ L(E, F ) with H = T (E) as a finite dimensional subspace
of F . We say that T is an operator of finite rank, and we claim that T is compact.
We should check this. Take any sequence {T (xk )}k=1,2,... ⊂ H, where xk ∈ E and kxk k ≤ 1.
Since H is finite-dimensional, this means that all bounded subsets of H have compact closure.
Thus there exists a convergent subsequence {Tykj }.
There is an extremely important theorem.
Theorem 16.8. If E and F are Banach spaces and we take a sequence {Tk }k=1,2,... ⊂ L(E, F )
with each Tk compact and kTk − T k → 0 for some T ∈ L(E, F ) (i.e. Tk → T in L(E, F )).
Then T is compact.
Proof. Take any sequence {T (xk )}k=1,2,... with kxk k ≤ 1. Then {T1 (xk )} has a convergent
subsequence T1 (x1,k ). Now, {T2 (x1,k )} has a convergent subsequence T2 (x2,k ). The important
thing is that {x2,k } is a subsequence of {x1,k } is a subsequence of {xk }. We continue this
33
process of finding subsequences of subsequences. Inductively, Tq (xq−1,k ) has a convergent
subsequence Tq (xq,k ).
Now we use a standard trick in analysis by taking the diagonal sequence. Let ε > 0 be
given. Consider {xq,q }. This is a subsequence of {xq,k }k=q,q+1,... . Therefore, {Tp (xq,q )}q=1,...
converges. That’s because {xq,q }q=p,p+1,... is also a subsequence of {xp,k }k=1,2,... .
The desired result should now follow easily, and we’ll finish it next time. 

17. 5/6
We have Banach spaces E and F , and Tk → T in L(E, F ). We want to show that if Tk is
compact then T is compact. This is Theorem 16.8.
Finishing proof of theorem 16.8. We proved that for any sequence {xk }k=1,2,... with kxk k ≤ 1
for all k, there exists a subsequence {xq,q }q=1,2,... of {xk }k=1,2,... such that {Tp (xq,q )}q=1,2,...
converges for each p. This was done with a diagonal argument. We claim that {T (xq,q )}q=1,2,...
converges.
Let ε > 0 be given. Pick p such that kTp − T k < ε. Using the Cauchy property, we can
pick N such that kTp (xq,q ) − Tp (xr,r )k < ε for every q > r ≥ N . Then
kT (xq,q ) − T (xr,r )k = kT (xq,q ) − Tp (xq,q ) + Tp (xq,q ) − Tp (xr,r ) + Tp (xr,r ) − T (xr,r )k
≤ kT (xq,q ) − Tp (xq,q )k + kTp (xq,q ) − Tp (xr,r )k + kTp (xr,r ) − T (xr,r )k
≤ kT − Tp k + ε + kTp − T k < 3ε
for every q > r ≥ N . Hence, {T (xq,q )}q=1,2,... is Cauchy and hence convergent. 
17.1. Hilbert-Schmidt operators.
Definition 17.1. Suppose that H is a Hilbert space and Y is a Banach space. An operator
T ∈ L(H, Y ) is said to be a Hilbert-Schmidt
P∞ operator if there exists a complete orthonormal
sequence e1 , e2 , . . . for H such that j=1 kT (ej )k2 < ∞.
Remark. It doesn’t matter which complete orthonormal
P∞ sequence we use. If (f1 , f2 , . . . )
2
is any other complete orthonormal sequence then j=1 kT (f j )k < ∞ as well. This is
something that will be checked on the homework.
Hilbert-Schmidt operators are important because a lot of common operators satisfy this
property, and because of the following theorem.
Theorem 17.2. Hilbert-Schmidt operators are automatically compact.
Proof. Let xj = (x, ej ). Define TN (x) = T ( N
P PN
j=1 xj ej ) = j=1 xj T (ej ). Then TN (H) is a
finite-dimensional space TN (H) = span {T (e1 ), . . . , T (eN )}. That is, TN is finite rank and
hence compact by example 16.7. Then

! N
! ∞
! ∞
X X X X
(T − TN )(x) = T xj e j − T xj e j = T xj ej = xj T (ej ).
j=1 j=1 j=N +1 j=N +1

We should be careful and check that this final equality holds. We have
∞ ∞
M
! M
X X X X
xj ej = lim xj ej =⇒ T xj ej = lim xj T (ej ).
M →∞ M →∞
j=N +1 j=N +1 j=N +1 j=N +1
34
For M2 > M1 , we can write
M2
X M1
X M2
X M2
X
xj T (ej ) − xj T (ej ) = xj T (ej ) ≤ |xj | kT (ej )k
j=N +1 j=N +1 j=M1 +1 j=M1 +1
v
u M2
u X
≤ kxk t kT ej k2 ,
j=M1 +1

which goes to zero because T is Hilbert-Schmidt, and T ( ∞


P
j=1 xj ej ) exists.
We now have
v v
u ∞ u ∞
u X u X
2
k(T − TN )(x)k ≤ kxk t kT (ej )k =⇒ kT − TN k ≤ t kT (ej )k2 → 0
j=N +1 j=N +1

as N → ∞. 

We should see that not all compact operators are Hilbert-Schmidt.


Example 17.3. Recall that we have the space `2 = {x = (x1 , x2 , . . . ) : xj ∈ R, x2j < ∞}.
P
This has the usual inner product norm.
Consider T : `2 → `2 defined by T (x) = (x1 , √x22 , √x33 , . . . , √xnn , . . . ). We claim that this is a
compact operator that is not Hilbert-Schmidt.
Let e1P= (1, 0, . . . ), e2 = (0, 1, 0, . . . ) be the standard complete orthonormal Pbasis. Then
∞ xj 1
T (x) = j=1 j ej . Now, T (ej ) = j for all j = 1, 2, . . . , which implies that
√ √ kT (ej )k2 =
P1
j
= ∞. Therefore, T is not Hilbert-Schmidt.
x
We still need to show that T is compact. Consider TN (x) = N
P
√j
j=1 j ej . These have finite
rank and are hence compact. Then
∞ 2 ∞ ∞
X x x2j 1 1
√j ej
2
X X
kT (x) − TN (x)k = = ≤ x2j ≤ kxk2 .
j=N +1
j j=N +1
j N + 1 j=N +1
N + 1

This means that kT − TN k ≤ √ 1 → 0, and hence T is compact.


N +1

17.2. Spectral theorem. This the simplest spectral theorem, for compact Hermitian oper-
ators. We might have seen the spectral theorem for symmetric finite dimensional matrices.
This is the generalization to infinite dimensional complex space.
Theorem 17.4. Suppose that H is a Hilbert space and T is a complex Hermitian operator,
i.e. (T x, y) = (x, T y) for all x, y ∈ H. Suppose T ∈ L(H, H). Then (ker T )⊥ = T (H) (the
closure of T (H)). Equivalently, we can write ker T = (T (H))⊥ .
If T (H) is infinite-dimensional, there exists a complete orthonormal sequence e1 , e2 , . . .
for T (H) such that T (ej ) =P λj ej for every j, where λj are real numbers and λj → 0 as
j → ∞. Also, for every x = ∞ ⊥
P∞
x e
j=1 j j ∈ (ker T ) , so that T (x) = j=1 j j ej .
λ x
If T (H) is finite dimensional, then there exists an orthonormal basis e1 , . . . , eN with
T (ej ) = λj ej for all j = 1, . . . , N .
35
18. 5/9
We will prove the Spectral Theorem 17.4. This requires three lemmas. The first two only
require Hermitian and do not need compactness.
Lemma 18.1. Let T : H → H be Hermitian. Then all of its eigenvalues are real and
eigenvectors corresponding to distinct eigenvalues are orthogonal.
Proof. Suppose that T x = λx for x 6= 0. Then (T x, x) = λ(x, x) = λ kxk2 . Note that
(T x, x) = (x, T x) so therefore this is real, and hence λ is real.
Now, suppose that T x = λx and T y = λy for λ 6= µ. Then (T x, y) = λ(x, y) and
(x, T y) = µ(x, y), which means that λ(x, y) = µ(x, y), which means that 0 = (λ − µ)(x, y),
which means that (x, y) = 0. 
Lemma 18.2. Suppose that T : H → H is Hermitian, and suppose that L is any subspace
of H with T (L) ⊂ L. Then T (L⊥ ) ⊂ L⊥ .
Proof. Take y ∈ T (L⊥ ), that is y = T (x) with x ∈ L⊥ . We need to check that y is orthogonal
to L. Suppose that z ∈ L. Then by the Hermitian property, we have (z, T (x)) = (T (z), x) =
0 because T (z) ∈ L and x ∈ L⊥ . This means that y = T (x) ∈ L⊥ . 
These two lemmas were straightforward and just required following the definitions. The
next lemma is slightly more complicated and is the main ingredient of the proof of the
Spectral Theorem.
Lemma 18.3. Suppose that T : H → H is nonzero, compact, and Hermitian. Then there
exists a vector e1 with ke1 k = 1 and T (e1 ) = λ1 e1 where either λ1 = kT k or λ1 = − kT k.
Therefore, either kT k or − kT k is an eigenvalue of T .
Proof. Recall from Proposition 15.7 that for Hermitian operators, kT k = supkxk=1 |(T x, x)|.
Then there exists a sequence {xk }k=1,2,... with kxk k = 1 for all k and (T (xk ), xk ) → λ1 , where
either λ1 = kT k or λ1 = − kT k. We need to show that xk converges, because then the limit
could be taken as our e1 .
Note that (using the Hermitian property),
kT (xk ) − λ1 xk k2 = kT (xk )k2 + λ21 − 2λ1 (T (xk ), xk ) ≤ 2λ21 − 2λ1 (T (xk ), xk ) → 0
as k → ∞. This means that T (xk ) − λ1 xk → 0 in H. Since T is compact, T (xk ) must have
a convergent subsequence. Then pick {xkj }j=1,2,... such that y = lim T (xkj ) exists. Then
T (xkj ) − λ1 xkj → 0 in H, which means that λ1 xkj → y and hence xkj → λy1 . Therefore
T (xkj ) → T ( λy1 ). This means that y = T ( λy1 ), or T y = λ1 y. We can now pick e1 = y/λ1 , and
this concludes the proof. 
This is a powerful result, and this is the main result and the inductive step of the proof
of the Spectral Theorem.
We are now ready to prove the Spectral Theorem.
Proof of the Spectral Theorem 17.4. First, Lemma 18.3 implies that we have T (e1 ) = λ1 e1
with λ1 6= 0 and ke1 k = 1. Then either T |(span{e1 })⊥ = 0 or we can repeat this step with
(span{e1 })⊥ in place of H and T |(span{e1 })⊥ in place of T . This works because Lemma 18.2
implies that T ((span{e1 })⊥ ) ⊂ (span{e1 })⊥
36
In the first case, we have (span {e1 })⊥ ⊂ ker T . Also, Lemma 18.1 implies that span {e1 } ⊂
(ker T )⊥ , which implies that ker T ⊂ (span {e1 })⊥ . This means that (span {e1 })⊥ = ker T ,
so that span {e1 } = (ker T )⊥ . In this case, we are done with the proof. This is the one-
dimensional case of the Spectral Theorem.
Now, we consider the inductive step. This is more or less the same as the preceding argu-
ment. Assume that k ≥ 2 and we have orthonormal e1 , . . . , ek−1 with span {e1 , . . . , ek−1 }⊥
and with T (ej ) = λj ej for λj 6= 0. Then by Lemma 18.3, either T |span{e1 ,...,ek−1 }⊥ = 0 or there
exists ek ∈ (span {e1 , . . . , ek−1 })⊥ with T (ek ) = λk ek for λk 6= 0. By Lemma 18.2, we then
have T (span {e1 , . . . , ek−1 }⊥ ) ⊂ span {e1 , . . . , ek−1 }⊥ .
In the first case, we have (span {e1 , . . . , ek })⊥ ⊂ ker T , and by Lemma 18.1, we have
span {e1 , . . . , en } ⊂ (ker T )⊥ . Therefore, just as before, ker T = (span {e1 , . . . , ek−1 })⊥ and
hence (ker T )⊥ = span {e1 , . . . , ek−1 }. In this case, we terminate and are done. This is the
finite dimensional case.
Therefore, we’ve proved that either the process terminates and we are in the finite di-
mensional case of the theorem (i.e. (ker T )⊥ is a finite dimensional subspace) or we get an
orthonormal sequence e1 , e2 , . . . with T (ej ) = λj ej with λj 6= 0 for all j and ej ∈ (ker T )⊥ .
We claim that in this case, e1 , e2 , . . . is complete in (ker T )⊥ . This is tricky to check.
Suppose that n ≥ 1. Then x ∈ H implies that x = yn +xn where yn ∈ (span {e1 , . . . , en })⊥
and xn ∈ span {e1 , . . . , en }. Note that in this case, Pythagoras’s Theorem holds, so that
kxk2 = kyn k2 + kxn k2 ≥ kyn k2 . In addition, we know that T (x) = T (yn ) + T (xn ).
In Homework 6, question 4, we showed that λj → 0. In addition, we have

kT (yn )k ≤ T |span{e1 ,...,en }⊥ kyn k

where |λn+1 | = T |span{e1 ,...,en }⊥ . This means that kT (yn )k ≤ |λn+1 | kxk → 0, and hence
T (xn ) → T (x).
Recall that we have xn = nj=1 (x, ej )ej , so we have
P

n
X
(x, ej )λj ej → T (x),
j=1

and
P∞ hence this series converges with respect to the inner product norm. Therefore T (x) =
j=1 λj (x, ej )ej .
It remains to check completeness. We know from Lemma 18.1 that span {e1 , e2 , . . . } ⊂
(ker T )⊥ . Also, x ∈ (span {e1 , e2 , . . . })⊥ , which means that T (x) = 0 and hence that x ∈
ker T . Therefore, (span {e1 , e2 , . . . })⊥ ⊂ ker T . Note that span {e1 , e2 , . . . } is not closed, so

hence span {e1 , e2 , . . . }⊥ = span {e1 , e2 , . . . }. That’s the end of the proof. 

19. 5/11
First, we need to finish the last few lines of the proof of the Spectral Theorem 17.4.
End of proof of Spectral Theorem 17.4. We have already showed that either there exists some
e1 , . . . , en orthonormal with span {e1 , . . . , en } = (ker T )⊥ and T ej = λj ej for all j = 1, 2, . . . , n

or there exists an orthonormal sequence P∞ e1 , e2 , . . . with span {e1 , e2 , . . . } ⊂ (ker T ) and
T (ej ) = λj ej for all j and T (x) = j=1 λj (x, ej )ej for all x. Our inclusion shows that
37
ker T ⊂ (span {e1 , e2 , . . . })⊥ . Also, x ∈ (span {e1 , e2 , . . . })⊥ , which implies that T (x) = 0
and hence x ∈ ker T . This means that (span {e1 , e2 , . . . })⊥ ⊂ ker T .
Therefore, ker T = (span {e1 , e2 , . . . })⊥ and hence (ker T )⊥ = span {e1 , e2 , . . . }. This
means that span {e1 , e2 , . . . } is dense in (ker T )⊥ . That is, e1 , e2 , . . . is a complete orthonor-
mal sequence in (ker T )⊥ . This concludes the end of the proof of the Spectral Theorem. 
19.1. An application of the Spectral Theorem. We will spend the next few lectures on
an application of the spectral theorem. We will give an application to ordinary differential
equations. There are equally important applications to partial differential equations, but
those require more machinery than what we have available.
Consider an interval [a, b] and two real valued function p : [a, b] → R is C 1 and q : [a, b] → R
is C 0 . Assume that p > 0 on [a, b].
For u ∈ C 2 ([a, b]), consider the differential operator Lu = (pu0 )0 + qu = pu00 + p0 u0 + qu.
There is an existence and uniqueness theorem for ordinary differential equations, which
we will not prove in this class.
Theorem 19.1 (Existence and uniqueness theorem). Let g : [a, b] → R be a given C 0
function, and suppose we are given that u(t0 ) = c1 and u0 (t0 ) = c2 at some given point
t0 ∈ [a, b]. There exists a unique C 2 ([a, b]) function u with Lu = g on [a, b] and u(t0 ) = c1
and u(t0 ) = c2 .
We will be particularly interested in what is called an “eigenvalue problem”, and we’ll see
that this relates to a certain Hermitian operator.
19.2. Sturm-Liouville.
Problem 19.2. We are given the Sturm-Liouville eigenvalue problem:

−Lu = λu on [a, b]

αu(a) + βu0 (a) = 0 for given α, β not both zero
γu(b) + δu0 (b) = 0 for given γ, δ not both zero.

Example 19.3. We do a very simple example. Consider the interval [a, b] = [0, π] with
p ≡ 1 and q ≡ 0 on the interval. Take (α, β) = (1, 0) and (γ, δ) = (1, 0). In this case, our
problem is (
−u00 = λu on [a, b]
u(0) = u(π) = 0.
√ √
In the case λ > 0, the general solution is u = A cos λt + B sin λt. Which of these
satisfy the boundary conditions?
√ Checking u(0) = 0, we see that A = 0. Then u(π) = 0
forces either B = 0 or λ = j for j = 1, 2, . . . . In particular, if u is nonzero then λ = j 2 and
the corresponding solution is uj = B sin jt.
In the case λ = 0, we get u00 = 0, which means that u = At + B. The boundary conditions
force A = B = 0, so there are no nonzero √
solutions.

−λt
In the case λ < 0, we have u = Ae + Be− −λt , and again, the boundary conditions
imply that A = B = 0.
The point is that there are some special values of λ that work. In addition, the functions
uj form a complete orthonormal sequence for L2R [0, π].
38
We want to show that the same type of behavior occurs for the general Sturm-Liouville
problem. We will end up showing that there exist a complete orthonormal sequence of
eigenfunctions.
Proposition 19.4. Suppose that u, v ∈ C 2 ([a, b]) are not identically zero, and suppose that
u(t) v(t)

Lu = 0 and Lv = 0. Then either u ≡ cv for some constant c or the vectors u0 (t) and v0 (t)
are linearly independent vectors in R2 for every t ∈ [a, b].
Proof. We can check this. If uu(t 0)
and vv(t 0)
are linearly dependent then either uu(t 0)
  
0 (t )
0
0 (t )
0
0 (t )
0
=
v(t0 ) v(t0 ) u(t0 )
  
c1 v0 (t0 ) or v0 (t0 ) = c2 u0 (t0 ) (these cases are not the same in the case when one of the
vectors is 0).
We can just consider the first case; the second case is the same. We then have u(t0 ) −
c1 v(t0 ) = 0 and u0 (t0 ) − c1 v 0 (t0 ) = 0, which means that w(t0 ) = w0 (t0 ) = 0 for w = u − c1 v.
By the uniqueness theorem, this means that w ≡ 0, which means that u ≡ cv. This proves
the proposition. 
Corollary 19.5. In the case Lu = 0 and Lv = 0, we have either u = cv or uv 0 − vu0 6= 0
for all t ∈ [a, b]. Actually, p(uv 0 − vu0 ) is constant.
Proof. The first statement is simply a restatement of the preceding proposition. Now, we
have
d d
(p(uv 0 − vu0 )) = ((pv 0 )u − (pu0 )v) = (pv 0 )0 u + pv 0 u0 − p(u0 )0 vv = −qvu + quv = 0,
dt dt
0 0
so indeed p(uv − vu ) is constant. 
Here, the quantity p(uv 0 − vu0 ) is called the Wronskian.
Proposition 19.6. As before, suppose that u, v ∈ C 2 ([a, b]) with Lu = 0 and Lv = 0 and
u, v not identically zero. Assume that 0 is not an eigenvalue of the Sturm-Liouville problem,
and we have the boundary conditions
(
αu(a) + βu0 (a) = 0
γv(b) + δv 0 (b) = 0,
where each of u and v satisfies the boundary conditions of the Sturm-Liouville problem on
one endpoint. Then u 6= cv, and hence, p(uv 0 − vu0 ) ≡ c 6= 0 on [a, b].
Proof. Otherwise, we would have that both u and v would satisfy the same Sturm-Liouville
problem with λ = 0, which is a contradiction.
Therefore, we see that p(uv 0 − vu0 ) ≡ c 6= 0 on [a, b]. 

20. 5/13
Recall that we are talking about the Sturm-Liouville eigenvalue problem 19.2. We have
Lu = (pu0 )0 + qu for p ∈ C 1 ([a, b]) and q ∈ C 0 ([a, b]). Assume for the moment that λ = 0 is
not an eigenvalue.
Recall that we showed last time that if Lu = 0 and Lv = 0, and each of u and v satisfy
one of the two boundary conditions, then we have uv 0 − vu0 6= 0. Also p(uv 0 − vu0 ) = c
is a constant called the Wronskian. Note that we can always get such u and v due to the
39
existence and uniqueness theorem. That is, we can solve Lu = 0 with u(a) = c1 , u0 (a) = c2
with c1 α + c2 β = 0, and we can solve Lv = 0 with v(b) = d1 , v 0 (b) = d2 with d1 γ + d2 δ = 0.
Problem 20.1. For a given g ∈ C 0 ([a, b]), we want to solve

Lw = g

αw(a) + βw0 (a) = 0
γw(b) + δw0 (b) = 0

To do this, we use the method of variation of parameters. The goal is to find w in the
form w = ϕu + ψv with some ϕ and ψ to be chosen. We want to find ϕ and ψ.
Note that w0 = ϕu0 + ψv 0 + uϕ0 + vψ 0 . Suppose that we stipulate that ϕ0 u + ψ 0 v = 0. Then
(pw0 )0 = (ϕ(pu0 ))0 + (ψ(pv 0 ))0 = −quϕ + pu0 ϕ0 − qvψ + pψ 0 v 0 = −qw + p(u0 ϕ0 + ψ 0 v 0 ). Hence
Lw = (pw0 )0 + qw = p(u0 ϕ0 + ψ 0 v 0 ) = g if and only if u0 ϕ0 + v 0 ψ 0 = gp . Therefore we have two
equations:
(
ϕ0 u + ψ 0 v = 0
u0 ϕ0 + v 0 ψ 0 = gp .
This can be written in matrix form as
   0  
u v ϕ g 0
= .
u0 v 0 ψ0 p 1
This can be solved by brute force, or by computing the inverse of the 2 × 2 matrix, yielding
solutions (
ϕ0 = − gc v
ψ 0 = gc u.
Hence, we have
Z b
g(s)v(s)
ϕ(t) = ds + A
t c
Z t
g(s)u(s)
ψ(t) = ds + B.
a c
Thus, supposing that A = B = 0,
Z b Z t
g(s)v(s) g(s)u(s)
w(t) = u(t) ds + v(t) ds.
t c a c
Rb Rb
We check the boundary conditions: w(a) = u(a) a g(s)v(s) c
ds and w0 (a) = u0 (a) a g(s)v(s)
c
ds.
Then
Z b
0 0 g(s)v(s)
αw(a) + βw (a) = (αu(a) + βu (a)) ds = 0.
a c
Rb R b g(s)u(s)
Similarly, we have w(b) = v(b) a g(s)u(s)
c
ds and w 0
(b) = v 0
(b) a c
ds. Then again,
Z b
0 0 g(s)u(s)
γw(b) + δw (b) = (γv(b) + δv (b)) ds = 0,
a c
40
so our proposal solution w(t) is actually the solution to this problem. That is,
Z b Z b
g(s)
w(t) = [H(s − t)u(t)v(s) − H(t − s)u(s)v(t)] ds = k(s, t)g(s) ds
a c a

where k(s, t) = H(s−t)u(t)v(s)+H(t−s)u(s)v(t)


c
. This is a symmetric function and H(s) is the
Heaviside function (
0 s<0
H(s) =
1 s > 0.
Remark. We have (
u(t)v(s)
c
t≤s
k(s, t) = u(s)v(t)
c
t > s.
is continuous on the closed square [a, b] × [a, b].
Rb
Define T : C 0 ([a, b]) → C 0 ([a, b]) by T (g)(t) = a k(s, t)g(s) ds. Observe that T is linear
and this is well-defined for all g ∈ L2 ([a, b]). Also, we claim that T (g) is C 0 ([a, b]) even when
g ∈ L2 ([a, b]). We can check this last statement. Let ε > 0 be given. By continuity of k,
there exists δ > 0 such that |k(s, t1 ) − k(s, t2 )| < ε whenever |t1 − t2 | < δ for t1 , t2 ∈ [a, b]
and s ∈ [a, b]. Therefore,
Z b Z b
|T (g)(t1 ) − T (g)(t2 )| = (k(s, t1 ) − k(s, t2 ))g(s) ds ≤ |k(s, t1 ) − k(s, t2 )||g(s)| ds
a a
b √
Z
≤ε 1 · |g(s)| ds ≤ ε b − a kgkL2 ([a,b]) = Cε
a
for some fixed constant C whenever t1 , t2 ∈ [a, b] with |t1 − t2 | < δ. This means that
T : L2 ([a, b]) → C 0 ([a, b]).
Now, we have (by Fubini’s Theorem):
Z b Z b Z b
(h, T (g)) = T (g)(t)h(t) dt = h(t) k(s, t)g(s) ds dt
a a a
Z bZ b
= k(s, t)h(t) dt g(s) ds = (T (h), g).
a a

Therefore, (T (h), g) = (T (g), h) for all h, g ∈ C 0 ([a, b]). Next time, we’ll show that this is
also true for h, g ∈ L2 .

21. 5/16
We continue the discussion of the Sturm-Liouville eigenvalue problem 19.2.
We are assuming for the moment that zero is not an eigenvalue. We used this to get
particular solutions u, v with Lu = 0, Lv = 0 each satisfying half of the boundary conditions:
αu(a) + βu0 (a) = 0 and γv(b) + δv 0 (b) = 0, with u 6= cv and v 6= cu. This means that
p(uv 0 − vu0 ) = k 6= 0. We’ll remove this assumption later.
Recall that we can now solve the inhomogeneous problem 20.1. Our solution was
Z b
w(t) = k(s, t)g(s) ds,
a
41
where k(s, t) = k(t, s) and k is continuous on the square [a, b] × [a, b]. In fact, we have
(
v(s)u(t)
c
s≥t
k(s, t) = u(s)v(t)
c
s ≤ t.
We also showed that this is ϕ(t)u(t) + ψ(t)v(t), where
1 b 1 t
Z Z
ϕ(t) = v(s)g(s) ds and ψ(t) = u(s)g(s) ds.
k t k a
This allowed us to define an operator T : C 0 ([a, b]) → C 2 ([a, b]) and T : L2 [a, b] →
C 0 ([a, b]). Acting with g, h ∈ C 0 ([a, b]), we saw that T is self-adjoint, giving (T (g), h)L2 [a,b] =
(g, T (h))L2 [a,b] . We claim that this is also true for g, h ∈ L2 [a, b].
Proposition 21.1. T is self-adjoint as an operator from L2 to L2 .
Proof. Recall that the continuous functions are dense in L2 , so for any g, h ∈ L2 [a, b],
there exists sequences gk , hk ∈ C 0 ([a, b]) of continuous functions with kgk − gkL2 → 0 and
khk − hkL2 → 0.
According to the self-adjointness property that we showed for continuous functions, we
have (T (gk ), hk )L2 [a,b] = (gk , T (hk ))L2 [a,b] for all k. Recall from last time that we also have
Rb
T (g)(t) = a k(s, t)g(s) ds = (k(s, t), g(s))L2 [a,b] . Hence,
s
Z b Z b
|T (g)(t)| ≤ |k(s, t)||g(s)| ds ≤ k(s, t)2 ds kgkL2 [a,b] ,
a a

so therefore max[a,b] |T (g)| ≤ max |k| b − a kgkL2 [a,b] . Note that for any f ∈ L2 [a, b], we
qR
b √
have kf kL2 = a
f (s)2 ds ≤ max |f | b − a. Therefore, we’ve shown that kT (g)kL2 [a,b] ≤
max |k|(b − a) kgkL2 [a,b] .
We now claim that as k → ∞, we have (T (gk ), hk )L2 [a,b] → (T (g), h)L2 [a,b] as k → ∞. We
check this:
(T (gk ), hk )L2 [a,b] = (T (gk ) − T (g), hk ) + (T (g), h) + (T (g), hk − h)
Here, the first term is
(T (gk ) − T (g), hk ) ≤ kT (gk ) − T (g)k khk k = kT (gk − g)k khk k
≤ max |k|(b − a) kgk − gk khk k → 0
and the last term is (T (g), hk − h) ≤ kT (g)k khk − hk → 0. Therefore, we see that indeed
(T (gk ), hk ) → (T (g), h) and similarly (gk , T (hk )) → (g, T (h)).
Therefore, we see that T : L2 ([a, b]) → C 0 ([a, b]) ⊂ L2 ([a, b]) with (T (g), h)L2 [a,b] =
(g, T (h))L2 [a,b] , so hence T is self-adjoint as an operator L2 → L2 . 
In addition, we also have the following result:
Proposition 21.2. T : L2 [a, b] → L2 [a, b] is compact. In fact, it is P Hilbert-Schmidt, i.e.
there exists a complete orthonormal sequence f1 , f2 , . . . for L [a, b] with ∞
2 2
j=1 kT (fj )kL2 [a,b] <
∞.
42
R 2π
Proof. Recall that for L2 ([0, 2π]), with the inner product (f, g) = π1 0 f (x)g(x) dx, we have
already found an orthonormal sequence { √12 , cos x, sin x, cos 2x, sin 2x, . . . }. To do this, we
just need to rescale them. Define a new variable t, where x = 2π(t−a) b−a

and dx = b−a dt. This
2
R b ˜
yields the new inner product (f, g) = b−a a f (t)g̃(t) dt. Our new orthonormal sequence is
then
1 2π(t − a)n 2π(t − a)n
√ , cos , sin , n = 1, 2, . . . .
2 b−a b−a
Note that this is a complete orthonormal sequence, and all functions are C 0 [a, b].
We now have that
Z b
2 2
T (fj )(t) = k(s, t)fj (s) ds = (k(s, t), fj (s))L2 [a,b] .
b−a b−a a
Bessel’s identity then tells us that
 2 X∞ Z b
2 2 2
(T (fj )(t)) = k(s, t)2 ds
b−a j=1
b − a a

for all t ∈ [a, b]. This means that


N Z b
2 X
(T (fj )(t))2 ≤ k(s, t)2 ds,
b − a j=1 a

which implies that


N
X Z bZ b
kT (fj )kL2 [a,b] ≤ k(s, t)2 ds dt,
j=1 a a
P∞
which is a fixed constant. This shows that j=1 kT (fj )k2L2 [a,b] < ∞, so T is Hilbert-Schmidt.

We already showed that T is bounded (in fact, kT (g)kL2 [a,b] ≤ max |k| b − a kgkL2 [a,b] ), and
hence T is compact. (Note that Hilbert-Schmidt does not imply compact; boundedness is
necessary!) 
Therefore, the Spectral Theorem 17.4 applies, and hence there exists a complete orthonor-
mal sequence h1 , h2 , . . . for (ker T )⊥ with T (hj ) = λj hj , λj 6= 0, and λj → 0 as j → ∞.
We claim that ker T = 0 (this will be checked next time). Assuming this for now, we see
that h1 , h2 , . . . is a complete orthonormal sequence for L2 [a, b] and T (hj ) = λj hj with λj 6= 0
for all j. How does this relate to the original Sturm-Liouville problem 20.1?
Recall that T : L2 [a, b] → C 0 [a, b] ⊂ L2 [a, b] maps into the continuous functions, so
therefore each hj ∈ C 0 [a, b] is continuous. However, we also know that T : C 0 [a, b] → C 2 [a, b],
which means that in fact each hj ∈ C 2 [a, b]. We also assumed that this operator solves the
Sturm-Liouville problem, so that L(T (g)) = g, so hence λj L(hj ) = hj , so we’ve transformed
the problem 20.1 into 
1
L(hj ) = λj hj

αhj (a) + βh0j (a) = 0

γhj (b) + δh0j (b) = 0.

In summary, we have found a complete orthonormal sequence of functions for L2 [a, b], each of
which is in C 2 [a, b], and that solve the Sturm-Liouville problem with our boundary conditions.
43
This completes the solution of the Sturm-Liouville eigenvalue problem, except that we still
need to check that ker T = 0.
This was a highly nontrivial result, so it wasn’t surprising that we had to work so hard
for it.

22. 5/18
Recall from last time that we still need to check that ker T = {0}.
Proposition 22.1. ker T = {0}.
Proof. Recall that for g ∈ L2 [a, b], we have
Z b
T (g)(t) = k(s, t)g(s) ds = u(t)ϕg (t) + v(t)ψg (t),
a

where β = p(uv 0 − vu0 ) 6= 0 and


1 b 1 t
Z Z
ϕg (t) = v(s)g(s) ds and ψg (t) = u(s)g(s) ds.
β t β a
Suppose that g ∈ L2 [a, b]. Recall that there exists a sequence {gk } ⊂ C 0 ([a, b]) with
kgk − gkL2 → 0. Then
T (gk ) = u(t)ϕgk (t) + v(t)ψgk (t).
Note that
Z b Z b
1 1
|ϕgk (t) − ϕg (t)| = v(s)(gk (s) − g(s)) ds ≤ |v(s)||gk (s) − g(s)| ds
|β| t |β| a
max[a,b] |v| b max[a,b] |v| √
Z
≤ 1 · |gk (s) − g(s)| ds ≤ b − a kgk − gkL2 [a,b] → 0.
|β| a |β|
The same argument is true for ψgk (t), so therefore T (gk )(t) → u(t)ϕg (t)+v(t)ψg (t) uniformly.
This means that T (gk )(t) ∈ C 0 [a, b].
Last time, we also checked that T (gk )(t) → T (g)(t). Thus, T (g)(t) = u(t)ϕg (t)+v(t)ψg (t).
We knew this to be true for g ∈ C 0 ([a, b]), and now we’ve checked it for g ∈ L2 [a, b].
Also, by construction, (T (gk ))0 (t) = u0 (t)ϕgk (t) + v 0 (t)ψgk (t); we chose ϕ and ψ to make
this happen, and in fact u(t)ϕ0gk (t) + v(t)ψg0 k (t) ≡ 0. We use the same uniform convergence
as before to see that (T (gk ))0 (t) → u0 (t)ϕg (t) + v 0 (t)ψg (t) uniformly.
We integrate this identity over [a, t] to get that
Z t
T (gk )(t) = T (gk )(a) + (u0 (s)ϕgk (s) + v 0 (s)ψgk (s)) ds.
a

By uniform convergence, we can take limits under the integral, and we see that
Z t
T (g)(t) = T (g)(a) + (u0 (s)ϕg (s) + v 0 (s)ψg (s)) ds.
a

Here, as we checked earlier, the integrand is a continuous function. This means that T (g)(t)
is C 1 [a, b] and T (g)0 (t) = u0 (t)ϕg (t) + v 0 (t)ψg (t).
44
To summarize, we have that
T (g)(t) = u(t)ϕg (t) + v(t)ψg (t)
T (g)0 (t) = u0 (t)ϕg (t) + v 0 (t)ψg (t).
Therefore, if T (g) = 0 then
u(t)ϕg (t) + v(t)ψg (t) = 0
u0 (t)ϕg (t) + v 0 (t)ψg (t) = 0.
This yields
    
u(t) v(t) ϕg (t) 0
= .
u0 (t) v 0 (t) ψg (t) 0
The determinant of the matrix is the Wronskian, which we know is nonzero. Hence, ϕg (t) ≡ 0
and ψg (t) ≡ 0. Hence, we’ve derived that
Z b Z t
v(s)g(s) ds ≡ 0 and u(s)g(s) ds ≡ 0
t a

for all t. We claim that g(s) = 0. If g were continuous, we could differentiate and use the
fundamental theorem of calculus; however, g ∈ L2 [a, b] and we have to be a bit more careful.
This is the content of the following lemma:
Rt
Lemma 22.2. If f ∈ L2 [a, b] with a f (s) ds = 0 for all t then f (x) = 0 almost everywhere.
Proof. There exists a sequence of step functions ϕk with kϕk − f kL2 → 0. Then for each ϕk ,
(k) (k) (k)
there is an associated partition x1 , x2 , . . . , xNk . Then for each k, we have that
Nk
X (k)
ϕk = aj χ[x(k) ,x(k) ] .
j−1 j
j=1

Then
kϕk − f k2L2 = (ϕk − f, ϕk − f )L2 [a,b] = kϕk k2 + kf k2 − 2(ϕk , f ).
Now,
Nk (k)
Z b X Z xj
(k)
(ϕk , f ) = ϕk (s)f (s) ds = aj f (s) ds = 0
a j=1 x(k)

because
(k) (k) (k)
Z xj Z xj Z xj−1
f (s) ds = f (s) ds − f (s) ds = 0.
(k)
xj−1 a a

Thus, we see that kfk − f k2L2 → 0 and hence f = 0 almost everywhere. 

Thus, we’ve seen that v(s)g(s) = 0 and v(s)g(s) = 0 almost everywhere. We never have
u(s) = v(s) = 0 because the Wronskian is nonzero, so hence g(s) = 0 almost everywhere.
This shows that ker T = {0}. 
45
Therefore, if 0 is not an eigenvalue, we know basically everything about the solutions of
the Sturm-Liouville problem 19.2. That is, there exists a complete orthonormal sequence
h1 , h2 , . . . for L2 [a, b] with hi ∈ C 2 [a, b] and T (hj ) = λj hj where λj 6= 0 and λj → 0. Here,
−Lhj = λ1j hj and the hj satisfy the boundary conditions.
So far, we’ve assumed that 0 is not an eigenvalue of the Sturm-Liouville eigenvalue problem
19.2:

−Lu = λu on [a, b]

αu(a) + βu0 (a) = 0 for given α, β not both zero
γu(b) + δu0 (b) = 0 for given γ, δ not both zero.

Currently, if 0 is an eigenvalue, we know nothing. We want to remove this assumption, and


this takes some work.

Proposition 22.3. There exists µ0 > 0 such that no λ ≤ −µ0 is an eigenvalue. (Hence,
1
λj
→ +∞.)

Proof. Suppose that −Lu = λu for u 6= 0, and suppose that we have the boundary condition
αu(a) + βu0 (a) = 0. If β = 0, we have u(a) = 0; otherwise, β 6= 0 and u0 (a) = − αβ u(a).
Consider u2 (t). This is a continuous function on a closed interval, so it attains its minimum.
Let y ∈ [a, b] be the point where u2 (t) attains a minimum. Then
Z y
(u2 )0 = u2 (y) − u2 (a),
a

which implies that


Z y Z b Z y
0 1
2 2
u (a) = u (y) − 2 u(s)u (s) ds ≤ 2
u (s) ds − 2 u(s)u0 (s) ds.
a b−a a a

Now, integration by parts yields


Z b Z b
2 0 0 0 b
λ kuk = (−Lu, u)L2 = 2
−((pu ) u + qu ) = −pu u|a + (p(u0 )2 − qu2 )
a a
Z b
= p(a)u0 (a)u(a) − p(b)u0 (b)u(b) + (p(u0 )2 − qu2 )
a
Z b
= cu(a)2 − du(b)2 + (p(u0 )2 − qu2 )
a

for constants c and d. Thus,


Z b
2 2
− λ kuk = −cu (a) − du (b) −2
(p(u0 )2 − qu2 )
a
 Z b Z y   Z b Z b  Z b
1 0 1 0
≤ |c| 2
u −2 uu + |d| 2
u −2 |u||u | − (p(u0 )2 − qu2 ).
b−a a a b − a a y a

We are now almost done, and we’ll finish this next time. 
46
23. 5/20
We’re trying to prove that there is a fixed lower bound on all of the eigenvalues.

Finishing the proof of 22.3. Recall that we showed that if −Lu = λu (and u 6= 0) then
Z b
|c| + |d| b 2
Z Z b
2 0 2
−λ kuk < − 2
(p(u ) − qu ) + u + 2(|c| + |d|) |u||u0 |.
a b − a a a

Let δ = min[a,b] p; we know that p > 0. Then this is:


Z b  Z b Z b
0 2 |c| + |d|
≤ −δ (u ) + max |q| + 2
u + 2(|c| + |d|) |u||u0 |.
a b−a a a

We will now use a tricky little inequality |ab| ≤ 21 (a2 +b2 ); this is equivalent to (|a|−|b|)2 ≥ 0.
For any ε > 0, we can write this as

 
b 1
|ab| = ( 2εa) √ ≤ εa2 + b2 .
2ε 4ε
|c|+|d|
Plugging in |u| = b and |u0 | = a, and setting C = max |q| + b−a
, we have
b b b b
(|c| + |d|)2
Z Z Z Z
0 2 2 0 2
≤ −δ (u ) + C u +ε (u ) + u2 .
a a a ε a

Summarizing, this yields:


Z b Z b Z b Z b
(|c| + |d|)2

2 0 2 0 2
−λ u ≤ −δ (u ) + ε (u ) + C + u2
a a a ε a

for every ε > 0. Choose ε = δ to get


Z b Z b
|c| + |d| (|c| + |d|)2

2
−λ u ≤ max |q| + + u2 .
a b−a δ a
2
Let γ = max |q| + |c|+|d|
b−a
+ (|c|+|d|)
δ
be the constant in this expression. That is, if µ0 is any
constant with µ0 > γ, then any eigenvalue λ must satisfy −λ < µ0 and λ > −µ0 . 

Recall that if µ1 , µ2 , µ3 , . . . are the eigenvalues then we showed that |µj | → ∞, and now
that we’ve shown that they are all bounded below, we know that hence µj → ∞ and there
are only finitely many negative eigenvalues µj .
Furthermore, we can now get rid of that annoying assumption that 0 is not an eigenvalue.
Let L0 u = (pu0 )0 + (q − µ0 )u = Lu − µ0 u. This means that −L0 u = λu if and only if
−Lu = (λ − µ0 )u, and the boundary conditions for u are the same in each case of L and L0 .
Thus, 0 is not an eigenvalue of L0 . All of our preceding results therefore apply. This
means that there exists a complete orthonormal sequence h1 , h2 , . . . for all of L2 [a, b] with
corresponding eigenvalues µ01 , µ02 , . . . . Then −L0 hj = µ0j hj , and hence −Lhj = µj hj for
µj = µ0j − µ0 . The same result in fact holds; we have a complete orthonormal sequence of
eigenvectors, and we know that the eigenvalues satisfy µj → ∞ as j → ∞.
47
23.1. Application: Heat Flow. We now consider an application of this discussion to
partial differential equations. This application is heat flow.
Suppose you have a region of Rn . (The physically useful situations are n = 1, 2, 3.) Say
this region is made of some homogeneous and isotropic material. Isotropic means that the
material is not made of crystals, so heat has no preference for flowing in a particular direction.
Let u(x, t) be the temperature at position x and time t.
There is a very accurate model for heat flow. Here are the physical assumptions:
R
(1) The quantity of heat in any ball Bρ (y) is λ Bρ (y) u(x, t) dx where λ is some constant
dependent on the material.
(2) Heat should flow in the direction −∇x u(x, t), and the rate of flow should be pro-
portional
R to |∇u|. The rate of flow of heat across the boundary ∂Bρ (y) should be
µ ∂Bρ (y) η · ∇u where η is the unit normal of ∂Bρ (y).
Hence, we should have that
Z Z
d
λ u(x, t) dx = µ η · ∇u.
dt Bρ (y) ∂Bρ (y)

We assume that u is a C 1 function so we put the derivative under the integral sign. This
is just a model, and it is close enough to a smooth function so assuming this should not be
problematic. Then we can apply the Gauss’ Theorem to see that
Z Z Z Z
∂u
λ =µ η · ∇u = µ div(∇u) = µ ∆u.
Bρ (y) ∂t ∂Bρ (y) Bρ (y) Bρ (y)

Here, we used the fact that div(∇u) = ∆u, where ∆ is the Laplacian.
Summarizing, we see that for any ball Bρ (y) of the material, we see that
Z Z
∂u µ
= ∆u.
Bρ (y) ∂t λ Bρ (y)
This means that
∂u µ
R R
Bρ (y) ∂t λ Bρ (y)
∆u
= ,
Bρ (y) Bρ (y)
and letting ρ → 0, we have that
∂u µ
= ∆u .
∂t λ
23.1.1. Example: n = 1. Let’s now apply this in the case of n = 1. This is the case of a
metal bar, and there are various boundary conditions. We could fix u(b) ≡ 0 to keep the
ends have fixed temperature. Alternatively, we could insulate the ends to prevent heat flow,
yielding ∂u
∂x
(b) ≡ 0.
Problem 23.1. In this case, our problem has boundary conditions and initial conditions.
2
 ∂u

 ∂t
= ∂∂xu2 , t > 0, a < x < b
αu(a, t) + β ∂u (a, t) = 0, t > 0

∂x
∂u

 γu(b, t) + δ ∂x
(b, t) = 0, t > 0

u(x, 0) = ϕ(x), a ≤ x ≤ b.

48
µ
Remark. There is no loss of generality to drop the constant λ
simply by rescaling via
t 7→ µλ t.

First, we look for separated variable solutions, i.e. u(x, t) = f (x)g(t). This won’t solve
the problem completely, but it’s a good start.
We have
∂u
= f (x)g 0 (t)
∂t
∂ 2u
2
= f 00 (x)g(t),
∂x
0 00
so our PDE becomes f (x)g (t) = f (x)g(t). This is equivalent to
g 0 (t) f 00 (x)
=
g(t) f (x)
at points where g(t) 6= 0 and f (x) 6= 0. How could that be? One side is a function of t
and the other is a function of x, so they can be equal only when both are constant. We can
therefore get a family of many solutions as follows: Take λ ∈ R, and solve
(
g 0 (t) = −λg(t) t≥0
00
f (x) = −λf (x) a ≤ x ≤ b.

We can solve the first equation easily; this is just g(x) = Ce−λt for some constant C. We
know how to solve second order equations, so we can also do the second equation. We’re
interested in taking u = f (t)g(x) while satisfying as many of our conditions as possible, so
we’ll make u also satisfy the boundary conditions. Observe that the boundary conditions for
u translate directly into boundary conditions of f , so we want to solve

00
−f = λf

αf (a) + βf 0 (a) = 0
γf (b) + δf 0 (b) = 0.

This is a Sturm-Liouville problem that we know how to solve. There exists a complete
orthonormal sequence of eigenfunctions h1 , h2 , . . . with corresponding λ1 , λ2 , . . . with λj →
∞.
That is, we have solutions cj hj (x)e−λj t for j = 1, 2, . . . . These satisfy the partial differential
equation and the boundary conditions. Now we have to try to satisfy the initial conditions.
The eigenfunctions form a complete orthonormal sequence, so ϕ ∈ L2 [a, b] implies that we
can pick cj = (ϕ, hj ) to get that

X
cj hj (x)e−λj t = ϕ
j=1

at t = 0 in the L2 norm.
So it looks like we’re done! There’s still something left to check; we need to make sure
that this series remains smooth and still satisfies the PDE. There’s still some checking and
a little way to go, but that’s the general idea.
49
24. 5/23
Recall that we were consider
P∞the heat equation problem 23.1.
We showed that u(x, t) = j=1 cj e−λj t hj (x), where λj is the j-th eigenvalue of a Sturm-
Liouville problem, hj is the j-th eigenfunction from Sturm-Liouville, and cj = (ϕ, hj )L2 [a,b]
is a candidate solution.
By construction, each partial sum is a solution of the heat equation
P satisfying boundary
conditions. Consider the initial conditions; at t = 0, we have cj hj = ϕ. There are a
number of things to check: does this sum converge? Is it continuous?
To check these properties, we need to use the Weierstrass M -test.

Proposition 24.1 (Weierstrass M -test). Suppose we have a sequence of given real-valued


functions fn : X → R, and P suppose we have numbers Mn ≥ 0 such that supX |fn | ≤ Mn and
Mn converges. Then ∞
P
n=1 fn (x) converges uniformly on X.
P
the comparison test guarantees convergence: |fn (x)| ≤ Mn and Mn converges
Proof. First, P
implies that fn (x) is absolutely convergent.
P∞
Let SN (x) = N
P
n=1 fn (x) and S(x) = Pn=1 fn (x).
Let ε > 0 be given. Pick J such that ∞ n=J+1 Mn ≤ ε. Then take any N > J. Then


X ∞
X ∞
X
|SN (x) − S(x)| = fn (x) ≤ |fn (x)| ≤ Mn < ε
n=N +1 n=N +1 n=N +1

for every x. This means that supx∈X |SN (x) − S(x)| < ε for every N > J, which gives us
uniform convergence. 

We only have finite time, so let’s only do this under the simplest boundary conditions.
Consider the special case of the interval [0, π] with boundary conditions u(a, t) = 0 and
u(b, t) = 0. In the notation of problem 23.1, this corresponds to (α, β) = (γ, δ) = (1, 0).
In this case, we know the eigenvalues of the Sturm-Liouville problem. This is λj = j 2 for
j = 1, 2, . . . and hj (x) = sin jx. Here, we use the L2 inner product with the appropriate
π
scaling to make hj have norm one: (f, g) = π2 0 f (x)g(x) dx. Then we have
R


2
X
u(x, t) = cj e−j t sin jx
j=1

where
Z π
2
cj = ϕ(x) sin jx dx.
π 0

Proposition 24.2. This u(x, t) converges uniformly.

Proof. We will check that this converges using the Weierstrass M -test. To do this, we will
also assume that f ∈ C 2 ([0, π]) satisfying the boundary conditions, so that ϕ(0) = ϕ(π) = 0.
Note that here, u(0, t) = u(π, t) = 0.
50
We can integrate by parts twice, observing that all boundary terms are zero, to get that
π
2 π
Z Z π
2 2
cj = ϕ(x) sin jx dx = − ϕ(x) cos jx + ϕ0 (x) cos jx dx
π 0 jπ 0 jπ 0
π Z π Z π
2 0 2 00 2
= 2 ϕ (x) sin jx − 2 ϕ (x) sin jx dx = − 2 ϕ00 (x) sin jx dx,
j π 0 j π 0 j π 0

and hence
2 max[0,π] |ϕ00 |
|cj | ≤
j2
P∞ 2
for j = 1, 2, . . . . We claim that j=1 cj e−j t sin jx converges uniformly on [0, π] × [0, ∞).
We can now check this:
2 max[0,π] |ϕ00 |
cj e−j t sin jx ≤ 2 ,
j2
P 1
which is the jth term of a convergent sequence j2
. This is what we call Mj in the
Weierstrass M -test. The Weierstrass M -test therefore shows uniform convergence. 
The uniform limit of continuous functions is continuous, which shows the following corol-
lary:
P∞ −j 2 t
Corollary 24.3. j=1 cj e sin jx is continuous in [0, π] × [0, ∞). Hence the boundary
conditions and initial conditions are satisfied.
Now, we just need to show that u(x, t) satisfies the heat equation. To do this, we need to
be able to differentiate; this will come from the following general result:
Theorem 24.4 (Differentiation of series). P Suppose we have a sequence of C 1 functions
fn : [c, d] → RPfor n = 1, 2, . . . , suppose that ∞ all x ∈ [c, d], and
n=1 fn (x) is convergent for P
suppose that ∞ f 0
n=1 nP (x) is uniformly convergent on [c, d]. Then f (x) = ∞
n=1 fn (x) is C
1
0 ∞ 0
on [c, d] and f (x) = n=1 fn (x).
Remark. We have to be careful, and there are counterexamples where we cannot differen-
tiate inside an infinite
P 0 series.
To check that fn (x) is uniformly convergent, the Weierstrass M -test gives a sufficient
condition: |fn0 (x)| ≤ Mn where
P
Mn is convergent.
Proof. We can differentiate a finite sum without any problems, so that
N
! N
d X X
fn (x) = fn0 (x),
dx n=1 n=1

which implies that


Z N
xX N
X N
X
fn0 (t) dt = fn (x) − fn (c).
c n=1 n=1 n=1
P 0
Since fn (x) is converging uniformly, we can take the limit as N → ∞ under the integral
sign. This gives
Z xX ∞ ∞
X ∞
X ∞
X
0
fn (t) dt = (fn (x) − fn (c)) = fn (x) − fn (c).
c n=1 n=1 n=1 n=1
51
Note that ∞ 0
P
n=1 fn (x) is a continuous function on [c, d] because it is the uniform limit of
continuous functions. Then
Z xX ∞ ∞
X ∞
X
0
fn (t) dt = fn (x) − fn (c).
c n=1 n=1 n=1
P∞ d
P∞
By the fundamental theorem of calculus, n=1 fn (x) is C 1 [c, d] and dx
( n=1 fn (x)) =
P ∞ 0
n=1 fn (x), which is what we wanted. 
We can now use the theorem to see our next claim:
Proposition 24.5. u(x, t) = ∞ −j 2 t
sin jx is C 2 ((0, π) × (0, ∞)) and its partial deriva-
P
j=1 cj e
2
tives can be computed termwise, and hence the equation ∂u ∂t
= ∂∂xu2 does hold.
Proof. We will check that we can apply our general differentiation theorem. That is, we need
to see that the derivative is uniformly convergent.
Consider any fixed x ∈ (0, 1) and suppose that t ∈ [c, d] for some 0 < c < d < ∞. Then
the termwise differentiated series with respect to t

2
X
cj (−j 2 )e−j t sin jx
j=1

is clearly uniformly convergent on the interval [c, d] by the Weierstrass M -test. Indeed,
2 2c
|cj j 2 e−j t sin jx| ≤ 2 max |ϕ00 |e−j ≤ 2 max |ϕ00 |e−jc ,
[0,π] [0,π]

which is again the jth term of a convergent series. Therefore the termwise differentiation
theorem 24.4 holds and tells us that

∂u X 2
=− j 2 cj e−j t sin jx.
∂t j=1

Similarly, in an almost identical argument, if we take some 0 < c < d < π and some fixed
t > 0, we see that the termwise differentiated series with respect to x

2
X
cj je−j t cos jx
j=1
2
is uniformly convergent by the Weierstrass M -test; indeed, |cj je−j t cos jt| ≤ 2 max |ϕ00 |e−jt
is the jth term of a convergent series. Also, differentiating a second time, we see that

2
X
− j 2 cj e−j t sin jx
j=1

is uniformly convergent on [c, d]. This shows that



∂ 2u X 2 ∂u
2
=− cj j 2 e−j t sin jx = .
∂x j=1
∂t
This concludes the argument, and we can conclude that u(x, t) is indeed a solution to the
heat equation 23.1. Note that if we had continued in this way, we could have shown that
u(x, t) is actually C ∞ in the interior. 
52
25. 5/25
Let’s briefly revisit the spectrum. Suppose that X is a Banach space and A ∈ L(X, X).
Recall that the spectrum of A is σ(A) = {z ∈ C : A − zI is not invertible}. Recall that
σ(A) is a closed subset of the closed disc {z ∈ C : |z| ≤ kAk}. We also showed that the set
of eigenvalues is trivially contained in σ(A), though the set of eigenvalues might be empty
(e.g. in the case of the shift operator).
There was one thing that neglected to address: Is the spectrum nonempty?
Theorem 25.1. The spectrum σ(A) is always nonempty.
We need two preliminaries before we can prove this claim.
Definition
P 25.2. A function h(z) is holomorphic if for every z0 with |z0 | ≤ R, we can write
h(z) = ∞ a
j=0 j (z − z0 )j
for sufficiently small |z − z0 |.
Theorem 25.3 (Maximum modulus principle). If h(z) is holomorphic on {z : |z| ≤ R} then
max|z|≤R |h(z)| = max|z|=R |h(z)|.
We will also need another very important result, which is one of the basic results of
functional analysis.
Theorem 25.4 (Hahn-Banach theorem). Let X be any normed linear space. Then there
exists a nontrivial bounded linear functional on X.
Example 25.5. For example, if X is an inner product space, f (x) = (x, y) for some fixed
y ∈ X \ {0} would be a bounded linear functional. This is less obvious in normed spaces.
We will assume these two preliminaries for now, and defer their proofs to later. First, we
will prove that the spectrum is nonempty.
Proof of theorem 25.1. Suppose to the contrary that σ(A) = ∅, i.e. A − zI is invertible for
every z ∈ C. In particular, if z0 ∈ C then we can write
A − zI = (A − z0 I) − (z − z0 )I = (A − z0 I)(I − (z − z0 )(A − z0 I)−1 )
Let B = (A − z0 I)−1 . Recall that by theorem 14.9, I − (z − z0 )(A − z0 I)−1 is invertible if
|z − z0 | k(A − z0 I)−1 k < 1, i.e. if
1 1
|z − z0 | < −1
= ,
k(A − z0 I) k kBk
and in this case, we have

! ∞
X X
−1 j j
(A − zI) = (z − z0 ) B B = (z − z0 )j B j+1 ,
j=1 j=0
so ∞
X
−1
(A − zI) (y) = (z − z0 )j B j+1 (y)
j=0
1
for any y ∈ X as long as |z − z0 | < kBk
.
By linearity, we get that for any nontrivial linear
functional (which exists by the Hahn-Banach theorem 25.4),
X∞
f ((A − zI)−1 (y)) = (z − z0 )j f (B j+1 (y))
j=0
53
is in C, so then h(z) = f ((a − zI)−1 (y)) is holomorphic for |z − z0 | < kBk
1
. Since we picked
z0 arbitrarily, we’ve shown that h(z) is holomorphic on the entire plane C.
Suppose that |z| > 2 kAk. Then we can write A − zI = −z(I − z1 A), where z1 A < 12 < 1.
Applying theorem 14.9 again, we see that

−1 1X 1 j
(A − zI) =− A.
z j=0 z j

This time, we see that



! ∞  
−1 1X 1 j 1X 1 j
h(z) = f ((A − zI) y) = f − Ay =− f Ay
z j=0 z j z j=0 zj

for |z| > 2 kAk. Therefore,


∞   ∞ ∞  j
1 X 1 j 1 X 1 j kf k X kAk 2 kf k kyk
|h(z)| ≤ f A y ≤ kf k A y ≤ kyk ≤ .
|z| j=0 zj |z| j=0 |z|j |z| j=0 |z| |z|

Therefore, we see that max|z|=R |h(z)| → 0 as R → ∞. By the maximum modulus principle


25.3, this shows that max|z|=R |h(z)| → 0 as R → ∞, which shows that h ≡ 0.
We have now shown that 0 ≡ h(z) = f ((A − zI)−1 (y)) for every y. Since the inverse maps
the Banach space onto itself, this means that f ≡ 0, which is a contradiction because we
chose f to be a nontrivial linear functional. 

Now, we need to prove our preliminaries. First, we’ll prove the maximum principle for
subharmonic functions.
Proposition 25.6. Let u ∈ C 2 (BR ) ∩ C 0 (B R ), where B R = {x ∈ Rn : kxk < R}, and
suppose that ∆u ≥ 0 on BR . This means that the function is twice differentiable on the open
ball and is continuous on the closed ball. Then maxB R u = max∂BR u.
∂2
Here, ∆u = nj=1 ∂x
P
2 u is the Laplacian.
j

Proof. Let ε > 0, and let v(x) = u(x) + ε kxk2 . Here, ∆v = ∆u + 2nε > 0 in BR .
First, suppose there some y ∈ BR such that v(y) = maxBR v. Then
∂v ∂v 2
(y) = 0, 2 (y) ≤ 0,
∂xj ∂xj
which implies that ∆v(y) ≤ 0; this is a contradiction to the statement that ∆v > 0. We
defined v in order to have strictly positive Laplacian precisely to make this work. Hence
max u ≤ max v = max v ≤ max u + εR2
BR BR ∂BR ∂BR

for all ε > 0, so therefore maxB R u ≤ max∂BR u. The reverse inequality is trivial, so we’ve
proven the maximum principle. 

We are now ready to prove theorem 25.3.


54
Proof of theorem 25.3. Suppose that h(z) is holomorphic on DR = {z ∈ C : |z| ≤ R}. We
can writeph(z) = h(x + iy) = u(x, y) + iv(x, y) where u and v are real-valued functions on
{(x, y) : x2 + y 2 ≤ R}. Then recall the Cauchy-Riemann equations:
(
∂u ∂v
∂x
= ∂y
∂u ∂v
∂y
= − ∂x .
Differentiating the first with respect to x and the second with respect to y yields
∂ 2u ∂v 2 ∂ 2v
= =
∂x2 ∂x∂y ∂y∂x
2
∂ u ∂v 2
= − ,
∂y 2 ∂y∂x
so hence
∂ 2u ∂ 2u
∆u = + = 0,
∂x2 ∂y 2
Similarly, ∆v = 0 as well; this means that u and v are harmonic. Now,
|h(z)|2 = u2 + v 2 ,
so hence
   
2 ∂ ∂u ∂v ∂ ∂u ∂v
∆|h(z)| = 2u + 2v + 2u + 2v
∂x ∂x ∂x ∂y ∂y ∂y
 2  2  2  2 !
∂u ∂v ∂u ∂v
=2 + + + ≥ 0,
∂x ∂x ∂y ∂y
and therefore the maximum principle 25.6 holds, and we are done. 

26. 5/27
Today we will prove the Hahn-Banach theorem. To reiterate, here is the statement of the
theorem:
Theorem 26.1 (Hahn-Banach theorem). Let X be any real vector space, and suppose that
p : X → R such that p is positively homogeneous (i.e. p(tx) = tp(x) for all t > 0 and all
x ∈ X) and subadditive (i.e. p(x + y) ≤ p(x) + p(y) for all x, y ∈ X). Let S0 be any subspace
of X and suppose that f0 : S0 → R is linear with f0 (x) ≤ p(x) for all x ∈ S0 .
Then there exists an extension f : X → R that is linear with f |S0 = f0 and f (x) ≤ p(x)
for all x ∈ X.
As we discussed last time, we are interested in a corollary:
Corollary 26.2. Let X be any normed space, and take p(x) = kxk. Consider some arbitrary
y ∈ X \ {0} and S0 = span{y}. Then f0 (ty) = t kyk is linear on S0 . The Hahn-Banach
theorem 26.1 then implies that there exists f : X → R with f |S0 = f0 and f (x) ≤ kxk for
every x ∈ X. That implies that f is a nontrivial bounded linear functional.
We are now ready to attack the proof of the Hahn-Banach theorem.
55
Proof of Hahn-Banach theorem 26.1. Suppose that S0 is a subspace of X which contains S0
such that there exists an f1 : S1 → R which is linear and f1 |S0 = f0 and f1 (x) ≤ p(x) for
every x ∈ S1 .
If S1 6= X, then we can pick a vector a ∈ X \ S1 , and we can define a new subspace
S2 = {x + ta : x ∈ S1 , t ∈ R}. We can find an extension f2 : S2 → R defined by f2 (x + ta) =
f1 (x) + tλ where λ = f2 (a) and x ∈ S1 . This is clearly linear, and it is an extension of f1
because f2 |S1 = f1 because this is the case where t = 0.
Here is the main step of the proof:
Claim. There exists a choice of λ such that f2 (x) ≤ p(x) for all x ∈ S2 .
Proof. In the case t = +1, we have f2 (x + a) = f1 (x) + λ for all x ∈ S1 , and in the case
t = −1, we have f2 (y − a) = f1 (y) − λ for all y ∈ S1 . Adding, we have f1 (x) + f1 (y) =
f1 (x + y) ≤ p(x + y) for all x, y ∈ S1 .
We use subadditivity in a somewhat tricky way to see that
f1 (x) + f1 (y) ≤ p(x + y) = p((x + a) + (y − a)) ≤ p(x + a) + p(y − a),
which means that
f1 (y) − p(y − a) ≤ p(x + a) − f1 (x)
for all x, y ∈ S1 . By taking a fixed x, we see that
sup (f1 (y) − p(y − a)) ≤ p(x + a) − f1 (x)
y∈S1

for all x, which in turn implies that


sup (f1 (y) − p(y − a)) ≤ inf (p(x + a) − f1 (x)).
y∈S1 x∈S1

That’s great, because there’s a number that fits in between these two (possibly equal to
both). Choose λ to be this number. That is, there exists λ ∈ R with f1 (y) − p(y − a) ≤ λ ≤
p(x + a) − f1 (x) for every x, y ∈ S1 . These gives us two inequalities:
f1 (x) + λ ≤ p(x + a) for every x ∈ S1
f1 (y) − λ ≤ p(y − a) for every y ∈ S1 .
If t > 0, the first inequality gives
 x   x 
f2 (x + ta) = f1 (x) + tλ = t f1 +λ ≤t p + a = p(x + ta).
t t
Similarly, if t < 0, the second inequality gives
      
x x
f2 (x + ta) = f1 (x) + tλ = |t| f1 − λ ≤ |t| p −a = p(x + ta).
|t| |t|
That was the clever part of the proof. 
It’s starting to look like the theorem is true. If we can extend it at all, we’ve shown that
we can extend it a bit more. It’s tempting that say that we’re done, but that’s not true
because we are working with infinite dimensional spaces. We need to do something a bit
more sophisticated.
Let S be the set of ordered pairs (S, fS ) such that S is a subspace of X which contains
S0 and fS : S → R is linear with fS (x) ≤ p(x) for all x ∈ S, and fS |S0 = f0 . Here, we’ve
collected all extensions of f0 . We can define a partial ordering on S.
56
Definition 26.3. Recall that a partial ordering or any set Q means that x  x for all x ∈ Q,
and that x  y and y  x implies that x = y, and also that x  y and y  z implies that
x  z. This differs total ordering because two elements of Q do not have to be related in a
partial order.
Q is totally ordered (or a “chain”) if x, y ∈ Q implies that either x  y or y  x.
Example 26.4. The real numbers are totally ordered (and hence partially ordered). The
inclusion of sets is an example of a partial order.
We can define our partial ordering on S by (S, fS )  (T, fT ) means that S ⊂ T and
fT |S = fS . Suppose that T ⊂ S is a chain. Then let T = ∪(T,fT )∈T T .
Claim. We claim that this is a subspace.
Normally, this is nonsense; the union of two subspaces is not in general a subspace. How-
ever, we do have a total ordering, and we will use this to check the claim.
Proof. Take any x, y ∈ T and α, β ∈ R. Then x ∈ T1 , y ∈ T2 for some (T1 , fT1 ), (T2 , fT2 ) ∈ T .
Since T is totally ordered, we know that either T1 ⊂ T2 or T2 ⊂ T1 . Assume without loss
of generality that T1 ⊂ T2 . Then x, y ∈ T2 , so therefore αx + βy ∈ T2 as well, and we are
done. 
We can now define a function f : T → R. For x ∈ T , define f (x) = fT (x) for any T with
(T, fT ) ∈ T and x ∈ T . This is unambiguous due to our total ordering.
Note that f : T → R is linear. To see this, consider α, β ∈ R and x, y ∈ T as above. Then
we know that x, y ∈ T2 for some T2 , so that f (αx+βy) = fT2 (αx+βy) = αfT2 (x)+βfT2 (y) =
αf (x) + βf (x). Clearly, we also have f (x) ≤ p(x). In particular, this shows that (T , f ) ∈ S.
By construction, (T, fT )  (T , f ), i.e. (T , f ) is an upper bound for T relative to this partial
order.
These are all of the hypothesis that we need for Zorn’s Lemma:
Lemma 26.5 (Zorn’s Lemma). If S is any partially ordered set such that every chain T has
an upper bound in S, then S has at least one maximal element.
That is, there exists (S, fS ) ∈ S such that if (S, fS )  (T, fT ) for some (T, fT ) ∈ S then
(S, fS ) = (T, fT ).
We now claim that this maximal element is now the whole space: X = S. Indeed, if
S 6= X then we can find S1 ) S and an extension f of fS , which contradicts maximality of
(S, fS ). This proves the Hahn-Banach theorem. 

27. 6/1
It is impossible to review everything, so we’ll give a brief and sketchy overview of most
(but not all) of the main topics.
We had inner product spaces
p and normed spaces, and we usually denote inner products
as (x, y). We checked that (x, x) is a norm, called the inner product norm. In particular,
this means that inner product spaces are contained in normed spaces. To do this, we needed
Cauchy-Schwarz : |(x, y)| ≤ kxk kyk, and we had the triangle inequality: kx + yk ≤ kxk +
kyk.
We had some special results about finite-dimensional normed spaces.
57
(1) We showed that in such spaces, all norms are equivalent. This means that if k·k1 and k·k2
are any two norms then there exists a constant C such that C −1 kxk1 ≤ kxk2 ≤ C kxk1
for every x ∈ X.
(2) Any closed bounded subset is compact. In particular, the closed unit ball is compact.
Of course, these fail miserably in infinite dimensional spaces. In fact, in infinite dimensional
space, the closed unit ball is never compact. The proof was that we showed that there exists
a sequence e1 , e2 , . . . with kej k = 1 for every j such that kei − ej k ≥ 1 for every i 6= j, which
violates sequential compactness. This provided a major contrast between finite and infinite
dimensional space.
We then talked about complete spaces. A complete inner product space is called a Hilbert
space and a complete normed space is called a Banach space.
Example 27.1. For example, Rn , Cn are standard examples of real and complex finite-
dimensional Hilbert spaces, with the inner products (x, y) = x · y and (z, w) = z · w. Other
Hilbert spaces included
R `2R and `2C , and most importantly, L2 [a, b] and L2C [a, b] with inner
products (f, g) = [a,b] f g.
In a Hilbert space, we could discuss the parallelogram identity with the inner product
norm:
kx − yk2 + kx + yk2 = 2(kxk2 + kyk2 ).
Using this, we proved the nearest point property. That is, if A is a closed convex subset of
H and x ∈ H then there exists a unique a ∈ A with kx − ak < kx − yk for all y ∈ A \ {a}.
A special case of this is when A is a closed linear subspace, in which case the nearest point
a ∈ A has the additional property that (x − a) ⊥ A.
This brings us to orthogonality. If E ⊂ H is any nonempty subset, then the orthogonal
complement of E is E ⊥ = {x ∈ H : (x, e) = 0 for all e ∈ E}. We checked that E ⊥ is a closed
linear subspace. We also had a theorem: If M is any closed linear subspace then
(1) M ⊥ ∩ M = {0}
(2) x ∈ H implies that x = y + z, y ∈ M and z ∈ M ⊥
(3) (M ⊥ )⊥ = M
The proof of (2) uses the nearest point property, and the proof of (3) uses (2) to show one
of two inclusions.
We then discussed orthonormal sequences in H. Suppose that e1 , . . . , eN is a finite or-
thonormal sequence. We had a basic identity from which many results followed:
N 2 N N
X X X
x− λj ej = kxk2 + |ci − λi |2 − |ci |2 .
j=1 j=1 i=1
PN
In particular, the nearest point of span {e1 , . . . , eN } to x is j=1 cj ej . In this case, our basic
identity reduces to
N 2 N
X X
x− cj ej = kxk2 − |ci |2 ,
j=1 i=1
which is Bessel’s identity.
We can also take an infinitely orthonormal sequence e1 , e2 , . . . . This allows several con-
clusions:
58
P∞
(1) j=1 cj ej
always converges. We proved this by checking that the partial sums formed
a Cauchy sequence. That is, there exists y such that
N
X
y− cj e j → 0
j=1

P N → ∞.
as
(2) ∞ i=1P|ci |2 ≤ kxk2 . This is Bessel’s inequality.
(3) x = ∞ j=1 cj ej if and only if equality holds in Bessel’s inequality. In this case, this is
called Bessel’s identity.
This
P leads us to our next definition. The orthonormal sequence e1 , e2 , . . . is complete if
x= ∞ j=1 cj ej for every x. We showed that the following are equivalent:

(1) e1 , e2 , . . . is complete
(2) Equality holds in Bessel’s inequality for every x ∈ H
(3) No x 6= 0 satisfies (x, ej ) = 0 for all j
(4) span {e1 , e2 , . . . } is dense in H.
Our key example of a complete orthonormal sequence is in the case H = L2C [−π, π]. We
showed thatR {einx }n=0,±1,±2,... is a complete orthonormal sequence with the inner product
1 π
(f, g) = 2π −π
f (t)g(t) dt. This was a long story. The proof involved the Fejér kernel, and
it’s hard to overestimate the importance of this.
We digressed a little bit because we didn’t fully understand L2C . To understand this, we
discussed the Lebesgue integral. We discussed:
(1) Definition of measure zero.
(2) Step functions.
(3) Main technical theorem.
Rb
(a) If ϕk is an increasing sequence of step functions and a ϕk is bounded then we have
{ϕk (x)}k=1,2,... is bounded for almost every x ∈ [a, b].
(b) If ψk is an increasing sequence and lim ϕk = lim ψk almost everywhere, then we have
Rb Rb
lim a ϕkR= lim a ψk .
(4) Definition of [a,b] f for f ∈ L0
(5) L1 = {g − h : g, h ∈ L0 } is a linear space R
(6) Properties of the integral. For example, if fk ≥ 0 and [a,b] fk → 0 then there exists a
subsequence fkj with fkj (x) → 0 for almost every x ∈ [a, b].
(7) L2 = {f : f ∈ L1 , f 2 ∈ L1 } is a linear space
(8) Completeness.
We then discussed linear operators. Here, X is a normed space, and we let X ∗ be the
set of bounded linear functionals, known as the dual space. This is a Banach space (even
if X is not complete). Furthermore, if X and Y are both normed spaces then we defined
L(X, Y ) = {bounded linear operators X → Y }. This had an operator norm
kT (x)k
kT k = sup = sup kT (x)k .
x6=0 kxk kxk=1

We showed that if Y is complete then L(X, Y ) is Banach.


59
We discussed adjoint operators. If H and K are Hilbert spaces and if T ∈ L(H, K) then
there exists T ∗ ∈ L(K, H) such that (T (x), y) = (x, T ∗ (y)) for all x ∈ H and y ∈ K. This
needed the Riesz representation theorem. That is, in a Hilbert space H, any f ∈ H ∗ can be
written as f (x) = (x, z) for some fixed z ∈ H.
Next, we discussed compact operators. If X and Y are Banach spaces, T ∈ L(X, Y ) means
that T ({x : kxk ≤ 1}) is contained in a compact subset of Y . We proved that if {Tk } is a
sequence of compact operators and Tk → T in the operator norm then T is also compact.
The proof of this required a diagonal process.
If X is a Hilbert space, we checked that any Hilbert-Schmidt operator is automatically
compact, and we also gave an example of an operator that was compact but not Hilbert-
Schmidt. Here, H = `2R and we had the operator T ∈ L(`2R , `2R ) with T (x) = (x1 , √x22 , √x33 , . . . ).
We also had various theorems about the spectrum. If X is a Banach space and T ∈ L(X, X)
then we showed that σ(T ) is a closed subset of the closed disc in the complex plane of radius
kT k. One of the last things that we did was to show that the spectrum is nonempty. One of
the main tools for discussing the spectrum was that if T ∈ L(X, Y ) with X a Banach space
then kT k < 1 implies that I − T is invertible.
The last part of the course was the spectral theorem, and the main application of that
theory was the Sturm-Liouville theory.
E-mail address: [email protected]

60

You might also like