175 Main
175 Main
MOOR XU
NOTES FROM A COURSE BY LEON SIMON
Abstract. These notes were taken during Math 175 (Functional Analysis) taught by Leon
Simon in Spring 2011 at Stanford University. They were live-TEXed during lectures in vim
and compiled using latexmk. Each lecture gets its own section. The notes are not edited
afterward, so there may be typos; please email corrections to [email protected].
1. 3/28
You’ve probably got a vague notion of what functional analysis is about. It is the study
of continuous linear operators on infinite dimensional spaces. This is interesting and has
applications.
with the same operations. We should also check that they are vector spaces; this is easy to
do.
Example 1.9. Define C(R) = {continuous functions R → R} with the natural operations
(f + g)(x) = f (x) + g(x) and (λf )(x) = λf (x). This is a real vector space. Similarly, we can
define C(C) = {continuous functions C → C} with the same operations; this is a complex
vector space.
Notice that the set of real polynomials {p(x) = a0 + a1 x + · · · + an xn } is a subspace of
C(R).
1.2. Inner product spaces.
Definition 1.10. A complex inner product space is a complex vector space with an inner
product, denoted (u, v) ∈ C. The inner product is a map (·, ·) : X × X → C with the
properties
(1) (u, v) = (v, u)
(2) (λu + µv, w) = λ(u, w) + µ(v, w) (linear in the first component)
(3) (u, u) is real and positive for u 6= 0.
A real inner product is defined analogously, with C replaced by R.
Remark. Note that we have (λu, v) = λ(u, v), but (u, λv) = λ(u, v). Also, check that
(u, v + w) = (u, v) + (u, w).
Example 1.11. X = Rn with (real) inner product (x, y) = x · y defined as the dot product.
Example 1.12. X = Cn with (complex) inner product (z, w) = nj=1 zj wj .
P
2
Example 1.13. For X = `2 (R), we can define the inner product as (x, y) = ∞
P
xy.
2
P∞ j=1 j j
Similarly, for X = ` (C), we can define the inner product as (v, w) = j=1 zj wj .
We should check that these series converge absolutely. This is an easy exercise.
Definition 1.14. Define the inner product norm or length of u to be
p
kuk = (u, u).
We can now derive some properties. We begin with a basic identity.
Proposition 1.15. (u + v, u + v) = (u, u) + (u, v) + (v, u) + (v, v). Therefore, ku + vk2 =
kuk2 + (u, v) + (u, v) + kvk2 , so hence
ku + vk2 = kuk2 + 2 Re(u, v) + kvk2 .
Similarly, we have
ku − vk2 = kuk2 − 2 Re(u, v) + kvk2 .
Adding these gives the “parallelogram identity”
ku + vk2 + ku − vk2 = 2(kuk2 + kvk2 ).
2. 3/30
P∞ P∞ P∞
Proposition 2.1. If j=1 x2j and j=1 yj2 are convergent, then j=1 xj yj is absolutely con-
vergent.
Proof. Recall the Cauchy-Schwarz inequality says that |xj yj | ≤ 21 (x2j + yj2 ). Then
N N
X 1X 2
|xj yj | ≤ (x + yj2 ) ≤ C.
j=1
2 j=1 j
2.1. Inner product spaces. Recallpthat we were talking about inner product spaces. We
have the inner product norm kuk = (u, u). This has a number of properties.
(1) ku + vk2 = kuk2 + kvk2 + 2 Re(u, v).
(2) kλuk = |λ| kuk.
p q
Proof. kλuk = (λu, λu) = λλ(u, u) = |λ| kuk.
(3) |(u, v)| ≤ kuk kvk for all u, v ∈ X. This is the Cauchy-Schwarz inequality.
Proof. Exercise.
3. 4/1
3.1. Finite dimensional spaces. Recall that we are working with a finite dimensional
normed linear space X with norm k·k. Last time, we showed that for any given basis
e1 , . . . , en , we proved that there exist M, m > 0 with m kxkRn ≤ kxk ≤ M kxkRn for all
x ∈ X.
Proposition 3.1. Note that this guarantees that all norms are equivalent in a finite dimen-
sional space. This means that if k·k1 and k·k2 are two norms in a finite dimensional space
X, then there exists a constant C with C −1 kxk2 ≤ kxk1 ≤ C kxk2 .
Proof. We check this fact. We have
m1 kxkRn ≤ kxk1 ≤ M1 kxkRn
m2 kxkRn ≤ kxk2 ≤ M2 kxkRn
5
This shows that
m1 M1
kxk2 ≤ m1 kxkRn ≤ kxk1 ≤ M1 kxkRn ≤ kxk2 ,
M2 m2
which proves our statement.
Proposition 3.2. All norms in a finite dimensional space gives the same open sets.
Proof. Define
Bρk·k1 (y) = {x ∈ X : kx − yk1 < ρ} .
M2 k·k k·k
Note that kx − yk1 < ρ implies that kx − yk2 < m1
ρ. Therefore, Bρ 1 (y) ⊆ B M22 (y). The
m1
ρ
opposite fact can be shown similarly.
Proposition 3.3. In a finite dimensional normed space X, the closed unit ball is compact.
Proof. Let {xk }k=1,... be a sequence in B1 (0) = {x ∈ X : kxk ≤ 1}.
We have that kxk knRn ≤o m1 · 1. Therefore, in Rn , this sequence is bounded. Hence, there
exists a subsequence xkj which converges to y ∈ Rn . In X, there is a corresponding point
y = nj=1 yj ej . Then xkj − y ≤ M xkj − y
P
→ 0. This proves compactness.
Rn
Lemma 3.4. If X is any infinite dimensional normed linear space, the closed unit ball is
not compact.
Proof. Take some e1 ∈ X, say ke1 k = 1.
Take e2 ∈ X \ span {e1 }.
Take e3 ∈ X \ span {e1 , e2 }.
Inductively, take en ∈ X \ span {e1 , . . . , en−1 }. These must exist because otherwise the
space would be finite dimensional.
Homework 1, problem 8 said that there exists wn ∈ span {e1 , . . . , en−1 } with 0 < λn =
ken − wn k = min {ken − yk : y ∈ span {e1 , . . . , en }}.
Define
en − wn e n − wn
ẽn = = .
ken − wn k λn
For n > l, we have
en − wn
kẽn − ẽl k = − ẽl .
λn
We claim that kẽn − ẽl k ≥ 1. Otherwise,
en − wn
− ẽl < 1,
λn
which means that
ken − (wn + λn ẽl )k < λn .
Note that wn + λn ẽl ∈ span {e1 , . . . , en−1 }, which contradicts the definition of λn as the
minimal distance. Therefore, we indeed have kẽn − ẽl k ≥ 1 and hence there is no convergent
subsequence.
Remark. Everything we said also works in complex spaces; just change R to C.
This concludes the discussion of finite dimensional spaces vs infinite dimensional spaces.
6
3.2. More about completeness. We haven’tnyet proven that any infinite space o is complete.
2
P∞ 2
We claim that this is true for the space `R = x = (x1 , x2 , . . . ) : j=1 xj < ∞ .
for all l ≥ N . This shows that y − x(l) ∈ `2R . We also know that x(l) ∈ `2R , so that
y ∈ `2R . Furthermore, for every ε > 0, there exists N with y − x(l) ≤ ε for all l ≥ N .
Hence, lim x(l) = y in `2R . This proves that every Cauchy sequence converges, which proves
completeness.
Note that the crucial step was to use the completeness of R.
Definition 3.7. A set A is convex means that for all x, y ∈ A, the line segment joining them
is in A, i.e. tx + (1 − t)y = y + t(x − y) ∈ A for all t ∈ [0, 1].
Theorem 3.8. Let X be any Hilbert space, and let A be any nonempty closed convex subset
of X. Let x ∈ X \ A. Then there exists a unique nearest point of A to x. More precisely,
there exists a ∈ A such that kx − ak < kx − yk for every y ∈ A \ {a}.
7
4. 4/4
Last time, we stated Theorem 3.8 that we can find a unique closest point. We can now
prove it. The proof is based on the parallelogram inequality. We don’t yet know that the
minimum exists, but we can consider the infimum.
Proof of Theorem 3.8. Letq α = inf {kx − yk : y ∈ A}. For all k = 1, 2, . .q
. , there exists
yk ∈ A with kx − yk k < α2 + k1 , since otherwise we would have kx − yk ≥ α2 + k1 for all
y ∈ A, contradicting the definition of α.
We can now apply the parallelogram identity. This states that kz − wk2 + kz + wk2 =
2(kzk2 + kwk2 ). We plug in z = x − yk and w = x − yl . Then
2 2 2 2 2 1 2 1
kyk − yl k + k(x − yk ) + (x − yl )k = 2(kx − yk k + kx − yl k ) ≤ 2 α + + α +
k l
Note that k(x − yk ) + (x − yl )k2 = 4 kx − (yk + yl )/2k2 . By the convexity of A, we know
that (yk + yl )/2 ∈ A. Therefore, kx − (yk + yl )/2k > α. Hence, we have
2 2 2 1 2 1
kyk − yl k + 4α ≤ 2 α + + α + ,
k l
so
2 2
kyk − yl k2 ≤ +
k l
for all k, l = 1, 2, . . . . Therefore, {yk } is a Cauchy sequence. X is a Hilbert space, so
it is complete. Hence, {yk } is convergent, so there exists a ∈ X with a = lim yk , i.e.
lim ka − yk k = 0. Hence, a ∈ A because A is closed.
We now claim that kx − ak is the minimum distance. We have
r
1
α ≤ kx − ak = kx − yk + yk − ak ≤ kx − yk k + kyk − ak ≤ α2 + + kyk − ak → α + 0.
k
Therefore, kx − ak = α.
We still need to show uniqueness. Suppose that ã ∈ A also has the minimum distance
kx − ãk = α. We want to show that a = ã. We can plug a and ã in the parallelogram law
in place of yk and yl . This gives
2
2 2 a + ã
2
ka − ãk + 4α ≤ ka − ãk + 4 x − = 2(kx − ak2 + kx − ãk2 ) = 4α2 ,
2
so hence ka − ãk2 = 0 and therefore a = ã.
4.1. Orthogonality.
Definition 4.1. In a Hilbert space X, vectors x1 , . . . , xN ∈ X are orthogonal means that
(xi , xj ) = 0, for all i 6= j, i, j = 1, . . . , N .
x1 , . . . , xN ∈ X are orthonormal if
(
0 i 6= j
(xi , xj ) =
1 i = j.
There is a fundamental identity for orthonormal vectors.
8
Proposition 4.2. Let x ∈ X, and suppose that e1 , . . . , eN are orthonormal. Let λ1 , . . . , λN
be scalars, let cj = (x, ej ) for j = 1, . . . , N . Then
N 2 N N
X 2
X X
2
x− λj ej = kxk + |ci − λi | − |ci |2 .
j=1 i=1 i=1
Proof.
N 2 N N
!
X X X
x− λj ej = x− λi ei , x − λj ej
j=1 i=1 j=1
N
! N
! N N
!
X X X X
= (x, x) − λi ei , x − x, λj ej + λi e i , λj ej
i=1 j=1 i=1 j=1
N
X N
X N
X
= kxk2 − λ i ci − λj cj + |λi |2
i=1 j=1 i=1
N
X N
X
= kxk2 + |ci − λi |2 − |ci |2 .
i=1 i=1
This identity is nice. Staring at it for a bit, we can read off the point that satisfies the
closest point property.
Theorem 4.3. The point of span {e1 , . . . , eN } which has minimum distance from x is exactly
PN
i=1 ci ei and it is the unique such point.
Intuitively, we expect that this minimum point on span {ei } is orthogonal. We can check
this:
N N
! N N N N
X X X X X X
x− ci e i , λj ej = λj (x, ej ) − ci λi = cj λ j − ci λi = 0.
i=1 j=1 j=1 i=1 j=1 i=1
5. 4/6
X is a Hilbert space and e1 , e2 , . . . is an orthonormal sequence.
P∞
Definition 5.1. e1 , e2 , . . . is a complete orthonormal sequence if x = i=1 (x, ei )ei for all
x ∈ X.
Theorem
P∞ 5.2. The statement (C) that e1 , e2 , . . . is a complete orthonormal sequence (i.e.
x = i=1 (x, ei )ei for all x ∈ X) is equivalent to each of the following:
(i) Equality holds in Bessel’s inequality for every x ∈ X (i.e. kxk2 = ∞ 2
P
i=1 |ci | )
(ii) There does not exist an x ∈ X \ {0} with (x, ei ) = 0 for every i.
(iii) span {e1 , e2 , . . . } is a dense subset of X.
Proof. We already proved P
(i) last time in Theorem 4.4.
(ii) =⇒ (C). Consider ∞ i=1 (x, ei )ei . We know that this converges. Then
∞
!
X
x− (x, ei )ei , ej
i=1
∞
! ∞
X X
= (x, ej ) − (x, ei )ei , ej = (x, ej ) − (x, ei )(ei , ej ) = (x, ej ) − (x, ej ) = 0.
i=1 i=1
QN 2
X
x− λN
j ej →0
j=1
Note that this is also true with sums up to M , with M ≥ QN and λLj = 0 for all L > N .
Let ε > 0. Then there exists N0 such that
M
X
x− cj ej < ε
j=1
PM P∞
for all M ≥ N0 . This is the definition of convergence, so x = limM →∞ j=1 cj ej = j=1 cj ej .
Example 5.3. Define
Z π
L2C [−π, π] = f: 2
|f | exists and is finite .
−π
This has an inner product Z π
1
(f, g) = f (x)g(x) dx.
2π −π
We can easily check the inner product properties.
This is not as simple as it looks. What type of integration are we using? The Riemann
integral is not good enough. Using the Riemann integral, this space lacks completeness.
There are a huge class of counterexamples.
Therefore, we will use the Lebesgue integral. We will spend a couple of lectures on this,
but for now, don’t worry about it. All Riemann-integrable functions are also Lebesgue-
integrable with the same integral; however, Lebesgue integration allows us to handle a much
larger class of functions. In particular, Lebesgue integration gives us completeness.
There is an extremely important application of the abstract theory that we have just
developed. There is a simple orthonormal sequence in this space. This is
ix −ix 2ix −2ix
1, e , e , e , e , . . . , enix , e−nix , . . . .
Proposition 5.4. This is an orthonormal sequence.
Proof.
( 2π
h=i(n−m)x
1 n=m
Z π Z π
1 1 2π
(einx , eimx ) = einx e−imx dx = ei(n−m)x dx = 1 e
i
2π −π 2π −π 2π i(n−m)
= 0 n 6= m.
11
In particular, all the previous theory applies to this case. In fact, we will show that this
is a complete orthonormal sequence.
6. 4/8
We are in the process of talking about an extremely important application. We are
considering the following example:
Example 6.1.
Z π
X= L2C ([−π, π]) = f = f1 + if2 : 2
|f | exists and is finite
−π
1
Rπ
The inner product (f, g) = 2π −π
f (x)g(x) dx is clear. For now, we will believe that this
is a complete space (with the Lebesgue integral). Historically, this is the main reason why
people use the Lebesgue integral. This will be proved soon.
The big claim is
Proposition 6.2.
1, eix , e−ix , e2ix , e−2ix , . . . , einx , e−inx , . . .
is a complete orthonormal sequence.
Proof. Recall that for an orthonormal sequence to be complete, we need
X∞
x= (x, en )en
n=1
for all x. In this case, we have the Fourier series of f
∞
X ∞
X N
X
(x, en )en = (f, einx )e−inx = lim (f, einx )einx .
n→∞
n=1 n=−∞ n=−N
(We don’t have to worry too much about the limit because we already proved earlier that
everything converges.) Here,
Z π
1
inx
(f, e ) = f (t)e−int dt.
2π −π
Therefore,
N Z π Z π N
X 1 −int inx 1 X
SN (f )(x) = f (x)e dte = f (x) ein(x−t) dt.
n=−N
2π −π 2π −π n=−N
First, we check this for when f is a continuous function with f (−π) = f (π) = 0. We claim
that we can use
m Z π m
1 X 1 1 X
SN (f )(x) = f (t) DN (x − t) dt.
m + 1 N =0 2π m + 1 −π N =0
Here,
m m
1 X 1 X e−iN s − ei(N +1)s
Km (x) = DN (s) =
m + 1 N =0 m + 1 N =0 1 − eis
1
Pm −iN s is
Pm iN s
N =0 e − e N =0 e
= m+1
1 − eis
1 − e−i(m+1)s i(m+1)s
1 1 is (1 − e )
= · −e
m + 1 1 − eis 1 − e−is 1 − eis
= ···
1 sin2 ( m+1
2
s)
= 2 s
m + 1 sin ( 2 )
where the calculation will be finished next time. This is the Fejer kernel. This is a nonneg-
ative function.
7. 4/11
We want to show completeness of the orthonormal sequence {einx : n = 0, ±1, . . . } in
L2C [−π, π].
Last time, we showed that
N
X
SN (f )(x) = cn einx
n=−N
13
with Z π
1
int
cn = (f, e ) = f (t)e−int dt.
2π −π
Then Z π
1
SN (f )(x) = f (t)DN (x − t) dt
2π −π
with
N
X
ins sin(N + 21 )s
DN (s) = e = .
n=−N
sin 2s
Note that DN (0) = 2N + 1. Consider
m Z π m Z π
1 X 1 1 X 1
SN (f )(x) = f (t) DN (x − t) dt = f (t)Kn (x − t) dt,
m + 1 N =0 2π −π m + 1 N =0 2π −π
where we define
m
1 − e−i(m+1)s i(m+1)s
1 X 1 1 is 1 − e
Km (s) = DN (s) = −e
m + 1 N =0 m + 1 1 − eis 1 − e−is 1 − eis
1 − e−i(m+1)s eis (1 − ei(m+1)s )
1 1
= −
m + 1 eis/2 (e−is/2 − eis/2 ) e−is/2 (eis/2 − e−is/2 ) eis/2 (e−is/2 − eis/2 )
1 − e−i(m+1)s 1 − ei(m+1)s
1 1
= − −is/2
m + 1 e−is/2 − eis/2 eis/2 − e−is/2 e − eis/2
+ e−i(m+1)s − 2 1 (ei(m+1)s/2 − e−i(m+1)s/2 )2
i(m+1)s
1 e
= =
m+1 (eis/2 − e−is/2 )2 m+1 (eis/2 − e−is/2 )2
1 sin2 ( m+1 2
s)
= 2 s
m + 1 sin 2
1
Pm
for 0 < |s| ≤ π. Additionally, Km (0) = m+1 N =0 (2N + 1) = m + 1.
This was not an obvious thing to do. It’s not clear that the average of the SN should be
nice, but it does come out nicely.
Let’s consider some properties of Km . Notice that it is even and positive. It is zero
whenever m+1 2
= kπ. There is a sharp peak at zero, with height m + 1. There are a lot
of zeros, and between the zeros, there are violent oscillations. It is a good idea to draw a
picture of this.
What is the area under the graph? We have
Z π m Z
1 1 1 X π
A= Km (s) ds = DN (s) ds = 1.
2π −π 2π m + 1 N =0 −π
This is nice. It is like an oscillatory version of the Dirac delta “function”. In the distribution
sense, these converge to the Dirac delta.
Consider this function on the intervals [−π, −δ) ∪ (δ, π], where δ < |s| ≤ π. Then
1 1
0 ≤ Km (s) ≤ →0
m + 1 sin2 2δ
as m → ∞ for fixed δ.
14
Take f = g, where g is continuous and g(−π) = g(π) = 0. Extend g to be 2π-periodic.
Then m Z π
1 X 1
SN (g)(x) = g(t)Kn (x − t) dt.
m + 1 N =0 2π −π
This is an average of the Fourier partial sums of g, and these functions are in span {einx }.
We claim that this is a really good approximation to g. To see this, consider
m Z π
1 X 1
SN (g)(x) − g(x) = g(t)Km (x − t) dt − g(x)
m + 1 N =0 2π −π
Z π
1
= (g(t) − g(x))Km (x − t) dt
2π −π
1
Rπ
because 2π −π
Km (x − t) dt = 1. Make a change of variable s = x − t, and observe that all
functions under consideration are 2π-periodic. Then
Z x+π Z π
1 1
= (g(x − s) − g(x))Km (s) ds = (g(x − s) − g(x))Km (s) ds .
2π x−π 2π −π
For any δ ∈ (0, π), we can break this integral up into
Z δ Z
1 1
= (g(x − s) − g(x))Km (s) ds + (g(x − s) − g(x))Km (s) ds .
2π −δ 2π δ≤|s|≤π
Note that g is uniformly continuous because it is continuous on a compact set. Let ε > 0.
Then there exists δ > 0 such that |x − y| < δ implies that |g(x) − g(y)| < ε. Therefore,
picking the appropriate δ, we have
Z δ Z
1 1
≤ (g(x − s) − g(x))Km (s) ds + (g(x − s) − g(x))Km (s) ds
2π −δ 2π δ≤|s|≤π
Z δ Z
1 1
≤ |g(x − s) − g(x)|Km (s) ds + |g(x − s) − g(x)|Km (s) ds.
2π −δ 2π δ≤|s|≤π
Let g is continuous on a compact set, so it attains its maximum. Let L = max |g|. Then we
have
Z δ Z
ε 2L
≤ Km (s) ds + Km (s) ds
2π −δ 2π δ≤|s|≤π
Z π
ε L 1 1
≤ Km (s) ds +
2π −π π m + 1 sin2 2δ
L 1 1
=ε+ < ε + ε = 2ε
π m + 1 sin2 2δ
L 1 1
provided that m > m0 , where π m0 +1 sin2 δ < ε. Therefore,
2
m
1 X
max SN (g)(x) − g(x) < 2ε
[−π,π] m + 1
N =0
for all m > m0 . We’re almost done. We have proved this for when g is continuous. We want
to show that this is true for all L2 functions, and we’ll finish this next time.
15
8. 4/13
We’ve almost finished the proof that our orthonormal sequence {einx : n = 0, ±1, . . . } is
dense in L2C [−π, π].
We proved that if g : [−π, π] → R is continuous with g(−π) = g(π) = 0, then for any
ε > 0, there exists m0 with
m
1 X
max g(x) − SN (g)(x) < ε
x∈[−π,π] m + 1 N =0
for all m ≥ m0 . Then,
v
m
u Z m 2
π
1 X u1 1 X
g− SN (g) =t g(x) − SN (g)(x) dx ≤ ε.
m + 1 N =0 2π −π m + 1 N =0
L2
This small change doesn’t seem profound, but it makes a huge difference to the theory.
Lemma 8.4. If S1 , S2 , . . . each has measure zero, then ∪∞
j=1 Sj also has measure zero.
The first thing we do when we meet a mathematical definition is to ask: Does it make
sense? It is well-defined by the second part of our lemma.
We now look at some properties of this integral. We will prove these next time.
Properties 8.9.
R R R
(1) If α, β ≥ 0, and f, g ∈ L0 , then αf + βg ∈ L0 and [a,b] (αf + βg) = α [a,b] f + β [a,b] g.
(2) If f, g ∈ L0 then max {f, g} , Rmin {f, g}R ∈ L0 .
(3) If f, g ∈ L0 with f ≤ g then [a,b] f ≤ [a,b] g.
Warning: f ∈ L0 does not imply that −f ∈ L0 .
9. 4/15
Recall that L0 is the set of f : [a, b] → R such that f (x)R = lim ϕk (x) for almost every
x ∈ [a, b], with ϕk+1 ≥ ϕk , where ϕk are step functions with ϕk bounded.
We have some properties:
17
Properties 9.1.
(1) RIf f ∈ L0Rand if f˜ : [a, b] → R with f˜(x) = f (x) for almost every x, then f˜ ∈ L0 and
[a,b]
f˜ = [a,b] f .
R R R
(2) α, β ≥ 0 and f, g ∈ L0 imply thatRαf + Rβg ∈ L0 and (αf + βg) = α f + β g.
(3) f, g ∈ L0 and f ≤ g implies that f ≤ g.
Proof. We prove (3). Pick increasing sequences ϕk and ψk with f = lim ϕk and g = lim ψk
almost everywhere.
Let ϕ̃k = min {ϕk , ψk }. Note that this is still a step function, and it still converges almost
everywhere to f . To see this, notice that
lim ϕ̃k (x) = min {lim ϕk (x), lim ψk (x)} = min {f (x), g(x)} = f (x)
almost everywhere. Similarly, we define ψ̃k (x)R= maxR {ϕk (x),
R ψk (x)}R to be a step function
with lim ψ̃k (x) = g(x). Therefore, we see that f ← ϕ̃k ≤ ψ̃k → g.
The point is that we should reduce everything to step functions.
A big defect is that f ∈ L0 does not imply that −f ∈ L0 . We enlarge this space to get
around this problem.
Definition 9.2. Let L1 = {g − h : g, h ∈ L0 }. This is the space of Lebesgue integrable
functions.
Observe that L1 is indeed a linear space.
We should check this. For example, for multiplication by scalars, if g − h ∈ L1 , we should
show that λ(g − h) ∈ L1 . To do this, we have two cases: λ > 0 and λ < 0. In each case,
either λg, λh ∈ L0 or −λg, −λh ∈ L0 , so the result follows simply.
R R R
Definition 9.3. For f = g − h ∈ L1 , we define f = g − h.
Such a definition might plausibly be nonsense. There might be lots of ways of writing f as
a difference of two functions in L0 . Suppose that g1 − h1 = g2 − h2 . Then g1 +Rh2 = gR2 + h1 .
Each
R R in LR0 , so we
isRa sum of functions R the linearity in L0 to see that g1 + h2 =
R can use
g2 + h1 . Therefore, g1 − h1 = g2 − h2 . Therefore, our definition works.
It seems like we got something for nothing, by extending L0 so simply to L1 . But we can’t
get something for nothing. It turns out that it will be hard to show some really simply facts.
We have some properties:
Properties 9.4.
(1) RIf f ∈ RL1 and f˜ : [a, b] → R with f˜(x) = f (x) almost everywhere then f˜ ∈ L1 and
f˜ = f . R R
(2) If f1 , f2 ∈ L1 with f1 ≤ f2 , then
R f 1R≤ f2 .
(3) If f ∈ L1 then |f | ∈ L1 and | f | ≤ |f |.
(4) If f ∈ L1 then there exists Ra decreasing
R sequence {gk } ⊂ L0 with gk (x) → f (x) for
almost every xR∈ [a, b] and gk → f .
(5) If f ∈ L1 and |f | = 0, then f (x) = 0 for almost every x ∈ [a, b]. R
(6) If we have a sequence {fk }⊂ L1 of nonnegative functions fk ≥ 0 with fk → 0, then
there exist a subsequence fkj j=1,2,... with fkj (x) → 0 for almost every x ∈ [a, b].
18
Proof. (1) follows trivially from the corresponding fact about L0 .
We check (2). f1 = g1 − h1 and f2 = gR2 − h2 for g1R, g2 , h1 , h2 ∈ L0 . WeR know Rthat
g1 − h1 ≤ g2 − h2 , so g1 + h2 ≤ g2 + h1 , and so (g1 + h2 ) ≤ (g2 + h1 ), and hence f1 ≤ f2 .
We check (3). f ∈ L1 , so f = g − h, where g, h ∈ L0 . Here, we use a trick to express
|g − h| as a differenceRof functions
R in L0 , that is, |g − h| = max {g, h} − min {g, h} ∈ L1 .
Now, f ≤ |f |, so that f ≤ |f |.
WeR check (4). f ∈ L1 , Rso that fR = g − h for g, h ∈ L0 . Then h = lim ϕk with ϕk+1 ≥ ϕk
and ϕk bounded and ϕk → h; a similar statement is true for g = lim ψl . Then
g − h = lim(g − ϕk ) almost everywhere, and g − ϕk = liml→∞ (ψl − ϕk ) is the limit is a
sequence of step functions, so g − ϕk ∈ L0 , so we are done.
Note that (6) implies (5); just take fk = |f | for all k.
We nowRneed to prove (6). This is a bit trickier than the previous Rproperties. We pick fkj
such that fkj < 21j and kj+1 > kRj . We can R certainly do this because fk → 0. By (4), there
1
exist gj ∈ L0 with gj ≥ fkj and gj ≤ fkj + 2j . Pick an increasing sequence {ψj,i }i=1,2,...
R R
of step functions with limi→∞ ψj,i (x) = gj (x) for almost every x and limi→∞ ψj,i = gj
for j = 1, 2, . . . . We can take these to be nonnegative, as otherwise we could just redefine
ψj,i = max {ψj,i , 0}.
Define ψi = ij=1 ψj,i , where ψi is a nonnegative step function. Then for i > N ,
P
N Z
X N
Z X Z i Z
X
ψj,i ≤ ψj,i ≤ ψi = ψj,i
j=1 j=1 j=1
i Z i Z i
X X 1 X 1
≤ gj ≤ fkj + ≤ =2
j=1 j=1
2j j=1
2j−1
R
for every i. Then ψi is bounded and ψi+1 ≥ ψi . By the main technical lemma 8.7, we see
that ψi (x) are bounded almost everywhere, i.e. for almost every x, there exists Mx with
N
X
ψj,i (x) ≤ ψi (x) ≤ Mx
j=1
10. 4/18
Recall that we defined L1 ([a, b]) = {f : f = g − h, g, h ∈ L0 [a, b]}.
Definition 10.1. Define Z
kf k1 = |f |.
[a,b]
Note that this is almost a norm. It has two of the three norm properties. We have
Z Z
kf1 + f2 k1 = |f1 + f2 | ≤ (|f1 | + |f2 |) = kf1 k1 + kf2 k1 ,
[a,b] [a,b]
19
and kλf k1 = |λ| kf k1 . However, for the third property, kf k1 = 0 implies that f (x) = 0 for
almost every x ∈ [a, b]. It formally fails the third property, but it does not fail badly. We
say that k·k1 is a seminorm.
Remark. If f ∈ L1 [a, b], there exists a sequence {ζk } of step functions such that kf − ζk k1 →
0 and ζk (x) → f (x) for almost every x ∈ [a, b]. This sequence converges in the seminorm
and converges almost everywhere pointwise.
Proof. We Rhave f =R g − h with g, h ∈ L0 . This means that g(x) = lim ϕk (x) for almost every
R lim Rϕk = g, with ϕk+1 ≥ ϕk . Likewise, h(x) = lim ψk (x) for almost every x and
x and
lim ψk = h with ψk+1 ≥ ψk .
Take ζk = ϕk − ψk . Then ψk (x) = ϕk (x) − ψk (x) → g(x) − h(x) for almost every x. Also,
since g − ϕk ≥ 0 and h − ψk ≥ 0 almost everywhere, we have
Z Z Z Z
|f − ζk | ≤ |g − ϕk | + |h − ψk | ≤ (g − ϕk ) + (h − ψk ) → 0.
[a,b] [a,b] [a,b] [a,b]
1 1
Theorem 10.2 (Completeness of L ). Suppose that {fk } ⊂ L [a, b] is Cauchy with respect
to k·k1 . (This means that for every ε > 0 there exists N such that kfk − fl k1 < ε for all
l > k ≥ N .) Then there exists f ∈ L1 [a, b] such that kfk − f k1 → 0 (i.e. fk → f with respect
to k·k1 ).
Proof. Consider applying the Cauchy property with ε = 2−j . Then there exists kj such
that fl − fkj < 2−j for all l ≥ kj , and we can choose kj+1 > kj for every j. Hence,
fkj+1 − fkj 1 < 2−j .
By the above remark, there exists a step function ζj with fkj − ζj 1 < 2−j . Then we can
see that
kζj+1 − ζj k1 = ζj+1 − fkj+1 + fkj+1 − fkj + fkj − ζj 1
≤ ζj+1 − fkj+1 1
+ fkj+1 − fkj 1
+ fkj − ζj 1
< 2−j + 2−j + 2−j ≤ 2−j+2 .
Take ζ0 ≡ 0. Observe that we can always write a+ = max {a, 0} ≥ 0 and a− = max {−a, 0} ≥
0, so that a = a+ − a− ; then
l
X l
X l
X
ζl = (ζj − ζj−1 ) = (ζj − ζj−1 )+ − (ζj − ζj−1 )− = Φl − Ψl ,
j=1 j=1 j=1
Then the first part of the main technical lemmaR 8.7 tells us Rthat Φl (x) →
R g(x) and ΨRl (x) →
h(x) almost everywhere, where g, h ∈ L0 , and [a,b] g = lim [a,b] Φl and [a,b] h = lim [a,b] Ψl .
Therefore, ζl → g(x) − h(x) almost everywhere. Define f = g − h ∈ L1 . We know that this
is the pointwise limit of ζl . In addition,
Z Z
kf − ζl k1 = |f − ζl | = |(g − Φl ) − (h − Ψl )|
Z Z Z Z
≤ |g − Φl | + |h − Ψl | = (g − Φl ) + (h − Ψl ) → 0.
20
Now,
Z Z
|f − fkl | =
|f − ζl + ζl − fkl | ≤ kf − ζl k1 + kζl − fkl k1 → 0.
R
On the other hand, we know that |fl − fkl | = kfl − fkl k1 → 0 as l → ∞ by the Cauchy
property. Hence,
kf − fl k1 ≤ kf − fkl k1 + kfkl − fl k1 → 0.
We cooked up a function f , and this took some work to do. We needed to find an increasing
sequence of step functions. Then we checked that f actually satisfied the properties that we
wanted. This concludes the proof.
11. 4/20
Today we will discuss linear functionals, and go back to the Lebesgue integral next time.
We first make some final remarks about orthogonality.
12. 4/22
Let’s finish our discussion of the Lebesgue integral. Recall that we proved the Monotone
Convergence Theorem 10.3.
This has a very useful corollary:
1
nR a givenofunction f : [a, b] → R and {fk } ⊂ L ([a, b]) with fk → f
Corollary 12.1. We have
almost everywhere, and [a,b] |fk | is bounded, then f ∈ L1 .
k=1,2,...
We proved that L1 is complete with respect to the L1 -seminorm. We claim that this is
also true in the L2 case.
Proposition 12.7. L2 is complete with respect to the seminorm k·k2 , that is, if {fk } ⊂
L2 [a, b] is a Cauchy sequence with respect to this seminorm k·k2 (i.e. for every ε > 0 there
exists N such that kfk − fk k2 < ε for every k > l ≥ N ), then there exists f ∈ L2 [a, b] with
kfk − f k2 → 0.
Proof. We can always write fk = fk+ − fk− , so hence without loss of generality we can take
fk ≥ 0. Then by the Cauchy-Schwarz inequality,
Z Z
2 2
kfk − fl k1 = |fk − fl | = |fk − fl |(fk + fl ) ≤ kfk − fl k2 kfk + fl k2 .
[a,b] [a,b]
If we use the Cauchy sequence property with ε = 1, then there exists some N1 such that
kfk − fl k2 < 1 for every k > l ≥ N1 . In particular, this means that kfk − fN1 k2 ≤ 1 for
every k ≥ N1 . By the triangle inequality, we can then write
kfk k2 = kfk − fN1 + fN1 k2 ≤ kfk − fN1 k2 + kfN1 k2 ≤ 1 + kfN1 k2
for all k ≥ N1 , so kfk k2 is bounded. Hence, kfk + fl k2 ≤ kfk k2 + kfl k2 ≤ C, so therefore
fk2 − fl2 1
≤ Cε
for all k > l > N . Therefore, {fk2 } is RCauchy with respect to the L1 -seminorm k·k1 . There-
fore,
there exists h ∈ L1 ([a, b]) with |fl2 − h| → 0. Therefore, there exists a subsequence
fkj with fk2j → h almost everywhere.
25
Also, by Cauchy-Schwarz,
√
Z
kfk − fl k1 = |fk − fl | · 1 ≤ kfk − fl k2 · b − a,
[a,b]
i.e. {fk } is a Cauchy sequence in L1 , and so hence there exists f ∈ L1 with kfk − f k1 → 0.
There is again a subsequence {flj } with flj (x) → f (x) for almost every x. We can take
flj to be a subsequence of the fkj because fkj is also Cauchy in L1 . Therefore, we have a
subsequence {flj } such that flj → f almost everywhere (with f ∈ L1 ) and fl2j → h with
h ∈ L1 . Hence, h = f 2 almost everywhere, and hence f 2 ∈ L1 . Therefore, f ∈ L2 .
We need to check convergence in the L2 norm. Then by Cauchy-Schwarz,
Z Z
2 2
kfk − f k2 = (fk − f ) = |fk − f ||fk − f |
[a,b] [a,b]
Z
≤ |fk − f |(fk + f ) = fk2 − f 2 1 → 0,
[a,b]
2
so hence L is a complete space.
There are still a few points to clean up, such as the proof of the technical Lemma 8.7.
Also, we’ll turn the seminorm to an actual norm through an underhanded trick.
13. 4/25
We will finish the discussion of the Lebesgue integral. We want to get around the difficulty
that our supposed “norm” is only a seminorm. R
Recall that for f ∈ L1 [a, b], we have the seminorm kf k1 = [a,b] |f |. Similarly, for f ∈
R
L2 [a, b], we have a semi-inner product (f, g) = [a,b] f g.
The problem is that kf k1 = 0 or kf k2 = 0 implies only that f (x) = 0 almost everywhere.
To get around this, instead of considering functions, we consider classes of functions.
Definition 13.1. The L1 class of f is defined as
f = g ∈ L1 [a, b] : g(x) = f (x) for almost every x ∈ [a, b] .
Note that g ∈ f if and only if f ∈ g if and only if f = g. Treat these as elements in a new
space instead of thinking of them as classes.
We can now define f + g = f + g. We should check that this makes sense. Indeed, if
f1 ∈ f and g1 ∈ g, then certainly f1 + g1 ∈ f + g. Similarly, λf = λf . Then we can define
Definition 13.2. L1 [a, b] = f : f ∈ L1 [a, b] is a linear space.
We can now define:
R R R
Definition 13.3. [a,b] f = [a,b] f and f 1
= kf k1 = [a,b]
|f |.
Proposition 13.4. L1 [a, b] is a normed (Banach) space.
Proof. We just need to check the properties: λf 1 = |λ| f and f + g 1 ≤ f 1 + g 1 .
Also, f = 0 implies that kf k1 = 0, so that f (x) = 0 almost everywhere, and so f ∈ 0 =
{f ∈ L1 [a, b] : f (x) = 0 a.e. x ∈ [a, b]}.
Similarly, we can define
26
R
Definition 13.5. L2 [a, b] = f : f ∈ L2 [a, b] , and we can define (f , g) = (f, g) = [a,b] f g
q
with a norm f 2 = (f , f ). This actually is now an inner product, so this is now a genuine
Hilbert space.
To finish off the Lebesgue theory, we still need to finish the proof of our main technical
lemma 8.7.
LemmanR 13.6o (Lemma 8.7). If {ψk } is an increasing sequence of step functions on [a, b]
with ψ bounded, then {ψk (x)} is bounded (and hence convergent) for almost every
[a,b] k
x ∈ [a, b].
Proof of Lemma 8.7. Let S = {x ∈ [a, b] : {ψk (x)} is not bounded}. Therefore, for x ∈ S,
we see that limk→∞ ψk (x) = ∞. We want to prove that S has measure zero.
Let’s agree to use the notation ψ̃k (x) = ψk (x) − ψ1 (x). Notice that this is a nonnegative
function, and ψ̃1 (x) ≡ 0. Equivalently, we see that S = {x ∈ [a, b] : {ψ̃k (x)} is not bounded}.
Take an arbitrary α > 0. Let Sk = {x ∈ [a, b] : ψ̃k (x) > α}. Then Sk+1 ⊃ Sk for all k, and
S1 = ∅. We can therefore write
N
[ −1
SN = (SN \ SN −1 ) ∪ (SN −1 \ SN −2 ) ∪ · · · ∪ (S2 \ S1 ) = (Sk+1 \ Sk )
k=1
N
[ −1 n o
= x : ψ̃k (x) ≤ α < ψ̃k+1 (x) .
k=1
(k)
Recall that ψ̃k are step functions. For each k = 1, 2, . . . , consider the partition a = x0 <
(k) (k)
x1 < · · · < xNk = b so that ψ̃k is constant on each interval of this partition. Choose these
partitions to be compatible with both ψ̃k and ψ̃k+1 (e.g. by taking a common refinement).
(k) (k)
Define Pk = {(xi−1 , xi ) : i = 1, . . . , Nk }. Then Sk+1 \ Sk is a subcollection of these. Then
we have
(k) (k)
Qk ⊂ Sk+1 \ Sk ⊂ Qk ∪ {x0 , . . . , xNk }
where Qk is some subcollection of Pk .
Therefore,
N[−1 N
[ −1
(k) (k)
Qk ⊂ SN ⊂ Qk ∪ {x0 , . . . , xNk }
k=1 k=1
N −1
Thus ψ̃N (x) > α on ∪k=1 Qk , i.e.
N
X −1 X
ψ̃N ≥ α χI .
k=1 I∈Qk
13.1. Linear operators. Now we will consider the next topic, which is linear operators.
Definition 13.7. Let E and F be normed spaces. Then T : E → F is a linear operator (i.e.
linear map) if T (αx + βy) = αT (x) + βT (y) for all x, y ∈ E and for all scalars α, β.
Definition 13.8. We say that T is bounded if there exists M such that kT (x)k ≤ M kxk
for all x ∈ E.
We should be somewhat careful here; kT (x)k is a norm in F , while kxk is a norm in E.
In the same way as for the linear functionals (as in Lemma 11.4), we get the following
theorem:
Theorem 13.9. T is continuous at each point of E if and only if T is continuous at 0 if
and only if T is bounded.
14. 4/29
14.1. Bounded linear operators. We started our discussion of bounded linear operators.
Let E and F be normed spaces, and suppose that T : E → F is linear. Then T is bounded
28
means that there exists M with kT (x)k ≤ M kxk for every x ∈ E. In this case, we can define
the operator norm
kT (x)k
kT k = sup : x ∈ E \ {0} = sup {kT (x)k : kxk = 1} .
kxk
Exercise 14.1. Check that kT k is a norm on L(E, F ).
A special case of this discussion is when F = R or C, when T is just a linear functional.
Note that T is continuous if and only if T is continuous at 0 if and only if T is bounded.
Definition 14.2. Let L(E, F ) be the set of all bounded linear maps E → F , equipped with
the operator norm.
Lemma 14.3. L(E, F ) is a Banach space provided that F is a Banach space.
The proof will be very similar to all of the other completeness proofs that we’ve seen.
Proof. L(E, F ) is clearly a normed linear space, so we just have to check completeness. Let
{Tk }k=1,2,... be any Cauchy sequence in L(E, F ). That is, let ε > 0; then there exists N
such that kTk − Tl k < ε for every l > k ≥ N . This implies that kTk (x) − Tl (x)k ≤ ε kxk for
every x ∈ E. Take any fixed x; this means that {Tk (x)}k=1,2,... is Cauchy in F and hence
convergent (since F is a Banach space) to some limit which we call T (x).
Now,
T (λx + µy) = lim Tk (λx + µy) = lim (λTk (x) + µTk (y)) = λT (x) + µT (y).
k→∞ k→∞
Suppose that w = (w1 , . . . , wm ). In this case, the adjoint property says that
m X
X n m X
X n
(A(z), w) = aij zj wi = zj (aij wi ).
i=1 j=1 i=1 j=1
31
Here, A∗ is given by
m
X
∗
Aw= aij wi ,
i=1
which means that A∗ = (aij )T .
Definition 15.5. A ∈ L(H, H) is self-adjoint (or in the complex case, Hermitian) if A∗ = A.
In the finite dimensional case, H = Cn . This means that (aij )T = (aij ), which means that
aij = aji for all i and j. This also means that the diagonal entries are real.
Proposition 15.6. If A ∈ L(H, H) is self-adjoint then (Ax, x) is always real.
Proof. This is because (Ax, x) = (x, Ax) = (Ax, x) by the self-adjointness property.
Proposition 15.7. If A ∈ L(H, H) is self-adjoint, then
kAxk |(Ax, x)|
kAk = sup = sup .
x6=0 kxk x6=0 kxk2
Proof. Let m = supx6=0 |(Ax,x)|
kxk2
. Notice that Cauchy-Schwarz implies that m ≤ supx6=0 kAxk
kxk
=
kAk. We need to prove the reverse inequality.
Note that (A(x + y), x + y) = (A(x), x) + (A(y), y) + (A(y), x) + (A(x), y). Here, notice
that (A(y), x) = (x, A(y)) = (A(x), y). Then we have
(A(x + y), x + y) = (A(x), x) + (A(y), y) + 2 Re(A(x), y)
(A(x − y), x − y) = (A(x), x) + (A(y), y) − 2 Re(A(x), y).
Subtracting, we get that
4 Re(A(x), y) = (A(x + y), x + y) − (A(x − y), x − y)
≤ m(kx + yk2 + kx − yk2 ) ≤ 2m(kxk2 + kyk2 ).
kxk
Suppose that A(x) 6= 0. Then choose y = kAxk A(x), to get 4 kxk kA(x)k ≤ 4m kxk2 .
Therefore, kAxk ≤ m kxk for every x, so therefore kxk ≤ m, which means that kAk = m.
16. 5/4
Today we will talk about the spectrum. Suppose that E is a complex Banach space and
suppose that we have a bounded linear map A ∈ L(E, E).
16.1. Spectrum.
Definition 16.1. The spectrum of A is σ(A) = {λ ∈ C : λI − A is not invertible}.
Recall that we say that A ∈ L(E, E) is invertible if A is 1-1, onto, and A−1 ∈ L(E, E).
Remark. {λ : λI − A is not 1:1} ∪ {λ : λI − A is not onto} ⊂ σ(A).
If λI − A is not 1:1, then there exists x 6= 0 with (λI − A)(x) = 0, i.e. Ax = λx. Here,
λ is an eigenvalue, and this means that all eigenvalues are in σ(A). Note that the spectrum
might be much bigger, however.
Example 16.2. Let E = Cn with the usual inner product norm. A ∈ L(E, E) is given by
an n × n matrix, so σ(A) is exactly the set of eigenvalues.
32
Example 16.3. Let E = `2C . Take any z = (z1 , z2 , . . . ) ∈ `2C . Let S be the shift oper-
ator S(z) = (0, z1 , z2 , . . . ). Clearly, S ∈ L(E, E); in fact, it is norm preserving. It has
no eigenvalues. To see this, suppose that there were some z such that Sz = λz. Then
(0, z1 , z2 , . . . ) = (λz1 , λz2 , . . . ). This means that z1 = 0, which means that z2 = 0, . . . . That
is, z = 0, and hence no λ is an eigenvalue. However, the spectrum of this shift operator is
actually very big: σ(S) = {λ ∈ σ : |λ| ≤ 1}.
Lemma 16.4. σ(A) is a closed subset of {λ ∈ C : |λ| ≤ kAk}.
Proof. Take any λ ∈ / σ(A), which implies that λI − A is invertible. Recall that the set of
invertible operators in L(E, E) is an open subset of L(E, E) with respect to the operator
norm. Then there exists δ > 0 such that kB − (λI − A)k < δ implies that B is invertible.
In particular, if µ ∈ C then |µ − λ| < δ implies that k(µI − A) − (λI − A)k < δ, and hence
µI − A is invertible. We have therefore shown that σ(A) is closed.
Now, suppose that |λ| > kAk. Then λI − A = λ(I − λ1 A). Note that Aλ = kAk |λ|
< 1, so
by our general theorem 14.9, this means that λI − A is invertible.
This means that no λ with |λ| > kAk is in σ(A), or in other words, we’ve proved that
σ(A) ⊂ {λ : |λ| ≤ kAk}.
16.2. Compact operators.
Definition 16.5. Suppose that E and F are normed linear spaces. A linear operator T :
E → F is compact if T ({x ∈ E : kxk ≤ 1}) is contained in a compact subset of F .
Remark. Note that this is the same as saying that for all R > 0, we know that T ({x ∈ E :
kxk < R}) is contained in a compact set.
T compact implies that T is bounded. Assume otherwise; then T is not bounded. Then
there exists a sequence {xk } ⊂ E with kxk k = 1 and kT (xk )k ≥ k, which means that
there does not exist a convergent subsequence of {T (xk )}. If there were such a convergent
subsequence T (xkj ) → y then T (xkj ) → kyk.
Example 16.6. Suppose that E is an infinite dimensional normed space. Then IE ∈ L(E, E)
is certainly bounded, but it is not compact because the closed unit ball in E is not compact:
there exists a sequence e1 , e2 , · · · ∈ E with kej k = 1 for all j and kei − ej k ≥ 1 with i 6= j.
Example 16.7. Suppose that T ∈ L(E, F ) with H = T (E) as a finite dimensional subspace
of F . We say that T is an operator of finite rank, and we claim that T is compact.
We should check this. Take any sequence {T (xk )}k=1,2,... ⊂ H, where xk ∈ E and kxk k ≤ 1.
Since H is finite-dimensional, this means that all bounded subsets of H have compact closure.
Thus there exists a convergent subsequence {Tykj }.
There is an extremely important theorem.
Theorem 16.8. If E and F are Banach spaces and we take a sequence {Tk }k=1,2,... ⊂ L(E, F )
with each Tk compact and kTk − T k → 0 for some T ∈ L(E, F ) (i.e. Tk → T in L(E, F )).
Then T is compact.
Proof. Take any sequence {T (xk )}k=1,2,... with kxk k ≤ 1. Then {T1 (xk )} has a convergent
subsequence T1 (x1,k ). Now, {T2 (x1,k )} has a convergent subsequence T2 (x2,k ). The important
thing is that {x2,k } is a subsequence of {x1,k } is a subsequence of {xk }. We continue this
33
process of finding subsequences of subsequences. Inductively, Tq (xq−1,k ) has a convergent
subsequence Tq (xq,k ).
Now we use a standard trick in analysis by taking the diagonal sequence. Let ε > 0 be
given. Consider {xq,q }. This is a subsequence of {xq,k }k=q,q+1,... . Therefore, {Tp (xq,q )}q=1,...
converges. That’s because {xq,q }q=p,p+1,... is also a subsequence of {xp,k }k=1,2,... .
The desired result should now follow easily, and we’ll finish it next time.
17. 5/6
We have Banach spaces E and F , and Tk → T in L(E, F ). We want to show that if Tk is
compact then T is compact. This is Theorem 16.8.
Finishing proof of theorem 16.8. We proved that for any sequence {xk }k=1,2,... with kxk k ≤ 1
for all k, there exists a subsequence {xq,q }q=1,2,... of {xk }k=1,2,... such that {Tp (xq,q )}q=1,2,...
converges for each p. This was done with a diagonal argument. We claim that {T (xq,q )}q=1,2,...
converges.
Let ε > 0 be given. Pick p such that kTp − T k < ε. Using the Cauchy property, we can
pick N such that kTp (xq,q ) − Tp (xr,r )k < ε for every q > r ≥ N . Then
kT (xq,q ) − T (xr,r )k = kT (xq,q ) − Tp (xq,q ) + Tp (xq,q ) − Tp (xr,r ) + Tp (xr,r ) − T (xr,r )k
≤ kT (xq,q ) − Tp (xq,q )k + kTp (xq,q ) − Tp (xr,r )k + kTp (xr,r ) − T (xr,r )k
≤ kT − Tp k + ε + kTp − T k < 3ε
for every q > r ≥ N . Hence, {T (xq,q )}q=1,2,... is Cauchy and hence convergent.
17.1. Hilbert-Schmidt operators.
Definition 17.1. Suppose that H is a Hilbert space and Y is a Banach space. An operator
T ∈ L(H, Y ) is said to be a Hilbert-Schmidt
P∞ operator if there exists a complete orthonormal
sequence e1 , e2 , . . . for H such that j=1 kT (ej )k2 < ∞.
Remark. It doesn’t matter which complete orthonormal
P∞ sequence we use. If (f1 , f2 , . . . )
2
is any other complete orthonormal sequence then j=1 kT (f j )k < ∞ as well. This is
something that will be checked on the homework.
Hilbert-Schmidt operators are important because a lot of common operators satisfy this
property, and because of the following theorem.
Theorem 17.2. Hilbert-Schmidt operators are automatically compact.
Proof. Let xj = (x, ej ). Define TN (x) = T ( N
P PN
j=1 xj ej ) = j=1 xj T (ej ). Then TN (H) is a
finite-dimensional space TN (H) = span {T (e1 ), . . . , T (eN )}. That is, TN is finite rank and
hence compact by example 16.7. Then
∞
! N
! ∞
! ∞
X X X X
(T − TN )(x) = T xj e j − T xj e j = T xj ej = xj T (ej ).
j=1 j=1 j=N +1 j=N +1
We should be careful and check that this final equality holds. We have
∞ ∞
M
! M
X X X X
xj ej = lim xj ej =⇒ T xj ej = lim xj T (ej ).
M →∞ M →∞
j=N +1 j=N +1 j=N +1 j=N +1
34
For M2 > M1 , we can write
M2
X M1
X M2
X M2
X
xj T (ej ) − xj T (ej ) = xj T (ej ) ≤ |xj | kT (ej )k
j=N +1 j=N +1 j=M1 +1 j=M1 +1
v
u M2
u X
≤ kxk t kT ej k2 ,
j=M1 +1
as N → ∞.
17.2. Spectral theorem. This the simplest spectral theorem, for compact Hermitian oper-
ators. We might have seen the spectral theorem for symmetric finite dimensional matrices.
This is the generalization to infinite dimensional complex space.
Theorem 17.4. Suppose that H is a Hilbert space and T is a complex Hermitian operator,
i.e. (T x, y) = (x, T y) for all x, y ∈ H. Suppose T ∈ L(H, H). Then (ker T )⊥ = T (H) (the
closure of T (H)). Equivalently, we can write ker T = (T (H))⊥ .
If T (H) is infinite-dimensional, there exists a complete orthonormal sequence e1 , e2 , . . .
for T (H) such that T (ej ) =P λj ej for every j, where λj are real numbers and λj → 0 as
j → ∞. Also, for every x = ∞ ⊥
P∞
x e
j=1 j j ∈ (ker T ) , so that T (x) = j=1 j j ej .
λ x
If T (H) is finite dimensional, then there exists an orthonormal basis e1 , . . . , eN with
T (ej ) = λj ej for all j = 1, . . . , N .
35
18. 5/9
We will prove the Spectral Theorem 17.4. This requires three lemmas. The first two only
require Hermitian and do not need compactness.
Lemma 18.1. Let T : H → H be Hermitian. Then all of its eigenvalues are real and
eigenvectors corresponding to distinct eigenvalues are orthogonal.
Proof. Suppose that T x = λx for x 6= 0. Then (T x, x) = λ(x, x) = λ kxk2 . Note that
(T x, x) = (x, T x) so therefore this is real, and hence λ is real.
Now, suppose that T x = λx and T y = λy for λ 6= µ. Then (T x, y) = λ(x, y) and
(x, T y) = µ(x, y), which means that λ(x, y) = µ(x, y), which means that 0 = (λ − µ)(x, y),
which means that (x, y) = 0.
Lemma 18.2. Suppose that T : H → H is Hermitian, and suppose that L is any subspace
of H with T (L) ⊂ L. Then T (L⊥ ) ⊂ L⊥ .
Proof. Take y ∈ T (L⊥ ), that is y = T (x) with x ∈ L⊥ . We need to check that y is orthogonal
to L. Suppose that z ∈ L. Then by the Hermitian property, we have (z, T (x)) = (T (z), x) =
0 because T (z) ∈ L and x ∈ L⊥ . This means that y = T (x) ∈ L⊥ .
These two lemmas were straightforward and just required following the definitions. The
next lemma is slightly more complicated and is the main ingredient of the proof of the
Spectral Theorem.
Lemma 18.3. Suppose that T : H → H is nonzero, compact, and Hermitian. Then there
exists a vector e1 with ke1 k = 1 and T (e1 ) = λ1 e1 where either λ1 = kT k or λ1 = − kT k.
Therefore, either kT k or − kT k is an eigenvalue of T .
Proof. Recall from Proposition 15.7 that for Hermitian operators, kT k = supkxk=1 |(T x, x)|.
Then there exists a sequence {xk }k=1,2,... with kxk k = 1 for all k and (T (xk ), xk ) → λ1 , where
either λ1 = kT k or λ1 = − kT k. We need to show that xk converges, because then the limit
could be taken as our e1 .
Note that (using the Hermitian property),
kT (xk ) − λ1 xk k2 = kT (xk )k2 + λ21 − 2λ1 (T (xk ), xk ) ≤ 2λ21 − 2λ1 (T (xk ), xk ) → 0
as k → ∞. This means that T (xk ) − λ1 xk → 0 in H. Since T is compact, T (xk ) must have
a convergent subsequence. Then pick {xkj }j=1,2,... such that y = lim T (xkj ) exists. Then
T (xkj ) − λ1 xkj → 0 in H, which means that λ1 xkj → y and hence xkj → λy1 . Therefore
T (xkj ) → T ( λy1 ). This means that y = T ( λy1 ), or T y = λ1 y. We can now pick e1 = y/λ1 , and
this concludes the proof.
This is a powerful result, and this is the main result and the inductive step of the proof
of the Spectral Theorem.
We are now ready to prove the Spectral Theorem.
Proof of the Spectral Theorem 17.4. First, Lemma 18.3 implies that we have T (e1 ) = λ1 e1
with λ1 6= 0 and ke1 k = 1. Then either T |(span{e1 })⊥ = 0 or we can repeat this step with
(span{e1 })⊥ in place of H and T |(span{e1 })⊥ in place of T . This works because Lemma 18.2
implies that T ((span{e1 })⊥ ) ⊂ (span{e1 })⊥
36
In the first case, we have (span {e1 })⊥ ⊂ ker T . Also, Lemma 18.1 implies that span {e1 } ⊂
(ker T )⊥ , which implies that ker T ⊂ (span {e1 })⊥ . This means that (span {e1 })⊥ = ker T ,
so that span {e1 } = (ker T )⊥ . In this case, we are done with the proof. This is the one-
dimensional case of the Spectral Theorem.
Now, we consider the inductive step. This is more or less the same as the preceding argu-
ment. Assume that k ≥ 2 and we have orthonormal e1 , . . . , ek−1 with span {e1 , . . . , ek−1 }⊥
and with T (ej ) = λj ej for λj 6= 0. Then by Lemma 18.3, either T |span{e1 ,...,ek−1 }⊥ = 0 or there
exists ek ∈ (span {e1 , . . . , ek−1 })⊥ with T (ek ) = λk ek for λk 6= 0. By Lemma 18.2, we then
have T (span {e1 , . . . , ek−1 }⊥ ) ⊂ span {e1 , . . . , ek−1 }⊥ .
In the first case, we have (span {e1 , . . . , ek })⊥ ⊂ ker T , and by Lemma 18.1, we have
span {e1 , . . . , en } ⊂ (ker T )⊥ . Therefore, just as before, ker T = (span {e1 , . . . , ek−1 })⊥ and
hence (ker T )⊥ = span {e1 , . . . , ek−1 }. In this case, we terminate and are done. This is the
finite dimensional case.
Therefore, we’ve proved that either the process terminates and we are in the finite di-
mensional case of the theorem (i.e. (ker T )⊥ is a finite dimensional subspace) or we get an
orthonormal sequence e1 , e2 , . . . with T (ej ) = λj ej with λj 6= 0 for all j and ej ∈ (ker T )⊥ .
We claim that in this case, e1 , e2 , . . . is complete in (ker T )⊥ . This is tricky to check.
Suppose that n ≥ 1. Then x ∈ H implies that x = yn +xn where yn ∈ (span {e1 , . . . , en })⊥
and xn ∈ span {e1 , . . . , en }. Note that in this case, Pythagoras’s Theorem holds, so that
kxk2 = kyn k2 + kxn k2 ≥ kyn k2 . In addition, we know that T (x) = T (yn ) + T (xn ).
In Homework 6, question 4, we showed that λj → 0. In addition, we have
where |λn+1 | = T |span{e1 ,...,en }⊥ . This means that kT (yn )k ≤ |λn+1 | kxk → 0, and hence
T (xn ) → T (x).
Recall that we have xn = nj=1 (x, ej )ej , so we have
P
n
X
(x, ej )λj ej → T (x),
j=1
and
P∞ hence this series converges with respect to the inner product norm. Therefore T (x) =
j=1 λj (x, ej )ej .
It remains to check completeness. We know from Lemma 18.1 that span {e1 , e2 , . . . } ⊂
(ker T )⊥ . Also, x ∈ (span {e1 , e2 , . . . })⊥ , which means that T (x) = 0 and hence that x ∈
ker T . Therefore, (span {e1 , e2 , . . . })⊥ ⊂ ker T . Note that span {e1 , e2 , . . . } is not closed, so
⊥
hence span {e1 , e2 , . . . }⊥ = span {e1 , e2 , . . . }. That’s the end of the proof.
19. 5/11
First, we need to finish the last few lines of the proof of the Spectral Theorem 17.4.
End of proof of Spectral Theorem 17.4. We have already showed that either there exists some
e1 , . . . , en orthonormal with span {e1 , . . . , en } = (ker T )⊥ and T ej = λj ej for all j = 1, 2, . . . , n
⊥
or there exists an orthonormal sequence P∞ e1 , e2 , . . . with span {e1 , e2 , . . . } ⊂ (ker T ) and
T (ej ) = λj ej for all j and T (x) = j=1 λj (x, ej )ej for all x. Our inclusion shows that
37
ker T ⊂ (span {e1 , e2 , . . . })⊥ . Also, x ∈ (span {e1 , e2 , . . . })⊥ , which implies that T (x) = 0
and hence x ∈ ker T . This means that (span {e1 , e2 , . . . })⊥ ⊂ ker T .
Therefore, ker T = (span {e1 , e2 , . . . })⊥ and hence (ker T )⊥ = span {e1 , e2 , . . . }. This
means that span {e1 , e2 , . . . } is dense in (ker T )⊥ . That is, e1 , e2 , . . . is a complete orthonor-
mal sequence in (ker T )⊥ . This concludes the end of the proof of the Spectral Theorem.
19.1. An application of the Spectral Theorem. We will spend the next few lectures on
an application of the spectral theorem. We will give an application to ordinary differential
equations. There are equally important applications to partial differential equations, but
those require more machinery than what we have available.
Consider an interval [a, b] and two real valued function p : [a, b] → R is C 1 and q : [a, b] → R
is C 0 . Assume that p > 0 on [a, b].
For u ∈ C 2 ([a, b]), consider the differential operator Lu = (pu0 )0 + qu = pu00 + p0 u0 + qu.
There is an existence and uniqueness theorem for ordinary differential equations, which
we will not prove in this class.
Theorem 19.1 (Existence and uniqueness theorem). Let g : [a, b] → R be a given C 0
function, and suppose we are given that u(t0 ) = c1 and u0 (t0 ) = c2 at some given point
t0 ∈ [a, b]. There exists a unique C 2 ([a, b]) function u with Lu = g on [a, b] and u(t0 ) = c1
and u(t0 ) = c2 .
We will be particularly interested in what is called an “eigenvalue problem”, and we’ll see
that this relates to a certain Hermitian operator.
19.2. Sturm-Liouville.
Problem 19.2. We are given the Sturm-Liouville eigenvalue problem:
−Lu = λu on [a, b]
αu(a) + βu0 (a) = 0 for given α, β not both zero
γu(b) + δu0 (b) = 0 for given γ, δ not both zero.
Example 19.3. We do a very simple example. Consider the interval [a, b] = [0, π] with
p ≡ 1 and q ≡ 0 on the interval. Take (α, β) = (1, 0) and (γ, δ) = (1, 0). In this case, our
problem is (
−u00 = λu on [a, b]
u(0) = u(π) = 0.
√ √
In the case λ > 0, the general solution is u = A cos λt + B sin λt. Which of these
satisfy the boundary conditions?
√ Checking u(0) = 0, we see that A = 0. Then u(π) = 0
forces either B = 0 or λ = j for j = 1, 2, . . . . In particular, if u is nonzero then λ = j 2 and
the corresponding solution is uj = B sin jt.
In the case λ = 0, we get u00 = 0, which means that u = At + B. The boundary conditions
force A = B = 0, so there are no nonzero √
solutions.
√
−λt
In the case λ < 0, we have u = Ae + Be− −λt , and again, the boundary conditions
imply that A = B = 0.
The point is that there are some special values of λ that work. In addition, the functions
uj form a complete orthonormal sequence for L2R [0, π].
38
We want to show that the same type of behavior occurs for the general Sturm-Liouville
problem. We will end up showing that there exist a complete orthonormal sequence of
eigenfunctions.
Proposition 19.4. Suppose that u, v ∈ C 2 ([a, b]) are not identically zero, and suppose that
u(t) v(t)
Lu = 0 and Lv = 0. Then either u ≡ cv for some constant c or the vectors u0 (t) and v0 (t)
are linearly independent vectors in R2 for every t ∈ [a, b].
Proof. We can check this. If uu(t 0)
and vv(t 0)
are linearly dependent then either uu(t 0)
0 (t )
0
0 (t )
0
0 (t )
0
=
v(t0 ) v(t0 ) u(t0 )
c1 v0 (t0 ) or v0 (t0 ) = c2 u0 (t0 ) (these cases are not the same in the case when one of the
vectors is 0).
We can just consider the first case; the second case is the same. We then have u(t0 ) −
c1 v(t0 ) = 0 and u0 (t0 ) − c1 v 0 (t0 ) = 0, which means that w(t0 ) = w0 (t0 ) = 0 for w = u − c1 v.
By the uniqueness theorem, this means that w ≡ 0, which means that u ≡ cv. This proves
the proposition.
Corollary 19.5. In the case Lu = 0 and Lv = 0, we have either u = cv or uv 0 − vu0 6= 0
for all t ∈ [a, b]. Actually, p(uv 0 − vu0 ) is constant.
Proof. The first statement is simply a restatement of the preceding proposition. Now, we
have
d d
(p(uv 0 − vu0 )) = ((pv 0 )u − (pu0 )v) = (pv 0 )0 u + pv 0 u0 − p(u0 )0 vv = −qvu + quv = 0,
dt dt
0 0
so indeed p(uv − vu ) is constant.
Here, the quantity p(uv 0 − vu0 ) is called the Wronskian.
Proposition 19.6. As before, suppose that u, v ∈ C 2 ([a, b]) with Lu = 0 and Lv = 0 and
u, v not identically zero. Assume that 0 is not an eigenvalue of the Sturm-Liouville problem,
and we have the boundary conditions
(
αu(a) + βu0 (a) = 0
γv(b) + δv 0 (b) = 0,
where each of u and v satisfies the boundary conditions of the Sturm-Liouville problem on
one endpoint. Then u 6= cv, and hence, p(uv 0 − vu0 ) ≡ c 6= 0 on [a, b].
Proof. Otherwise, we would have that both u and v would satisfy the same Sturm-Liouville
problem with λ = 0, which is a contradiction.
Therefore, we see that p(uv 0 − vu0 ) ≡ c 6= 0 on [a, b].
20. 5/13
Recall that we are talking about the Sturm-Liouville eigenvalue problem 19.2. We have
Lu = (pu0 )0 + qu for p ∈ C 1 ([a, b]) and q ∈ C 0 ([a, b]). Assume for the moment that λ = 0 is
not an eigenvalue.
Recall that we showed last time that if Lu = 0 and Lv = 0, and each of u and v satisfy
one of the two boundary conditions, then we have uv 0 − vu0 6= 0. Also p(uv 0 − vu0 ) = c
is a constant called the Wronskian. Note that we can always get such u and v due to the
39
existence and uniqueness theorem. That is, we can solve Lu = 0 with u(a) = c1 , u0 (a) = c2
with c1 α + c2 β = 0, and we can solve Lv = 0 with v(b) = d1 , v 0 (b) = d2 with d1 γ + d2 δ = 0.
Problem 20.1. For a given g ∈ C 0 ([a, b]), we want to solve
Lw = g
αw(a) + βw0 (a) = 0
γw(b) + δw0 (b) = 0
To do this, we use the method of variation of parameters. The goal is to find w in the
form w = ϕu + ψv with some ϕ and ψ to be chosen. We want to find ϕ and ψ.
Note that w0 = ϕu0 + ψv 0 + uϕ0 + vψ 0 . Suppose that we stipulate that ϕ0 u + ψ 0 v = 0. Then
(pw0 )0 = (ϕ(pu0 ))0 + (ψ(pv 0 ))0 = −quϕ + pu0 ϕ0 − qvψ + pψ 0 v 0 = −qw + p(u0 ϕ0 + ψ 0 v 0 ). Hence
Lw = (pw0 )0 + qw = p(u0 ϕ0 + ψ 0 v 0 ) = g if and only if u0 ϕ0 + v 0 ψ 0 = gp . Therefore we have two
equations:
(
ϕ0 u + ψ 0 v = 0
u0 ϕ0 + v 0 ψ 0 = gp .
This can be written in matrix form as
0
u v ϕ g 0
= .
u0 v 0 ψ0 p 1
This can be solved by brute force, or by computing the inverse of the 2 × 2 matrix, yielding
solutions (
ϕ0 = − gc v
ψ 0 = gc u.
Hence, we have
Z b
g(s)v(s)
ϕ(t) = ds + A
t c
Z t
g(s)u(s)
ψ(t) = ds + B.
a c
Thus, supposing that A = B = 0,
Z b Z t
g(s)v(s) g(s)u(s)
w(t) = u(t) ds + v(t) ds.
t c a c
Rb Rb
We check the boundary conditions: w(a) = u(a) a g(s)v(s) c
ds and w0 (a) = u0 (a) a g(s)v(s)
c
ds.
Then
Z b
0 0 g(s)v(s)
αw(a) + βw (a) = (αu(a) + βu (a)) ds = 0.
a c
Rb R b g(s)u(s)
Similarly, we have w(b) = v(b) a g(s)u(s)
c
ds and w 0
(b) = v 0
(b) a c
ds. Then again,
Z b
0 0 g(s)u(s)
γw(b) + δw (b) = (γv(b) + δv (b)) ds = 0,
a c
40
so our proposal solution w(t) is actually the solution to this problem. That is,
Z b Z b
g(s)
w(t) = [H(s − t)u(t)v(s) − H(t − s)u(s)v(t)] ds = k(s, t)g(s) ds
a c a
Therefore, (T (h), g) = (T (g), h) for all h, g ∈ C 0 ([a, b]). Next time, we’ll show that this is
also true for h, g ∈ L2 .
21. 5/16
We continue the discussion of the Sturm-Liouville eigenvalue problem 19.2.
We are assuming for the moment that zero is not an eigenvalue. We used this to get
particular solutions u, v with Lu = 0, Lv = 0 each satisfying half of the boundary conditions:
αu(a) + βu0 (a) = 0 and γv(b) + δv 0 (b) = 0, with u 6= cv and v 6= cu. This means that
p(uv 0 − vu0 ) = k 6= 0. We’ll remove this assumption later.
Recall that we can now solve the inhomogeneous problem 20.1. Our solution was
Z b
w(t) = k(s, t)g(s) ds,
a
41
where k(s, t) = k(t, s) and k is continuous on the square [a, b] × [a, b]. In fact, we have
(
v(s)u(t)
c
s≥t
k(s, t) = u(s)v(t)
c
s ≤ t.
We also showed that this is ϕ(t)u(t) + ψ(t)v(t), where
1 b 1 t
Z Z
ϕ(t) = v(s)g(s) ds and ψ(t) = u(s)g(s) ds.
k t k a
This allowed us to define an operator T : C 0 ([a, b]) → C 2 ([a, b]) and T : L2 [a, b] →
C 0 ([a, b]). Acting with g, h ∈ C 0 ([a, b]), we saw that T is self-adjoint, giving (T (g), h)L2 [a,b] =
(g, T (h))L2 [a,b] . We claim that this is also true for g, h ∈ L2 [a, b].
Proposition 21.1. T is self-adjoint as an operator from L2 to L2 .
Proof. Recall that the continuous functions are dense in L2 , so for any g, h ∈ L2 [a, b],
there exists sequences gk , hk ∈ C 0 ([a, b]) of continuous functions with kgk − gkL2 → 0 and
khk − hkL2 → 0.
According to the self-adjointness property that we showed for continuous functions, we
have (T (gk ), hk )L2 [a,b] = (gk , T (hk ))L2 [a,b] for all k. Recall from last time that we also have
Rb
T (g)(t) = a k(s, t)g(s) ds = (k(s, t), g(s))L2 [a,b] . Hence,
s
Z b Z b
|T (g)(t)| ≤ |k(s, t)||g(s)| ds ≤ k(s, t)2 ds kgkL2 [a,b] ,
a a
√
so therefore max[a,b] |T (g)| ≤ max |k| b − a kgkL2 [a,b] . Note that for any f ∈ L2 [a, b], we
qR
b √
have kf kL2 = a
f (s)2 ds ≤ max |f | b − a. Therefore, we’ve shown that kT (g)kL2 [a,b] ≤
max |k|(b − a) kgkL2 [a,b] .
We now claim that as k → ∞, we have (T (gk ), hk )L2 [a,b] → (T (g), h)L2 [a,b] as k → ∞. We
check this:
(T (gk ), hk )L2 [a,b] = (T (gk ) − T (g), hk ) + (T (g), h) + (T (g), hk − h)
Here, the first term is
(T (gk ) − T (g), hk ) ≤ kT (gk ) − T (g)k khk k = kT (gk − g)k khk k
≤ max |k|(b − a) kgk − gk khk k → 0
and the last term is (T (g), hk − h) ≤ kT (g)k khk − hk → 0. Therefore, we see that indeed
(T (gk ), hk ) → (T (g), h) and similarly (gk , T (hk )) → (g, T (h)).
Therefore, we see that T : L2 ([a, b]) → C 0 ([a, b]) ⊂ L2 ([a, b]) with (T (g), h)L2 [a,b] =
(g, T (h))L2 [a,b] , so hence T is self-adjoint as an operator L2 → L2 .
In addition, we also have the following result:
Proposition 21.2. T : L2 [a, b] → L2 [a, b] is compact. In fact, it is P Hilbert-Schmidt, i.e.
there exists a complete orthonormal sequence f1 , f2 , . . . for L [a, b] with ∞
2 2
j=1 kT (fj )kL2 [a,b] <
∞.
42
R 2π
Proof. Recall that for L2 ([0, 2π]), with the inner product (f, g) = π1 0 f (x)g(x) dx, we have
already found an orthonormal sequence { √12 , cos x, sin x, cos 2x, sin 2x, . . . }. To do this, we
just need to rescale them. Define a new variable t, where x = 2π(t−a) b−a
2π
and dx = b−a dt. This
2
R b ˜
yields the new inner product (f, g) = b−a a f (t)g̃(t) dt. Our new orthonormal sequence is
then
1 2π(t − a)n 2π(t − a)n
√ , cos , sin , n = 1, 2, . . . .
2 b−a b−a
Note that this is a complete orthonormal sequence, and all functions are C 0 [a, b].
We now have that
Z b
2 2
T (fj )(t) = k(s, t)fj (s) ds = (k(s, t), fj (s))L2 [a,b] .
b−a b−a a
Bessel’s identity then tells us that
2 X∞ Z b
2 2 2
(T (fj )(t)) = k(s, t)2 ds
b−a j=1
b − a a
In summary, we have found a complete orthonormal sequence of functions for L2 [a, b], each of
which is in C 2 [a, b], and that solve the Sturm-Liouville problem with our boundary conditions.
43
This completes the solution of the Sturm-Liouville eigenvalue problem, except that we still
need to check that ker T = 0.
This was a highly nontrivial result, so it wasn’t surprising that we had to work so hard
for it.
22. 5/18
Recall from last time that we still need to check that ker T = {0}.
Proposition 22.1. ker T = {0}.
Proof. Recall that for g ∈ L2 [a, b], we have
Z b
T (g)(t) = k(s, t)g(s) ds = u(t)ϕg (t) + v(t)ψg (t),
a
By uniform convergence, we can take limits under the integral, and we see that
Z t
T (g)(t) = T (g)(a) + (u0 (s)ϕg (s) + v 0 (s)ψg (s)) ds.
a
Here, as we checked earlier, the integrand is a continuous function. This means that T (g)(t)
is C 1 [a, b] and T (g)0 (t) = u0 (t)ϕg (t) + v 0 (t)ψg (t).
44
To summarize, we have that
T (g)(t) = u(t)ϕg (t) + v(t)ψg (t)
T (g)0 (t) = u0 (t)ϕg (t) + v 0 (t)ψg (t).
Therefore, if T (g) = 0 then
u(t)ϕg (t) + v(t)ψg (t) = 0
u0 (t)ϕg (t) + v 0 (t)ψg (t) = 0.
This yields
u(t) v(t) ϕg (t) 0
= .
u0 (t) v 0 (t) ψg (t) 0
The determinant of the matrix is the Wronskian, which we know is nonzero. Hence, ϕg (t) ≡ 0
and ψg (t) ≡ 0. Hence, we’ve derived that
Z b Z t
v(s)g(s) ds ≡ 0 and u(s)g(s) ds ≡ 0
t a
for all t. We claim that g(s) = 0. If g were continuous, we could differentiate and use the
fundamental theorem of calculus; however, g ∈ L2 [a, b] and we have to be a bit more careful.
This is the content of the following lemma:
Rt
Lemma 22.2. If f ∈ L2 [a, b] with a f (s) ds = 0 for all t then f (x) = 0 almost everywhere.
Proof. There exists a sequence of step functions ϕk with kϕk − f kL2 → 0. Then for each ϕk ,
(k) (k) (k)
there is an associated partition x1 , x2 , . . . , xNk . Then for each k, we have that
Nk
X (k)
ϕk = aj χ[x(k) ,x(k) ] .
j−1 j
j=1
Then
kϕk − f k2L2 = (ϕk − f, ϕk − f )L2 [a,b] = kϕk k2 + kf k2 − 2(ϕk , f ).
Now,
Nk (k)
Z b X Z xj
(k)
(ϕk , f ) = ϕk (s)f (s) ds = aj f (s) ds = 0
a j=1 x(k)
because
(k) (k) (k)
Z xj Z xj Z xj−1
f (s) ds = f (s) ds − f (s) ds = 0.
(k)
xj−1 a a
Thus, we’ve seen that v(s)g(s) = 0 and v(s)g(s) = 0 almost everywhere. We never have
u(s) = v(s) = 0 because the Wronskian is nonzero, so hence g(s) = 0 almost everywhere.
This shows that ker T = {0}.
45
Therefore, if 0 is not an eigenvalue, we know basically everything about the solutions of
the Sturm-Liouville problem 19.2. That is, there exists a complete orthonormal sequence
h1 , h2 , . . . for L2 [a, b] with hi ∈ C 2 [a, b] and T (hj ) = λj hj where λj 6= 0 and λj → 0. Here,
−Lhj = λ1j hj and the hj satisfy the boundary conditions.
So far, we’ve assumed that 0 is not an eigenvalue of the Sturm-Liouville eigenvalue problem
19.2:
−Lu = λu on [a, b]
αu(a) + βu0 (a) = 0 for given α, β not both zero
γu(b) + δu0 (b) = 0 for given γ, δ not both zero.
Proposition 22.3. There exists µ0 > 0 such that no λ ≤ −µ0 is an eigenvalue. (Hence,
1
λj
→ +∞.)
Proof. Suppose that −Lu = λu for u 6= 0, and suppose that we have the boundary condition
αu(a) + βu0 (a) = 0. If β = 0, we have u(a) = 0; otherwise, β 6= 0 and u0 (a) = − αβ u(a).
Consider u2 (t). This is a continuous function on a closed interval, so it attains its minimum.
Let y ∈ [a, b] be the point where u2 (t) attains a minimum. Then
Z y
(u2 )0 = u2 (y) − u2 (a),
a
We are now almost done, and we’ll finish this next time.
46
23. 5/20
We’re trying to prove that there is a fixed lower bound on all of the eigenvalues.
Finishing the proof of 22.3. Recall that we showed that if −Lu = λu (and u 6= 0) then
Z b
|c| + |d| b 2
Z Z b
2 0 2
−λ kuk < − 2
(p(u ) − qu ) + u + 2(|c| + |d|) |u||u0 |.
a b − a a a
We will now use a tricky little inequality |ab| ≤ 21 (a2 +b2 ); this is equivalent to (|a|−|b|)2 ≥ 0.
For any ε > 0, we can write this as
√
b 1
|ab| = ( 2εa) √ ≤ εa2 + b2 .
2ε 4ε
|c|+|d|
Plugging in |u| = b and |u0 | = a, and setting C = max |q| + b−a
, we have
b b b b
(|c| + |d|)2
Z Z Z Z
0 2 2 0 2
≤ −δ (u ) + C u +ε (u ) + u2 .
a a a ε a
Recall that if µ1 , µ2 , µ3 , . . . are the eigenvalues then we showed that |µj | → ∞, and now
that we’ve shown that they are all bounded below, we know that hence µj → ∞ and there
are only finitely many negative eigenvalues µj .
Furthermore, we can now get rid of that annoying assumption that 0 is not an eigenvalue.
Let L0 u = (pu0 )0 + (q − µ0 )u = Lu − µ0 u. This means that −L0 u = λu if and only if
−Lu = (λ − µ0 )u, and the boundary conditions for u are the same in each case of L and L0 .
Thus, 0 is not an eigenvalue of L0 . All of our preceding results therefore apply. This
means that there exists a complete orthonormal sequence h1 , h2 , . . . for all of L2 [a, b] with
corresponding eigenvalues µ01 , µ02 , . . . . Then −L0 hj = µ0j hj , and hence −Lhj = µj hj for
µj = µ0j − µ0 . The same result in fact holds; we have a complete orthonormal sequence of
eigenvectors, and we know that the eigenvalues satisfy µj → ∞ as j → ∞.
47
23.1. Application: Heat Flow. We now consider an application of this discussion to
partial differential equations. This application is heat flow.
Suppose you have a region of Rn . (The physically useful situations are n = 1, 2, 3.) Say
this region is made of some homogeneous and isotropic material. Isotropic means that the
material is not made of crystals, so heat has no preference for flowing in a particular direction.
Let u(x, t) be the temperature at position x and time t.
There is a very accurate model for heat flow. Here are the physical assumptions:
R
(1) The quantity of heat in any ball Bρ (y) is λ Bρ (y) u(x, t) dx where λ is some constant
dependent on the material.
(2) Heat should flow in the direction −∇x u(x, t), and the rate of flow should be pro-
portional
R to |∇u|. The rate of flow of heat across the boundary ∂Bρ (y) should be
µ ∂Bρ (y) η · ∇u where η is the unit normal of ∂Bρ (y).
Hence, we should have that
Z Z
d
λ u(x, t) dx = µ η · ∇u.
dt Bρ (y) ∂Bρ (y)
We assume that u is a C 1 function so we put the derivative under the integral sign. This
is just a model, and it is close enough to a smooth function so assuming this should not be
problematic. Then we can apply the Gauss’ Theorem to see that
Z Z Z Z
∂u
λ =µ η · ∇u = µ div(∇u) = µ ∆u.
Bρ (y) ∂t ∂Bρ (y) Bρ (y) Bρ (y)
Here, we used the fact that div(∇u) = ∆u, where ∆ is the Laplacian.
Summarizing, we see that for any ball Bρ (y) of the material, we see that
Z Z
∂u µ
= ∆u.
Bρ (y) ∂t λ Bρ (y)
This means that
∂u µ
R R
Bρ (y) ∂t λ Bρ (y)
∆u
= ,
Bρ (y) Bρ (y)
and letting ρ → 0, we have that
∂u µ
= ∆u .
∂t λ
23.1.1. Example: n = 1. Let’s now apply this in the case of n = 1. This is the case of a
metal bar, and there are various boundary conditions. We could fix u(b) ≡ 0 to keep the
ends have fixed temperature. Alternatively, we could insulate the ends to prevent heat flow,
yielding ∂u
∂x
(b) ≡ 0.
Problem 23.1. In this case, our problem has boundary conditions and initial conditions.
2
∂u
∂t
= ∂∂xu2 , t > 0, a < x < b
αu(a, t) + β ∂u (a, t) = 0, t > 0
∂x
∂u
γu(b, t) + δ ∂x
(b, t) = 0, t > 0
u(x, 0) = ϕ(x), a ≤ x ≤ b.
48
µ
Remark. There is no loss of generality to drop the constant λ
simply by rescaling via
t 7→ µλ t.
First, we look for separated variable solutions, i.e. u(x, t) = f (x)g(t). This won’t solve
the problem completely, but it’s a good start.
We have
∂u
= f (x)g 0 (t)
∂t
∂ 2u
2
= f 00 (x)g(t),
∂x
0 00
so our PDE becomes f (x)g (t) = f (x)g(t). This is equivalent to
g 0 (t) f 00 (x)
=
g(t) f (x)
at points where g(t) 6= 0 and f (x) 6= 0. How could that be? One side is a function of t
and the other is a function of x, so they can be equal only when both are constant. We can
therefore get a family of many solutions as follows: Take λ ∈ R, and solve
(
g 0 (t) = −λg(t) t≥0
00
f (x) = −λf (x) a ≤ x ≤ b.
We can solve the first equation easily; this is just g(x) = Ce−λt for some constant C. We
know how to solve second order equations, so we can also do the second equation. We’re
interested in taking u = f (t)g(x) while satisfying as many of our conditions as possible, so
we’ll make u also satisfy the boundary conditions. Observe that the boundary conditions for
u translate directly into boundary conditions of f , so we want to solve
00
−f = λf
αf (a) + βf 0 (a) = 0
γf (b) + δf 0 (b) = 0.
This is a Sturm-Liouville problem that we know how to solve. There exists a complete
orthonormal sequence of eigenfunctions h1 , h2 , . . . with corresponding λ1 , λ2 , . . . with λj →
∞.
That is, we have solutions cj hj (x)e−λj t for j = 1, 2, . . . . These satisfy the partial differential
equation and the boundary conditions. Now we have to try to satisfy the initial conditions.
The eigenfunctions form a complete orthonormal sequence, so ϕ ∈ L2 [a, b] implies that we
can pick cj = (ϕ, hj ) to get that
∞
X
cj hj (x)e−λj t = ϕ
j=1
at t = 0 in the L2 norm.
So it looks like we’re done! There’s still something left to check; we need to make sure
that this series remains smooth and still satisfies the PDE. There’s still some checking and
a little way to go, but that’s the general idea.
49
24. 5/23
Recall that we were consider
P∞the heat equation problem 23.1.
We showed that u(x, t) = j=1 cj e−λj t hj (x), where λj is the j-th eigenvalue of a Sturm-
Liouville problem, hj is the j-th eigenfunction from Sturm-Liouville, and cj = (ϕ, hj )L2 [a,b]
is a candidate solution.
By construction, each partial sum is a solution of the heat equation
P satisfying boundary
conditions. Consider the initial conditions; at t = 0, we have cj hj = ϕ. There are a
number of things to check: does this sum converge? Is it continuous?
To check these properties, we need to use the Weierstrass M -test.
∞
X ∞
X ∞
X
|SN (x) − S(x)| = fn (x) ≤ |fn (x)| ≤ Mn < ε
n=N +1 n=N +1 n=N +1
for every x. This means that supx∈X |SN (x) − S(x)| < ε for every N > J, which gives us
uniform convergence.
We only have finite time, so let’s only do this under the simplest boundary conditions.
Consider the special case of the interval [0, π] with boundary conditions u(a, t) = 0 and
u(b, t) = 0. In the notation of problem 23.1, this corresponds to (α, β) = (γ, δ) = (1, 0).
In this case, we know the eigenvalues of the Sturm-Liouville problem. This is λj = j 2 for
j = 1, 2, . . . and hj (x) = sin jx. Here, we use the L2 inner product with the appropriate
π
scaling to make hj have norm one: (f, g) = π2 0 f (x)g(x) dx. Then we have
R
∞
2
X
u(x, t) = cj e−j t sin jx
j=1
where
Z π
2
cj = ϕ(x) sin jx dx.
π 0
Proof. We will check that this converges using the Weierstrass M -test. To do this, we will
also assume that f ∈ C 2 ([0, π]) satisfying the boundary conditions, so that ϕ(0) = ϕ(π) = 0.
Note that here, u(0, t) = u(π, t) = 0.
50
We can integrate by parts twice, observing that all boundary terms are zero, to get that
π
2 π
Z Z π
2 2
cj = ϕ(x) sin jx dx = − ϕ(x) cos jx + ϕ0 (x) cos jx dx
π 0 jπ 0 jπ 0
π Z π Z π
2 0 2 00 2
= 2 ϕ (x) sin jx − 2 ϕ (x) sin jx dx = − 2 ϕ00 (x) sin jx dx,
j π 0 j π 0 j π 0
and hence
2 max[0,π] |ϕ00 |
|cj | ≤
j2
P∞ 2
for j = 1, 2, . . . . We claim that j=1 cj e−j t sin jx converges uniformly on [0, π] × [0, ∞).
We can now check this:
2 max[0,π] |ϕ00 |
cj e−j t sin jx ≤ 2 ,
j2
P 1
which is the jth term of a convergent sequence j2
. This is what we call Mj in the
Weierstrass M -test. The Weierstrass M -test therefore shows uniform convergence.
The uniform limit of continuous functions is continuous, which shows the following corol-
lary:
P∞ −j 2 t
Corollary 24.3. j=1 cj e sin jx is continuous in [0, π] × [0, ∞). Hence the boundary
conditions and initial conditions are satisfied.
Now, we just need to show that u(x, t) satisfies the heat equation. To do this, we need to
be able to differentiate; this will come from the following general result:
Theorem 24.4 (Differentiation of series). P Suppose we have a sequence of C 1 functions
fn : [c, d] → RPfor n = 1, 2, . . . , suppose that ∞ all x ∈ [c, d], and
n=1 fn (x) is convergent for P
suppose that ∞ f 0
n=1 nP (x) is uniformly convergent on [c, d]. Then f (x) = ∞
n=1 fn (x) is C
1
0 ∞ 0
on [c, d] and f (x) = n=1 fn (x).
Remark. We have to be careful, and there are counterexamples where we cannot differen-
tiate inside an infinite
P 0 series.
To check that fn (x) is uniformly convergent, the Weierstrass M -test gives a sufficient
condition: |fn0 (x)| ≤ Mn where
P
Mn is convergent.
Proof. We can differentiate a finite sum without any problems, so that
N
! N
d X X
fn (x) = fn0 (x),
dx n=1 n=1
is clearly uniformly convergent on the interval [c, d] by the Weierstrass M -test. Indeed,
2 2c
|cj j 2 e−j t sin jx| ≤ 2 max |ϕ00 |e−j ≤ 2 max |ϕ00 |e−jc ,
[0,π] [0,π]
which is again the jth term of a convergent series. Therefore the termwise differentiation
theorem 24.4 holds and tells us that
∞
∂u X 2
=− j 2 cj e−j t sin jx.
∂t j=1
Similarly, in an almost identical argument, if we take some 0 < c < d < π and some fixed
t > 0, we see that the termwise differentiated series with respect to x
∞
2
X
cj je−j t cos jx
j=1
2
is uniformly convergent by the Weierstrass M -test; indeed, |cj je−j t cos jt| ≤ 2 max |ϕ00 |e−jt
is the jth term of a convergent series. Also, differentiating a second time, we see that
∞
2
X
− j 2 cj e−j t sin jx
j=1
Now, we need to prove our preliminaries. First, we’ll prove the maximum principle for
subharmonic functions.
Proposition 25.6. Let u ∈ C 2 (BR ) ∩ C 0 (B R ), where B R = {x ∈ Rn : kxk < R}, and
suppose that ∆u ≥ 0 on BR . This means that the function is twice differentiable on the open
ball and is continuous on the closed ball. Then maxB R u = max∂BR u.
∂2
Here, ∆u = nj=1 ∂x
P
2 u is the Laplacian.
j
Proof. Let ε > 0, and let v(x) = u(x) + ε kxk2 . Here, ∆v = ∆u + 2nε > 0 in BR .
First, suppose there some y ∈ BR such that v(y) = maxBR v. Then
∂v ∂v 2
(y) = 0, 2 (y) ≤ 0,
∂xj ∂xj
which implies that ∆v(y) ≤ 0; this is a contradiction to the statement that ∆v > 0. We
defined v in order to have strictly positive Laplacian precisely to make this work. Hence
max u ≤ max v = max v ≤ max u + εR2
BR BR ∂BR ∂BR
for all ε > 0, so therefore maxB R u ≤ max∂BR u. The reverse inequality is trivial, so we’ve
proven the maximum principle.
26. 5/27
Today we will prove the Hahn-Banach theorem. To reiterate, here is the statement of the
theorem:
Theorem 26.1 (Hahn-Banach theorem). Let X be any real vector space, and suppose that
p : X → R such that p is positively homogeneous (i.e. p(tx) = tp(x) for all t > 0 and all
x ∈ X) and subadditive (i.e. p(x + y) ≤ p(x) + p(y) for all x, y ∈ X). Let S0 be any subspace
of X and suppose that f0 : S0 → R is linear with f0 (x) ≤ p(x) for all x ∈ S0 .
Then there exists an extension f : X → R that is linear with f |S0 = f0 and f (x) ≤ p(x)
for all x ∈ X.
As we discussed last time, we are interested in a corollary:
Corollary 26.2. Let X be any normed space, and take p(x) = kxk. Consider some arbitrary
y ∈ X \ {0} and S0 = span{y}. Then f0 (ty) = t kyk is linear on S0 . The Hahn-Banach
theorem 26.1 then implies that there exists f : X → R with f |S0 = f0 and f (x) ≤ kxk for
every x ∈ X. That implies that f is a nontrivial bounded linear functional.
We are now ready to attack the proof of the Hahn-Banach theorem.
55
Proof of Hahn-Banach theorem 26.1. Suppose that S0 is a subspace of X which contains S0
such that there exists an f1 : S1 → R which is linear and f1 |S0 = f0 and f1 (x) ≤ p(x) for
every x ∈ S1 .
If S1 6= X, then we can pick a vector a ∈ X \ S1 , and we can define a new subspace
S2 = {x + ta : x ∈ S1 , t ∈ R}. We can find an extension f2 : S2 → R defined by f2 (x + ta) =
f1 (x) + tλ where λ = f2 (a) and x ∈ S1 . This is clearly linear, and it is an extension of f1
because f2 |S1 = f1 because this is the case where t = 0.
Here is the main step of the proof:
Claim. There exists a choice of λ such that f2 (x) ≤ p(x) for all x ∈ S2 .
Proof. In the case t = +1, we have f2 (x + a) = f1 (x) + λ for all x ∈ S1 , and in the case
t = −1, we have f2 (y − a) = f1 (y) − λ for all y ∈ S1 . Adding, we have f1 (x) + f1 (y) =
f1 (x + y) ≤ p(x + y) for all x, y ∈ S1 .
We use subadditivity in a somewhat tricky way to see that
f1 (x) + f1 (y) ≤ p(x + y) = p((x + a) + (y − a)) ≤ p(x + a) + p(y − a),
which means that
f1 (y) − p(y − a) ≤ p(x + a) − f1 (x)
for all x, y ∈ S1 . By taking a fixed x, we see that
sup (f1 (y) − p(y − a)) ≤ p(x + a) − f1 (x)
y∈S1
That’s great, because there’s a number that fits in between these two (possibly equal to
both). Choose λ to be this number. That is, there exists λ ∈ R with f1 (y) − p(y − a) ≤ λ ≤
p(x + a) − f1 (x) for every x, y ∈ S1 . These gives us two inequalities:
f1 (x) + λ ≤ p(x + a) for every x ∈ S1
f1 (y) − λ ≤ p(y − a) for every y ∈ S1 .
If t > 0, the first inequality gives
x x
f2 (x + ta) = f1 (x) + tλ = t f1 +λ ≤t p + a = p(x + ta).
t t
Similarly, if t < 0, the second inequality gives
x x
f2 (x + ta) = f1 (x) + tλ = |t| f1 − λ ≤ |t| p −a = p(x + ta).
|t| |t|
That was the clever part of the proof.
It’s starting to look like the theorem is true. If we can extend it at all, we’ve shown that
we can extend it a bit more. It’s tempting that say that we’re done, but that’s not true
because we are working with infinite dimensional spaces. We need to do something a bit
more sophisticated.
Let S be the set of ordered pairs (S, fS ) such that S is a subspace of X which contains
S0 and fS : S → R is linear with fS (x) ≤ p(x) for all x ∈ S, and fS |S0 = f0 . Here, we’ve
collected all extensions of f0 . We can define a partial ordering on S.
56
Definition 26.3. Recall that a partial ordering or any set Q means that x x for all x ∈ Q,
and that x y and y x implies that x = y, and also that x y and y z implies that
x z. This differs total ordering because two elements of Q do not have to be related in a
partial order.
Q is totally ordered (or a “chain”) if x, y ∈ Q implies that either x y or y x.
Example 26.4. The real numbers are totally ordered (and hence partially ordered). The
inclusion of sets is an example of a partial order.
We can define our partial ordering on S by (S, fS ) (T, fT ) means that S ⊂ T and
fT |S = fS . Suppose that T ⊂ S is a chain. Then let T = ∪(T,fT )∈T T .
Claim. We claim that this is a subspace.
Normally, this is nonsense; the union of two subspaces is not in general a subspace. How-
ever, we do have a total ordering, and we will use this to check the claim.
Proof. Take any x, y ∈ T and α, β ∈ R. Then x ∈ T1 , y ∈ T2 for some (T1 , fT1 ), (T2 , fT2 ) ∈ T .
Since T is totally ordered, we know that either T1 ⊂ T2 or T2 ⊂ T1 . Assume without loss
of generality that T1 ⊂ T2 . Then x, y ∈ T2 , so therefore αx + βy ∈ T2 as well, and we are
done.
We can now define a function f : T → R. For x ∈ T , define f (x) = fT (x) for any T with
(T, fT ) ∈ T and x ∈ T . This is unambiguous due to our total ordering.
Note that f : T → R is linear. To see this, consider α, β ∈ R and x, y ∈ T as above. Then
we know that x, y ∈ T2 for some T2 , so that f (αx+βy) = fT2 (αx+βy) = αfT2 (x)+βfT2 (y) =
αf (x) + βf (x). Clearly, we also have f (x) ≤ p(x). In particular, this shows that (T , f ) ∈ S.
By construction, (T, fT ) (T , f ), i.e. (T , f ) is an upper bound for T relative to this partial
order.
These are all of the hypothesis that we need for Zorn’s Lemma:
Lemma 26.5 (Zorn’s Lemma). If S is any partially ordered set such that every chain T has
an upper bound in S, then S has at least one maximal element.
That is, there exists (S, fS ) ∈ S such that if (S, fS ) (T, fT ) for some (T, fT ) ∈ S then
(S, fS ) = (T, fT ).
We now claim that this maximal element is now the whole space: X = S. Indeed, if
S 6= X then we can find S1 ) S and an extension f of fS , which contradicts maximality of
(S, fS ). This proves the Hahn-Banach theorem.
27. 6/1
It is impossible to review everything, so we’ll give a brief and sketchy overview of most
(but not all) of the main topics.
We had inner product spaces
p and normed spaces, and we usually denote inner products
as (x, y). We checked that (x, x) is a norm, called the inner product norm. In particular,
this means that inner product spaces are contained in normed spaces. To do this, we needed
Cauchy-Schwarz : |(x, y)| ≤ kxk kyk, and we had the triangle inequality: kx + yk ≤ kxk +
kyk.
We had some special results about finite-dimensional normed spaces.
57
(1) We showed that in such spaces, all norms are equivalent. This means that if k·k1 and k·k2
are any two norms then there exists a constant C such that C −1 kxk1 ≤ kxk2 ≤ C kxk1
for every x ∈ X.
(2) Any closed bounded subset is compact. In particular, the closed unit ball is compact.
Of course, these fail miserably in infinite dimensional spaces. In fact, in infinite dimensional
space, the closed unit ball is never compact. The proof was that we showed that there exists
a sequence e1 , e2 , . . . with kej k = 1 for every j such that kei − ej k ≥ 1 for every i 6= j, which
violates sequential compactness. This provided a major contrast between finite and infinite
dimensional space.
We then talked about complete spaces. A complete inner product space is called a Hilbert
space and a complete normed space is called a Banach space.
Example 27.1. For example, Rn , Cn are standard examples of real and complex finite-
dimensional Hilbert spaces, with the inner products (x, y) = x · y and (z, w) = z · w. Other
Hilbert spaces included
R `2R and `2C , and most importantly, L2 [a, b] and L2C [a, b] with inner
products (f, g) = [a,b] f g.
In a Hilbert space, we could discuss the parallelogram identity with the inner product
norm:
kx − yk2 + kx + yk2 = 2(kxk2 + kyk2 ).
Using this, we proved the nearest point property. That is, if A is a closed convex subset of
H and x ∈ H then there exists a unique a ∈ A with kx − ak < kx − yk for all y ∈ A \ {a}.
A special case of this is when A is a closed linear subspace, in which case the nearest point
a ∈ A has the additional property that (x − a) ⊥ A.
This brings us to orthogonality. If E ⊂ H is any nonempty subset, then the orthogonal
complement of E is E ⊥ = {x ∈ H : (x, e) = 0 for all e ∈ E}. We checked that E ⊥ is a closed
linear subspace. We also had a theorem: If M is any closed linear subspace then
(1) M ⊥ ∩ M = {0}
(2) x ∈ H implies that x = y + z, y ∈ M and z ∈ M ⊥
(3) (M ⊥ )⊥ = M
The proof of (2) uses the nearest point property, and the proof of (3) uses (2) to show one
of two inclusions.
We then discussed orthonormal sequences in H. Suppose that e1 , . . . , eN is a finite or-
thonormal sequence. We had a basic identity from which many results followed:
N 2 N N
X X X
x− λj ej = kxk2 + |ci − λi |2 − |ci |2 .
j=1 j=1 i=1
PN
In particular, the nearest point of span {e1 , . . . , eN } to x is j=1 cj ej . In this case, our basic
identity reduces to
N 2 N
X X
x− cj ej = kxk2 − |ci |2 ,
j=1 i=1
which is Bessel’s identity.
We can also take an infinitely orthonormal sequence e1 , e2 , . . . . This allows several con-
clusions:
58
P∞
(1) j=1 cj ej
always converges. We proved this by checking that the partial sums formed
a Cauchy sequence. That is, there exists y such that
N
X
y− cj e j → 0
j=1
P N → ∞.
as
(2) ∞ i=1P|ci |2 ≤ kxk2 . This is Bessel’s inequality.
(3) x = ∞ j=1 cj ej if and only if equality holds in Bessel’s inequality. In this case, this is
called Bessel’s identity.
This
P leads us to our next definition. The orthonormal sequence e1 , e2 , . . . is complete if
x= ∞ j=1 cj ej for every x. We showed that the following are equivalent:
(1) e1 , e2 , . . . is complete
(2) Equality holds in Bessel’s inequality for every x ∈ H
(3) No x 6= 0 satisfies (x, ej ) = 0 for all j
(4) span {e1 , e2 , . . . } is dense in H.
Our key example of a complete orthonormal sequence is in the case H = L2C [−π, π]. We
showed thatR {einx }n=0,±1,±2,... is a complete orthonormal sequence with the inner product
1 π
(f, g) = 2π −π
f (t)g(t) dt. This was a long story. The proof involved the Fejér kernel, and
it’s hard to overestimate the importance of this.
We digressed a little bit because we didn’t fully understand L2C . To understand this, we
discussed the Lebesgue integral. We discussed:
(1) Definition of measure zero.
(2) Step functions.
(3) Main technical theorem.
Rb
(a) If ϕk is an increasing sequence of step functions and a ϕk is bounded then we have
{ϕk (x)}k=1,2,... is bounded for almost every x ∈ [a, b].
(b) If ψk is an increasing sequence and lim ϕk = lim ψk almost everywhere, then we have
Rb Rb
lim a ϕkR= lim a ψk .
(4) Definition of [a,b] f for f ∈ L0
(5) L1 = {g − h : g, h ∈ L0 } is a linear space R
(6) Properties of the integral. For example, if fk ≥ 0 and [a,b] fk → 0 then there exists a
subsequence fkj with fkj (x) → 0 for almost every x ∈ [a, b].
(7) L2 = {f : f ∈ L1 , f 2 ∈ L1 } is a linear space
(8) Completeness.
We then discussed linear operators. Here, X is a normed space, and we let X ∗ be the
set of bounded linear functionals, known as the dual space. This is a Banach space (even
if X is not complete). Furthermore, if X and Y are both normed spaces then we defined
L(X, Y ) = {bounded linear operators X → Y }. This had an operator norm
kT (x)k
kT k = sup = sup kT (x)k .
x6=0 kxk kxk=1
60